Hindsight experience replay: strategy for sampling goals

Question

The authors of Hindsight Experience Replay list out several strategies for sampling a set of additional goals $G$ in Section 4.5:

final - corresponds with final state of environment,
future — replay with k random states which come from the same episode as the transition being replayed and were observed after it,
episode — replay with k random states coming from the same episode as the transition
being replayed,
random — replay with k random states encountered so far in the whole training procedure.

My interpretation of the future method is then that we can only select k random states if the current transition being replayed has already happened in the episode, so this is at minimum the second time we've seen the current transition. This seems very unlikely if working in an environment with a large state space (especially with continuous features). Am I missing something obvious in the interpretation of how to implement this strategy?

Karim Ibrahim Yehia · Answer

I’ve been there for quite a while until i read the implementation of thr algorithm in OpenAI Baselines library, you can find it on Github.
Basically as of what i understood from the algorithm the K factor mainly adjusts the percentage of sampled data (being replayed for the update step in Q weights) that if we sample 1 transition without HER we sample another K with HER. This can be implemenated in the prescribed fashion if we sampled a batch of data and replayed K/1+k with HER by simply changing their goals by other states observed at any time in the future

Hindsight experience replay: strategy for sampling goals

One Answer

Add your own answers!

Ask a Question