Data Science Asked by nsalas on April 27, 2021
The authors of Hindsight Experience Replay list out several strategies for sampling a set of additional goals $G$ in Section 4.5:
My interpretation of the future method is then that we can only select k random states if the current transition being replayed has already happened in the episode, so this is at minimum the second time we’ve seen the current transition. This seems very unlikely if working in an environment with a large state space (especially with continuous features). Am I missing something obvious in the interpretation of how to implement this strategy?
I’ve been there for quite a while until i read the implementation of thr algorithm in OpenAI Baselines library, you can find it on Github. Basically as of what i understood from the algorithm the K factor mainly adjusts the percentage of sampled data (being replayed for the update step in Q weights) that if we sample 1 transition without HER we sample another K with HER. This can be implemenated in the prescribed fashion if we sampled a batch of data and replayed K/1+k with HER by simply changing their goals by other states observed at any time in the future
Answered by Karim Ibrahim Yehia on April 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP