Prioritised Remembering in Experience Replay (Q-Learning)

Artificial Intelligence Asked by conscious_process on August 24, 2021

I’m using Experience Replay based on the original Prioritized Experience Replay (PER) paper. In the paper authors show ~ an order of magnitude increase in data efficiency from prioritized sampling. There is space for further improvement, since PER remembers all experiences, regardless of their importance.

I’d like to extend PER so it remembers selectively based on some metric, which would determine whether the experience is worth remembering or not. The time of sampling and re-adjusting the importance of the experiences increases with the number of samples remembered, so being smart about remembering should at the very least speed-up the replay, and hopefully also show some increase in data efficiency.

Important design constrains for this remembering metric:

compatibility with Q-Learning, such as DQN
computation time, to speed up the process of learning and not trade off one type of computation for another
simplicity

My questions:

What considerations would you make for designing such a metric?
Do you know about any articles addressing the prioritized experience memorization for Q-Learning?

dqn experience replay q learning reinforcement learning

Add your own answers!

Ask a Question

Get help from others!