Guidelines to debug REINFORCE-type algorithms?

Data Science Asked by Astariul on November 28, 2020

I implemented a self-critical policy gradient (as described here), for text summarization.

However, after training, the results are not as high as expected (actually lower than without RL…).

I’m looking for general guidelines on how to debug RL-based algorithms.

I tried :

  • Overfitting on small datasets (~6 samples) : I could increase the average reward , but it does not converge. Sometimes the average reward would go down again.
  • Changing the learning rate : I changed the learning rate and see its effect on small dataset. From my experiment I choose quite big learning rate (0.02 vs 1e-4 in the paper)
  • Looking at how average reward evolve as training (on full dataset) goes : Average reward significantly does not move…

One Answer

The only resource I could find so far :

For my specific case, I made a few errors :

  • Frozen some part of the network that shouldn't be frozen
  • Wrong learning rate

Even if I could overfit a small dataset, it didn't mean anything : while training on the whole dataset, the average reward was not going up.

You should look for a reward going up.

I'm not accepting this answer as I believe it is not complete : it lacks general and systematic guidelines to debug a Reinforcement Learning algorithm.

Answered by Astariul on November 28, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP