Guidelines to debug REINFORCE-type algorithms?

Question

I implemented a self-critical policy gradient (as described here), for text summarization.

However, after training, the results are not as high as expected (actually lower than without RL...).

I'm looking for general guidelines on how to debug RL-based algorithms.

I tried :

Overfitting on small datasets (~6 samples) : I could increase the average reward , but it does not converge. Sometimes the average reward would go down again.
Changing the learning rate : I changed the learning rate and see its effect on small dataset. From my experiment I choose quite big learning rate (0.02 vs 1e-4 in the paper)
Looking at how average reward evolve as training (on full dataset) goes : Average reward significantly does not move...

Astariul · Answer

The only resource I could find so far :

https://github.com/williamFalcon/DeepRLHacks

For my specific case, I made a few errors :

Frozen some part of the network that shouldn't be frozen
Wrong learning rate

Even if I could overfit a small dataset, it didn't mean anything : while training on the whole dataset, the average reward was not going up.

You should look for a reward going up.

I'm not accepting this answer as I believe it is not complete : it lacks general and systematic guidelines to debug a Reinforcement Learning algorithm.

Guidelines to debug REINFORCE-type algorithms?

One Answer

Add your own answers!

Ask a Question