Policy gradient/REINFORCE algorithm with RNN: why does this converge with SGM but not Adam?

Question

I am working on training RNN model on caption generation with REINFORCE algorithm. I adopt self-critic strategy (see paper Self-critical Sequence Training for Image Captioning) to reduce the variance. I initialize the model with a pre-trained RNN model (a.k.a. warm start). This pre-trained model (trained with log-likelihood objective) got 0.6 F1 score in my task.

When I use adam optimizer to train this policy gradient objective, the performance of my model drops to 0 after a few epochs. However, if I switch to gradientdescent optimizer and keep everything else the same, the performance looks reasonable and slightly better than the pre-trained model. Is there any idea why is that?

I use tensorflow to implement my model.

Ran Elgiser · Answer

Without the code there's not much we can do but, I'd guess you need to significantly lower the learning rate. From my experience Adam requires a significantly lower learning rate compared to SGD.

Policy gradient/REINFORCE algorithm with RNN: why does this converge with SGM but not Adam?

One Answer

Add your own answers!

Ask a Question