TransWikia.com

Policy gradient/REINFORCE algorithm with RNN: why does this converge with SGM but not Adam?

Data Science Asked by Kechen on December 23, 2020

I am working on training RNN model on caption generation with REINFORCE algorithm. I adopt self-critic strategy (see paper Self-critical Sequence Training for Image Captioning) to reduce the variance. I initialize the model with a pre-trained RNN model (a.k.a. warm start). This pre-trained model (trained with log-likelihood objective) got 0.6 F1 score in my task.

When I use adam optimizer to train this policy gradient objective, the performance of my model drops to 0 after a few epochs. However, if I switch to gradientdescent optimizer and keep everything else the same, the performance looks reasonable and slightly better than the pre-trained model. Is there any idea why is that?

I use tensorflow to implement my model.

One Answer

Without the code there's not much we can do but, I'd guess you need to significantly lower the learning rate. From my experience Adam requires a significantly lower learning rate compared to SGD.

Answered by Ran Elgiser on December 23, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP