Data Science Asked by Kechen on December 23, 2020
I am working on training RNN model on caption generation with REINFORCE algorithm. I adopt self-critic strategy (see paper Self-critical Sequence Training for Image Captioning) to reduce the variance. I initialize the model with a pre-trained RNN model (a.k.a. warm start). This pre-trained model (trained with log-likelihood objective) got 0.6 F1 score in my task.
When I use adam optimizer to train this policy gradient objective, the performance of my model drops to 0 after a few epochs. However, if I switch to gradientdescent optimizer and keep everything else the same, the performance looks reasonable and slightly better than the pre-trained model. Is there any idea why is that?
I use tensorflow to implement my model.
Without the code there's not much we can do but, I'd guess you need to significantly lower the learning rate. From my experience Adam requires a significantly lower learning rate compared to SGD.
Answered by Ran Elgiser on December 23, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP