Data Science Asked by MetaHG on August 5, 2021
I have been working on an Offline/Batch Reinforcement Learning problem where I am using a BCQ-DDQN model as a Q-table. The model input is a state of 8 dimensions, and the output is a vector of Q-values for 8 different actions. The hidden layers have respectively 64 and 128 neurons.
I have run multiple experiments in different configurations, i.e., with different reward functions and different datasets (as we are in an Offline RL case).
For all experiments (with multiple seeds), I have noticed a weird behavior during the training of my agents.
While the loss of my models seem to have almost stabilized, my gradients keep growing for some reason. In the images, the "Grads (unclipped)/Mean" graph is computed using tf.global_norm() from Tensorflow on the different parameters of the network, i.e., the function is fed with the tensors of the gradients of the weights and biases of each network layer.
Note that in one of the experiment, it has happened that the gradient stabilised to a non-value zero while the loss was stable too.
With a stabilised loss, I would expect the norm of my gradients to converge to 0, meaning that the agent would have converge to some optima. However, in my case, the signals seem to be contradictory. The loss suggests that an optima has been reached, while a growing gradient suggests the opposite.
How is this possible? What should I conclude about my model, is it close to convergence? Or not yet?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP