Artificial Intelligence Asked by unter_983 on November 29, 2021
I’m trying to understand the DDPG algorithm shown at this page. I don’t know what should the result of the gradient at step 14 be.
Is it a scalar that I have to use to update all the weights (so all weights are updated with the same value)? Or is it a list with a different values to use for updating for each weight? I’m used to working with loss functions and an $y$ target, but here I don’t have them so I’m quite confused.
Each Q output is a scalar, so the sum of all those is a scalar. Thus, you're taking a gradient wrt your parameters of a scalar. The result is a vector with one entry per parameter.
Answered by harwiltz on November 29, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP