Data Science Asked by thrillingelf on April 15, 2021
In the Schulman 2017 PPO Paper, there is a value function loss term in the final loss in equation 9, where they state that the value function loss is the MSE of the target value and predicted value.
My question is, how do you compute the $V_t^{Target}$ term? I’m guessing it’s the return or collected sum of rewards. Would that be discounted like
$V_t^{target} = sum_{i=t}^T gamma^{(i-t)} r_i$,
or $V_t^{target} = sum_{i=t}^T r_i$,
or neither?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP