TransWikia.com

RL PPO Algorithm: Understanding the Value Function Loss term in PPO by OpenAI

Data Science Asked by thrillingelf on April 15, 2021

In the Schulman 2017 PPO Paper, there is a value function loss term in the final loss in equation 9, where they state that the value function loss is the MSE of the target value and predicted value.

My question is, how do you compute the $V_t^{Target}$ term? I’m guessing it’s the return or collected sum of rewards. Would that be discounted like

$V_t^{target} = sum_{i=t}^T gamma^{(i-t)} r_i$,

or $V_t^{target} = sum_{i=t}^T r_i$,

or neither?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP