RL PPO Algorithm: Understanding the Value Function Loss term in PPO by OpenAI

Data Science Asked by thrillingelf on April 15, 2021

In the Schulman 2017 PPO Paper, there is a value function loss term in the final loss in equation 9, where they state that the value function loss is the MSE of the target value and predicted value.

My question is, how do you compute the $V_t^{Target}$ term? I’m guessing it’s the return or collected sum of rewards. Would that be discounted like

$V_t^{target} = sum_{i=t}^T gamma^{(i-t)} r_i$,

or $V_t^{target} = sum_{i=t}^T r_i$,

or neither?

actor critic policy gradients reinforcement learning

Add your own answers!

Ask a Question

Get help from others!

Recent Questions

How can I transform graph image into a tikzpicture LaTeX code?
How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
Need help finding a book. Female OP protagonist, magic
Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?

Recent Answers

Joshua Engel on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Peter Machado on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?