TransWikia.com

Definition of the Q* function in reinforcement learning

Data Science Asked by marlineer43 on August 2, 2021

I’m making my way through Sutton’s Introduction to Reinforcement Learning. He gives the definition of the $q_*$ function as follows

$$
q_*(a) = mathbf{E}[R_t | A_t = a]
$$

where $A_t$ is the action taken at time t and $R_t$ is the reward associated with taking $A_t$. From my understanding, $q_*$ represents the true value of taking action $a$, which is the mean reward when $a$ is selected.

But I’m confused about why $t$ is included in this equation at all. Should $q_*(a)$ really be $q_*(a, t)$? Or are we to understand $q_*$ as taking the expected reward across all $t$?

One Answer

The reward of action $a$ is defined as a stationary probability distribution with mean $q_*(a)$. This is independent of time $t$. However the estimate of $q_*(a)$ at time $t$, denoted by $Q_t(a)$, is dependent on time $t$

Or are we to understand q∗ as taking the expected reward across all t?

The expectation is not over time, but over a probability distribution with mean $q_*(a)$.

For eg., in the 10-armed bandit problem, the reward for each of the 10 actions comes from a Normal distribution with mean $q_*(a), a= 1,...,10$ and variance 1.

Correct answer by vineet gundecha on August 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP