Artificial Intelligence Asked by Metrician on August 24, 2021
I’ve been looking online for a while for a source that explains these computations but I can’t find anywhere what does the $|A(s)|$ mean. I guess $A$ is the action set but I’m not sure about that notation:
$$frac{varepsilon}{|mathcal{A}(s)|} sum_{a} Q^{pi}(s, a)+(1-varepsilon) max _{a} Q^{pi}(s, a)$$
Here is the source of the formula.
I also want to clarify that I understand the idea behind the $epsilon$-greedy approach and the motivation behind the on-policy methods. I just had a problem understanding this notation (and also some other minor things). The author there omitted some stuff, so I feel like there was a continuity jump, which is why I didn’t get the notation, etc. I’d be more than glad if I can be pointed towards a better source where this is detailed.
This expression: $|mathcal{A}(s)|$ means
$|quad|$ the size of
$mathcal{A}(s)$ the set of actions in state $s$
or more simply the number of actions allowed in the state.
This makes sense in the given formula because $frac{epsilon}{|mathcal{A}(s)|}$ is then the probability of taking each exploratory action in an $epsilon$-greedy policy. The overall expression is the expected return when following that policy, summing expected results from the exploratory and greedy action.
Correct answer by Neil Slater on August 24, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP