What does the term $|mathcal{A}(s)|$ mean in the $epsilon$-greedy policy?

Question

I've been looking online for a while for a source that explains these computations but I can't find anywhere what does the $|A(s)|$ mean. I guess $A$ is the action set but I'm not sure about that notation:
$$frac{varepsilon}{|mathcal{A}(s)|} sum_{a} Q^{pi}(s, a)+(1-varepsilon) max _{a} Q^{pi}(s, a)$$
Here is the source of the formula.
I also want to clarify that I understand the idea behind the $epsilon$-greedy approach and the motivation behind the on-policy methods. I just had a problem understanding this notation (and also some other minor things). The author there omitted some stuff, so I feel like there was a continuity jump, which is why I didn't get the notation, etc. I'd be more than glad if I can be pointed towards a better source where this is detailed.

Neil Slater · Accepted Answer

This expression: $|mathcal{A}(s)|$ means

$|quad|$ the size of

$mathcal{A}(s)$ the set of actions in state $s$

or more simply the number of actions allowed in the state.
This makes sense in the given formula because $frac{epsilon}{|mathcal{A}(s)|}$ is then the probability of taking each exploratory action in an $epsilon$-greedy policy. The overall expression is the expected return when following that policy, summing expected results from the exploratory and greedy action.

What does the term $|mathcal{A}(s)|$ mean in the $epsilon$-greedy policy?

One Answer

Add your own answers!

Ask a Question