Matrix notation in Sutton and Barto

Question

On pg. 206 of Barto and Sutton's Reinforcement Learning, there is a curious statement about the result of a scalar product:

As I interpret it, A is the expectation of a scalar product of two d-dimensional vectors: which should be a scalar, right? So how do they get a dxd-matrix from it? Is it a shorthand for a scalar matrix (diagonal with the repeated coefficient, namely this scalar product)?

Neil Slater · Accepted Answer

In Sutton & Barto, vectors are considered column vectors by default. So if you have this kind of product:
$$mathbf{a}mathbf{b}^T$$
where $mathbf{a}$ and $mathbf{b}$ are $d$ dimensional vectors, it does not calculate the scalar product. Instead it treats both vectors as matrices and calculates a matrix product, which will be a $d times d$ matrix because you are multiplying a $d times 1$ matrix by a $1 times d$ matrix.
Worthing noting that the scalar product can also be calculated as a $1 times 1$ matrix if follow the same matrix multiplication rules but with the first vector transposed instead:
$$mathbf{a}^Tmathbf{b}$$
which leads to multiplying a $1 times d$ matrix by a $d times 1$ matrix. This is why the value function approximation can be written as $mathbf{w}^Tmathbf{x}_t$ (there is a small liberty taken of assuming a $1 times 1$ matrix is the same as a scalar value in terms of notation).

Matrix notation in Sutton and Barto

One Answer

Add your own answers!

Ask a Question