TransWikia.com

Matrix notation in Sutton and Barto

Data Science Asked by corazza on July 7, 2021

On pg. 206 of Barto and Sutton’s Reinforcement Learning, there is a curious statement about the result of a scalar product:

enter image description here

As I interpret it, A is the expectation of a scalar product of two d-dimensional vectors: which should be a scalar, right? So how do they get a dxd-matrix from it? Is it a shorthand for a scalar matrix (diagonal with the repeated coefficient, namely this scalar product)?

One Answer

In Sutton & Barto, vectors are considered column vectors by default. So if you have this kind of product:

$$mathbf{a}mathbf{b}^T$$

where $mathbf{a}$ and $mathbf{b}$ are $d$ dimensional vectors, it does not calculate the scalar product. Instead it treats both vectors as matrices and calculates a matrix product, which will be a $d times d$ matrix because you are multiplying a $d times 1$ matrix by a $1 times d$ matrix.

Worthing noting that the scalar product can also be calculated as a $1 times 1$ matrix if follow the same matrix multiplication rules but with the first vector transposed instead:

$$mathbf{a}^Tmathbf{b}$$

which leads to multiplying a $1 times d$ matrix by a $d times 1$ matrix. This is why the value function approximation can be written as $mathbf{w}^Tmathbf{x}_t$ (there is a small liberty taken of assuming a $1 times 1$ matrix is the same as a scalar value in terms of notation).

Correct answer by Neil Slater on July 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP