Confusion with Notation in the Book on Deep Learning by Ian Goodfellow et al

Question

In chapter 6.1 on 'Example: Learning XOR', the bottom of page 168 mentions:

The activation function $g$ is typically chosen to be a function that
is applied element-wise, with $h_i = g(x^TW_{:,i}+c_i).$

Then we see equation 6.3 is defined as (assuming g as ReLU):

We can now specify our complete network as
$f(x; W,c,w,b) = w^T$
max${0, W^Tx + c} + b$

Wondering why the book uses $W^Tx$ in equation 6.3, while I expect it to be $x^TW$. Unlike XOR example in the book where $W$ is a $2times2$ square matrix, we may have non-square $W$ as well, and in such cases, $x^TW$ is not same as $W^Tx$.
Please help me understand, if I'm missing something here.

Graph4Me Consultant · Accepted Answer

Let $mathbf{y}  = mathbf{W}^T mathbf{x}$
Then, $mathbf{y}^T =(mathbf{W}^T mathbf{x})^T =mathbf{x}^{T}(W^T)^T = mathbf{x}^{T}W $. Note that $mathbf{W}$ does not have to be a square matrix.
Let $e^{(i)}_{j} = delta_{i,j} $.
Then,  $y_{i} = mathbf{y}^{T}e^{(i)} = (mathbf{x}^T W) e^{(i)} = mathbf{x}^{T}(We^{(i)}) = mathbf{x}^{T}W_{:,i}$
and thus
$h_{i} = g(mathbf{x}^T W_{:,i}+c_{i}) = g(y_{i}+c_{i})$
On the other hand,
$f(..)  = w^{T} max{mathbf{0},W^{T}mathbf{x}+mathbf{c}}+b = w^{T} max{mathbf{0},mathbf{y}+mathbf{c}}+mathbf{b}$.
Does that answer your question ?

Confusion with Notation in the Book on Deep Learning by Ian Goodfellow et al

One Answer

Add your own answers!

Ask a Question