TransWikia.com

Confusion with Notation in the Book on Deep Learning by Ian Goodfellow et al

Data Science Asked on June 29, 2021

In chapter 6.1 on ‘Example: Learning XOR‘, the bottom of page 168 mentions:

The activation function $g$ is typically chosen to be a function that
is applied element-wise, with $h_i = g(x^TW_{:,i}+c_i).$

Then we see equation 6.3 is defined as (assuming g as ReLU):

We can now specify our complete network as
$f(x; W,c,w,b) = w^T$
max${0, W^Tx + c} + b$

Wondering why the book uses $W^Tx$ in equation 6.3, while I expect it to be $x^TW$. Unlike XOR example in the book where $W$ is a $2times2$ square matrix, we may have non-square $W$ as well, and in such cases, $x^TW$ is not same as $W^Tx$.

Please help me understand, if I’m missing something here.

One Answer

Let $mathbf{y} = mathbf{W}^T mathbf{x}$

Then, $mathbf{y}^T =(mathbf{W}^T mathbf{x})^T =mathbf{x}^{T}(W^T)^T = mathbf{x}^{T}W $. Note that $mathbf{W}$ does not have to be a square matrix.

Let $e^{(i)}_{j} = delta_{i,j} $.

Then, $y_{i} = mathbf{y}^{T}e^{(i)} = (mathbf{x}^T W) e^{(i)} = mathbf{x}^{T}(We^{(i)}) = mathbf{x}^{T}W_{:,i}$ and thus

$h_{i} = g(mathbf{x}^T W_{:,i}+c_{i}) = g(y_{i}+c_{i})$

On the other hand, $f(..) = w^{T} max{mathbf{0},W^{T}mathbf{x}+mathbf{c}}+b = w^{T} max{mathbf{0},mathbf{y}+mathbf{c}}+mathbf{b}$.

Does that answer your question ?

Correct answer by Graph4Me Consultant on June 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP