Data Science Asked on June 29, 2021
In chapter 6.1 on ‘Example: Learning XOR‘, the bottom of page 168 mentions:
The activation function $g$ is typically chosen to be a function that
is applied element-wise, with $h_i = g(x^TW_{:,i}+c_i).$
Then we see equation 6.3 is defined as (assuming g as ReLU):
We can now specify our complete network as
$f(x; W,c,w,b) = w^T$
max${0, W^Tx + c} + b$
Wondering why the book uses $W^Tx$ in equation 6.3, while I expect it to be $x^TW$. Unlike XOR example in the book where $W$ is a $2times2$ square matrix, we may have non-square $W$ as well, and in such cases, $x^TW$ is not same as $W^Tx$.
Please help me understand, if I’m missing something here.
Let $mathbf{y} = mathbf{W}^T mathbf{x}$
Then, $mathbf{y}^T =(mathbf{W}^T mathbf{x})^T =mathbf{x}^{T}(W^T)^T = mathbf{x}^{T}W $. Note that $mathbf{W}$ does not have to be a square matrix.
Let $e^{(i)}_{j} = delta_{i,j} $.
Then, $y_{i} = mathbf{y}^{T}e^{(i)} = (mathbf{x}^T W) e^{(i)} = mathbf{x}^{T}(We^{(i)}) = mathbf{x}^{T}W_{:,i}$ and thus
$h_{i} = g(mathbf{x}^T W_{:,i}+c_{i}) = g(y_{i}+c_{i})$
On the other hand, $f(..) = w^{T} max{mathbf{0},W^{T}mathbf{x}+mathbf{c}}+b = w^{T} max{mathbf{0},mathbf{y}+mathbf{c}}+mathbf{b}$.
Does that answer your question ?
Correct answer by Graph4Me Consultant on June 29, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP