Data Science Asked on December 21, 2020
I was reading the book ‘Make your own neural network’ by Tariq Rashid. In his book, he said:
(Note – He’s talking about normal feed forward neural networks)
The $t_k$ is the target value at node $k$, the $O_k$ is the predicted output at node $k$, $W_{jk}$ is the weight connecting the node $j$ and $k$ and the $E$ is the error at node $k$
Then he says that, we can remove the 2 because we only care about the direction of the slope of the error function and it’s just a scaling factor. So, can’t we remove $sigmoid($$sum_{j}$$ W_{jk}. O_j)$, as we know it would be between $0$ and $1$, and so it would also just act as a scaling factor. If you then see, we can remove everything after $(t_k-O_k)$, as we know the whole expression would be between $0$ and $1$, and so it would just act as a scaling factor. So that leaves us with just:
$$t_k-O_k$$
which is definitely the wrong derivative.
If we can’t remove that whole expression, then why did he removed the $2$, as they both were scaling factors?
You can remove the factor because:
It is constant with respect to the variable you compute the derivate on.
The constant is positive
You have this constant factor for all variables you compute the derivative on. So if $nabla f(mathbf{W})$ would be the correct gradient, by setting the constant factor to $1$, you get $frac{1}{2} nabla f(mathbf{W})$.
When you use (stochastic) gradient descent, you scale the gradient anyway (with the learning rate / step size). So it is important to have the correct gradient, up to a scaling factor (which must be independent of the variables, and positive).
You could also set the scaling factor $frac{1}{2}$ already in the loss function $L$. So if $L$ is the unscaled loss and $L' = frac{1}{2}L$, then you have $L(mathbf{w})$ is minimal if and only if $L'(mathbf{w})$ is minimal.
Correct answer by Graph4Me Consultant on December 21, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP