TransWikia.com

Why is it valid to remove a constant factor from the derivative of an error function?

Data Science Asked on December 21, 2020

I was reading the book ‘Make your own neural network’ by Tariq Rashid. In his book, he said:

enter image description here

(Note – He’s talking about normal feed forward neural networks)

The $t_k$ is the target value at node $k$, the $O_k$ is the predicted output at node $k$, $W_{jk}$ is the weight connecting the node $j$ and $k$ and the $E$ is the error at node $k$

Then he says that, we can remove the 2 because we only care about the direction of the slope of the error function and it’s just a scaling factor. So, can’t we remove $sigmoid($$sum_{j}$$ W_{jk}. O_j)$, as we know it would be between $0$ and $1$, and so it would also just act as a scaling factor. If you then see, we can remove everything after $(t_k-O_k)$, as we know the whole expression would be between $0$ and $1$, and so it would just act as a scaling factor. So that leaves us with just:

$$t_k-O_k$$

which is definitely the wrong derivative.

If we can’t remove that whole expression, then why did he removed the $2$, as they both were scaling factors?

One Answer

You can remove the factor because:

  1. It is constant with respect to the variable you compute the derivate on.

  2. The constant is positive

  3. You have this constant factor for all variables you compute the derivative on. So if $nabla f(mathbf{W})$ would be the correct gradient, by setting the constant factor to $1$, you get $frac{1}{2} nabla f(mathbf{W})$.

When you use (stochastic) gradient descent, you scale the gradient anyway (with the learning rate / step size). So it is important to have the correct gradient, up to a scaling factor (which must be independent of the variables, and positive).

You could also set the scaling factor $frac{1}{2}$ already in the loss function $L$. So if $L$ is the unscaled loss and $L' = frac{1}{2}L$, then you have $L(mathbf{w})$ is minimal if and only if $L'(mathbf{w})$ is minimal.

Correct answer by Graph4Me Consultant on December 21, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP