Data Science Asked by rajarshi on February 27, 2021
Why is it necessary to calculate the derivative of activation functions while updating model( regression or NN) parameters? Why is the constant gradient of linear functions considered as a disadvantage?
As far as I know, when we do stochastic gradient descent using the formula:
$$text{weight} = text{weight} + (text{learning rate}times (text{actual output} – text{predicted output}) * text{input})$$
then also, the weights get updated fine, so why is calculation of derivative considered so important?
As the name suggests, Gradient Descent ( GD ) optimization works on the principle of gradients which basically is a vector of all partial derivatives of a particular function. According to Wikipedia,
In vector calculus, the gradient is a multi-variable generalization of the derivative.
At its core, GD computes derivatives ( in terms of Neural Networks ) of a composite function ( a neural network is itself a composite function ) because of the gradient descent update rule, which is,
$Large theta = theta - alpha frac{partial J}{partial theta}$
Where $theta$ is the parameter which needs to be optimized. In a neural network, this parameter could be a weight or a bias. $J$ is the objective function ( loss function in NN ) which needs to be minimized. So for $frac{partial J}{partial theta}$, we need to repeatedly apply the chain rule till we have a derivative of the loss function with respect to that parameter.
Intuition:
Sorry for the weird image. When GD is far away from the function minima ( where it tends to reach ) the value of $frac{partial J}{partial theta}$ is greater and therefore the updated value of $theta$ is smaller than the previous one. This updated value is scaled by the learning rate ( $alpha$ ). The negative sign indicates that we are moving in the opposite direction of the gradient.
Answered by Shubham Panchal on February 27, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP