TransWikia.com

Chain function in backpropagation

Data Science Asked by Alon on September 21, 2020

I’m reading a Neural Networks Tutorial. In order to answer my question you might have to take a brief look at it.

I understand everything until something they declare as "chain function":

enter image description here

It’s located under 4.5 Backpropagation in depth.

I know that the chain rule says that the derivative of a composite function equals to the product of the inner function derivative and the outer function derivative, but still don’t understand how they got to this equation.

Update:

Please read the article and try to only use the terms and signs that are declared there. It’s confusing and tough enough anyway, so I really don’t need extra-knowledge now. Just use the same "language" as in the article and explain how they arrived to that equation.

2 Answers

This is the Neural Network -

$hspace{5cm}$enter image description here

This is the equation -


$hspace{5cm}$$frac{partial J}{partial w_{12}^{(2)}} = frac{partial J}{partial h_{1}^{(3)}} frac{partial h_{1}^{(3)}}{partial z_{1}^{(2)}} frac{partial z_{1}^{(2)}}{partial w_{12}^{(2)}}$

$J$ = The calculated Loss

$w_{12}^{(2)}$ - Weight for which the rate of Loss is to be calculated

$h_{1}^{(3)}$ - Output after the Activation of Output Neuron [This will be used with True value to get the Loss]

$z_{1}^{(2)}$ - This is Output before the Activation functions

We create chain so that each individual partial derivative can be easily calculated and we get the derivative of two variables which are not directly connected i.e. inner layer weights and the Loss

First term - Derivative of Loss w.r.t Output -
$J$ = $y_{true}^2$ - ${h_{1}^{(3)}}^2$ [Assuming a square loss]
Deriative will be - 2*${h_{1}^{(3)}}$

Second term - Derivative of output after activation w.r.t Output before activation -
$h_{1}^{(3)}$ = $z_{1}^{(2)}$ [Assuming a Linear o/p activation]
Deriative will be - 1
If it is Sigmoid, Let's assume sigmoid as $f(x)$
$h_{1}^{(3)}$ = $f(z_{1}^{(2)})$
We know the derivative of $Sigmoid = f(x)(1 - f(x))$ Read here
So the derivative is - $f(z_{1}^{(2)})$(1 - $f(z_{1}^{(2)}))$

Third term - Derivative of output before activation w.r.t one of the participating three weights -
$z_{1}^{(2)}$ = $h_{1}^{(2)}$ * $w_{11}^{(2)}$ + $h_{2}^{(2)}$ * $w_{12}^{(2)}$ + $h_{3}^{(2)}$ * $w_{13}^{(2)} + b$
First and last two terms will will 0. Deriative of mid term will be - $h_{2}^{(2)}$

Correct answer by 10xAI on September 21, 2020

Well may be you should understand how its done in a Perceptron first and then fill the gaps in,

but still don't understand how they got to this equation.

in Gradient descent we want to calculate the steepest descent along the surface, which we can do by calculating the derivative of E with respect to each component of weights.so we can write it like this. Now the negative of this gives me the steepest.

enter image description here

Now, in back propagation, to put it in very simple terms, we have all these weights(now from one unit to another since its a network) which we want to tune in such a way it minimizes the error for the complete network. for that we calculate the error in forward pass and then accordingly go back and tune the weights . But this time i have to to figure out how to do it when there are weights from one node to another(your article is trying to show you this). but good news is i have this equation which tells for how i can do it for a weight. So now, Its basically straight forward using this equation, just needs to track of all these weights from one unit to another.

So now i have weights like this and its weights sumenter image description here

enter image description here

And we again solve it by like-

enter image description here

Now , in your article they are doing the same but wrote this equation just to explain that how a weights change can influence the error in the whole network, so just the mathematical representation . for chain rule you can read- https://www.khanacademy.org/math/ap-calculus-ab/ab-differentiation-2-new/ab-3-1a/v/chain-rule-introduction

Please read Tom Mitchell's chapter on Neural Networks.

Here is another answer to it-Compute backpropagation

Answered by BlackCurrant on September 21, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP