Data Science Asked by cdr on August 7, 2020
Hi I am working on a simple convolution neural network (image attached below). The input image is 5×5, the kernel is 2×2 and it undergoes a ReLU activation function. After ReLU it gets max pooled by a 2×2 pool, these then are flattened and headed off into the fully connected layer. Once through the fully connected layer the outputs are converts into Softmax probabilities. I’ve propagated froward through the network and am now working on back propagation steps. I have taken the derivative of cross entropy and softmax, and calculated the weights in the fully connected layer.
Where I get confused is how to preform back propagation through Max pooling and then ultimately find the derivatives of the weights in the convolution layer.
What I’ve found online is that you need to find the derivative of the loss with respect to flattened layer, but I am unsure on how you do that.
If I could get some help with an explanation, ideally with equations it would be awesome.
Cross posted in stack overflow (https://stackoverflow.com/questions/63022091/back-propagation-through-a-simple-convolutional-neural-network)
The backpropagation algorithm attributes a penalty per weight in the network. To get the associated gradient for each weight we need to backpropagate the error back to its layer using the derivative chain rule.
The derivative of a layer depends on the function that is being applied. In the case of the flattening layer it is simply reshaping (a mapping) the values. Thus no additional loss will be added at this layer. All you need to know is how the flattening occurs.
For example if the forward pass flattening is
$flattenbegin{pmatrix} a & b c & d end{pmatrix} = begin{pmatrix} a b c d end{pmatrix}$,
then you can easily map the associated cost so far back to the $2 times 2 times 1$ feature map.
In the forward pass the max pooling layer is taking the maximum value in a $3 times 3$ window that is passed along your image. For example the bold values in the first $3 times 3$ window would have a maximum of $11$.
$maxpooling begin{pmatrix} bf{1} & bf{2} & bf{3} & 4 bf{5} & bf{6} & bf{7} & 8 bf{9} & bf{10} & bf{11} & 12 13 & 14 & 15 & 16 end{pmatrix} = begin{pmatrix} bf{11} & 12 15 & 16 end{pmatrix}$
Thus the resulting error backpropagation would only pass through the maximum values which were passed down by the forward pass. For all other values the error term would not backpropagate. Thus the current error matrix you had backpropagating until this point would be multiplied by
$begin{pmatrix} 0 & 0 & 0 & 0 0 & 0 & 0 & 0 0 & 0 & 1 & 1 0 & 0 & 1 & 1 end{pmatrix}$
Thus only 4 error terms would continue onto earlier layers.
I have gone in detail about how to do backpropagation through convolutions here: CNN backpropagation between layers.
Correct answer by JahKnows on August 7, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP