Back propagation through a simple convolutional neural network

Question

Hi I am working on a simple convolution neural network (image attached below). The input image is 5x5, the kernel is 2x2 and it undergoes a ReLU activation function. After ReLU it gets max pooled by a 2x2 pool, these then are flattened and headed off into the fully connected layer. Once through the fully connected layer the outputs are converts into Softmax probabilities. I've propagated froward through the network and am now working on back propagation steps. I have taken the derivative of cross entropy and softmax, and calculated the weights in the fully connected layer.
Where I get confused is how to preform back propagation through Max pooling and then ultimately find the derivatives of the weights in the convolution layer.
What I've found online is that you need to find the derivative of the loss with respect to flattened layer, but I am unsure on how you do that.
If I could get some help with an explanation, ideally with equations it would be awesome.
Cross posted in stack overflow (https://stackoverflow.com/questions/63022091/back-propagation-through-a-simple-convolutional-neural-network)

JahKnows · Answer

The backpropagation algorithm attributes a penalty per weight in the network. To get the associated gradient for each weight we need to backpropagate the error back to its layer using the derivative chain rule.
Flattening layer
The derivative of a layer depends on the function that is being applied. In the case of the flattening layer it is simply reshaping (a mapping) the values. Thus no additional loss will be added at this layer. All you need to know is how the flattening occurs.
For example if the forward pass flattening is
$flattenbegin{pmatrix}
  a & b\ 
  c & d
end{pmatrix} = begin{pmatrix}
  a \
  b \
  c \  
  d
end{pmatrix}$,
then you can easily map the associated cost so far back to the $2 times 2 times 1$ feature map.
Max pooling layer
In the forward pass the max pooling layer is taking the maximum value in a $3 times 3$ window that is passed along your image. For example the bold values in the first $3 times 3$ window would have a maximum of $11$.
$maxpooling begin{pmatrix}
  bf{1} & bf{2} & bf{3} & 4 \ 
  bf{5} & bf{6} & bf{7} & 8 \
  bf{9} & bf{10} & bf{11} & 12 \
  13 & 14 & 15 & 16
end{pmatrix} = begin{pmatrix}
  bf{11} & 12\ 
  15 & 16
end{pmatrix}$
Thus the resulting error backpropagation would only pass through the maximum values which were passed down by the forward pass. For all other values the error term would not backpropagate. Thus the current error matrix you had backpropagating until this point would be multiplied by
$begin{pmatrix}
  0 & 0 & 0 & 0 \ 
  0 & 0 & 0 & 0 \
  0 & 0 & 1 & 1 \
  0 & 0 & 1 & 1
end{pmatrix}$
Thus only 4 error terms would continue onto earlier layers.
Convolutional layers
I have gone in detail about how to do backpropagation through convolutions here: CNN backpropagation between layers.

Back propagation through a simple convolutional neural network

One Answer

Flattening layer

Max pooling layer

Convolutional layers

Add your own answers!

Ask a Question