Cross Validated Asked on December 13, 2021
Online tutorials describe in depth the convolution of an image with a filter, etc; However, I have not seen one that describes the backpropagation on the filter (at least visually).
First let me try to explain how I understand backpropagation on a fully connected network.
For example, the derivative of the $Error$ with respect to $W_1$ is the following:
$$
frac{partial Error}{partial W_1} = frac{partial Error}{partial HA_1} frac{partial HA_1}{partial H_1} frac{partial H_1}{partial W_1}
$$
The last partial derivative is the most interesting one in this case … and it is equal to the value of the first input (Single value).
$$
frac{partial H_1}{partial W_1} = I_1
$$
EDITED
The original question was how does one perform backpropagation on a convolutional layer – for example $$frac{partial Error}{partial W_1} = ?$$
The convolutional layer as described online.
$$
G_1 = V_1W_1 + V_2W_2 + V_4W_3 + V_5W_4 \
G_2 = V_2W_1 + V_3W_2 + V_5W_3 + V_6W_4 \
G_3 = V_4W_1 + V_5W_2 + V_7W_3 + V_8W_4 \
G_4 = V_5W_1 + V_6W_2 + V_8W_3 + V_9W_4
$$
Notice that there are groups of pixels that share the same weights ($W$s), so I can picture the equations above as follows:
So, applying the chain rule as in the first example, we get the following:
$$
frac{partial Error}{partial W_1} = frac{partial Error}{partial G_1}frac{partial G_1}{partial W_1} + frac{partial Error}{partial G_2}frac{partial G_2}{partial W_1} + frac{partial Error}{partial G_3}frac{partial G_3}{partial W_1} + frac{partial Error}{partial G_4}frac{partial G_4}{partial W_1}
$$
And the derivatives of interest …
$$
frac{partial G_1}{partial W_1} = V_1 \
frac{partial G_2}{partial W_1} = V_2 \
frac{partial G_3}{partial W_1} = V_4 \
frac{partial G_4}{partial W_1} = V_5
$$
That’s it! Chain rule all the way.
I don’t like to answer my own question – so if you leave some feedback or tell me I am wrong – I ‘ll give you the credit.
Could you not simply say that the backpropagation on a convolutional layer is the sum of the backpropagation on each part, sliding window, of the image/tensor that the convolution covers?
This is important as it connects to the fact that the weights are shared over multiple pixels and thus weights should reflect general local features of the images independently from their location.
Answered by Dimitri Ognibene on December 13, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP