Data Science Asked by user63067 on August 13, 2021
I know how to calculate error for weights based on the output and update weights between output<–>hidden and hidden<–>input layers.
The problem is that I have no idea how to calculate delta for values in input layer based on the error and then use it in convolution backpropagation.
Let's look at the layers before the reshaping stage since everything after that is simply a densely connected neural network.
Max pooling takes a window of values and only the maximum value passes through. This means that the error can only be contributed by the maximum values, thus only the weights for these values will be updated.
This is the same as for the densely connected layer. You will take the derivative of the cross-correlation function (mathematically accurate name for convolution layer). Then use that layer in the backpropagation algorithm.
Let's look at this following example
The forward pass of the convolutional layer can be expressed as
$x_{i, j}^l = sum_m sum_n w_{m,n}^l o_{i+m, j+n}^{l-1} + b_{i, j}^l$
where in our case $k_1$ and $k_2$ is the size of the kernel, in our case $k_1=k_2=2$. So this says for an output $x_{0,0} = 0.25$ like you found. $m$ and $n$ iterate across the dimensions of the kernel.
Assuming you are using the mean squared error (MSE) defined as
$E = frac{1}{2}sum_p (t_p - y_p)^2$,
we want to determine
$frac{partial E}{partial w^l_{m', n'}}$ in order to update the weights. $m'$ and $n'$ are the indices in the kernel matrix not be confused with its iterators. For example $w^1_{0,0} = -0.13$ in our example. We can also see that for an input image $H$x$K$ the output dimension after the convolutional layer will be
$(H-k_1+1)$x$(W-k_2+1)$.
In our case that would be $4$x$4$ as you showed. Let's calculate the error term. Each term found in the output space has been influenced by the kernel weights. The kernel weight $w^1_{0,0} = -0.13$ contributed to the output $x^1_{0,0} = 0.25$ and every single other output. Thus we express its contribution to the total error as
$frac{partial E}{partial w^l_{m', n'}} = sum_{i=0}^{H-k_1} sum_{j=0}^{W-k_2} frac{partial E}{partial x^l_{i, j}} frac{partial x^l_{i, j}}{partial w^l_{m', n'}}$.
This iterates across the entire output space, determines the error that output is contributing and then determines the contribution factor of the kernel weight with respect to that output.
Let us call the contribution to the error from the output space delta for simplicity and to keep track of the backpropagated error,
$frac{partial E}{partial x^l_{i, j}} = delta^l_{i,j}$.
The convolution is defined as
$x_{i, j}^l = sum_m sum_n w_{m,n}^l o_{i+m, j+n}^{l-1} + b_{i, j}^l$,
thus,
$frac{partial x^l_{i, j}}{partial w^l_{m', n'}} = frac{partial}{partial w^l_{m', n'}} (sum_m sum_n w_{m,n}^l o_{i+m, j+n}^{l-1} + b_{i, j}^l)$.
By expanding the summation we end up observing that the derivative will only be non-zero when $m=m'$ and $n=n'$. We then get:
$frac{partial x^l_{i, j}}{partial w^l_{m', n'}} = o^{l-1}_{i+m', j+n'}$.
Then back in our error term
$frac{partial E}{partial w^l_{m', n'}} = sum_{i=0}^{H-k_1} sum_{j=0}^{W-k_2} delta_{i,j}^l o^{l-1}_{i+m', j+n'}$.
$w^{(t+1)} = w^{(t)} - eta frac{partial E}{partial w^l_{m', n'}}$
Let's calculate some of them
import numpy as np
from scipy import signal
o = np.array([(0.51, 0.9, 0.88, 0.84, 0.05),
(0.4, 0.62, 0.22, 0.59, 0.1),
(0.11, 0.2, 0.74, 0.33, 0.14),
(0.47, 0.01, 0.85, 0.7, 0.09),
(0.76, 0.19, 0.72, 0.17, 0.57)])
d = np.array([(0, 0, 0.0686, 0),
(0, 0.0364, 0, 0),
(0, 0.0467, 0, 0),
(0, 0, 0, -0.0681)])
gradient = signal.convolve2d(np.rot90(np.rot90(d)), o, 'valid')
array([[ 0.044606, 0.094061], [ 0.011262, 0.068288]])
Now you can put that into the SGD equation in place of $frac{partial E}{partial w}$.
Please let me know if there are errors in the derivation.
Answered by JahKnows on August 13, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP