How backpropagation works in case of 2 hidden layers?

Question

Imagine the next structure (for simplicity, there's no bias, and activation formatting with sigmoid or relu, just weights). The input has two neurons, the two hidden layers have 3 neurons each, the output layer has two neurons, so a cost ($sum C$) with two "subcosts" ($C^1$, $C^2$).
(I'm new at machine learning, and super confused with the different notations, and formatting, indexes, so to clarify, in case of activations, the upper index will show the index of it in the layer, and the lower index will show the layers index which is in, so the third neuron in the second layer will look like $a^2_1$ (0 starting). In weights, the upper index will show which is the index of neuron it's "coming from", the first index of the lower index will show which is the index of neuron it's "going to", and the second index of the lower index will show which is the index of layer it's "going to", so the third layers first neurons weight, which connects it with the second layers first neuron will look like $w^0_{0,2}$.)
Just for the example, I would like to propagate the first line's weights back. Just to illustrate:
a0-w1-a1-w2-a2-w3-a3->y0->C0
ax    ax    ax    ax
      ax    ax

So to know the first weights (of the output layer) slope in the backpropagation process:
$$frac{partialsum C}{partial w^0_{0,3}} = frac{partial a^0_3}{partial w^0_{0,3}}frac{partialsum C}{partial a^0_3}$$
As I learned, because $w^0_{0,3}$ doesn't affect $C^1$, just $C^0$, $frac{partialsum C}{partial a^0_3}$ is just $2(a^0_3-y^0)$. Not sure how can it be generalized though, to be used dynamically, not having to check if a weight has direct connection with all costs, or not.
As I somehow glued the tutorials together, calculating the second layers (backwards) first weight:
$$frac{partialsum C}{partial w^0_{0,2}} = frac{partial a^0_2}{partial w^0_{0,2}}frac{partialsum C}{partial a^0_2}$$
Where $w^0_{0,2}$ has connection with $a^0_2$, which has connection with all neurons of the output layer, therefore, to both costs, so:
$$frac{partialsum C}{partial a^0_2}=frac{partial C^0}{partial a^0_2}+frac{partial C^1}{partial a^0_2}$$
Where:
$$frac{partial C^0}{partial a^0_2}=frac{partial a^0_3}{partial a^0_2}frac{partial C^0}{partial a^0_3}$$
and:
$$frac{partial C^1}{partial a^0_2}=frac{partial a^1_3}{partial a^0_2}frac{partial C^0}{partial a^1_3}$$
Clear so far, what I don't understand, and can't split down from this point is:
$$frac{partialsum C}{partial w^0_{0,1}} = frac{partial a^0_1}{partial w^0_{0,1}}frac{partialsum C}{partial a^0_1}$$
Because $a^0_1$ has connections to three neurons, in the second layer backwards, in other words
$$frac{partial C^0}{partial a^0_1}=frac{partial a^0_2}{partial a^0_1}frac{partial C^0}{partial a^0_2}$$
can't be true, as $a^0_1$ affects $C^0$ not just through $a^0_2$, but $a^1_2$, and $a^2_2$ as well. So how this can be solved? Do I have to add them up, or it needs to be solved in a different way than the previous solutions?

How backpropagation works in case of 2 hidden layers?

Add your own answers!

Ask a Question