(Deep Learning) Backpropagation derivation from notes by Andrew NG

Data Science Asked by Probability is wonderful on March 6, 2021

I am self-studying Andrew NG’s deep learning course materials from the mcahine learning course (CS 229) of Stanford. The material is available here.

I have a question about the chain rule techniques used in deriving the backpropagation step shown below (Equation 3.28 on page 12). Specifically, I wonder how come $frac{partial z^{[2]}}{partial W^{[2]}}=a^{[1]}$? Obviously, the result couldn’t be a vector (i.e., $a^{[1]}$) if we differentiate a vector ($z^{[2]}$) with respect to a matrix ($W^{[2]}$).

Also, the notes subsequently says that the size of the two sides do not match up. This really confuses me: if the derivation is correct, then how could the sizes of both sides not equal (see below)?

I will greatly appreciate if anyone can help explain the steps here! I spent many days and nights already but have no progress at all. Thanks!

backpropagation deep learning neural network

Add your own answers!

Ask a Question

Get help from others!