Data Science Asked by Probability is wonderful on March 6, 2021
I am self-studying Andrew NG’s deep learning course materials from the mcahine learning course (CS 229) of Stanford. The material is available here.
I have a question about the chain rule techniques used in deriving the backpropagation step shown below (Equation 3.28 on page 12). Specifically, I wonder how come $frac{partial z^{[2]}}{partial W^{[2]}}=a^{[1]}$? Obviously, the result couldn’t be a vector (i.e., $a^{[1]}$) if we differentiate a vector ($z^{[2]}$) with respect to a matrix ($W^{[2]}$).
Also, the notes subsequently says that the size of the two sides do not match up. This really confuses me: if the derivation is correct, then how could the sizes of both sides not equal (see below)?
I will greatly appreciate if anyone can help explain the steps here! I spent many days and nights already but have no progress at all. Thanks!
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP