Data Science Asked on August 14, 2021
The paper I read is Glorot et al (2010). And the math part is in Section 4.2.1.
Formula (5) and (10) make sense to me but I cannot derive formula (6) and (7) myself from (2) and (3).
I found many tutorials on the internet used the formula
$$Var[XY] = Var[X]Var[Y] + (E[X])^2 Var[Y] + Var[X](E[Y])^2$$
which requires the independence between X and Y.
But in formula (2) and (3) the gradients are not independent of W and Z, because all of them are related to each other through the output from the last layer.
I would appreciate it if anyone can give me a derivation of the formula (6) and (7).
Thanks in advance.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP