Data Science Asked by user93607 on October 29, 2020
It is known that SGD iteration has huge variance.
Given the iteration update:
$$
w^{k+1} := w^k – underbrace{alpha g_i(w^k)}_{p^k},
$$
where $w$ are model weights and $g_i(w^k)$ is gradient of loss function evaluated for sample $i$. How do I compute variance of each update $p^k$?
I would like to plot it for each iteration and study its behavior during minimization process.
You could plot a graph of update versus iteration and analyze the variation of each update as the number of iteration increases. Like in here, where they are comparing the variance of the standard gradient descent algorithm versus its stochastic version.
Answered by Guilherme Marques on October 29, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP