TransWikia.com

Computing variance of an SGD iteration

Data Science Asked by user93607 on October 29, 2020

It is known that SGD iteration has huge variance.
Given the iteration update:
$$
w^{k+1} := w^k – underbrace{alpha g_i(w^k)}_{p^k},
$$

where $w$ are model weights and $g_i(w^k)$ is gradient of loss function evaluated for sample $i$. How do I compute variance of each update $p^k$?
I would like to plot it for each iteration and study its behavior during minimization process.

One Answer

You could plot a graph of update versus iteration and analyze the variation of each update as the number of iteration increases. Like in here, where they are comparing the variance of the standard gradient descent algorithm versus its stochastic version.

enter image description here

Answered by Guilherme Marques on October 29, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP