Computing variance of an SGD iteration

Question

It is known that SGD iteration has huge variance. 
Given the iteration update: 
$$
w^{k+1} := w^k - underbrace{alpha  g_i(w^k)}_{p^k},
$$
where $w$ are model weights and $g_i(w^k)$ is gradient of loss function evaluated for sample $i$. How do I compute variance of each update $p^k$? 
I would like to plot it for each iteration and study its behavior during minimization process.

Guilherme Marques · Answer

You could plot a graph of update versus iteration and analyze the variation of each update as the number of iteration increases. Like in here, where they are comparing the variance of the standard gradient descent algorithm versus its stochastic version.

Computing variance of an SGD iteration

One Answer

Add your own answers!

Ask a Question