TransWikia.com

Stochastic Gradient Descent Batching

Data Science Asked by foobarbaz on November 29, 2020

I’m new to regression and we are doing a very simple exercise in a course. I’m taking to get a basic understanding of GD and SGD for a linear regression.

From my understanding, the only difference between GD and SGD is that instead of performing the algorithm on dataset size m as is processing in GD, SGD performs the operation on subsets of m.

My question is, for SGD does one simply perform the algorithm on the mini-batch, or is there some sort of summation of the results to come out with a final answer? Apologies if I’m not asking in the correct terms, I’m newer to some of the mathematical concepts involved.

One Answer

In SGD you just feed an example to your model, compute the gradient of the loss function of that example and update the weights according to the gradient of the loss of that example.

In mini-batch gradient descent you feed a batch to your model, compute the gradient of the loss of that batch and update the weights according to the gradient of the loss of that batch.

In fact, SGD is mini-batch gradient descent with batch size equal to 1.

Correct answer by David Masip on November 29, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP