Data Science Asked by coolcat on August 28, 2021
As far as I know, mini-batch can be used to reduce the variance of the gradient, but I am also considering if we can achieve the same result if we use the decreasing step size and only single sample in each iteration? Can we compare the convergence rate of them?
Generally answer is "it's not known". Similarity of effects of increasing minibatches size and decreasing learning rate is mostly empirical, there is no known asymptotic formula for it. Also effect of small LR and big minibatch is not the same. For example batch normalization layer would act completely different on those two approaches. Probabilistic distribution of gradients produced by minibatches and single sample (or mb of significantly different size) would be also quite different
Answered by mirror2image on August 28, 2021
Main objective of mini-batch gradient descent is to achieve faster results over full-batch gradient descent as it will start learning weights before completion of one epoch. SGD will start learning earlier than Mini-batch, isn't it? But mini-batch reduces variance of the gradient compared to SGD.
Coming to the question, you're right it's possible to compare the convergence of both scenarios. People used to use SGD with decreasing step-size until Mini-batch algorithm came. Because in practice, Mini-batch gives better performance over SGD due to it's vectorisation property. This property helps in making the computation faster with comparable results wrt SGD.
Answered by Abhishek Singla on August 28, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP