larger batches decrease learning rate because of a technical artifact?

Data Science Asked on August 18, 2021

I’m training a neural network for a classification task and experimenting with different batch sizes. I’m using the negative log likelihood loss averaged over the samples in the batch.

I realized that because I’m keeping the number of epochs and the learning rate constant, and because I’m averaging the loss over the samples in the batch, I get slower convergence when I use larger batches simply because when I double the batch size, I’m doing half the learning steps…

How can I fix this technical artifact and study the real effect of batch size (and homogeneity) for my task like in here? Should I just stop averaging and instead sum the loss over the batch samples?

learning rate loss function mini batch gradient descent

Add your own answers!

Ask a Question

Get help from others!