Should I take random elements for mini-batch gradient descent?

Data Science Asked on May 8, 2021

When implementing mini-batch gradient descent for neural networks, is it important to take random elements in each mini-batch? Or is it enough to shuffle the elements at the beginning of the training once?

(I’m also interested in sources which definitely say what they do.)

One Answer

It should be enough to shuffle the elements at the beginning of the training and then to read them sequentially. This really achieves the same objective as taking random elements every time, which is to break any sort of predefined structure that may exist in your original dataset (e.g. all positives in the beginning, sequential images, etc).

While it would work to fetch random elements every time, this operation is typically not optimal performance-wise. Datasets are usually large and are not saved in your memory with fast random access, but rather in your slow HDD. This means sequential reads are pretty much the only option you have for good performance.

Caffe for example uses LevelDB, which does not support efficient random seeking. See this, which confirms that the dataset is trained with images always in the same order.

Correct answer by Clash on May 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP