Does mini-batch gradient descent nullify the effect of stratification on the training data set?

Question

In data pre-processing, stratified shuffle is used to ensure that the distribution of the original dataset is reflected in the training, test and validation dataset.
Mini-batch gradient descent uses random shuffling to ensure randomness in the mini-batches.
My doubt is- Why should we implement stratified shuffle on our dataset if it is going to be shuffled in a random manner later during training?

Tim von Känel · Accepted Answer

It doesn't, the workflow when training a model is like that:

Create 10 evenly distributed splits from the dataset using stratified shuffle
train set = 8 splits;
validation set = 1 split;
test set = 1 split
Shuffle the train set and the validation set and create minibatches from them
Train for one epoch using the batches
Repeat from step 3 until all epochs are over
Evaluate the model using the test set

If we skip the stratified shuffling in step 1 the classes of the train set, validation set and test set wont be evenly distributed.
If we skip the shuffling before each epoch in step 3 the mini-batches in each epoch will be the same.
The proportions of the train set, validation set and test set can of course vary.

Does mini-batch gradient descent nullify the effect of stratification on the training data set?

One Answer

Add your own answers!

Ask a Question