How should I sample my validation set if I randomly sample training data?

Question

I have:
training dataset of size 150k.
validation dataset of size 19k.
At each epoch I randomly sample without replacement 10k datapoints for training because I get Out of Mem Errors.
I need to downsample my validation set too. Which of the following methods seem most appropriate:

Randomly sampling validation set which is x% of 10k and use the same set across every epoch.
Randomly sampling validation set which is x% of 10k at every epoch.

Predicted Life · Answer

Actually you should never use any sampling techniques on your testing/evaluation data because this could lead to wrong classification results.
If your dataset is imbalanced you could perform upsampling or downsampling techniques (like SMOTE) on your training data only.
If you want to benchmark your multi-class classification you need to rely on e.g. the confusion matrix, recall, precision and F1 measure. Please keep in mind that the accuracy measure cannot be interpreted if you have too imbalanced data.

How should I sample my validation set if I randomly sample training data?

One Answer

Add your own answers!

Ask a Question