TransWikia.com

How should I sample my validation set if I randomly sample training data?

Data Science Asked on January 18, 2021

I have:
training dataset of size 150k.
validation dataset of size 19k.

At each epoch I randomly sample without replacement 10k datapoints for training because I get Out of Mem Errors.

I need to downsample my validation set too. Which of the following methods seem most appropriate:

  • Randomly sampling validation set which is x% of 10k and use the same set across every epoch.
  • Randomly sampling validation set which is x% of 10k at every epoch.

One Answer

Actually you should never use any sampling techniques on your testing/evaluation data because this could lead to wrong classification results. If your dataset is imbalanced you could perform upsampling or downsampling techniques (like SMOTE) on your training data only. If you want to benchmark your multi-class classification you need to rely on e.g. the confusion matrix, recall, precision and F1 measure. Please keep in mind that the accuracy measure cannot be interpreted if you have too imbalanced data.

Answered by Predicted Life on January 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP