Holdout vs K-fold

Question

I am working with 15k image dataset for binary classification. This is a patient-based medical image dataset. Is it enough to use a randomized holdout strategy (train, validation, and test)? Should I use k-fold cross-validation (test and train(10-fold))? Which is best for this amount of image dataset.
TIA

Nicholas James Bailey · Answer

The benefit of k-fold is that it gives you a better idea of how your model will generalise in the real world.
If you plan to make a model that is useful in the real world I recommend using a k-fold cross validation approach (or a leave p out approach if you have time), so that you can construct some nonparametric confidence intervals for your model.
In each fold you can split the training data into training and validation if you need a validation set, e.g. for early stopping.

Holdout vs K-fold

One Answer

Add your own answers!

Ask a Question