Data Science Asked by Fazla Rabbi Mashrur on February 25, 2021
I am working with 15k image dataset for binary classification. This is a patient-based medical image dataset. Is it enough to use a randomized holdout strategy (train, validation, and test)? Should I use k-fold cross-validation (test and train(10-fold))? Which is best for this amount of image dataset.
TIA
The benefit of k-fold is that it gives you a better idea of how your model will generalise in the real world.
If you plan to make a model that is useful in the real world I recommend using a k-fold cross validation approach (or a leave p out approach if you have time), so that you can construct some nonparametric confidence intervals for your model.
In each fold you can split the training data into training and validation if you need a validation set, e.g. for early stopping.
Answered by Nicholas James Bailey on February 25, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP