How to analyse the results of cross-validation do determine overfitting

Question

I performed k-fold CV and measured the resulting average error (RMSE) for each fold. This was done with 5 folds, and 4 of the measurements gave similar errors (between 10% and 12%), but one of the tests has given a 4% error.
What can be concluded in regards to overfitting in this experiment?
Is the model overfitted because it works much better in one of the situations than in the others?
Thanks.

roman · Answer

In short, k-fold CV is not about over-fitting. Your samples can never be ideally identical, so you can only conclude that your error is mean±std.
If your model training process is iterative, then you can detect overfitting by checking test score over the course of training.
If you're making hyper-parameter search with k-fold CV, perhaps with many steps, then you can eventually find out that holdout score is much worse than avg. test score. That would be an overfitting too.

How to analyse the results of cross-validation do determine overfitting

One Answer

Add your own answers!

Ask a Question