Model selection: large mean and variance vs small mean and variance

Question

This question was always in my mind. Imagine you are doing 5-10 fold cross validation and one model gives you mean accuracy of 0.8, but with 0.2 standard deviation and the other one gives 0.7 with 0.05 standard deviation. Which one is better?

Kartik Patnaik · Answer

First things First:
1. What is the problem statement i mean is it a regression problem or a classification problem.
2. How did you measure your accuracy, I mean MSE,MAPE,OOB,RMSE,SSE which one in regression problem or Accuracy, Precision, Recall or ROC if its a classification!
Kindly clarify.

If its a classification problem! obviously, how could you measure the SD?assuming its a regression problem,

You need to answer the above things.
Moreover, Every accuracy measure has its own state of the art use cases so please understand which accuracy measure you should go for else go with the model which has less error.

WHen it comes to variance, i assume higher the variance lower is the model.

German C M · Answer

This is exactly a question I asked Sebastian Raschka (author of the great Python Machine Learning book), here you can find his answer saying "I also recommend the 1-standard error method, which basically means that you select the best model from k-fold based on pure performance, and then you select the simplest model that is within 1 standard error of that model".
The more extended explanation of his answer can be found on his github link.

Model selection: large mean and variance vs small mean and variance

2 Answers

Add your own answers!

Ask a Question