Data Science Asked on September 29, 2021
This question was always in my mind. Imagine you are doing 5-10 fold cross validation and one model gives you mean accuracy of 0.8, but with 0.2 standard deviation and the other one gives 0.7 with 0.05 standard deviation. Which one is better?
First things First: 1. What is the problem statement i mean is it a regression problem or a classification problem. 2. How did you measure your accuracy, I mean MSE,MAPE,OOB,RMSE,SSE which one in regression problem or Accuracy, Precision, Recall or ROC if its a classification! Kindly clarify.
If its a classification problem! obviously, how could you measure the SD?assuming its a regression problem,
You need to answer the above things. Moreover, Every accuracy measure has its own state of the art use cases so please understand which accuracy measure you should go for else go with the model which has less error.
WHen it comes to variance, i assume higher the variance lower is the model.
Answered by Kartik Patnaik on September 29, 2021
This is exactly a question I asked Sebastian Raschka (author of the great Python Machine Learning book), here you can find his answer saying "I also recommend the 1-standard error method, which basically means that you select the best model from k-fold based on pure performance, and then you select the simplest model that is within 1 standard error of that model".
The more extended explanation of his answer can be found on his github link.
Answered by German C M on September 29, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP