Cross Validated Asked by ka28mi on December 29, 2021
I am comparing different classification models and for a given model (let’s say logistic regression), everytime I run it, it obviously produces a slightly different value for the Accuracy and for the AUC. Since, I want to compare the different models, I want a pretty stable metric, so I thought of running each algorithm many times and taking the average of the accuracy and for the AUC. Is this something that is commonly done ? If not, does it make sense to do this ?
Computing it multiple times can be more useful for specific algorithms. Let's take an example : while a regression will do the same thing if you use it on the exact same data (ie if you split your train and test with a random state to each time have the exact same rows), an algorithm like random forest will randomly take few attributes to make the forest (imagine you have 200 attributes, and make a RandomForest with 5 trees of 10 attributes, the algorithm will create 5 trees each based on 10 random attributes from your 200. It can easily be understood that, since the model does it at every fit, the result can be way different). I'd suggest you to look at the theory behind the algorith to know if you have to do multiple time the same test, of if the result will always be the same.
Also, if we take back the example of RandomForest, you can produce more trees, with more features, which will take time but make more reliable results (just take care of overfitting).
Last thing to check your Overfitting : get normal AUC, and another AUC based on predictions on X_train (so on the same set you used to fit your algorithm). If values are far from each other, you might overfit (your algorith didn't learn a tendancy, but the exact results on training set, and tries to apply it on test)
Answered by Adept on December 29, 2021
First thing to do is to make sure that you're not overfitting. If there is no such strong signal, then averaging out performance metrics you mentioned make sense. And, producing a basic confidence intervals (or +- std intervals) out of different runs of the same algorithm is a common practice in research papers.
Answered by gunes on December 29, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP