Data Science Asked on March 30, 2021
Let’s say I have 5 models cross-validated via leave-one-out strategy. I have the predictions and scores of each model.
Now, it’s time to calculate the average for the set of 5 models – am I supposed to:
A standard way to provide the performance of each model would be:
providing, for each split, the value of the chosen metric (accuracy, roc_auc, etc) on the train and test sets (on your case, your one-out sample), something like this (in this case with 2 models):
as a final model performance (for each one of the 5 models), a mean metric value together with its standard deviation for the test sets is a way to inform about the model quality and its robustness, something like (preferably for the test set):
You have more detail on how to automatically get this done via scikit-learn, and in this answer and this one.
By the way, consider using another strategy as stratified k-fold, in case you have a lot of samples, as leave-one-out would be very costly.
Answered by German C M on March 30, 2021
There are multiple popular ways to ensemble models. Averaging, majority voting, selecting the one with the highest probability, learn a new model based on these 5 numbers are amongst the many methods available. Check also the Bayes optimal classifier which 'averages' these probabilities in a Bayes way: https://en.wikipedia.org/wiki/Ensemble_learning#Bayes_optimal_classifier
Answered by LuckyLuke on March 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP