Cross-validated average: metrics mean or ensembling probabilities?

Question

Let's say I have 5 models cross-validated via leave-one-out strategy. I have the predictions and scores of each model.
Now, it's time to calculate the average for the set of 5 models - am I supposed to:

add up the 5 losses and divide them by 5?
Or average their probabilities for each prediction and use the average probability to calculate new metrics like an ensemble/ forest?

German C M · Answer

A standard way to provide the performance of each model would be:

providing, for each split, the value of the chosen metric (accuracy, roc_auc, etc) on the train and test sets (on your case, your one-out sample), something like this (in this case with 2 models):

as a final model performance (for each one of the 5 models), a mean metric value together with its standard deviation for the test sets is a way to inform about the model quality and its robustness, something like (preferably for the test set):

You have more detail on how to automatically get this done via scikit-learn, and in this answer and this one.
By the way, consider using another strategy as stratified k-fold, in case you have a lot of samples, as leave-one-out would be very costly.

LuckyLuke · Answer

There are multiple popular ways to ensemble models. Averaging, majority voting, selecting the one with the highest probability, learn a new model based on these 5 numbers are amongst the many methods available. Check also the Bayes optimal classifier which 'averages' these probabilities in a Bayes way:
https://en.wikipedia.org/wiki/Ensemble_learning#Bayes_optimal_classifier

Cross-validated average: metrics mean or ensembling probabilities?

2 Answers

Add your own answers!

Ask a Question