TransWikia.com

scikit-learn classifier reset in loop

Data Science Asked by Mattia Colombo on February 10, 2021

I’m trying to evaluate classifiers comparison by running the sample script that can be found here.

What I noticed is that in some cases the classifier is not reset.
In fact, duplicating some of those (with no parameter change) the score and the countour change between the two.

This can be seen simply replacing AdaBoostClassifier() in the classifier list with another MLPClassifier(alpha=1)

I guess that at every cicle of the for loop the classifier should be reset in order to make a fair comparison among the different models, and this case should behave the same I think.

In particular, differences are noticed duplicating the MPL (Neural Net) and the Random Forest, while there is no change duplicating KNN or RBF SVM.

I also tried to clone the classifier, and even del clf in the loop, but the behaviour stays the same.

How can I make the evaluation replicable and not influenced by the previous run? I want to be sure that when I use the same model, and only change the parameters the result is correct, and this will be possible only if two identical model yield the same result.

One Answer

The behaviour you are seeing is not related to not properly resetting models but the stochastic nature of most of these algorithms. By setting a random seed the same random numbers will be generated every time. See:

How to seed the random number generator for scikit-learn?

However, while this will lead to a reproducible sample, this might still not be fair. If one model randomly gets a good seed and another one a bad one you will unfairly always favor the first one. What you could do is run the models multiple times with the same hyperparameters but with different seed and look for the average performance. This way you get a more fair comparison and the reproducibility. Pick the seeds up front, then you can loop over them. Something like this:

seeds = (1, 2, 3, 4, 5)
performances = []
for seed in seeds:
    performances.append(score(Model(param1=1, param2=2, random_state=seed)))

Answered by Jan van der Vegt on February 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP