TransWikia.com

why does my calibration curve for platts and isotonic have less points than my uncalibrated model?

Data Science Asked by Maths12 on January 8, 2021

i train a model using grid search then i use the best parameters from this to define my chosen model.

model = XGBClassifier()
pipeline = make_pipeline(model)

kfolds = StratifiedKFold(3)
clf = GridSearchCV(pipeline, parameters, cv=kfolds.split(x_train, y_train),
                           scoring='roc_auc', return_train_score=True)

clf.fit(x, y)

model = clf.best_estimator_

using this model from gridsearch i then calibrate it and plot uncalibrated vs calibrated..

y_test_uncalibrated = model.predict_proba(x_test)[:, 1]
fraction_of_positives, mean_predicted_value=calibration_curve(y_test,y_test_uncalibrated,n_bins=10)

plt.plot(mean_predicted_value, fraction_of_positives, 's-', label='Uncalibrated')
    
clf_isotonic = CalibratedClassifierCV(model, cv='prefit', method='isotonic')
clf_isotonic.fit(x_train, y_train)
y_test_iso = clf_isotonic.predict_proba(x_test)[:, 1]
fraction_of_positives, mean_predicted_value = calibration_curve(y_test, y_test_iso, n_bins=10)
    
plt.plot(mean_predicted_value, fraction_of_positives, 's-', color='red', label='Calibrated (Isotonic)')

i do the same for platts above.. however i get the following results:

enter image description here

i don’t understand why there are less points now for isotonic and platts? i don’t feel as though i am doing anything wrong in my code. am i making any mistakes?

One Answer

The default strategy for calibration_curve is 'uniform', i.e. each of the bins has equal width. If, after calibration, your model makes no predictions inside a bin, there will be no point plotted for that range.

You could change to strategy='quantile', which would guarantee 10 points plotted for each curve; you'll get many more of the red/yellow dots further to the left.


Also, ideally you should not be fitting the calibration on the same data that you trained the original model. From the User Guide,

An already fitted classifier can be calibrated by setting cv="prefit". In this case, the data is only used to fit the regressor. It is up to the user make sure that the data used for fitting the classifier is disjoint from the data used for fitting the regressor.

Answered by Ben Reiniger on January 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP