why does my calibration curve for platts and isotonic have less points than my uncalibrated model?

Question

i train a model using grid search then i use the best parameters from this to define my chosen model.
model = XGBClassifier()
pipeline = make_pipeline(model)

kfolds = StratifiedKFold(3)
clf = GridSearchCV(pipeline, parameters, cv=kfolds.split(x_train, y_train),
                           scoring='roc_auc', return_train_score=True)

clf.fit(x, y)

model = clf.best_estimator_

using this model from gridsearch i then calibrate it and plot uncalibrated vs calibrated..
y_test_uncalibrated = model.predict_proba(x_test)[:, 1]
fraction_of_positives, mean_predicted_value=calibration_curve(y_test,y_test_uncalibrated,n_bins=10)

plt.plot(mean_predicted_value, fraction_of_positives, 's-', label='Uncalibrated')
    
clf_isotonic = CalibratedClassifierCV(model, cv='prefit', method='isotonic')
clf_isotonic.fit(x_train, y_train)
y_test_iso = clf_isotonic.predict_proba(x_test)[:, 1]
fraction_of_positives, mean_predicted_value = calibration_curve(y_test, y_test_iso, n_bins=10)
    
plt.plot(mean_predicted_value, fraction_of_positives, 's-', color='red', label='Calibrated (Isotonic)')

i do the same for platts above.. however i get the following results:

i don't understand why there are less points now for isotonic and platts? i don't feel as though i am doing anything wrong in my code. am i making any mistakes?

Ben Reiniger · Answer

The default strategy for calibration_curve is 'uniform', i.e. each of the bins has equal width.  If, after calibration, your model makes no predictions inside a bin, there will be no point plotted for that range.
You could change to strategy='quantile', which would guarantee 10 points plotted for each curve; you'll get many more of the red/yellow dots further to the left.

Also, ideally you should not be fitting the calibration on the same data that you trained the original model.  From the User Guide,

An already fitted classifier can be calibrated by setting cv="prefit". In this case, the data is only used to fit the regressor. It is up to the user make sure that the data used for fitting the classifier is disjoint from the data used for fitting the regressor.

why does my calibration curve for platts and isotonic have less points than my uncalibrated model?

One Answer

Add your own answers!

Ask a Question