Data Science Asked by Maths12 on January 8, 2021
i train a model using grid search then i use the best parameters from this to define my chosen model.
model = XGBClassifier()
pipeline = make_pipeline(model)
kfolds = StratifiedKFold(3)
clf = GridSearchCV(pipeline, parameters, cv=kfolds.split(x_train, y_train),
scoring='roc_auc', return_train_score=True)
clf.fit(x, y)
model = clf.best_estimator_
using this model from gridsearch i then calibrate it and plot uncalibrated vs calibrated..
y_test_uncalibrated = model.predict_proba(x_test)[:, 1]
fraction_of_positives, mean_predicted_value=calibration_curve(y_test,y_test_uncalibrated,n_bins=10)
plt.plot(mean_predicted_value, fraction_of_positives, 's-', label='Uncalibrated')
clf_isotonic = CalibratedClassifierCV(model, cv='prefit', method='isotonic')
clf_isotonic.fit(x_train, y_train)
y_test_iso = clf_isotonic.predict_proba(x_test)[:, 1]
fraction_of_positives, mean_predicted_value = calibration_curve(y_test, y_test_iso, n_bins=10)
plt.plot(mean_predicted_value, fraction_of_positives, 's-', color='red', label='Calibrated (Isotonic)')
i do the same for platts above.. however i get the following results:
i don’t understand why there are less points now for isotonic and platts? i don’t feel as though i am doing anything wrong in my code. am i making any mistakes?
The default strategy
for calibration_curve
is 'uniform'
, i.e. each of the bins has equal width. If, after calibration, your model makes no predictions inside a bin, there will be no point plotted for that range.
You could change to strategy='quantile'
, which would guarantee 10 points plotted for each curve; you'll get many more of the red/yellow dots further to the left.
Also, ideally you should not be fitting the calibration on the same data that you trained the original model. From the User Guide,
An already fitted classifier can be calibrated by setting
cv="prefit"
. In this case, the data is only used to fit the regressor. It is up to the user make sure that the data used for fitting the classifier is disjoint from the data used for fitting the regressor.
Answered by Ben Reiniger on January 8, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP