Determine how each feature contribute to XGBoost Classification

Question

so for a summary of what I have done:

My dataset has 5 classes and 10 parameters. I used XGBclassifer from sklearn to investigate if I could use those 10 parameters to predict the class of each data point. After training and fitting the XGBclassifier, I checked feature_importances_ and found out that 2/10 parameters played a key role in the classification.

So my question is:

Can I find out exactly how those 2 parameters contribute to the classification of each specific class? For example, can I find the cut-off values for parameter 1 and parameter 2 that will result in the prediction of class 1?

I am thinking of performing unsupervised clustering using those 2 parameters and k value = 5. Afterwards, I can just eyeball the approximate cutoff values. However, I worry that the 5 clusters will not correspond closely to the 5 groups.

Thanks a lot in advanced

nimar · Answer

If you have just then parameters and two of them are important. You can plot the trees and see the threshold for each of the parameters.

from xgboost import XGBClassifier
from xgboost import plot_tree
import matplotlib.pyplot as plt

# fit the model 
model = XGBClassifier().fit(X, y)
# plot single tree
plot_tree(model)
plt.show()

The above code just plots the first tree. You can plot, for example, 4th boosted tree in sequences using the following line of code

plot_tree(model, num_trees=3)

Note that in each tree, you might have a different threshold for each parameter in an ensembling method.

saurabh kumar · Answer

I think shap values might be able to help you. Check this out link. You can check both local as well as global interpretability.

Determine how each feature contribute to XGBoost Classification

2 Answers

Add your own answers!

Ask a Question