TransWikia.com

Determine how each feature contribute to XGBoost Classification

Data Science Asked by Tran Tran on January 12, 2021

so for a summary of what I have done:

My dataset has 5 classes and 10 parameters. I used XGBclassifer from sklearn to investigate if I could use those 10 parameters to predict the class of each data point. After training and fitting the XGBclassifier, I checked feature_importances_ and found out that 2/10 parameters played a key role in the classification.

So my question is:

Can I find out exactly how those 2 parameters contribute to the classification of each specific class? For example, can I find the cut-off values for parameter 1 and parameter 2 that will result in the prediction of class 1?

I am thinking of performing unsupervised clustering using those 2 parameters and k value = 5. Afterwards, I can just eyeball the approximate cutoff values. However, I worry that the 5 clusters will not correspond closely to the 5 groups.

Thanks a lot in advanced

2 Answers

If you have just then parameters and two of them are important. You can plot the trees and see the threshold for each of the parameters.

from xgboost import XGBClassifier
from xgboost import plot_tree
import matplotlib.pyplot as plt

# fit the model 
model = XGBClassifier().fit(X, y)
# plot single tree
plot_tree(model)
plt.show()

The above code just plots the first tree. You can plot, for example, 4th boosted tree in sequences using the following line of code

plot_tree(model, num_trees=3)

Note that in each tree, you might have a different threshold for each parameter in an ensembling method.

Answered by nimar on January 12, 2021

I think shap values might be able to help you. Check this out link. You can check both local as well as global interpretability.

Answered by saurabh kumar on January 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP