Data Science Asked on November 13, 2021
Should SHAP value analysis be done on the train or test set?
What does it mean if the feature importance based on mean |SHAP value| is different between the train and test set of my lightgbm model?
I intend to use SHAP analysis to identify how each feature contributes to each individual prediction and possibly identify individual predictions that are anomalous. For instance, if the individual prediction’s top (+/-) contributing features are vastly different from that of the model’s feature importance, then this prediction is less trustworthy. Does this approach make sense?
Since SHAP gives you an estimation of an individual sample (they are local explainers), your explanations are local(for a certain instance)
You are just comparing two different instances and getting different results. This is normal and can happen in train and test set. This doesn't mean also that your train and test set have bad split, they could be good split.
In the end SHAP is done to help you understand how the model behaves in a particular instance. It should be done where you are interested in understanding. I guess that you can also try to find what is the difference between train and test with shap values, but they are local explainers so you might not find much success.
I wouldn't say anything about the quality of predictions given the feature importance.
Answered by Carlos Mougan on November 13, 2021
You have to make sure that the problem doesn't come from your data or your model :
Make sure that your data don't change significantly (same % of classes) but also general distribution / correlation of features, correlation between features and output.
Make sure that your model is not overfit on your train data.
Once you have made sure of that, the idea of using SHAP to look for outliers is interesting, but might not work at all, depending on your variables / problems.
Answered by lcrmorin on November 13, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP