features importance

Question

Suppose I have dataset labeled with two classes such as healthy and unhealthy  and I applied feature selection (features importance)on dataset.
How can I know that features  are  important to which  class(to healthy or to unhealthy )?

Simon Larsson · Answer

Assuming we are talking about feature importance for decision tree algorithms here. You cannot really say. It only tells you how often a feature is used to split both classes apart.

If you want more insight in how your model makes decision you could look into SHAP and LIME. Both are methods that approximate your model and then tries to explain it. You can check out these two libraries in Python:

https://github.com/slundberg/shap

https://github.com/marcotcr/lime

ASH · Answer

Something like this should get you going.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv("https://rodeo-tutorials.s3.amazonaws.com/data/credit-data-trainingset.csv")
df.head()

from sklearn.ensemble import RandomForestClassifier

features = np.array(['revolving_utilization_of_unsecured_lines',
                     'age', 'number_of_time30-59_days_past_due_not_worse',
                     'debt_ratio', 'monthly_income','number_of_open_credit_lines_and_loans', 
                     'number_of_times90_days_late', 'number_real_estate_loans_or_lines',
                     'number_of_time60-89_days_past_due_not_worse', 'number_of_dependents'])
clf = RandomForestClassifier()
clf.fit(df[features], df['serious_dlqin2yrs'])

# from the calculated importances, order them from most to least important
# and make a barplot so we can visualize what is/isn't important
importances = clf.feature_importances_
sorted_idx = np.argsort(importances)

padding = np.arange(len(features)) + 0.5
plt.barh(padding, importances[sorted_idx], align='center')
plt.yticks(padding, features[sorted_idx])
plt.xlabel("Relative Importance")
plt.title("Variable Importance")
plt.show()

features importance

2 Answers

Add your own answers!

Ask a Question