Data Science Asked by ssy on March 31, 2021
I am working on a model to predict which employee is going to resign from a firm. The dataset has columns like Date of Birth, Date of Joining, Department, Gender, Marital Status, Years at company etc.
Using Tensorflow, I created a neural network classifier which is able to give the predictions (Going to leave/Not going to leave) and the probability that an employee is going to leave.
Let’s say that the model gives 87% probability that a particular person is going to leave. I want to know which factor is contributing how much to this person’s resignation (i.e. to find out the reason for the particular person’s leaving)
How can I do that?
I’m using Jupyter notebook for the code and Keras for the neural network.
It's very difficult to understand the effect of each variable on the final output, since the weights of an input variable propagate their effect to all the units of the following hidden layers. Truth is, Neural Networks are very powerful predictors, but they are not very good when it comes to estimate feature importance.
One (very time consuming) way of doing it would be: substitute each variable with random noise that has the same mean and variance, and observe how the model's performance changes. Repeat this for all your variables. If performance gets significantly lower once you substituted a variable with random noise, then the contribution of it to the final accuracy has to be relevant.
As a more viable alternative, train a tree-based classifier (Random Forest or XGBoost) on the output. This class of models returns importance scores. You can use Neural Network for prediction, and the tree-based model to estimate what variable is more relevant for your task.
Answered by Leevo on March 31, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP