Data Science Asked by J.GUO on March 18, 2021
Context:
To predict employee turnover ( will an employee leave? ), I have used one of the classification algorithms (LDA) to train my dataset, and then make predictions.
The dataset is quite small (500 lines), some 20 features, the following are some examples:
However, HR Experience tells us that:
For employees whose Years_Spent < 1.5, Salary_Increase is a feature that does not have any impact on turnover (Because of Salary_Increase > 0 only when Years_Spent >1.5
).
Sale_Bonus will not have any impact on those who are not commercials. (Because IT guys will never recept sale bonus)
Here comes the problem:
If I set Salary_Increase = 0 for employees whose Years_Spent <1.5
and Sale_Bonus = 0 for those who are not commercials
, the classification algorithm will take 0 as a very small value, so a possible conclusion could be drawn by an algorithm: “employeeA will leave because he never receives sale_bonus”, (However, in reality, employeeA is from IT department, employeeA receive never sale_bonus and employeeA will not leave because of that), as we see, the constructed model is not correct.
My question is: How to handle this kind of problem so that HR experience can be understood by classification algorithms?
Thank you for your patient reading and kindly welcome all sorts of discussions!
Welcome to the site!
What you are describing above is known as an interaction.
You should consider the algorithm you wish to use and whether it allows for interactions between predictors. Some techniques - like generalised linear models - will require interactions to be stated explicitly, while tree-based algorithms will capture interactions automatically.
Answered by bradS on March 18, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP