Data Science Asked by Nimrod Ets on December 16, 2020
I have dataset 1 (stayers) consisting of 1500 record of HR data demographic data of employees (11 features) who currently are in the company. Dataset 2 (leavers) contains 180 records -same features- contains demographic data of people who voluntarily left the company.
My aim is to identify within dataset 1 who is at risk of leaving the company.
Question: what would be a good approach to build a training data set?
I am thinking about some kind of train_test_split
My thinking is right now to split stayers (dataset 1) into 8 groups of roughly 180 records large groups
then combining each of these groups individually with the complete dataset 2 (leavers) build a logistic regression
With each of these combinations I do a logistic regression on the remaining stayers data and do a prediction on attrition (yes/no) and then compare all the resulting models
What do you think? Any glaring pitfalls or risks in my approach?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP