Cross Validated Asked by Rishu on November 6, 2021
Let’s say i want to do customer attrition prediction. Now customer attrition can happen anytime during an year. There are 2 ways i can think of setting up the problem.
Fix a reference data e.g. 1 Nov’16. Dependent variable (in observation period) calculated by considering customers who churned in next 3 months (Nov/Dec/Jan). Independent variables duration can be calculated between Nov’15-Oct’16 (1 yr) & variables such transaction in last 3/6 months can be created. (I think this is a better approach. Also makes more sense if i want to score the model and build campaigns)
Consider year 2016. For customers who churned in July’16 (observation period) consider Jan-June’16 as the duration for creating independent variables, for customer churned in Aug’16 consider Feb-July’16 for independent variable creation. Append this data row-wise, take a random sample from this data for training and rest for testing. (here i feel dependent variables will have seasonality as variable created would have considered different months)
Can someone please let me know which of these is right (or if anyone is correct). This is will be helpful as i have not been able to figure this out.
Thanks
In my applications, I use a somewhat rolling window period, but of course a lot of customization may apply, depending on the customer churn data features for a specific product:
Example (supposing a quarterly forecast):
Of course, depending on your model, you may want to use a training, a validation and a testing set, while performing the training task (as you already mention).
Post-validation of this configuration carried good results in my case, but success is always depending on many factors (such as model choice, period picking, seasonality considerations, data quality, etc.)
Answered by Thanos on November 6, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP