How to formulate a classification problem with time series element

Question

Let’s say i want to do customer attrition prediction. Now customer attrition can happen anytime during an year. There are 2 ways i can think of setting up the problem.

Fix a reference data e.g. 1 Nov’16. Dependent variable (in observation period) calculated by considering customers who churned in next 3 months (Nov/Dec/Jan). Independent variables duration can be calculated between Nov’15-Oct’16 (1 yr) & variables such transaction in last 3/6 months can be created. (I think this is a better approach. Also makes more sense if i want to score the model and build campaigns)
Consider year 2016. For customers who churned in July’16 (observation period) consider Jan-June’16 as the duration for creating independent variables, for customer churned in Aug’16 consider Feb-July’16 for independent variable creation. Append this data row-wise, take a random sample from this data for training and rest for testing. (here i feel dependent variables will have seasonality as variable created would have considered different months)

Can someone please let me know which of these is right (or if anyone is correct). This is will be helpful as i have not been able to figure this out.

Thanks

Thanos · Answer

In my applications, I use a somewhat rolling window period, but of course a lot of customization may apply, depending on the customer churn data features for a specific product:

X months period prediction, with the previous 2*X months for getting the input features.
To train the model, I use the previous X months for training, and the 2*X months before them to get the input features.

Example (supposing a quarterly forecast):

Predict period: 03/2017 - 06/2017
Features for prediction, from period: 09/2016 - 02/2017
Model training period: 12/2016 - 02/2017
Features for model training, from period: 06/2016 - 11/2016

Of course, depending on your model, you may want to use a training, a validation and a testing set, while performing the training task (as you already mention).

Post-validation of this configuration carried good results in my case, but success is always depending on many factors (such as model choice, period picking, seasonality considerations, data quality, etc.)

How to formulate a classification problem with time series element

One Answer

Add your own answers!

Ask a Question