Data Science Asked on April 10, 2021
This is the follow up question for General approach on time series for customer retention/churn in retail.
I have a time series of data in the following form:
| purchase_date | cutomer_id | num_purchases | churned |
2018-10-31 id1 39 0
2018-11-31 id1 0 0
2019-01-31 id1 6 0
...
2019-03-31 id1 88 1
2019-03-31 id2 300 0
2018-04-31 id2 2 1
2019-02-31 id3 1 1
2019-07-31 id4 100 0
... id5
I grouped the data by month and summed num_purchases by month. The churned column for user id1
for example represents in which month customer churned. So id1
in my case churned in March. Before this, to label who has churned or not, we sampled customers based on 2 months of inactivity period from the churn date. I need to predict if a user is going to churn in a 2 months from now.
I am getting very bad prediction results using logistic regression for example and the churned column as a class column. I suspect this is because some users like id3
and id4
appear only once (or very few number of times) and other users like id1
appear a lot. I am not sure how to approach imputation in this case because these users just didn’t exit before or after and I am not sure if it would make sense to impute them. Does anyone have idea on how to improve my model result? I am getting 0.85 for accuracy, and 0 for precision, recall and F1.
It would be interested to deal with it as a sequence classification problem. For instance, you could use HMM (Hidden Markov Model) or equivalent to classify the sequences. The data format would be:
ID: sequence label
id1: 39,0,6,...,88 1
id2: 300, 2 1
id3: 1 1
id4: 100 0
Some suggestions:
id1 39, 0 0
)6 -> 10
, 39 -> 40
)Answered by 20roso on April 10, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP