What is best practice to feature engineer from prior event counts?

Question

Say for example I am building a model to predict a customer churn event from Spotify, with my target being whether a customer churns in the next 90 days.
One feature I might expect could be predictive of this event is customers checking their billing statements online - so I might engineer features for each customer on each training date to encode the information of how many times they have checked their billing statements.
For example, I might create a feature CHECKBILL_CNT_0_10 which is a count of how many times this customer has checked their online bill in the last 10 days, with many of these such features across different time ranges.
I have seen two different styles of how data scientists do this:

CHECKBILL_0_10, CHECKBILL_0_30, CHECKBILL_0_90 ...
CHECKBILL_0_10, CHECKBILL_10_30, CHECKBILL_30_90 ...

Both technically encode the same information; however, I'm wondering if one of these options offers advantages over the other? I'm inclined to think that option 2 would be preferable since the features would be less correlated, & therefore the model might learn more easily, but this is speculative.

Gozie · Answer

You may want to try both options out and see which is better. Feature engineering I think is more like a trial and error (iterative) process.

What is best practice to feature engineer from prior event counts?

One Answer

Add your own answers!

Ask a Question