TransWikia.com

What is best practice to feature engineer from prior event counts?

Data Science Asked by user3555243 on May 11, 2021

Say for example I am building a model to predict a customer churn event from Spotify, with my target being whether a customer churns in the next 90 days.

One feature I might expect could be predictive of this event is customers checking their billing statements online – so I might engineer features for each customer on each training date to encode the information of how many times they have checked their billing statements.

For example, I might create a feature CHECKBILL_CNT_0_10 which is a count of how many times this customer has checked their online bill in the last 10 days, with many of these such features across different time ranges.

I have seen two different styles of how data scientists do this:

  1. CHECKBILL_0_10, CHECKBILL_0_30, CHECKBILL_0_90 …
  2. CHECKBILL_0_10, CHECKBILL_10_30, CHECKBILL_30_90 …

Both technically encode the same information; however, I’m wondering if one of these options offers advantages over the other? I’m inclined to think that option 2 would be preferable since the features would be less correlated, & therefore the model might learn more easily, but this is speculative.

One Answer

You may want to try both options out and see which is better. Feature engineering I think is more like a trial and error (iterative) process.

Answered by Gozie on May 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP