Data Science Asked on May 2, 2021
Looking to build a response model (click or no click) on marketing data which displays varying number of offers to a person. I don’t want to model which offer they click but do they click any of the offers presented to them. My issue is how to deal with the differing number and types of offers?
Example data could be one table of id’s:
id clicked
001 1
002 0
003 1
And varying number of offers per id:
id discount_rate on_amt
001 0.05 100
001 0.10 500
002 0.03 50
003 0.05 100
003 0.10 300
003 0.15 500
Do I create features from the offer data set such as average discount_rate, max on_amt etc.? Or create a very large binary sparse matrix of binned offer types such as rate_5-10_amt_0-50 1/0 and rate_5-10_amt_50-100 1/0 …?
Or is there a good model that handles variable data like this?
You need to create a tidy version of the data with the on_amt
and discount_rat
encoded as a categorical variable (e.g., one-hot encoded). If they are continuous, they need to binned into categorical variables then encoded.
Answered by Brian Spiering on May 2, 2021
Our team uses ‘featuretools’’s deep feature synthesis for exactly this scenario. In this way you can capture much more signal via various aggregations per feature (mean, most_recent, mode etc.)
Answered by Anders Swanson on May 2, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP