Model for Differing Number of Rows per Observation

Question

Looking to build a response model (click or no click) on marketing data which displays varying number of offers to a person. I don't want to model which offer they click but do they click any of the offers presented to them. My issue is how to deal with the differing number and types of offers?

Example data could be one table of id's:

id   clicked
001       1
002       0
003       1

And varying number of offers per id:

id  discount_rate  on_amt
001     0.05       100
001     0.10       500
002     0.03        50
003     0.05       100
003     0.10       300
003     0.15       500

Do I create features from the offer data set such as average discount_rate, max on_amt etc.? Or create a very large binary sparse matrix of binned offer types such as rate_5-10_amt_0-50 1/0 and rate_5-10_amt_50-100 1/0 ...?

Or is there a good model that handles variable data like this?

Brian Spiering · Answer

You need to create a tidy version of the data with the on_amt and discount_rat encoded as a categorical variable (e.g., one-hot encoded). If they are continuous, they need to binned into categorical variables then encoded.

Answered by Brian Spiering on May 2, 2021

Anders Swanson · Answer

Our team uses ‘featuretools’’s deep feature synthesis for exactly this scenario. In this way you can capture much more signal via various aggregations per feature (mean, most_recent, mode etc.)

Model for Differing Number of Rows per Observation

2 Answers

Add your own answers!

Ask a Question