Data Science Asked by NuValue on January 7, 2021
I have a time series data set with the lifecycle of 9000 different B2B sales leads. What I call lifecycle consists of a dataset with one registry per day for every different sales Lead identifier with 4 predictive variables (DAYS_SINCE_START
, LEAD_ID
, CUSTOMER_INTEREST
, MARKET
, TYPE_SERVICE
) and one response variable (OUTCOME
). The response variable outcome can have 2 different values: Won
(1) or Lost
(0).
A mock example of the data frame would be the following:
As it can be seen, some leads “die” before others, this is because we receive the final outcome of the customer at that day of its lifecycle (we won or lost the lead), so that lead identifier drops from the dataset.
My mission is to create one single model that could be able to define the outcome of a new sales lead that is entering to the data set, and the prediction would be done 30 days after current time (Why 30 days after? The first 30 days after the resources of the company have already been assigned). How would I model this?
This problem can be modeled with a standard binary classification since you labeled data. Time can be considered a feature which can be encoded relative to outcome date. Random forest would be a good algorithm to try.
You need to rearrange the data to be tidy data where each LEAD_ID
is in a single with row with all features for that id in the same row.
Answered by Brian Spiering on January 7, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP