Data Science Asked by Yatoom on September 16, 2020
I was wondering what the best way is to make a model for predicting credit applications.
I have two tables, which look like this:
client_nr yearmonth total_nr_trx nr_debit_trx volume_debit_trx ... etc.
1 1 201201 94 49 6527529
2 1 201202 85 58 3475518
3 1 201203 94 61 31317405
4 1 201204 85 52 18869967
5 1 201205 93 53 2893105
client_nr yearmonth credit_application nr_credit_applications
1 1 201201 0 0
2 1 201202 0 0
3 1 201203 0 0
4 1 201204 1 1
5 1 201205 0 1
The goal is to determine which clients are likely to apply for a credit. So far, I have made sequences of size (months, features) for each client,
using the first table.
My questions:
client_nr
? Or should I make splits by month?nr_credit_applications
, or classification on credit_application
?Intuitively I think that the model would need some additional features based on the economic context in order to be more accurate, and that would also be a part where the evolution across time really matters.
- Would it now be a good idea to create train/test folds based on the
client_nr
? Or should I make splits by month?
If possible you need to have a full time series for a client as an instance, so I'd say splitting on the client number is much better. But select the client numbers randomly from the full set of client ids, because it's possible that the clients number are assigned following a particular order in time.
- Should I then select the first $n$ months as features, and create one label from $k$ months after $n$ that indicates whether the client has applied for a credit in those $k$ months? Or is there a better way?
That depends on the exact goal and on the kind of algorithm being used, but afaik usually with a time series the label is predicted at any given time given the past and current features and the past labels (or past predicted labels).
- Would it be better to use regression using
nr_credit_applications
, or classification oncredit_application
?
This is more a matter of convenience for your application or for the algorithm being used, as the accuracy should be very similar (assuming you use the same kind of method of course).
Answered by Erwan on September 16, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP