Cross Validated Asked by 1111ktq on December 27, 2021
I have a question about the data size for probability of default model.
For each consumer, I have a binary bit to indicate whether the client goes default or not (1 is default and 0 is current). And I have the occupation for each consumer (ie, doctor, student). So I can calculate the probability of default for different occupations (#default clients / #total clients), and then continue with my analysis.
The problem here is that for each occupation, it has different size (ie. 1000 clients are students, only 50 clients are doctors). Then how can I say that the population of the group(data points) is statistically sufficient to calculate the probability of default for that occupation?
For example, there is only 1 out of 50 doctors in my database that went default, so does this 2% is correctly reflecting the default behavior of this occupation? If the population is too small, then I don’t want to include it in my future analysis. What should be the minimum data size for each group that I can say I’m confident with the outcome?
Much Appreciated!
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP