Data Science Asked on June 8, 2021
I have read this article about developing a credit scorecard in python, where it is stated that when binning the continuous variables, it needs to be ensured that:
1. Each bin should have at least 5% of the observations
2. Each bin should be non-zero for both good and bad loans
3. The WOE should be distinct for each category. Similar groups should be aggregated or binned together. It is because the bins with similar WoE have almost the same proportion of good or bad loans, implying the same predictive power
4. The WOE should be monotonic, i.e., either growing or decreasing with the bins
5. Missing values are binned separately
This seems like a lot of work to accomplish manually (each column needs to be divided in bins, each of these five conditions need to be checked manually, bins should then be adjusted and the conditions need to be checked again). Is there a faster way to do it? Or is there any algorithm/function that bins continuous variables in most practical way.
Or is there any algorithm/function that bins continuous variables in most practical way.
Sure there is, but that's the wrong question: the standard method for discretizing a continuous variable consists in splitting the values into equal intervals, that's it. Of course it doesn't guarantee any of the 5 conditions, since these conditions are about additional constraints almost exclusively related to expert knowledge and the specifics of the data.
Note that these conditions can certainly be automated, there's no need for manual verification. There might be some domain-specific packages which do this for you, but there's no reason a standard ML/statistics library would provide methods for every specific problem like this one.
Answered by Erwan on June 8, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP