Cross Validated Asked on December 21, 2021
I am trying to understand the basics of how and when is it ok discretize a variable.
Below are some papers that support Supervised Discretization:
Improving Classification Performance with Discretization on Biomedical Datasets
Feature selection via discretization
On the other hand, there is
Frank Harrel’s page on problems caused by discretization
and a lot of other posts that discourage binning
What is the benefit of breaking up a continuous predictor variable?
Is binning of continuous data always bad for statistical tests?
Therefore, if I take the target class into account to decide the bins, it would help in feature selection in classification, but not arbitrary binning of continuous values ?
There is also an argument that says tree models do implicit binning, and that by pre-binning, we are not giving complete information to the model.
Does binning of ranges make sense for a Random Forest?
Appreciate any clarification.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP