Does Discretization improve Classifier Performance?

Cross Validated Asked on December 21, 2021

I am trying to understand the basics of how and when is it ok discretize a variable.

Below are some papers that support Supervised Discretization:

Improving Classification Performance with Discretization on Biomedical Datasets

Feature selection via discretization

On the other hand, there is

Visual Revelations

Frank Harrel’s page on problems caused by discretization

and a lot of other posts that discourage binning

What is the benefit of breaking up a continuous predictor variable?

Is binning of continuous data always bad for statistical tests?

Therefore, if I take the target class into account to decide the bins, it would help in feature selection in classification, but not arbitrary binning of continuous values ?

There is also an argument that says tree models do implicit binning, and that by pre-binning, we are not giving complete information to the model.

Does binning of ranges make sense for a Random Forest?

Appreciate any clarification.

binning classification random forest

Does Discretization improve Classifier Performance?

Add your own answers!

Ask a Question