TransWikia.com

Can setting of different thresholds help in model performance in case of handling class imbalances?

Data Science Asked on September 5, 2021

In a binary classification problem where there is class imbalance, after applying undersampling/oversampling or SMOTE techinques, is it still convention to use a 0.5 threshold if we are making the two classes completely balanced? Or should we still change the threshold based on what we are trying to optimize?

2 Answers

It is not any hard rule that the ratio of good vs bad should be 50-50. Rather it depends on your scenario, for example you have 70% goods, and 30% bads, this ratio is decent and your model should be able to understand the patterns in data well. If you have only 5 to 10% bads, and you want to improve the model performance, then Oversampling/Undersampling is required, and making 60-40, 70-30, 65-35, 55-45 are decent ratios.

In case, when you are concerned about the distribution of each variable after oversampling, then take a look at this post too.

Answered by Deepak on September 5, 2021

If changing the threshold improves the performance of the model, it would be better to change it and use an optimal value. Balancing the dataset does not mean that you can not change the discrimination threshold. Also, in some cases when the data is originally balanced, changing the threshold could be very useful and it is a very smart move.

Answered by nimar on September 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP