TransWikia.com

Is there an appropriate use of adjusting class weights for a balanced dataset?

Data Science Asked by GeeKandaa on April 14, 2021

I ask this because I am currently working with a CNN model built for diagnosis of pneumonia. Originally, I followed a notebook on kaggle to build the model and thereby learn what each bit of code is for, etc.
The dataset used was rather imbalanced, with a far greater number of pneumonia cases than normal (healthy) ones. Hence the model.fit class_weight parameter was set to {0:6.0, 1:0.5}.
(0 being normal, 1 being pneumonia)

Since then, whilst working on the model and making adjustments, I acquired a number of new data to add to the model such that now the dataset is fairly balanced. In fact, I ensure that the data is loaded into the model so that it is exactly balanced, the dataframes used are coded to ensure an equal number of pneumonia and normal cases in the training testing and validation dataframes.

So, accordingly, I am now trying to remove the use of the class_weights parameter as (as far as I understand it) it is not necessary and may impart some bias in the results. However, in doing so, the model no longer seems to improve in accuracy. It essentially stalls on 0.5 indefinitely. Whereas, with the weights applied, I achieve 0.90+ accuracy.

Simply put, is there some reason for this? The code is quite long, but I’m happy to post it if it is deemed required, but I feel like this may be due to my lack of understanding than error in code (as it has otherwise been working fine and as expected). Thanks in advance.

EDIT: For the sake of clarity and understanding, I performed a grid search over possible values for applied weight values. It confirmed an appropriate choice as being 0:~4.0, 1:0.4, but also suggests 0:1, 1:5.0.

EDIT 2: For further clarity, a link to a github containing the model code and output files etc. https://github.com/GeeKandaa/ML-Code

One Answer

Class Weight can be important even for balanced data if, for example, some class is more significant than others, so loss wrt this class should count more.

One can even think of class weights as unique extra hyper-parameters with their own effect on the outcome (either positive or negative) and treat them as such without interpretation.

Related: How does class_weight work in Decision Tree?

Correct answer by Nikos M. on April 14, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP