TransWikia.com

Unbalanced data set - how to optimize hyperparams via grid search?

Data Science Asked by Code Now on July 31, 2021

I would like to optimize the hyperparameters C and Gamma of an SVC by using grid search for an unbalanced data set. So far I have used class_weights=’balanced’ and selected the best hyperparameters based on the average of the f1-scores.
However, the data set is very unbalanced, i.e. if I chose GridSearchCV with cv=10, then some minority classes are not represented in the validation data.
I’m thinking of using SMOTE, but I see the problem here that I would have to set k_neighbors=1 because in some minority classes there are often only 1-2 samples.
Does anyone have a tip how to optimized the hyperparameters in this case? Are there any alternatives?

Many thanks for every hint

One Answer

Scikit-learn's GridSearchCV uses StratifiedKFold so all classes will be proportional represented in the splits. GridSearchCV can be used for hyperparameter search.

Imbalanced-learn's SMOTE can also be used. If there are fewer samples than k, it will only use available samples.

Answered by Brian Spiering on July 31, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP