TransWikia.com

How to achieve better accuracy of 90+ on a 3 class highly skewed dataset?

Data Science Asked by user3243499 on June 4, 2021

I have a 3 class dataset with very high imbalance classes:

class 1: 75000
class 2: 27000
class 3: 3000

With simple learning algorithms, accuracy is 84.6% but as expected mostly predicts class 1, few class 2 and no class 3.

With techniques, like oversampling, SMOTE, undersampling, XGBoost, Adaboost, showed some increase in F1 score, but the accuracy overall either stays at ~84% or drops.

Any promising technique that I can explore for improving accuracy atleast above 90%? I am not bothered about improving the accuracy of only class 3, but the accuracy of the overall classifier. thanks.

One Answer

First be careful, looking only at accuracy in a multiclass problem can be misleading: with almost 75% of the data in the majority class, a dummy model which always predict the majority class achieves almost 75%. Measuring performance with micro or macro F1-score would be more informative.

Now about designing your experiments: currently you seem to be trying various methods at random, including sampling techniques and classification algorithms. Why not, but in this way you rely entirely on luck to improve performance. In particular what strikes me is that you don't mention anything about the task or the features (btw it's probably the reasons why some people downvoted the question). The type and nature of the features, their number and their relation to the class can be important to understand why certain methods work and others don't. There might be some feature engineering to do. In particular using feature selection methods sometimes brings great improvement. It would also be useful to get an idea of the performance obtained with simple methods (like decision trees, SVM, logistic regression). Finally you could investigate in more detail which kind of cases which get misclassified and/or study how stable the model is with respect to varying the number of instances or features.

Correct answer by Erwan on June 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP