TransWikia.com

Non-uniform class occurances in input data for classification task - how to tackle it?

Data Science Asked by Mikołaj Wróblewski on December 5, 2020

So, I gathered political articles for my thesis, now I want to be able to classify given text. Though the classes distribution is actually crazy.

  • Class 1: 964 docs
  • Class 2: 37,020
  • Class 3: 640
  • Class 4: 2,675
  • Class 5: 793
  • Class 6: 23,160
  • Class 7: 2,665

Such a skewed data is obviously going to favor classes 2 and 6, though I thought about elevating the difference from last layer for classes with lower observations, is that worth a shot? Or it will actually create overfit for these classes? Unfortunately I can’t scrap more data, the websites with articles doesn’t have any more (at least now). Of course any data augmentation is not possible.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP