TransWikia.com

Text classification of an imbalanced dataset

Data Science Asked on June 7, 2021

class distribution looks like thisthis is how the dataset looks likeI have a dataset with size ~ 500k entries. There are 2 columns, ‘product description’ and ‘level 1’. I am developing my model such that it learns from a training set of 350k and based on the product description for test data, it gives the values in ‘Level 1’. A simple linear classifier gives an accuracy of 85% which is too low, I am aiming for 97% atleast. I think this might be because the dataset is imbalanced, the level 1 values in the training data are imbalanced. How do I resolve this? Can I make the upsampling minority and downsampling majority work here?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP