Text classification of an imbalanced dataset

Data Science Asked on June 7, 2021

I have a dataset with size ~ 500k entries. There are 2 columns, ‘product description’ and ‘level 1’. I am developing my model such that it learns from a training set of 350k and based on the product description for test data, it gives the values in ‘Level 1’. A simple linear classifier gives an accuracy of 85% which is too low, I am aiming for 97% atleast. I think this might be because the dataset is imbalanced, the level 1 values in the training data are imbalanced. How do I resolve this? Can I make the upsampling minority and downsampling majority work here?

classification python

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

haakon.io on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Jon Church on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?