TransWikia.com

How to deal with imbalanced text data

Data Science Asked by glowingdoodle on January 30, 2021

I am working on a problem where I have to classify products into multiple classes (more than one) based on product descriptions. For instance:

“Tresemme shampoo and conditioner – sulfate-free” = Personal Hygiene
“Lavender-scented handwash with moisturizer” = Personal Hygiene
“Doritos Ranch flavor 18 oz mega party pack” = Snacks
“Painting and Craft kit for adults above 18” = Art and Craft

However, my training dataset is highly imbalanced. A few classes have only 10 records while there is one that has 3000 records. 50000 records overall.

Can anyone suggest any good techniques to deal with the imbalance in text data?

Thanks,
GD

One Answer

I too am working on same problem, found these below links very useful in getting started on oversampling and under sampling-

https://machinelearningmastery.com/random-oversampling-and-undersampling-for-imbalanced-classification/

https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis

Answered by BlackCurrant on January 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP