Data Science Asked by glowingdoodle on January 30, 2021
I am working on a problem where I have to classify products into multiple classes (more than one) based on product descriptions. For instance:
“Tresemme shampoo and conditioner – sulfate-free” = Personal Hygiene
“Lavender-scented handwash with moisturizer” = Personal Hygiene
“Doritos Ranch flavor 18 oz mega party pack” = Snacks
“Painting and Craft kit for adults above 18” = Art and Craft
However, my training dataset is highly imbalanced. A few classes have only 10 records while there is one that has 3000 records. 50000 records overall.
Can anyone suggest any good techniques to deal with the imbalance in text data?
Thanks,
GD
I too am working on same problem, found these below links very useful in getting started on oversampling and under sampling-
https://en.wikipedia.org/wiki/Oversampling_and_undersampling_in_data_analysis
Answered by BlackCurrant on January 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP