TransWikia.com

What ML techniques work on imbalanced datasets

Data Science Asked on April 18, 2021

I have some specific questions for which I could not find answers in textbooks/research articles. Shall be grateful for an answer. These are:

  1. Are there ML techniques that can be directly applied on class imbalanced datasets? OR is it a practice to balance the dataset either by using some weighted approach or SMOTE methods? What is the standard way for real datasets/industries? I am referring to fraud detection, anomaly and water leak detection where inherently the dataset would always be imbalanced.

  2. Let’s say I do class balancing by some weighted loss function. This loss function would be calculated on some amount of data say 100 examples of streaming data. Then during deployment phase, I may not have 100 examples of data coming; it could be more or less than what was used in training. The weighting approach is say the inverse class frequency approach which depends on the number of examples. So if the weighting approach was used on 100 examples during training, then during deployment phase I should always have 100 examples to work on to make some prediction on all those examples? Or there is no dependency between number of training examples and number of examples during deployment when doing class balancing?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP