TransWikia.com

Should I oversample my validation data to get better F1 score and PRC?

Data Science Asked by Frank Xu on January 22, 2021

I am currently working with a dataset that is imbalanced, about 30k rows * 14 features (just for you know), and 99.5% of the data is labeled 0. Since the model is strongly imbalanced I decided to use precision/recall/f1 score to decide the performance of model.

I used SMOTE to oversample my training data (after splitting the validation set out). Now my model is trained with oversampled training data, and I am going to test it with validation set. If I just validate it on original validation data, I get a F1 score around 0.05, and the classification report is followed:

          precision    recall  f1-score   support

 Class 0       1.00      0.86      0.93      7606
 Class 1       0.03      0.75      0.05        36

If I oversample my validation data, I get a F1 score around 0.85:

          precision    recall  f1-score   support

 Class 0       0.84      0.86      0.85      7606
 Class 1       0.86      0.83      0.85      7606

My question is:

  1. Should I use an oversampled validation set? (because the result is much prettier but I think the model is the same anyway)

  2. Why do I have such bad metrics on the original validation data? Is it because the data size is not big enough?

One Answer

what you encounter are real-world problems rarely taught in classes.

  • For training, I would test SKLearn's class_weight = "balanced" or class_weight={0:0.995, 1:0.005}. It's a very robust technique.
  • For testing, you can't fiddle with the class_weight. It's meant to simulate the real-world data.
  • Make sure you don't overfit. E.G. Decision Tree max_leaf_node 5-10 OR (not both) max_depth 3-5.

Your results aren't that bad. A precision 1 of 0.03 is "low", but you only had 36 labels. Roughly speaking, the model labeled 1,000 as 1 (100/0.03 * 36). i.e. 97% false alarm. But with a good recall, you got most of the labels.

Putting it differently: in a dataset of around 8,000 you got 1,000 labeled as 1. That's around 12%. Out of those 1,000 there are about 30 real 1's. You get most of them (around 75%).

Imagine this was Churn Prediction. You don't want to lose customers. Without ML, you might send all your 8,000 customers a discount of 10%. That's expensive. With ML, you would only send 1,000 customers a discount and get most of them wanting to leave. That's a strong improvement over no model.

The same argument applies to many other cases such as Predictive Maintenance.

Answered by FrancoSwiss on January 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP