Cross Validated Asked on November 12, 2021
I have a highly imbalanced dataset (0.21 percent positives, rest negatives) for which I am trying to build a classifier.
I tried to improve the F1 scores using hyperparameter tuning but in all the iterations, I got either good recall or good precision scores. Never the both. One came at the cost of another.
Is there a way to use these two models to improve the F1 and reduce the number of false positives being produced by the model with good recall but bad precision.
I would start by looking at the formulas:
$recall=frac{TP}{TP+FN}$
$precision = frac{TP}{TP+FP}$
$F_1 = 2 frac{(precision) (recall)}{precision+recall}$
From here it is easy to see that precision and recall are inversely proportional. This means when one increases, the other one decreases. One option is to adjust your threshold and analyze your f1 score. If you are working in python, try looking into the get_metrics_report
function from sklearn which yields a very useful table for this cases.
Try reducing your FN ratio (by adjusting your threshold) to increase recall and F1 but this will inherently come with a precision cost. How much precision can be sacrificied? Depends on you and the context of your problem.
Answered by PLanderos33 on November 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP