TransWikia.com

Accuracy is lower than f1-score for imbalanced data

Data Science Asked by ds_newbie on September 29, 2021

For a binary classification, I have a dataset with 55% negative label and 45% positive labels.

The results of the classifier shows that the accuracy is lower than the f1-score.
Does that mean that the model is learning the negative instances much better than the positive ones?

Does that even make sense, to have accuracy less than the f1-score?

2 Answers

I'll try to answer this with a couple examples:

Say we have 100 instances (55 negative, 45 positive). Let's say we predict 1/45 positives and 55/55 negatives correctly. Then our accuracy is 0.56 but our F1 score is 0.0435.

Now suppose we predict everything as positive: we get an accuracy of 0.45 and an F1 score of 0.6207.

Therefore, accuracy does not have to be greater than F1 score.

Because the F1 score is the harmonic mean of precision and recall, intuition can be somewhat difficult. I think it is much easier to grasp the equivalent Dice coefficient.

As a side-note, the F1 score is inherently skewed because it does not account for true negatives. It is also dependent on the high-level classification of "positive" and "negative", so it is also relatively arbitrary. That's why other metrics such as Matthew's Correlation Coefficient are better.

Answered by Benji Albert on September 29, 2021

It's helpful to look at the formula of accuracy and F1 score. $$Accuracy = frac{TP+TN}{TP+TN+FP+FN}$$ and $$F1= frac{2 TP}{2TP + FP + FN} $$ Now you are in the situation which Accuracy < F1. A simple algebraic manipulation will give you $TN < TP$. So your model predicts the positive better the negative. It depends on other factors to see whether this is fine or not, but for your case (a little bit imbalanced), I guess it's fine.

Answered by SiXUlm on September 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP