TransWikia.com

Evaluating performance of classifier on lopsided dataset

Data Science Asked by Mikkel bruun on February 23, 2021

I have a binary classifier that I would like to evaluate the performance of. It’s been both trained and tested on a data set where the ratio of true to false labels is lopsided. This means that while it’s quite poor at correctly guessing true, its overall performance on the test set looks very good when using a metric such as right_guesses/total.

What is a better metric to use? Preferably, one where the true false labels account for the same percentage of the score although their numbers are unequal.

2 Answers

sklearn has weighted accuracy score which works just fine:

sklearn.metrics.balanced_accuracy_score()

Answered by Mikkel bruun on February 23, 2021

It's better to use the F-score(F-measure) in this case to evaluate youre model. To calculate the F-score you can use the following equation:

$textrm{ F score} = frac{(2 * Precision * Recall) }{ (Precision + Recall)} = frac{tp}{tp+frac{1}{2}(fp+fn)}$

where:

Dont use accuracy as you may have a high accuracy without correctly classifying the majority of a class if you have an imbalanced dataset.

This is an article if you'd like to read more on imbalanced data.

Answered by Anoop A Nair on February 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP