Data Science Asked on May 28, 2021
I’m looking for a metric that can be used to quantify how imbalanced the labels are in a dataset.
I’m not looking for a strategy to solve the imbalance problem, I just want to present how imbalanced my dataset is. I’ve computed the ratio of the most frequent and least frequent labels which is probably an ok way of doing it but I’m sure there’s a more robust way?
You are looking for Entropy. The higher the entropy, the more imbalanced it is. You can use this function for calculating it.
Correct answer by Abhishek Verma on May 28, 2021
A very simple measure of imbalance would be the standard deviation of the classes proportions.
Answered by Erwan on May 28, 2021
I'd recommend looking at the Gini index as a measure of the inequality in the class sizes. Unlike entropy or standard deviation, Gini index is explicitly designed to capture the amount of inequality in a distribution.
Answered by kfx on May 28, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP