TransWikia.com

Preparing Dataset Minority Class vs Majority Class

Data Science Asked by misheekoh on September 10, 2020

I’m currently doing a binary classification for sentiment prediction. Currently I have the majority class (~90% of the data) as my positive class (labelled 1) and the minority class (~10% of the data) as my negative class (labeled 0). What I’d like to maximize in this experiment is the detection of negative sentiments, hence I’d like to maximize the precision (and recall) of my minority class.

However, in many similar datasets (in terms of prioritizing the detection of minority class) out there like credit card fraud detection, cancer detection, usually the minority class is set as the positive class and the majority class set as the negative class.

My question is: Does it matter if the minority class is set as the positive or negative label in relation to performance of training a model or affecting a loss function such as cross entropy?

One Answer

My question is: Does it matter if the minority class is set as the positive or negative label in relation to performance of training a model or affecting a loss function such as cross entropy?

No it doesn't.

However in binary classification it's customary to call "positive" the main class of interest, so be careful to be clear about which one is positive/negative when/if you present your results to other people.

Also be careful that precision and recall are usually calculated for whatever is called the positive class, so don't inadvertently use the results of the majority class instead of the one you're interested in.

Correct answer by Erwan on September 10, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP