Cross Validated Asked by Aishwarya A R on January 2, 2021
I am working on a multi-label classification problem. Each sample is capable of taking more than a single label. Sometimes samples don’t have any labels associated with them.
My dataset has 50% samples with 1 or more labels associated with them. The remaining have no labels at all. I am sure, among the future "test" samples, there will be a population that has no labels attached.
So far, I’ve been dropping the 50% samples with no labels and training a multilabel classifier. Recently, I realized that this model will end up predicting labels for a sample even when none of the labels seem appropriate for it. This leaves me with 2 options –
Am I thinking in the right direction? I’d also like to know your suggestions on this problem.
Let $n$ be the number of distinct labels. The problem with your first proposed solution is that your multi-label method now has to learn that label "NONE" never co-occurs with other labels. If the multi-label method assumes nothing about the distribution of labels, then it has to learn that all $2^n-1$ combinations of labels in which "NONE"=1 and at least one other label is 1, never occurs. It also, does not prevents predicting all zeroes.
As your problem has a lot of samples with no labels at all, a simple and effective solution is to build your own hierarchical classifier. Make two classifiers: The first one is a binary classifier that just detects if all labels are zero or not. To train this binary classifier, just transform your samples with no labels to label "A" and all other labels to label "B". That is, a "A" from this binary classifier means no labels at all and a "B" means that exists at least one label. The second classifier is any multi-label classifier you want, but only trained on samples with at least one label. In prediction/test phase, this second classifier is only called if the first binary classifier predicts label "B" (at least one label). Details on more elaborated hierarchical classifiers can be found in: https://www.researchgate.net/publication/306040749_Consistency_of_Probabilistic_Classifier_Trees
Other common solutions are using one of this four multi-label methods in combination with a multi-class classifier (ex: K-nn and SVM): Binary Relevance, Classifier Chain and Label Powerset. Scikit-learn implement this methods. I suggest Classifier Chain, which take dependencies among labels into account, as it seems from your question that you want the algorithm to predict pretty well when there is no labels at all. Label Powerset is also a good solution, except if you have a "lot" of labels ($ngeq 20$) and not sufficient data.
Correct answer by lhsmello on January 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP