Can I modify a Logistic Regression classifier to out put more than one class based on the probabilty?

Question

I'm training a Logistic Regression classifier on text data. I found that many of my data points have more than one target class. Is it possible to modify my model to output more than one class based on the data.
I plan to split multi-class data points in my training set into distinct classes(i.e. if one x has 3 classes, I will split that text into three so that each different text has a unique class associated with it). Then when I predict on the test data if I will output n classes such that

Probability(Class_1)+Probabilty(Class_2)+...+Probabilty(Class_n)>0.95

I will use the prdict_proba method of LogisticRegression for this.
Is it a correct way of doing this?. your help is much appreciated.

10xAI · Answer

We can't use the same data with a different label in the same model.
It will confuse the model.
What you need is multiple models e.g. OneVsRest strategy. i.e. one for each Class.
Little clarification-
What you are looking for is Multi-label, not Multi-class. Please check the internet on this. May read this SO Answer for a quick understanding.
You can implement an OneVsRest multi-label yourself but better use the Scikit-learn implementation Link.

OneVsRestClassifier can also be used for multilabel classification. To use this feature, provide an indicator matrix for the target y when calling .fit.

What it means is to encode your data in a multi-label format when calling fit().
You may use the from sklearn.preprocessing import MultiLabelBinarizer for this.
classif = OneVsRestClassifier(LogisticRegression())
classif.fit(X, Y)

Read this SO Answer for an example.
You may also use a sklearn.multioutput import MultiOutputClassifier wrapper for the same. Check this SO Answer

Can I modify a Logistic Regression classifier to out put more than one class based on the probabilty?

One Answer

Add your own answers!

Ask a Question