Data Science Asked on June 5, 2021
I’m training a Logistic Regression classifier on text data. I found that many of my data points have more than one target class. Is it possible to modify my model to output more than one class based on the data.
I plan to split multi-class data points in my training set into distinct classes(i.e. if one x has 3 classes, I will split that text into three so that each different text has a unique class associated with it). Then when I predict on the test data if I will output n classes such that
Probability(Class_1)+Probabilty(Class_2)+…+Probabilty(Class_n)>0.95
I will use the prdict_proba method of LogisticRegression for this.
Is it a correct way of doing this?. your help is much appreciated.
We can't use the same data with a different label in the same model.
It will confuse the model.
What you need is multiple models e.g. OneVsRest strategy. i.e. one for each Class.
Little clarification-
What you are looking for is Multi-label, not Multi-class. Please check the internet on this. May read this SO Answer for a quick understanding.
You can implement an OneVsRest multi-label yourself but better use the Scikit-learn implementation Link.
OneVsRestClassifier can also be used for multilabel classification. To use this feature, provide an indicator matrix for the target y when calling .fit.
What it means is to encode your data in a multi-label format when calling fit()
.
You may use the from sklearn.preprocessing import MultiLabelBinarizer
for this.
classif = OneVsRestClassifier(LogisticRegression())
classif.fit(X, Y)
Read this SO Answer for an example.
You may also use a sklearn.multioutput import MultiOutputClassifier
wrapper for the same. Check this SO Answer
Answered by 10xAI on June 5, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP