Cross Validated Asked by Naveen Y on February 9, 2021
I have built a text classifier using OneClassSVM.
I have the training set which corresponds to only one label i.e(“Yes”) and I don’t have the other(“NO”) label data.
My task is to build a classifier which classifies the new unseen sentence(test data) as 1 if it is very similar to the training data. Else, it classifies as -1 i.e,(anomaly).
I have used Word2Vec to build the word embeddings for my training data.
Then, I am using word-vector averaging with OneClassSVM to build a anomaly detector classifier.
This classifier is currently giving accuracy of about 50%-55%. I have to enhance this further to build a robust classifier.
Any suggestions to this problem would be helpful…
This paper Outlier Detection for Text Data discussed similar problem. I believe for a robust classifier you need to understand latent topics in the corpus, either with LSI approach as discussed in this paper or via a clustering approach in latent space. I think using de-noising autoencoder for learning features from sentence embedding is the most straight forward approach to obtain robust classifier.
Answered by Akbari on February 9, 2021
The question is than about your data - how representational your cases from training set are for the whole "yes" subset - ?
And what type of errors your classifier does?
You may also try to use word2vec to produce embeddings of the whole texts.
Answered by MkL on February 9, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP