Data Science Asked on December 5, 2021
I’m doing sentiment analysis on a twitter dataset (problem link). I have extracted the POS tags from the tweets and created tfidf vectors from the POS tags and used them as a feature (got accuracy of 65%). But I think, we can achieve a lot more with POS tags since they help to distinguish how a word is being used within the scope of a phrase. The model I’m training is MultnomialNB().
The problem I’m trying to solve is to find the sentiments of tweets like positive, negative or neutral.
I created tfidf vectors from the tweet and gave the inputs to my model:
tfidf_vectorizer1 = TfidfVectorizer(
max_features=5000, min_df=2, max_df=0.9, ngram_range=(1,2))
train_pos = tfidf_vectorizer1.fit_transform(train_data['pos'])
test_pos = tfidf_vectorizer1.transform(test_data['pos'])
clf = MultinomialNB(alpha=0.1).fit(train_pos, train_labels)
predicted = clf.predict(test_pos)
With the above code I got 65% accuracy. Rather than creating TF-IDF vectors of POS and using them as modal inputs. I’m wondering is there any other way that we can use POS tags to increase the accuracy of the model?
There are so many ways you could go about this. For starters, you could use Conditional Random Fields (CRF). There is a sweet implementation in Python. In which you can set the POS features and more. There is a website from the same source you posted on how to use CRF for your purpose (I have not read it thoroughly). Spacy is another great resource to get all the features that you need fast. Nonetheless, for SOTA you will need some NN implementations.
Answered by 20roso on December 5, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP