Data Science Asked by Doofenshmirtz on September 30, 2021
I have a huge amount of tweets on a particular topic say ‘ABC’ and the data is not labelled. I want to perform multi-class sentiment analysis of these tweets. I tried many unsupervised clustering techniques like Kmeans, DBScan, Agglomerative clustering from sklearn but the max silhoutte score that I have reached is 0.31 and the kmeans gives large negative score. I have performed cleaning and encoding of tweets using Bert embeddings, Word2Vec but nothing seems to change.
Suppose I used some other labelled multiclass dataset and build a classifier and then use that classifier to identify sentiment in my target data, will it be good enough? Is this approach correct and logical?
I have found these general speech datasets. Will they suffice my purpose of getting correct sentiments for the "ABC" tweets dataset?
I found this another emotion dataset related to tweets.
The natural approach is to use a labelled dataset and a supervised learning technique. You can start with something simple, like using tf-idf for feature generation and train a simple logistic regression model.
I think this is the first thing you should try, I see it more likely to succeed than the unsupervised techniques, and it is simple enough.
Answered by David Masip on September 30, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP