Sentiment analysis of tweets (Train model on a labelled dataset and use on some other unlabelled data)

Question

I have a huge amount of tweets on a particular topic say 'ABC' and the data is not labelled. I want to perform multi-class sentiment analysis of these tweets. I tried many unsupervised clustering techniques like Kmeans, DBScan, Agglomerative clustering from sklearn but the max silhoutte score that I have reached is 0.31 and the kmeans gives large negative score. I have performed cleaning and encoding of tweets using Bert embeddings, Word2Vec but nothing seems to change.
Suppose I used some other labelled multiclass dataset and build a classifier and then use that classifier to identify sentiment in my target data, will it be good enough? Is this approach correct and logical?
I have found these general speech datasets. Will they suffice my purpose of getting correct sentiments for the "ABC" tweets dataset?
I found this another emotion dataset related to tweets.

David Masip · Answer

The natural approach is to use a labelled dataset and a supervised learning technique. You can start with something simple, like using tf-idf for feature generation and train a simple logistic regression model.
I think this is the first thing you should try, I see it more likely to succeed than the unsupervised techniques, and it is simple enough.

Sentiment analysis of tweets (Train model on a labelled dataset and use on some other unlabelled data)

One Answer

Add your own answers!

Ask a Question