TransWikia.com

Sentiment analysis of tweets (Train model on a labelled dataset and use on some other unlabelled data)

Data Science Asked by Doofenshmirtz on September 30, 2021

I have a huge amount of tweets on a particular topic say ‘ABC’ and the data is not labelled. I want to perform multi-class sentiment analysis of these tweets. I tried many unsupervised clustering techniques like Kmeans, DBScan, Agglomerative clustering from sklearn but the max silhoutte score that I have reached is 0.31 and the kmeans gives large negative score. I have performed cleaning and encoding of tweets using Bert embeddings, Word2Vec but nothing seems to change.

Suppose I used some other labelled multiclass dataset and build a classifier and then use that classifier to identify sentiment in my target data, will it be good enough? Is this approach correct and logical?

I have found these general speech datasets. Will they suffice my purpose of getting correct sentiments for the "ABC" tweets dataset?

I found this another emotion dataset related to tweets.

One Answer

The natural approach is to use a labelled dataset and a supervised learning technique. You can start with something simple, like using tf-idf for feature generation and train a simple logistic regression model.

I think this is the first thing you should try, I see it more likely to succeed than the unsupervised techniques, and it is simple enough.

Answered by David Masip on September 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP