TransWikia.com

Biasing SVM algorithm towards particular subset of data

Data Science Asked by Datashapa123 on September 4, 2021

I’m training an SVM model for sentiment analysis, based on social media data eg. tweets.

The model will be trained using a small selection of a particular company’s tweets in order to classify new ones. However, since the training set is too small to get an accurate model I will be combining the company’s data with a much larger general tweets dataset to train the model.

Being specialised to one company, the content of the respective data is slightly different to the content of the general dataset. Since the data to be predicted is company specialised, it seems logical to me to bias the models training towards giving greater importance to the company related tweets to improve the accuracy. My first thought was simply increasing the magnitude of the polarity of the companies tweets, ie general tweets are -1 or 1 and company tweets are -3 and 3, for example.

Is this the right idea/method?

2 Answers

I don't think that's a very good idea: the goal is not to make the model predict a more extreme polarity when the tweet relates to the company.

Instead you might want to consider oversampling the few instances of this specific company. For instance if you have 100 company-specific tweets and 1000 general tweets in your training set, you could duplicate the company-specific ones 10 times in order to give the specific tweets have a higher weight in the data. If possible you should tune the parameter of how many times to duplicate in order to obtain the optimal value.

Answered by Erwan on September 4, 2021

Please try duplicating the specific company's data ten times or more, and include more samples in cross/test data from that company-specific data (3:1). I hope this will have some positive implications.

Answered by Muhammad Shahzad on September 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP