Data Science Asked by user84037 on January 11, 2021
I am going to build machine learning algorithm to identify fake tweets. The data set has huge retweets which I think might be an issue. Do you think given that the focus is the original tweet, it is better to remove all the retweets?
Thank you,
No. I do not believe so and I can explain a few reasons why.
You should remove retweets if.
Answered by Michael Hearn on January 11, 2021
There might be a chance that the retweet has an entirely different context compared to the original tweet. It is also possible that some retweets with different opinion/comment gain more popularity than the original one.
In these cases I don't think you can classify them as fake tweets.
You can classify tweets as fake when they are widely retweeted but with no context, One such example is retweets due to a giveaway or charity.
If you can figure out how to separate the spam retweets and original tweets it would help for better analysis and accurate results.
Answered by Uday T on January 11, 2021
To me it depends on what you want to focus on : do you want to create a model dealing with original posts that are fake news, and then make an algorithm finding the original from a retweet then applying your model ? Or do you just want a model that takes one tweet, not looking if it's a retweet or not, and trying to guess if it's fake or not.
In the first case, you should remove them, because you'll have many information about the people retweeting fake news, while you only want to find info about origin posters, which will make your model biaised. In the second case, of course, since that's exactly what your model aims to do, you should keep them.
Answered by BeamsAdept on January 11, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP