Cross Validated Asked by Sharki on December 6, 2020
I was thinking of training a neural network that would be able to classify twitter users according to their followers. For example, I would like to know if a user is “gamer” or not by the people they follow (not the number, but the list of the people they follow). I’d have a dataset of people who are gamers or not based on the accounts they’re following and train them.
The problem that I find is that the number of followers of each person will vary … One person could follow thousands of users while another could only follow hundreds, so since the list of followers of each person won’t be the same lenght, how could I determine my feature there?
This would be an example of my dataset:
Features would be every people an account is following, and the label would say if it’s a gamer or not.
As I said, the problem I can see is that the number of labels won’t be the same…
This is a very interesting problem that can be approached in a lot of different ways!
Even though you are asking for a neural network, I will apply my own experience with Bayesian networks as I think they are more than suitable for this task (they might be even the best choice).
You could start of with a small seed of highly popular users which you can label as gamers/not gamers. Afterwards, using some threshold (which should vary in proportion to the number of users which have been labelled P(θ = X | labelledUsers = 100)), you should decide whether a particular user is a gamer or not (I would start off with people that follow a lot of "twitterers" to maximize the chances of following one of your "seed".
To help you find the previously mentioned probability, you can also help yourself with a test set to validate the probability. I hope this was helpful even though it doesn't use a neural network. BTW, you could also model P(θ = X | labelledUsers = 100) with a beta distribution. Using this technique, the algorithm would choose how much to weigh each feature using a PDF and you could just feed it as many features as you like without bothering about how relevant they are. As an extra, I wholeheartedly encourage you to buy/download John Kruschke's book on Doing Bayesian Data Analysis (A tutorial with R and BUGS)
, it will give you an excellent insight into data analysis for solving problems like this one.
In case you still want to use ANNs, I would recommend learning the parameters (the weights) for the neural network in a special environment (which you have to manually label) and using features such as the number of people following, ratio: gamers/all, you can try to create a computer vision algorithm for the images he posts (gaming vs non gaming). The list is infinite! Still, you should be careful not to include too many random (meaningless) features otherwise it might cause underfitting!
If you would also like details on which programming language would be best for this, I would recommend R or Matlab (my personal choice even though R comes with more premade functions) for developing the algorithm and Fortran or C for a final exportable version (if you wished to do so).
Correct answer by david david on December 6, 2020
I would start by giving each Twitter user a unique ID - this will become the index of your dataset. Then for each unique ID (twitter user), do feature engineering and brainstorm different features that you think would be useful. I would highly recommend reading 'Hands On Machine Learning with Scikit-Learn and Tensorflow' if you don't have much experience with machine learning and neural networks for an implementation based approach. I'd also recommend researching and reading papers about Twitter user trends and the different category of users if you haven't already. Knowing the subject matter of what you are trying to model is very important.
Keep in mind, that you don't need to be very selective with the features you add. As long as you can come up with a reasonable suspicion, go ahead and add the data. Keep in mind, this will add to the amount of time you spend cleaning the data.
Some features I can think of:
Another thought: a problem like this is ripe for a network analysis and graph based network model. Here is an example.
Answered by FifthCode on December 6, 2020
Your problem is that you are taking each followed user as an input feature and that the number of followed users can vary so you don't have a fixed length vector for your NN.
Two possible approaches (that I can think of):
eg: the target account X has 3 followers:
Feature vector for the target account should aggregate these values, for instance:
X = {'gamer': 3; 'influencer': 1; 'youngster': 2; 'leader': 1; 'political activist': 1}
You might want to normalize so all values are in [0,1].
In both cases, you need to flip your view on the inputs and move from each followed user as a dimension to something else.
Answered by gdupont on December 6, 2020
If I correctly understand your question, your main problem here is the length of your features which are varying. So I think you need to use Long short-term memory (LSTM ) model or Gated Recurrent Units (GRU). These types of models are based on Recurrent neural networks(RNN)s architecture which are used widely in deep learning models specially when you need some memory in your model or where your models input sequence length are varying. Here is a good article about Sequence Classification with LSTM.
Answered by pouyan on December 6, 2020
I guess your question is: you want to use a neural network to predict whether a user is a "gamer" or not, and you want to find some features from his/her followers. Well here are some possible features you may want to use.
Hope these can help you.
Answered by Nick Chen on December 6, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP