TransWikia.com

User profiling based on multiple posts

Data Science Asked by Mac Sat on December 7, 2020

I currently have collected a dataset of different social media posts for each user with labels assigned to each user. I tried to use LSTM, and BERT for the text classification problem, So for each post, I try to predict the label(for example age). This is not sufficient because you need all of the information contained in the sum of posts to determine the user’s age for example.
My first thought was to concatenate all of the posts for a single user but since I am currently using BERT which has a maximum sequence length of 512 it wouldn’t work. My second idea was to use a text summary and combine them in one vector and hope it doesn’t pass the maximum length limit.

Do you have any suggestions for a possible solution? I would assume this problem has been dealt with in the scientific literature and I would be thankful if anyone could point me in the right direction.

2 Answers

I was thinking if you could vectorize the entire post using fasttext or gensim. if i understood the problem statement correctly, here is the sample tutorial.

Answered by b.k on December 7, 2020

You might want to have a look in XLNet.

In XLNet you can input a sentence, then take the state and inject it into a new run if the XLNet with a second input sentence, inject it into a third one with a third sentence and so on.

This, in essence, allows you to process text of virtually unlimited length.

Please see also this.

Answered by user2182857 on December 7, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP