How to give name to topics created using LDA?

Question

I have categorized 800,000 documents into 500 categories using the Mahout topic modelling.

Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm.
For the time being, I have used the following algorithm to arrive at the name for the topic:

For each topic

Take all the documents belonging to the topic (using the document-topic distribution output)
Run python nltk to get the noun phrases
Create the TF file from the output
name for the topic is the phrase (limited towards max 5 words)

Please suggest a approach to arrive at more relevant name for the topics.

chewpakabra · Answer

If you don't want to dig into much NLP in that task, I suggest you to generate a set of most frequent NGrams (of lengths 2-5) from your documents and find the most distinct ngrams for each category using TF*IDF metric as sense importance of a particular ngram (normalizing measure by word count) and selecting those Ngrams that are used in a particular category and are not (or rarely) used in others.

Answered by chewpakabra on August 8, 2020

Emre · Answer

I can suggest several papers on this topic:

Automatic Labelling of Topic Models
Automatic Labeling Hierarchical Topics
Representing Topics Labels for Exploring Digital Libraries

You can find more by looking at their citations.

CpILL · Answer

You might try using word vectors to average the top N words in a topic and then using the cosine similarity to find the closest word in the corpus?

Just a quick and dirty an idea...

Learning stats by example · Answer

A few ideas you'll often see..

Generate a list from Wikipedia titles, extract keyphrases, predict the related wikipedia pages and use the keyphrases.
Generate a hand-labeled dataset.
Use a graph populated with topics and the relations between words and topics to predict the most likely topics
Abstractive summarization and keyphrase extraction

How to give name to topics created using LDA?

4 Answers

Add your own answers!

Ask a Question