TransWikia.com

How to give name to topics created using LDA?

Data Science Asked by adihere on August 8, 2020

I have categorized 800,000 documents into 500 categories using the Mahout topic modelling.

Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm.
For the time being, I have used the following algorithm to arrive at the name for the topic:

For each topic

  • Take all the documents belonging to the topic (using the document-topic distribution output)
  • Run python nltk to get the noun phrases
  • Create the TF file from the output
  • name for the topic is the phrase (limited towards max 5 words)

Please suggest a approach to arrive at more relevant name for the topics.

4 Answers

If you don't want to dig into much NLP in that task, I suggest you to generate a set of most frequent NGrams (of lengths 2-5) from your documents and find the most distinct ngrams for each category using TF*IDF metric as sense importance of a particular ngram (normalizing measure by word count) and selecting those Ngrams that are used in a particular category and are not (or rarely) used in others.

Answered by chewpakabra on August 8, 2020

I can suggest several papers on this topic:

  • Automatic Labelling of Topic Models
  • Automatic Labeling Hierarchical Topics
  • Representing Topics Labels for Exploring Digital Libraries

You can find more by looking at their citations.

Answered by Emre on August 8, 2020

You might try using word vectors to average the top N words in a topic and then using the cosine similarity to find the closest word in the corpus?

Just a quick and dirty an idea...

Answered by CpILL on August 8, 2020

A few ideas you'll often see..

  • Generate a list from Wikipedia titles, extract keyphrases, predict the related wikipedia pages and use the keyphrases.
  • Generate a hand-labeled dataset.
  • Use a graph populated with topics and the relations between words and topics to predict the most likely topics
  • Abstractive summarization and keyphrase extraction

Answered by Learning stats by example on August 8, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP