How to map topic to a document after topic modeling is done with LDA

Question

Is there any way I can map generated topic from LDA to the list of documents and identify to which topic it belongs to ? 
I am interested in clustering documents using unsupervised learning and segregating it into appropriate cluster. 
Any link, code example, paper will greatly be appreciated.

Sid · Answer

After training your LDA topic model you can input documents into the model and it will classify them into the pre defined number of topics. In gensim (python), this would look something like this:

ques_vec = dictionary.doc2bow(tokenized_document)
topic_vec = ldamodel[ques_vec]

The dictionary is something you should have created for training 
ldamodel is the model that you trained. 
The topic_vec will contain the classified topic number (class) and the probability that the document belongs to that class.

At this point, you will not know what is the meaning of each topic (class), because it is the result of unsupervised classification. To know what is the meaning of each topic that your lda model clusters your documents into, you have to look into the trained parameters like this:

words = ldamodel.show_topic(topic_number, topn = 200)

If you print that, you'll see the top 200 words that make up that topic number. Based on the meaning of the words in each topic, you name that topic as an appropriate class.

How to map topic to a document after topic modeling is done with LDA

One Answer

Add your own answers!

Ask a Question