extract document topic vectors from lda model

Question

how can I extract document-topic matrix from LDA model and use it as input features an svm classifier? I am using gensim for implementation

Marc Kelechava · Answer

I've done this before in Gensim, hopefully it will help:

train_vecs = []
for i in range(len(your_training_examples)):
    top_topics = lda_train.get_document_topics(train_corpus[i], minimum_probability=0.0)
    topic_vec = [top_topics[i][1] for i in range(20)]
    train_vecs.append(topic_vec)

The above would give the top 20 topics for every document. 'train_corpus' is the result of doing something like this in Gensim once you have a bigram object from the 'Phrases' Gensim model class:

train_corpus = [id2word.doc2bow(text) for text in bigram]

extract document topic vectors from lda model

One Answer

Add your own answers!

Ask a Question