LSA, LDA or NMF in Topic Modeling?

Question

I'm trying to implement Topic Modeling via Python & NLP but can't figure out what algorithm should I use. I have studied Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA) and Non-negative matrix factorization (NMF) but how to decide which algorithm fits best for certain task? If I just try all of them in a row then how to measure the result?

Erwan · Answer

As far as I know LDA is the state of the art approach for topic modeling, but I'm not following the field very closely. So I would say that it's pretty safe to use LDA, and my guess is that the different approaches are likely to give similar results overall.
In case you want to try different methods, the question of evaluating topic models is quite complex. This article might help.
Side note: there is a non-parametric variant (no need to choose the number of topics) of LDA called Hierarchical Dirichlet Processes.

Brian Spiering · Answer

Since all three algorithms have standard implementations in Python, you should try all three.
One of the best ways to evaluate topic modeling is random sample the topics and see if they "make sense". Manually inspecting which documents are in which cluster is good way to see if the topic modeling is doing what you intended it to do.

LSA, LDA or NMF in Topic Modeling?

2 Answers

Add your own answers!

Ask a Question