Data Science Asked on February 20, 2021
I’m trying to implement Topic Modeling via Python & NLP but can’t figure out what algorithm should I use. I have studied Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA) and Non-negative matrix factorization (NMF) but how to decide which algorithm fits best for certain task? If I just try all of them in a row then how to measure the result?
As far as I know LDA is the state of the art approach for topic modeling, but I'm not following the field very closely. So I would say that it's pretty safe to use LDA, and my guess is that the different approaches are likely to give similar results overall.
In case you want to try different methods, the question of evaluating topic models is quite complex. This article might help.
Side note: there is a non-parametric variant (no need to choose the number of topics) of LDA called Hierarchical Dirichlet Processes.
Answered by Erwan on February 20, 2021
Since all three algorithms have standard implementations in Python, you should try all three.
One of the best ways to evaluate topic modeling is random sample the topics and see if they "make sense". Manually inspecting which documents are in which cluster is good way to see if the topic modeling is doing what you intended it to do.
Answered by Brian Spiering on February 20, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP