Data Science Asked by xiaodai on November 4, 2020
I can’t seem to find a tidytext (R library) equivalent in Python. Text mining in Python seems quite weak compared to R.
Scikit-learn has a great implementation of latent dirichlet allocation, which I would argue is as straightforward to use as the implementation in tidytext. There’s a tutorial here.
Also, Python has SpaCy, which is slicker than anything R has so far in terms of tooling for NLP pipelines,
I do love R, and I feel it’s still a better language for tidying and processing data than Python. Tidytext is currently nicer than anything in Python in terms of getting data in and out of topic models. However, Python is a lot better resources than R for text mining overall.
Answered by Nicholas James Bailey on November 4, 2020
To add onto @Nicholas James Bailey's answer:
tidytext
provides functionality for two different main operations: text mining and text modeling.
I think the text mining part of it where we tokenize, tidy and prep text data is a bit more unique. As pointed out there are several model alternatives for text data, some of which are arguably better.
In terms of text mining in python here are my experience summed up. There are some helpful libraries like NLTK
and others. Additionally many text processing operations like tokenization are simply easier to implement with base functionality in python than in R eliminating the need for an external package.
However the biggest advantage of tidytext
is it's tidy approach which is pretty unique to the R and specifically the tidyverse
environment.
My preferred solution
Due to this I have actually stopped looking for a python alternative to tidytext
, instead I prep and tidy my data in R and then model in python by integrating them via reticulate
in my R notebooks.
Answered by Fnguyen on November 4, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP