Does Python have R's tidytext equivalent?

Question

I can't seem to find a tidytext (R library) equivalent in Python. Text mining in Python seems quite weak compared to R.

Nicholas James Bailey · Answer

Scikit-learn has a great implementation of latent dirichlet allocation, which I would argue is as straightforward to use as the implementation in tidytext. There’s a tutorial here.

Also, Python has SpaCy, which is slicker than anything R has so far in terms of tooling for NLP pipelines,

I do love R, and I feel it’s still a better language for tidying and processing data than Python. Tidytext is currently nicer than anything in Python in terms of getting data in and out of topic models. However, Python is a lot better resources than R for text mining overall.

Fnguyen · Answer

To add onto @Nicholas James Bailey's answer:
tidytext provides functionality for two different main operations: text mining and text modeling.
I think the text mining part of it where we tokenize, tidy and prep text data is a bit more unique. As pointed out there are several model alternatives for text data, some of which are arguably better.
In terms of text mining in python here are my experience summed up. There are some helpful libraries like NLTK  and others. Additionally many text processing operations like tokenization are simply easier to implement with base functionality in python than in R eliminating the need for an external package.
However the biggest advantage of tidytext is it's tidy approach which is pretty unique to the R and specifically the tidyverse environment.
My preferred solution
Due to this I have actually stopped looking for a python alternative to tidytext, instead I prep and tidy my data in R and then model in python by integrating them via reticulate in my R notebooks.

Does Python have R's tidytext equivalent?

2 Answers

Add your own answers!

Ask a Question