TransWikia.com

Does Python have R's tidytext equivalent?

Data Science Asked by xiaodai on November 4, 2020

I can’t seem to find a tidytext (R library) equivalent in Python. Text mining in Python seems quite weak compared to R.

2 Answers

Scikit-learn has a great implementation of latent dirichlet allocation, which I would argue is as straightforward to use as the implementation in tidytext. There’s a tutorial here.

Also, Python has SpaCy, which is slicker than anything R has so far in terms of tooling for NLP pipelines,

I do love R, and I feel it’s still a better language for tidying and processing data than Python. Tidytext is currently nicer than anything in Python in terms of getting data in and out of topic models. However, Python is a lot better resources than R for text mining overall.

Answered by Nicholas James Bailey on November 4, 2020

To add onto @Nicholas James Bailey's answer:

tidytext provides functionality for two different main operations: text mining and text modeling.

I think the text mining part of it where we tokenize, tidy and prep text data is a bit more unique. As pointed out there are several model alternatives for text data, some of which are arguably better.

In terms of text mining in python here are my experience summed up. There are some helpful libraries like NLTK and others. Additionally many text processing operations like tokenization are simply easier to implement with base functionality in python than in R eliminating the need for an external package.

However the biggest advantage of tidytext is it's tidy approach which is pretty unique to the R and specifically the tidyverse environment.

My preferred solution

Due to this I have actually stopped looking for a python alternative to tidytext, instead I prep and tidy my data in R and then model in python by integrating them via reticulate in my R notebooks.

Answered by Fnguyen on November 4, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP