TransWikia.com

Python vs R (vs Stata): the old battle revisited

Economics Asked on May 3, 2021

I am an avid Python user. I know Stata but I’m not a pro. I don’t know R. I do econometrics (mostly time series, but also cross-section and panel), and statistics, and Python seems quite sufficient in meeting my needs.

I know that economists (at least old schoolers) are mainly using Stata and statisticians mostly R.

I read here and there that 1) R’s libraries are superior to Python’s, and 2) when it comes to visualization nothing can beat R’s ggplot.

My questions are:

a) What are examples of econometric/statistical analyses that can be easily done in R but not (as easily or at all) in Python?

b) Is visualization in R really superior to Python? In what sense: less coding, ease of coding, more intuitive, or the quality of the final output?

c) Where will you put Stata in this comparison?

3 Answers

I'd rate my experience: Stata 9/10, Python 7/10, R 3/10

on a) There are many econometric approaches specific to a certain field for which packages have been developed for R and Stata, but not (yet) for Python. One example is the Heckman Selection approach in Labor Economics, which I had to do myself in Python. There are numerous other examples.

on b) Possibly, ggplot is the best combination in terms of flexibility and intuition. Matplotlib in python is pretty much of a pain, I do bokeh as often as possible.

on c) In my view, Stata beats both others in terms of ease of coding. Something like a bar chart with two nested catgories can be done extremely quick and intuitively. The Stata output though is so far mostly limited on publication graphs. So quality of output is inferior. And the data handling before you can put them into a graph is also likely to be more painful in Stata (only one dataset) comparing two the other two (multiple parallel dataframes).

Answered by E. Sommer on May 3, 2021

I use all three programs.

Python can do everything that R can do and R can do everything that Python does, but I must say R is superior to Python when it comes to the packages. For that reason for most econometric analysis I usually default to R. I find also producing nice standard statistics graphics with R easier (but for maps I prefer Python).

However, Python is far superior for web-scraping, numerical analysis and sentiment text analysis (although R has some good packages for that as well). Also, I prefer to use Python when I need to set up my own program as programming in Python is more natural (if that makes any sense) than in R, unless the program can be build easily from ready made functions from various packages.

I always recommend to people around me to learn both Python and R - the difference between them is not that big and with R you don’t really need to invest heavily into programming skills but just basics and then use packages.

Also, Jupyter Notebooks that can accommodate both R and Python make using both of them easier.

When it comes to Stata I use it only for a educational purposes (I teach econometric tutorials at university). To be honest I don’t like stata for several reasons:

  1. Stata is not a freeware and i don’t think that the price tag is justified given that it’s inferior product compared to free ware programs like R and Python. If you can get it for free from uni then you probably don’t care about that but still it’s something to keep in mind.

  2. Stata is a program not a language so if you want to create a new complex function you need to separately get and learn Mata (statas programming language).

  3. Stata has some serious limitations on matrix sixes even in the most expensive edition the max mat size is 11000 which is serious limit when you work with panel data and have to run some iterative model with large number of variables. You will be routinely forced to run for example panel LR heteroskedasticity test on random subsamples even with the most expensive edition.

  4. Creating beautiful graphics in Stata is nearly impossible - now don’t get me wrong you can make some decent graphics in Stata but it pales in contrast to what you can do with Python or R.

  5. Stata is clunky with time series analysis. If you look for easy to use program for time series analysis it would be EViews. For example Stata can’t handle if you have quarters that are expressed in date format it will think you have gaps in your time series and won’t let you run basic time series commands until you create new time series variable. Also the offer of time series models is quite low and programming your own is painful.

  6. While technically possible webscraping or numerical analysis in Stata is hell - if you need that for your work don’t use it.

However, Stata has also few advantages:

  1. It’s more user friendly than R or Python, and can be even used without coding through interface (that’s why we use it for tutorials for students as a first program they see so they don’t get overwhelmed).

  2. Doing some adjustments to datasets creating dummy variables etc is easier compared to Python or R but that’s mostly because in both Python and R you can have various types of data, lists, data frames etc.

  3. With purchase to Stata you get access to Stata forums - it’s something like stack exchange but they actually pay professionals to give you answers there and often beside support you can get there very good advice even on econometrics, and you will usually get advice really quick.

I personally would not use Stata for my own scientific work. Python and R are superior to every aspect of Stata - but it’s very good starting program for students.

Answered by 1muflon1 on May 3, 2021

In my experience, I think Python is better for econometrics than R and Stata for the following reasons:

a) In real applications, get and transform data is 60% of the work. For this tasks Python is better.

b) To select the best model and features it's necessary to use loops. Loops in R are difficult but in Python are easier to use.

c) Object oriented programming is easier in Python. This means that we can develop our own objects and libraries easier than in R.

d) Python is a Swiss knife. It can be used for econometrics, for web scrapping, machine learning, ETL, quantitative finance, among other applications.

If you want examples of Python applied to econometrics, you can check on this book https://www.amazon.com/dp/B08KJ1322G that have several examples of python applied to econometrics.

Answered by dany on May 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP