TransWikia.com

Is there a package for using SQL to manipulate Pandas dataframes in Python?

Data Science Asked by Sean McCarthy on January 30, 2021

Rather than learn a new package/language, I’d like to use my existing SQL skills to manipulate pandas dataframes in Python. Does anyone know of a way to do this, or perhaps a package that will allow me to do this?

4 Answers

Based on my experience you can almost do everything that can be done using pandas in your sql. I've not seen recent versions of pandas but I remember that sql is even better because using pandas you are restricted to the size of memory. If memory fills out you may crash, something that does not happen using sql commands. You can save your pandas data frame in a csv file and manipulate that csv file using your sql. This link and also here may help you. Also for importing your csv file to your sql, you have not specified what sql you have but this link may help you. Other sqls also provide this behavior.

Answered by Media on January 30, 2021

I found a package called pandasql, which is based on sqldf for R. It seems quite a bit slower than doing the transformations with the pandas package, but it gets the job done. Just put the SQL query into a string like this:

query_string = """
    select * from df
"""

Then use the string in the pandasql.sqldf package, as follows:

new_dataframe = pandasql.sqldf(query_string, globals())

Choose globals() or locals(), depending on the scope you want for your variables.

As I mentioned, it seems a bit slow, but I couldn't find anything else. I may use this from time to time until I become better at Pandas.

Sean

Answered by Sean McCarthy on January 30, 2021

There's actually a new package called dataframe_sql that does just what you're looking for. It's different from Pandasql in that it translates sql directly to pandas methods, which eliminates the slow down caused by that package. If you want information about installation or how it works you can check it out here

Answered by Zach Brookler on January 30, 2021

You can use the below option for Google BigQuery SQL:

import pandas as pd
from google.cloud import bigquery

selectQuery = """SELECT * FROM mydataset.mytable"""
bigqueryClient = bigquery.Client()
df = bigqueryClient.query(selectQuery).to_dataframe()
print(df)

Answered by Soumendra Mishra on January 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP