Data Science Asked by Sean McCarthy on January 30, 2021
Rather than learn a new package/language, I’d like to use my existing SQL skills to manipulate pandas dataframes
in Python. Does anyone know of a way to do this, or perhaps a package that will allow me to do this?
Based on my experience you can almost do everything that can be done using pandas in your sql. I've not seen recent versions of pandas but I remember that sql is even better because using pandas you are restricted to the size of memory. If memory fills out you may crash, something that does not happen using sql commands. You can save your pandas data frame in a csv
file and manipulate that csv
file using your sql. This link and also here may help you. Also for importing your csv
file to your sql, you have not specified what sql you have but this link may help you. Other sqls also provide this behavior.
Answered by Media on January 30, 2021
I found a package called pandasql, which is based on sqldf for R. It seems quite a bit slower than doing the transformations with the pandas package, but it gets the job done. Just put the SQL query into a string like this:
query_string = """
select * from df
"""
Then use the string in the pandasql.sqldf package, as follows:
new_dataframe = pandasql.sqldf(query_string, globals())
Choose globals() or locals(), depending on the scope you want for your variables.
As I mentioned, it seems a bit slow, but I couldn't find anything else. I may use this from time to time until I become better at Pandas.
Sean
Answered by Sean McCarthy on January 30, 2021
There's actually a new package called dataframe_sql that does just what you're looking for. It's different from Pandasql in that it translates sql directly to pandas methods, which eliminates the slow down caused by that package. If you want information about installation or how it works you can check it out here
Answered by Zach Brookler on January 30, 2021
You can use the below option for Google BigQuery SQL:
import pandas as pd
from google.cloud import bigquery
selectQuery = """SELECT * FROM mydataset.mytable"""
bigqueryClient = bigquery.Client()
df = bigqueryClient.query(selectQuery).to_dataframe()
print(df)
Answered by Soumendra Mishra on January 30, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP