TransWikia.com

Azure Cloud SQL - Querying large number of rows with Python

Data Science Asked on March 29, 2021

I have a Python Flask application that connects to an Azure Cloud SQL Database, and uses the Pandas read_sql method with SQLAlchemy to perform a select operation on a table and load it into a dataframe.

recordsdf = pd.read_sql(recordstable.select(), connection)

The recordstable has around 5000 records, and the function is taking around 10 seconds to execute (I have to pull all records every time). However, the exact same operation with the same data takes around 0.5 seconds when I’m selecting from a local SQL Server database.

What can I do to reduce the time it takes to load data from Azure to a dataframe? Would moving the entire Python application to Azure serverless help? Thanks

Additional Information

  • Azure database is on Standard tier with 20 DTUs
  • Database region has been configured to be close to my location
  • Ideally looking for the operation to take under 2 seconds

One Answer

You have several phases of the data retrieval process: connection time, download time, and database procesing time.

If you store your data in a csv file in a blob storage then the processing time will be faster (essentially zero). So every day you could save the data from the database to a csv file and then access the file when you need it.

Azure serverless will reduce the connection time and download time (if your internet connection is slow), but will not reduce the processing time of the database.

Answered by keiv.fly on March 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP