Database Administrators Asked by Alex Lo on October 28, 2021
I have a MySQL RDS instance (“production”) from which I need to copy incremental data to another MySQL RDS instance (“reporting”). “Incremental data” being the new rows in certain tables. Currently I accomplish this via a PHP script which queries from “production” based on the last update time and writes on CSV files to disk and then I use mysqlimport.exe to import the files into “reporting”.
My current method takes ~80 seconds. Is this the most robust and efficient way to do this? I’ve found other tools for “one off” transfers, but nothing in terms of an “enterprise incremental database replication” tool.
Our current process is not a “continuous” replication because of batch operations on the reporting side. This may be negotiable though. Would easing that constraint open up tools that would make this easy?
Relevant related articles/questions:
Amazon’s article on importing data seems to mainly deal with the batch (initial load) case.
http://aws.amazon.com/articles/2933
This can be done with a stored proc when moving from one DB to another on the same instance (the problem with my case is they are different instances). See: Keeping MySQL staging database on same server updated from production
I can’t put both DBs on the same instance because our instances are under quite a bit of load.
Write your own bash script and schedule Incremental Load Job to sync Prod and reporting server/DWH DB,
Step 1: Does a mysqldump on Prod (without lock) to fetch records from tables (Where updated_at between Previous_Day_Start and Previous_Day_End)
Step2: Restore the backed up SQL file into reporting/dwh instance, (Delta DB - create your staging DB here)
Step3: Does an SP call to sync records in reporting/dwh delta DB to actual reporting/dwh DB (both DBs are in the same instance)
SP Logic: First do the update by matching primary key, then do an insert of new records.
Script Commands: #(1)Do the mysql database backup (dump) mysqldump -h$vHost1 -u$vUser -p$vPass1 --skip-comments --quick --extended-insert --set-gtid-purged=OFF --skip-triggers --skip-add-locks $vDb1 $vTable --where="updated_at between '$vStartDate' and '$vEndDate'" > $vFile
#(2) DO mysql restore mysql -h$vHost2 -u$vUser -p$vPass2 --comments --database=$vDb2 < $vFile
#(3) SP call mysql -h$vHost2 -u$vUser -p$vPass2 --skip-column-names --database=$vDb2 --execute="CALL your_sp();" >> "$vDailyIncrementalLoad/$vFileName"
Warning: mysqldump is costly operation, so run the job in non-business hours on a nightly basis, Thanks
Answered by Sathyanath Ravichandran on October 28, 2021
MySQL RDS allows you to do two options
You set up a Read Slave. Then, run the reports from it. There is a tool in the RDS CLI called rds-create-db-instance-read-replica
to create a Read Slave from a Running RDS instance. The following are done under the hood:
Take a live snapshot of Production. Then, open a new instance using that snapshot.
The tools for this are
rds-create-db-snapshot
rds-restore-db-instance-from-db-snapshot
The only difference between these options
In both cases, once you done, simply drop the new Instance (saves money) at your discretion.
If you combine the two ideas, you could load your PHP script's output into the point-time instance. Any needed data to retrieve from production could be read from the Read Slave to prevent further load issues with production.
Bringing up instances can be time-consuming. So, plan to have the instance brought up early enough for report and import time.
This is more ranting and brainstorming than a concrete answer.
Answered by RolandoMySQLDBA on October 28, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP