TransWikia.com

Issue using pyModis.createMosaicGDAL() on Debian

Geographic Information Systems Asked by Nart Barileva on February 17, 2021

I’m currently trying to automate some data processing with MODIS data using pyModis but am running into an odd issue.

I’ve developed a script that I’ve run locally just fine. It downloads all the tiles for a date using downModis() and creates the mosaic using createMosaicGDAL() and then does some additional processing. Locally, the downloading takes around 30 minutes IIRC and the mosaicking takes around 45 minutes.

I’ve set up a Google Cloud VM to automate the processing with. When running on this machine, the downloading is much faster (10 minutes) but the mosaicking is taking incredibly long or is getting stuck.

My local machine is a MacBook Pro with 4 CPU cores and 16GB RAM and a 256GB SSD.

The VM is running Debian with 2 cores and 16GB RAM and a 100GB hard drive.

The image below shows the Disk I/O while processing. You can see a spike in red which is the reads and then it drops down. The writes then spike in blue but then plateau at 12 MiB/s for over two hours. The final mosaic is about 10GB so this can’t be the writing of the mosaic (12 MiB/s * 2 hours = 86400 MiB).
enter image description here

I’m wondering if there’s any chance the code could be getting stuck somehow in some sort of loop. I don’t see why else it would keep writing data. Also, the disk space remains constant from the start at around 10GB so it’s not just writing additional files or anything like that.

This might totally be an issue with GCP but I figured I’d ask here to see if I could get any help.

One Answer

I managed to sort things out. As I suspected, the issue had to do with GCP.

If it helps anyone else out there, the slowing was due to the slow IO on the GCP disks. The speed depends on the type of disk being used as well as how large the disk is, with the larger ones having greater speeds. I wasn't aware of this but I managed to run a test that took 17 minutes on my laptop and 204 minutes on an n2-highmem-2 machine with a 100GB standard persistent disk. It took 53 minutes with the same machine but 100GB SSD. This is to be expected as this SSDs have 4x disk throughput.

I'll be playing around with what makes sense to do, but I at least know the issue.

Answered by Nart Barileva on February 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP