TransWikia.com

Spatially merging NetCDF climate grids?

Geographic Information Systems Asked by Trevor J. Smith on May 3, 2021

I’m currently working with a large number of CF-compliant NetCDF climate grids with global coverage that are organized as tiles across a file structure pertaining to particular “variables” and “years” (e.g. precipitation-year1950-001.nc, precipitation-year1961-125.nc, tmax-year2008-650.nc, etc.). They’re currently organized in a sensible file structure (:project/variable/year/files.nc)

For each variable*year, I would like to spatially merge the tiles (similar in call to gdal_merge outputfile.tif /path/to/data/*.tif) using a walking method and I’m wondering what would be the most efficient/reliable way of doing so.

From my research, two (labour-intensive) methods come to mind:

  1. Performing a walking loop in BASH that calls the Climate Data Operators (CDO) function mergegrid. Since mergegrid can only accept two input files for each output, even if I were to merge them efficiently, I would need to call it > 400x for each variable*year.

  2. Performing a walking loop in Python with the NetCDF4 and NumPy libraries that creates an empty array the size of the the Earth, which reads each NetCDF to an np.array and writes it out to the new NetCDF, appends the metadata, and iterates through the variables*years.

Ideally, I need a method that is easily replicated after I’ve written the function out and if I can stay in Python3, that would be great. What I would like to know is which is the better way of merging lots of nested NetCDF data or is there another method/modules that I haven’t come across that would be better at performing large spatial merges of NetCDF data.

One Answer

I did some reading around and came across obscure CDO documentation for the collgrid function. With that, I was able to finally perform this series of merges in python by walking through files and performing a subprocess call to CDO using something similar to the following:

#!/bin/python3

import os
import subprocess

source = "/path/to/source"
destination = "path/to/destination"

nc_filenames = []
nc_f = {}
for dirpath, subdirs, files in os.walk(source):
    print(dirpath)
    for f in files:
        if f.endswith(".nc"):
            nc_filenames.append(f)
            nc_f[f] = os.path.join(dirpath, f)

filegroups = []
for f in nc_filenames:
    filegroups.append('-'.join('.'.join(f.split('.')[0:-1]).split('-')[0:-1]))

tiles = range(1000)

for group in set(filegroups):
    if group is not '':
        infiles = []
        output = str(group) + ".nc"
        for i in tiles:
            file = str(group) + '-' + i + ".nc"
            infiles.append(nc_f[file])
        inputs = ' '.join(infiles)
        outputs = os.path.join(destination, output)
        subprocess.call(["cdo", "-z", "zip", "collgrid", inputs, outputs])

Correct answer by Trevor J. Smith on May 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP