TransWikia.com

Adding compression to a NetCDF file using xarray

Geographic Information Systems Asked by gwydion93 on December 7, 2020

I have a raster that I am trying to compress and convert to a NetCDF format with compression level = 9 using the xarray package. I assume that the compression is added using the encoding parameter as a dict, but I am not quite sure how I understand what to add completely here:

 f = directory + "/D_Passaic_F02_NBR_E0001_WGS84_comp"
 t = xarray.open_rasterio(f)
 encode = {'zlib': True, 'complevel': 9}
 t.to_netcdf(output_dir+"/Test2.nc", encoding=encode)

I basically get an error: KeyError: 'zlib', but I am not sure what I am suppose to use here. Suggestions?
The data array shows a single band, and an x, and y variable like this:

 <xarray.DataArray (band: 1, y: 9635, x: 14564)>  

        or, in more detail:

 <bound method ImplementsArrayReduce._reduce_method.<locals>.wrapped_func of 
 <xarray.DataArray (band: 1, y: 9635, x: 14564)>
 [140324140 values with dtype=float32]
 Coordinates:
   * band     (band) int32 1
   * y        (y) float64 41.06 41.06 41.06 41.06 ... 40.74 40.74 40.74 40.74
   * x        (x) float64 -74.45 -74.45 -74.45 -74.45 ... -73.97 -73.97 -73.97
 Attributes:
     transform:                 (3.2670488250568696e-05, 0.0, -74.447024371179...
     crs:                       +init=epsg:4326
     res:                       (3.2670488250568696e-05, 3.2670488250568696e-05)
     is_tiled:                  1
     nodatavals:                (-9999.0,)
     scales:                    (1.0,)
     offsets:                   (0.0,)
     AREA_OR_POINT:             Area
     HISTOGRAM:                 9090|9307|9097|9209|8729|8864|8744|8864|9181|9...
     TIFFTAG_ARTIST:            HEC-RAS
     TIFFTAG_IMAGEDESCRIPTION:  Depth (Max)>

One Answer

The error hints at xarray trying to find a variable called "zlib" in your data. The correct structure for the encoding dict would be something like:

encode = {"precipitation": {'zlib': True, ...}}

But due to the way the data was loaded this is tricky.

What you have after loading a file using open_rasterio is a DataArray. A DataArray does not have a structure with variables. When writing to NetCDF your data needs to be a DataSet.

If you call to_netcdf on a DataArray, it will automatically generate a DataSet with a variable for the data called "__xarray_dataarray_variable__". So this will work but the resulting file will be ugly with a name like that...:

encode = {"__xarray_dataarray_variable__": {'zlib': True, ...}}  # yuck!

When you built a DataSet manually you can specify the variable name. For example if your band contains DEM data, something like this might make sense:

dem = t.to_dataset(name="dem")

Afterwards you can specify the encoding dict for that and have a reasonable name in the result:

encoding = {"dem": {'zlib': True, ...}}
dem.to_netcdf("dem.nc", encoding=encoding)

Sadly compression with xarray is very RAM hungry, watch out for your unsaved work before you run it in case your system OOMs.

References:

Correct answer by bugmenot123 on December 7, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP