Zonal statistics using two raster, one with zones and the other with data using Python

Question

I am trying to calculate the mean of a raster using zone data from another raster. The zone data was created using the Whitebox clump tool where each group of pixels have a unique ID number. There are ~800,000 unique clumps and the rasters are fairly big (27700, 31511) so I am avoiding converting the clumps to vector format because this causes memory issues. I would like the output of this analysis to be another raster where the clump IDs in the original raster are replaced with the mean value of that clump or a table containing the clump IDs and mean value of that clump. This process is the same as ArcMaps Zonal Statistics but I would like to use python and open source packages. I tried this using the code below that I wrote but it is way too slow for the size of data I'm working with.
clumps = raster of clumps 
IDs = np.unique(clumps)
values = raster of values
means = clumps
for id in IDs:
    mask = clump
    mask = np.where(mask == id, 1, 0)
    mean_value = mask*values
    mean_value[mean == 0] = np.nan
    mean_value = np.nanmean(mean_value )
    mean_value = mean_value.astype(np.int64)
    means[means == id] = mean_value

Andrew · Answer

I found a solution by tiling the input rasters and running the above code for each tile. I used the clump output from Whitebox Tools as my zone data.

WhiteboxDev · Answer

You could use WhiteboxTools' ZonalStatistics tool to do this. As a Python script, it would look like the following:
wbt.zonal_statistics(
    i="raster.tif", 
    features="clumps.tif", 
    output="mean.tif", 
    stat="mean", 
    out_table=None
)

There should be no problem running this tool with 800,000 features from the clump output.

Zonal statistics using two raster, one with zones and the other with data using Python

2 Answers

Add your own answers!

Ask a Question