TransWikia.com

Fastest way to extract a raster in R (improve the time of my reproducible code)

Geographic Information Systems Asked by Neal Barsch on April 9, 2021

I’m wondering if I have maximized the speed at which a mean of an area buffered around a point in a raster can be extracted.

Fastest way I have ever found to extract a raster, with the pre-cropping suggestion by @dbaston:

If you have the velox raster already (even if you have to buffer the shape dynamically), this is lightning:

test7_LIGHTNING <- system.time(mclapply(seq_along(x1), function(x){
  q <- vras$crop(poly);vras$extract(poly, fun=function(t) mean(t,na.rm=T))
}))

> test7_LIGHTNING
      user  system elapsed 
     0.001   0.005   0.355 

These are still quick if you need to dynamically load velox raster (simulating loading different rasters across a set, but here loading the same raster repetitively from sample reproducible data below):

test8 <- system.time(mclapply(seq_along(x1), function(x){ ras<-velox("testras_so.tif");ras$crop(poly);ras$extract(poly, fun=function(t) mean(t,na.rm=T)) }))

test9 <- system.time(mclapply(seq_along(x1), function(x){ ras<-velox("testras_so.tif");ras$crop(st_buffer(st_sfc(st_point(c(x1[x],y1[x]), dim="XY"),crs=32651),200000));ras$extract(poly, fun=function(t) mean(t,na.rm=T)) }))


> test8
   user  system elapsed 
  0.011   0.016   4.450 
> test9
   user  system elapsed 
  0.006   0.012   4.333 

ORIGINAL QUESTION:
Can performance be improved any further on these LOCALLY?
I use parallel mclapply already, and I know I could get further gains by setting up and running this on a cluster (use a cluster or get more cpu’s is not the answer I’m looking for).

Replicate some data:

library(raster)
library(parallel)
library(truncnorm)
library(gdalUtils)
library(velox)
library(sf)
ras <- raster(ncol=1000, nrow=1000, xmn=2001476, xmx=11519096, ymn=9087279, ymx=17080719)
ras[]=rtruncnorm(n=ncell(ras),a=0, b=10, mean=5, sd=2)
crs(ras) <- "+proj=utm +zone=51 ellps=WGS84"

writeRaster(ras,"testras_so.tif", format="GTiff")

gdalbuildvrt(gdalfile = "testras_so.tif", 
             output.vrt = "testvrt_so.vrt")

x1 <- runif(100,2001476,11519096)
y1 <- runif(100, 9087279,17080719)

poly <- st_buffer(st_sfc(st_point(c(x1[1],y1[1]), dim="XY"),crs=32651),200000)
vras <- velox("testvrt_so.vrt")
###########

Tests (test1: if have poly and velox raster, test2: if have to generate buffer but have velox raster, test3: if have to generate velox from VR (simulating having different rasters) but having the buffer, test4: have to generate both (from VR):test5: generate velox from tif have buffer, test6: generate both (tif version).

#Test time if have poly and velox raster
test1 <- system.time(mclapply(seq_along(x1), function(x){
  vras$extract(poly, fun=function(t) mean(t,na.rm=T))
}))

#Test time if have to generate buffer but have velox raster
test2 <- system.time(mclapply(seq_along(x1), function(x){
  vras$extract(st_buffer(st_sfc(st_point(c(x1[x],y1[x]), dim="XY"),crs=32651),200000), fun=function(t) mean(t,na.rm=T))
}))

#Test time if have to generate velox from VR (simulating having different rasters) but having the buffer
test3 <- system.time(mclapply(seq_along(x1), function(x){
  velox("testvrt_so.vrt")$extract(poly, fun=function(t) mean(t,na.rm=T))
}))

#Test time if have to generate velox from VR AND generate buffer (simulating a list of rasters with different buffers each)
test4 <- system.time(mclapply(seq_along(x1), function(x){
  velox("testvrt_so.vrt")$extract(st_buffer(st_sfc(st_point(c(x1[x],y1[x]), dim="XY"),crs=32651),200000), fun=function(t) mean(t,na.rm=T))
}))

#Test time if have to generate velox from TIF (simulating having different rasters) but having the buffer
test5 <- system.time(mclapply(seq_along(x1), function(x){
  velox("testras_so.tif")$extract(poly, fun=function(t) mean(t,na.rm=T))
}))

#Test time if have to generate velox from TIF AND generate buffer (simulating a list of rasters with different buffers each)
test6 <- system.time(mclapply(seq_along(x1), function(x){
  velox("testras_so.tif")$extract(st_buffer(st_sfc(st_point(c(x1[x],y1[x]), dim="XY"),crs=32651),200000), fun=function(t) mean(t,na.rm=T))
}))

My results (yours will vary with cores due to mclapply running parallel):

   > test1
   user  system elapsed 
  0.007   0.022   3.501 
> test2
   user  system elapsed 
  0.008   0.025   3.851 
> test3
   user  system elapsed 
  0.018   0.036  10.546 
> test4
   user  system elapsed 
  0.020   0.042  10.754 
> test5
   user  system elapsed 
  0.015   0.034   9.143 
> test6
   user  system elapsed 
  0.015   0.033   9.364

Can anybody make any suggestions to make this faster or have I maxed local speed here?

One Answer

I get much faster results with velox if I crop the raster before running extract, e.g.:

r <- velox("testras_so.tif")
r$crop(poly)
r$extract(poly)

I've also been working on a package with an optimized extract function that may be of interest:

library(exactextractr)

exact_extract(ras, poly)                # get a matrix with weights and values,
                                        # as with raster::extract(, weights=TRUE)
exact_extract(ras, poly, weighted.mean) # pass weights and values to an R function
exact_extract(ras, poly, 'mean')        # use a built-in stat

For this case, runtimes look comparable to velox but take into account partially covered cells (comparable to weights=TRUE with raster::extract).

library(microbenchmark)

v <- velox(ras)
microbenchmark(
  v$extract(poly),
  {q <- v$copy(); q$crop(poly); q$extract(poly)},
  exact_extract(ras, poly)
)

# Unit: milliseconds
#                     expr      min       lq     mean   median       uq       max neval
# v$extract(poly)          38.937180 39.749301 41.703686 40.247412 41.217893 56.372611   100
# { q <- v$copy()...        2.510894  2.639588  2.769510  2.725196  2.844109  4.850091   100
# exact_extract(ras, poly)  1.818466  2.039748  3.193071  2.255309  2.382927 35.236612   100

Correct answer by dbaston on April 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP