TransWikia.com

R raster extract() and keep the original shapefile attributes

Geographic Information Systems Asked by ODstuck on December 24, 2020

I use the R function extract() in order to extract the values from a raster dataset. The shapefile that I use with the extract() function has some columns/attributes -for example the unique ID and a characteristic of each polygon- that I need to remain within the extracted dataframe.
The readOGR function returns a SpatialPolygonsDataframe that has a “data” dataframe and a “polygons” list.
I would need the extracted dataframe with extract(raster, shapefile, df=TRUE) to associate the extracted values with the shapefile “data” attributes, also to be sure of what is really what (maybe someone can explain -if it is relevant- in order to avoid possible confusion, about the attribute from readOGR named “plotOrder”).

A sample code:

require(sp)
require(raster)

setwd('FILEPATH')
shapefilepath <- 'SHAPEFILEPATH'
rasterpath <-'RASTERFILEPATH'
shp <- readOGR(dsn = ".", layer = "SHAPEFILENAME_withoutDOTshp")
pts <- slot(shp, "data")
img <- brick(rasterpath)
rastervalues <- extract(img, shp, cellnumbers=TRUE, df=TRUE)

So, this returns a dataframe that does not include the associated “data” from the SpatialPolygonsDataframe from readOGR().
Any way to overcome this?

One Answer

The results are ordered to the points being extract so, there is a one-to-one row match. You can join them after running the function or on-the-fly by just doing some like:

shp@data <- data.frame(shp@data, extract(img, shp, cellnumbers=TRUE))

However, keep in mind that certain extract arguments or data types return a list object. This is the case if the extract buffer argument is used with points or if the vector data passed to extract is a polygon. Each element in the list represents the multiple values associated with each vector feature. To get this type of data so that it can be joined back to the vector, one must summarize it so that it is represented as a matching length vector or data.frame. This can be done with a function such as lapply. I would be reluctant to force output to a data.frame (ie., df=TRUE) as this limits the use of lapply. Here is a worked example for SINGLEPART polygons.

First, create some data that represents SINGLEPART polygons and a raster stack.

data(meuse)
  coordinates(meuse) <- ~x+y
  proj4string(meuse) <- CRS("+init=epsg:28992")

b <- rgeos::gBuffer(meuse[sample(1:nrow(meuse),5),], 
                    byid = TRUE, width = 250)

r <- raster(extent(meuse), resolution=30,
             crs=CRS("+init=epsg:28992"))
    r[] <- runif(ncell(r))
r <- stack(r, focal(r, gaussian.kernel(sigma=2, n=11), mean))

plot(r[[2]])
  plot(b,add=TRUE)

Here we extract raster values for each polygon and let the extract function summarize the data for us. This can be easily be related back to your vector data as the results are ordered to your vector data.

( e <- extract(r, b, fun=mean, na.rm=TRUE) )
  ( b@data <- data.frame(b@data, e) )

Alternately, we can extract the values to a list object and summarize it our-self. This results in a list object containing a data.frame (with columns for each raster) for each polygon.

( e <- extract(r, b) )

Here we display the number of elements in list and dimensions of data.frame in each list element

length(e)
lapply(e, dim)

Now we can use lapply to apply a mode function to our data. I wrap lapply in do.call (using rbind) to collapse the results into a matrix rather than returning a list.

mode <- function(x){
  d <- stats::density(x[!is.na(x)], kernel = "gaussian")
  return(d$x[d$y == max(d$y)])
}

e.mode <- do.call(rbind, lapply(e, FUN=function(x) apply(x, MARGIN=2, FUN=mode)))
( b@data <- data.frame(b@data, e.mode) )

Things get complicated when your data is MULTIPART geometry (many features associated with single rows in the attributes @data data.frame). In this case values associated with all of the features need to be aggregated down to the attribute level (each row in @data). This could plausibly be done with a function such as tapply that can produce a summary based on an aggregating value (eg., polygon ID). In the case of MULTIPART geometry I would highly recommend coercing to SINGLEPART (each feature has a row in @data) using a function such as explode in the spatialEco package. A sure way to check MULTIPART in sp objects is to look at the dimension of the object using dim(x) and compare that to the dimensions of the data slot dim(x@data). If they are different then it is MULTIPART. In sf objects the geometry column will clearly show MULTIPART classes.

Answered by Jeffrey Evans on December 24, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP