TransWikia.com

Using st_distance with apply functions and iterators

Geographic Information Systems Asked by Danple on August 8, 2020

I’m working with a dataset of islands in which I need to calculate the distance between each island and the nearest landmass with an area greater than that island. I can do this successfully with a for loop, but this takes too long and I want to rewrite the code so that it can eventually work with parallel processing and apply functions.

Here is example code for the for loop:

library(rgdal)
library(tidyverse)
library(sf)
library(spData)

alaska <- alaska

#this dissolves all features to remove internal divisions and 
#then converts multi-part polygons to single part. This
#effectively creates a separate feature for each isolated
#geometry (i.e. island)

alaska <- alaska %>% 
  st_buffer(0) %>%
  summarise(area = AREA) %>% 
  st_cast("POLYGON")

#recalculate area for each island
alaska$area <- alaska %>% 
  st_area() %>% 
  units::set_units(value = km^2) %>% 
  as.numeric()

#spatial object with Alaska landmass still included
alaska_all <- alaska 

#spatial object with Alaska landmass removed, leaving only islands
alaska_islands <- alaska[!alaska$area == sort(alaska$area, decreasing = T)[1],]


for (i in 1:nrow(alaska_islands)){
  d <- st_distance(alaska_islands[i,],
                   alaska_all[alaska_all$area > alaska_islands$area[i],],
                   by_element = T)
  alaska_islands$nn[i] <- sort(unique(d))[1]
}

What I’m having trouble figuring out is how to keep the alaska_islands$area[i],] iteration as part of the function. I’ve tried apply, lapply, and mapply and have had limited success. The most successful thus far has been:

apply(st_distance(alaska_islands, alaska_all[alaska_all$area > alaska_islands$area[i],]), 1, min)

but it seems to ignore the iterator. Some islands return the correct values (i.e. same as from the for loop) while others simply return 0.

One Answer

The usual trick to turning a loop over i into an apply style loop is to apply over i and write a function that does the job for argument i. For example:

sapply(1:nrow(alaska_islands), 
  function(i){
    min(
     st_distance(
       alaska_islands[i,],
       alaska_all[alaska_all$area > alaska_islands$area[i],])
     )
     }
  )

gives:

 [1]  59657.8767  25262.5830   5546.5902  51010.3289  18465.6949   4590.8588
 [7]   5137.2093   5379.8038   3629.5807   7366.8446    190.1415   5194.3717
[13]   4258.3425   7284.6550  18060.4840  43035.8838   7747.8683  28012.9886
[19]   8728.0644 176939.7554  33350.3792  51818.7938   2178.2563 343380.1486
[25] 253544.2871   2653.4898  18492.8536  14964.9450   8390.1484  15273.4045
[31] 250720.1601   7470.6006   8685.0207   3328.7401  11072.0120  16976.9256
[37]   7945.7566   6188.7005  38892.5618 158952.4960  12986.4244   3583.9433
[43]   4802.9676  54793.4105   3607.2833  90103.9748  14677.1710  21871.5170
[49] 564263.9674

which is the same as your $nn values from the loop.

That said, it might be easier to make his parallel using foreach from the foreach package. But if you are happy with the parallel versions of apply, then this should get you on your way.

Correct answer by Spacedman on August 8, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP