Parallel process for spatial weights matrix production in R

Question

I have an sf object pData that is rather large (300,000 polygons) and I am trying to produce spatial weights for each polygon:
tilesNb=knn2nb(knearneigh(st_centroid(pData), k = 2))
tilesWeights=nb2listw(tilesNb, style="W")

But the first line ran for hours and then timed out. Is there a way to conduct this step through a parallel process? I tried modifying code from here, but I couldn't quite figure out how to modify their technique for my simpler approach to neighbor matching.

Spacedman · Accepted Answer

You might not need to parallelise if you can use a faster nearest neighbour algorithm. The FNN package has such. Let's test it:
Make 100,000 points:
> xy = cbind(runif(100000),runif(100000))

And get the 2 nearest neighbours.
> kxyF = FNN::knn.index(xy, k=2)

I blinked and it was done. It gives me a 100,000 x 2 matrix of indexes showing the nearest points as rows of xy:
> str(kxyF)
 int [1:100000, 1:2] 82552 71951 71149 16282 14576 88806 87619 19744 55467 60788 ...

Let's try with knearneigh:
> kxySP = knearneigh(xy, k=2)

I'm still waiting. I've typed this whole answer up to here and still waiting...I give up. Here's the output for a smaller example so we can compare the output:
With FNN::knn.index:
> str(kxyF)
 int [1:10000, 1:2] 1618 2026 2426 7932 9634 5257 9092 6751 4080 3892 ...

Note you only get the index. knearneigh also does the work of computing the distances:
> str(kxySP)
List of 5
 $ nn       : int [1:10000, 1:2] 1618 2026 2426 7932 9634 5257 9092 6751 4080 3892 ...
 $ np       : int 10000
 $ k        : num 2
 $ dimension: int 2
 $ x        : num [1:10000, 1:2] 0.3509 0.5742 0.0598 0.5555 0.7335 ...
 - attr(*, "class")= chr "knn"
 - attr(*, "call")= language knearneigh(x = xy, k = 2)

but the indexes look the same as with the fast FNN function.
If you want the distances, they can be computed and since you know the indexes you only have a few distances to compute.
Yes there's might be a little but of manipulation to get the structure you get from nb2listw but I don't see any computational complexity there.

Parallel process for spatial weights matrix production in R

One Answer

Add your own answers!

Ask a Question