Choosing projection for running polygon intersections at global scale (i.e, geodesic intersections)?

Question

How do I choose a projection for intersecting two global polygon datasets in WGS-84 (4326), and get accurate estimates of their overlap area?
I searched the internet for answers to this question, and all turned out to be wrong I think, except for the lovely projection table posted by Jochen Albrecht.
So, when you are intersecting two polygons in 4326, you will need currently in R (sf package, as of 12/2020) to reproject them into a common projection, run the intersection, and then back-transform them into 4326 to get a correct estimate of the overlap area (using sf::st_area).
Most sources (e.g. Chapter 6 Reprojecting geographic data of Geocomputation with R) will recommend an azimuthal equidistant or equal-area projection centered on the focal polygon, which if you have lots of polygons (I have >200K), means a for-loop.  Not that slow actually, but many of the suggested projections will do geodesic intersections between polygons with some bias.

Matt F · Answer

Jochen's table led me to an accurate answer (I think/hope).
Here is the answer I came up with.  First, if you can run this in ArcGIS Pro, go for it, because it has a geodesic option, but it is not a possibility for my analysis (too much data, etc.).  Second, try the new s2 functionality in sf in R. Sadly, I could not use the new s2 functionality since my data kept throwing validity errors. But it turns out there may be a simple solution.
If you have a global dataset that stretches over the international dateline, the first step is to clip it in two at the dateline.  That solves a lot of problems and makes the solution below possible. if you can't do that, and the solutions above don't work, try the local azimuth solution (see below).
So I did some trial analyses, and I think what you actually want is a projection that preserves SHAPE, not distance or area.  And the best projections for shape are cylindrical, which means that sf's basic equirectangular projection is best for geodesic intersection (i.e., the default st_intersection on 4326 data, which throws an error you can ignore).
Here's an example set of analyses that show that equirectangular works best, with zero bias in area estimation between two large example polygons.  An equidistant azimuthal projection, centered on the local polygon centroid, had the second lowest bias.
library(sf)

#  create a polygon box (pol) centered on 0,0 lat/long, a second shifted half a degree east (pol2, and a third (pol1b) representing the area of overlap.

center_loc = 0
a=center_loc-0.5
b=center_loc+0.5
pol = st_polygon(list(rbind(c(a,a), c(b,a), c(b,b), c(a,b), c(a,a))))
pol2 = st_polygon(list(rbind(c(0,a), c(1,a), c(1,b), c(0,b), c(0,a))))
pol1b=st_polygon(list(rbind(c(0,a), c(b,a), c(b,b), c(0,b), c(0,a))))

plot(pol); plot(pol2, add=TRUE); plot(pol1b, add=TRUE, col="red")

pol=st_sfc(pol)
st_crs(pol)=4326
pol2=st_sfc(pol2)
st_crs(pol2)=4326
pol1b=st_sfc(pol1b)
st_crs(pol1b)=4326

#  okay!  now you can calculate the area of each:
st_area(pol)
st_area(pol2)
st_area(pol1b); truval=st_area(pol1b)

#  now to transform the focal poly,  pol, to a few different azimuthal systems.
pol_aeqd=st_transform(pol, "+proj=aeqd +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs") #aeqd
pol_l=st_transform(pol, "+proj=laea") #lambert
pol_st=st_transform(pol, "+proj=stere") #stereographic

#  now to do the same for pol2 (note that the projection is centered on the neighboring polygon, which we would do for a local azimuthal projection) 
pol2_aeqd=st_transform(pol2, "+proj=aeqd +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs") #aeqd
pol2_l=st_transform(pol2, "+proj=laea") #lambert
pol2_st=st_transform(pol2, "+proj=stere") #stereographic

#  and to intersect and back convert to 4326.
inter_a=st_intersection(pol_aeqd, pol2_aeqd)
inter_l=st_intersection(pol_l, pol2_l)
inter_st=st_intersection(pol_st, pol2_st)
fin_a=st_transform(inter_a, st_crs(4326))
fin_l=st_transform(inter_l, st_crs(4326))
fin_st=st_transform(inter_st, st_crs(4326))

#  I added equirectangular here for fun...
inter_fun=st_intersection(pol, pol2)

#  now for the results, showing the bias of different projections.
st_area(fin_a); truval-st_area(fin_a)
st_area(fin_l); truval-st_area(fin_l)
st_area(fin_st); truval-st_area(fin_st)
st_area(inter_fun); truval-st_area(inter_fun)

#  So it appears that equirectangular does the best job maintaining shape, so it is the most accurate for intersection, as long as you back-transform to 4326 for overlap area calc following intersection.

##  and it hold true if you go to a random location, like y+20, x+100
coords=rbind(c(a,a), c(b,a), c(b,b), c(a,b), c(a,a))
coord_rhalf=rbind(c(0,a), c(b,a), c(b,b), c(0,b), c(0,a))
plot(coords[,1]+10, coords[,2])
pol3=st_polygon(list(cbind(coords[,1]+100, coords[,2]+20)))
pol3b=st_polygon(list(cbind(coords[,1]+100.5, coords[,2]+20)))
pol3c=st_polygon(list(cbind(coord_rhalf[,1]+100, coord_rhalf[,2]+20)))
pol3=st_sfc(pol3); st_crs(pol3)=4326
pol3b=st_sfc(pol3b); st_crs(pol3b)=4326
pol3c=st_sfc(pol3c); st_crs(pol3c)=4326

st_area(pol3); st_area(pol3b); st_area(pol3c); truval2=st_area(pol3c)
#
st_area(st_intersection(pol3, pol3b)); truval2-st_area(st_intersection(pol3, pol3b))
pol3_a=st_transform(pol3, "+proj=aeqd +lat_0=20 +lon_0=100 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs") #aeqd
pol3b_a=st_transform(pol3b, "+proj=aeqd +lat_0=20 +lon_0=100 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs") #aeqd
st_intersection(pol3_a, pol3b_a) %>% st_transform(st_crs(4326)) %>% st_area()
truval2-st_intersection(pol3_a, pol3b_a) %>% st_transform(st_crs(4326)) %>% st_area()
#

Robert Hijmans · Answer

You should not use a projection (except for making maps; or when you must use an algorithm that really cannot deal with angular data). For area computations in R, you should use the lon/lat data. That is the best way to avoid inaccuracies. The thing to consider is the vertex density of the polygons. If that is too low, the intersection does not happen at the right (close enough) place. But you can increase the vertex density (see my example below). If low vertex density is a problem, making lon/lat intersections inaccurate, projecting the data to a planar crs does not help at all. That is a common, but big, misconception.
Here is an extreme example. I do not know sf very well, so I use sp/raster/geosphere
Example data
library(raster)
crs = "+proj=longlat +datum=WGS84"
# very large polygons with very few vertices    
p1 <- spPolygons(rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60), c(-180,-20)), crs=crs)
p2 <- spPolygons(rbind(c(-125,0), c(0,60), c(40,5), c(15,-45), c(-125,0)), crs=crs)

First the naïve approach
x1 <- intersect(p1, p2) 
raster::area(x1) / 1e+12 
#[1] 50.02117

plot(bind(p1, p2), border=c("red", "blue"), lwd=5)
plot(x1, add=TRUE, density=20)

Now add vertices (in this case I ask for at least one every 1000 m)
pp1 <- geosphere::makePoly(p1, interval=1000, sp=TRUE)
pp2 <- geosphere::makePoly(p2, interval=1000, sp=TRUE)

The number of vertices was
nrow(geom(p1)) - 1
#[1] 4

And is now
nrow(geom(pp1))-1
[1] 40785

Try again. The area of intersection is 2.5 times larger.
x2 <- intersect(pp1, pp2) 
raster::area(x2) / 1e+12 
#[1] 127.8921
 
plot(bind(pp1, pp2), border=c("red", "blue"), lwd=5)
plot(x2, add=TRUE, density=20)

I am implementing these things in terra to streamline and speed things up.
A comparison using a projection:
crs2 <- "+proj=sinu"
p1q <- spTransform(p1, crs2)
p2q <- spTransform(p2, crs2)
qi <- intersect(p1q, p2q)

Areas are way off
area(p1) / 1e+12
#[1] 211.2407
area(p1q) / 1e+12
#[1] 127.7939
area(p2) / 1e+12
#[1] 161.2815
area(p2q )/ 1e+12
#[1] 107.1201

And the intersected area is about as bad as the naive approach
area(x1) / 1e+12
#[1] 50.02117
area(x2) / 1e+12
#[1] 127.8921
area(qi) / 1e+12
#[1] 56.00833

plot(bind(p1q, p2q), border=c("red", "blue"), lwd=5)
plot(qi, add=TRUE, density=20)

From the comment (sinu is not a great example):
 crs3 <- "+proj=eqc"; 
 p1q <- spTransform(p1, crs3); 
 p2q <- spTransform(p2, crs3); 
 qi <- intersect(p1q, p2q); 
 qi2 <-spTransform(qi, crs); 
 area(x1) / 1e+12; 
 #[1] 50.02117
 area(qi2) / 1e+12
 #[1] 50.02117

Notes (in repsonse to the comments):
raster::intersect is a wrapper for rgeos::gIntersection that also keeps the attributes (but the latter should be faster if you do not care about that). You can also use sf::st_intersection. Since you are not projecting, the data-line should not be an issue?.
You can find out if it matters by changing interval in makePoly. You can check the number of vertices with nrow(geom(p1) (see above). My experience with the WDPA is that it has poor topology (e.g. self intersections), but much detail, so none of this might be of much practical concern.
Whether adding vertices adds unintended realism would depend on how the data were created. If they were GPS points it would avoid losing accuracy and I it would be good practice (best practice would be an algorithm that finds the correct intersection without such help, e.g. s2)). But the boundaries are more likely from tracing on a map. Then it would depend on the underlying map projection that was used for that. So it may be impossible to really know?

Choosing projection for running polygon intersections at global scale (i.e, geodesic intersections)?

2 Answers

Add your own answers!

Ask a Question