Geographic Information Systems Asked by enbecker on July 31, 2021
I am working with geospatial data to be used to make calls to the Foursquare API and conduct cluster analysis on extracted data. To do this, it is helpful if I can project data using both GeoPandas.to_crs()
when wrangling entire GeoDataFrames, and PyProj.Transformer.transform(
) when dealing with NumPy Arrays and Pandas DataFrames. While doing this, I came across an issue where .to_crs()
and .transform()
produce different projection results (off by anywhere from 300-600m) even though I am using the same input data and the same string arguments when designating the CRS for transformation (UTM Zone 33 North).
In order to define the bounds of my sampling grid, I dissolve district-level polygons from my original GeoDataFrame to get an aggregated polygon of the entire area of Berlin. To do this, I apply GeoPandas.to_crs()
to produce the berlin_polyg_utm
object and everything appears fine when I am only using .to_crs()
but as soon as I use Pyproj.Transformer.transform()
it becomes clear I am getting different results from these two methods, when they should be the same since GeoPandas
is built on PyProj
.
In similar questions I have seen here problems were found concerning axis order (lat/lon instead of lon/lat when Proj assumes x/y) and failure to set the correct CRS when creating the GeoDataFrame, but after trying this it has not fixed this issue. I do not believe this is the issue as the polygons are stored in the lon/lat format in my GeoDataFrames, but perhaps I am not understanding the full mechanics of GeoPandas.
For the sake of thoroughness I tried using both authority string and proj string arguments when creating the transformer object, and it produces the same discrepancies between transformed values.
The answer below has shown that issue is due to transforming the bounds individually rather than transforming the entire geometry, so now I am avoiding use transformations on individual points in this case.
Full relevant code for this question is shown below:
# set up environment
import numpy as np
import pandas as pd
import pyproj
import geopandas as gpd
from geopy.geocoders import Nominatim
from shapely.geometry import shape, Polygon, Point
from shapely.ops import transform
# extract data from geojson file at a url
urldistr = 'https://tsb-opendata.s3.eu-central-1.amazonaws.com/bezirksgrenzen/bezirksgrenzen.geojson'
distr_gdf = gpd.read_file(urldistr)
distr_gdf = distr_gdf[['Gemeinde_name', 'Land_name','geometry']]
distr_gdf = distr_gdf.rename(columns={'Gemeinde_name':'District Name', 'Land_name':'State Name'})
distr_gdf = distr_gdf.set_crs(epsg=4326, allow_override=True)
# dissolve polygon to get total bounds in order to determine sample grid size
berlin_polyg = distr_gdf[['State Name','geometry']].dissolve(by='State Name')
# project polygon into UTM Zone 33 North using GeoPandas.to_crs()
berlin_polyg_utm = berlin_polyg.to_crs(epsg=32633)
# use GeoPy to get a test point for Berlin to use in testing PyProj projection object
address = 'Berlin Germany'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
berlin = [location.longitude, location.latitude]
# create and test authority string Transformer object
trans2xy = pyproj.Transformer.from_crs("epsg:4326","epsg:32633", always_xy=True)
trans2ll = pyproj.Transformer.from_crs("epsg:32633","epsg:4326", always_xy=True)
print('Berlin center coordinate, Lon= {}; Lat= {}'.format(berlin[0], berlin[1]))
print('Berlin center transform to UTM, X= {}; Y= {}'.format(x,y))
print('Berlin center reverse transform, Lon= {}; Lat= {}'.format(lon, lat))
# create and test proj string Transformer object
transform2xy = pyproj.Transformer.from_crs("+proj=latlong +datum=WGS84 +ellps=WGS84" , "+proj=utm +zone=33 +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
transform2ll = pyproj.Transformer.from_crs("+proj=utm +zone=33 +datum=WGS84 +ellps=WGS84" , "+proj=latlong +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
print('Berlin center coordinate, Lon= {}; Lat= {}'.format(berlin[0], berlin[1]))
print('Berlin center transform to UTM, X= {}; Y= {}'.format(x,y))
print('Berlin center reverse transform, Lon= {}; Lat= {}'.format(lon, lat))
# finding bounds using GeoPandas.to_crs() projected Multipolygon
minx = berlin_polyg_utm.bounds.iloc[0,0]
maxx = berlin_polyg_utm.bounds.iloc[0,2]
miny = berlin_polyg_utm.bounds.iloc[0,1]
maxy = berlin_polyg_utm.bounds.iloc[0,3]
print("Min X = {}nMax X = {}nMin Y = {}nMax Y = {}".format(minx ,maxx ,miny, maxy))
berlin_x_range = maxx - minx
berlin_y_range = maxy - miny
print("The length of Berlin's X range is {} meters and Y range is {} meters.".format(
berlin_x_range, berlin_y_range))
berlin_x_radius = berlin_x_range/2
berlin_y_radius = berlin_y_range/2
print("The length of Berlin's X radius is {} meters and Y radius is {} meters.".format(
berlin_x_radius, berlin_y_radius))
# transformation on individual bounds using authority string
minx1, miny1 = trans2xy.transform(minlon, minlat)
maxx1, maxy1 = trans2xy.transform(maxlon, maxlat)
print("Min X = {}nMax X = {}nMin Y = {}nMax Y = {}".format(minx1 ,maxx1 ,miny1, maxy1))
berlin_x_range1 = maxx1 - minx1
berlin_y_range1 = maxy1 - miny1
print("The length of Berlin's X range is {} meters and Y range is {} meters.".format(
berlin_x_range1, berlin_y_range1))
berlin_x_radius1 = berlin_x_range1/2
berlin_y_radius1 = berlin_y_range1/2
print("The length of Berlin's X radius is {} meters and Y radius is {} meters.".format(
berlin_x_radius1, berlin_y_radius1))
# transformation on individual bounds using proj string
minx2, miny2 = transform2xy.transform(minlon, minlat)
maxx2, maxy2 = transform2xy.transform(maxlon, maxlat)
print("Min X = {}nMax X = {}nMin Y = {}nMax Y = {}".format(minx2 ,maxx2 ,miny2, maxy2))
berlin_x_range2 = maxx2 - minx2
berlin_y_range2 = maxy2 - miny2
print("The length of Berlin's X range is {} meters and Y range is {} meters.".format(
berlin_x_range2, berlin_y_range2))
berlin_x_radius2 = berlin_x_range2/2
berlin_y_radius2 = berlin_y_range2/2
print("The length of Berlin's X radius is {} meters and Y radius is {} meters.".format(
berlin_x_radius2, berlin_y_radius2))
# using PyProj Transformer class on entire polygon to confirm cause of different results
berlin_shape = shape(berlin_polyg.loc['Berlin']['geometry'])
shape2xy = pyproj.Transformer.from_crs("epsg:4326","epsg:32633", always_xy=True).transform
berlin_polyg_utm1 = transform(shape2xy, berlin_shape)
berlin_polyg_utm1.bounds
These screenshots show the discrepancies in the results from transformations on entire geometries and individual points.
geopandas uses the Transformer
class in the to_crs method. The reason is because it takes into account datum shifts. For more information about the difference, see: https://pyproj4.github.io/pyproj/stable/gotchas.html#proj-not-a-generic-latitude-longitude-to-projection-converter
EDIT: Based on your comment, I think that the issue is that you are transforming the bounds instead of the entire geometry. This can lead to different results depending on your geometry. You can use shapely's transform method to transform the full geometry: https://shapely.readthedocs.io/en/stable/manual.html#other-transformations
Answered by snowman2 on July 31, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP