TransWikia.com

Reading raw data into geopandas

Geographic Information Systems Asked on December 15, 2021

Is it possible to read raw data into a geopandas GeoDataFrame, a la a pandas DataFrame?

For example, the following works:

import io
import pandas as pd
import requests
data = requests.get("https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON")
pd.read_json(io.BytesIO(data.content))

The following does not:

import geopandas as gpd
import io
import requests
data = requests.get("https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON")
gpd.read_file(io.BytesIO(data.content))

In other words, is it possible to read geospatial data that’s in memory without saving that data to disk first?

7 Answers

As indicated by @littlexsparkee, geopandas can now read known file formats directly from url's (this is possible since version 0.4), e.g.:

import geopandas as gpd

geojson_url = "https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON"
gdf1 = gpd.read_file(geojson_url)

gpkg_url = 'http://www.geopackage.org/data/gdal_sample.gpkg'
gdf2 = gpd.read_file(gpkg_url)

zip_url = 'https://www2.census.gov/geo/tiger/TIGER2010/STATE/2010/tl_2010_31_state10.zip'
gdf3 = gpd.read_file(zip_url)

Since Geopandas 0.8 it is also possible to directly read filelike objects. The example in the question now works for instance:

import geopandas as gpd
import io
import requests

request = requests.get("https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON")
gpd.read_file(io.BytesIO(request.content))

or, similarly, for a geopackage

request = requests.get('http://www.geopackage.org/data/gdal_sample.gpkg')
gpd.read_file(io.BytesIO(request.content))

(I have not managed to reproduce this for shapefiles or zip-files however.)

See the geopandas docs for some more examples.

Answered by onietosi on December 15, 2021

I prefer the result obtained by using the undocumented GeoDataFrame.from_features() rather than passing the GeoJSON to the GDF constructor directly:

import geopandas as gpd
import requests
data = requests.get("https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON")
gpd.GeoDataFrame().from_features(data.json())

Output

                       geometry                         name                                url           line objectid                                              notes
0    POINT (-73.99107 40.73005)                     Astor Pl  http://web.mta.info/nyct/service/  4-6-6 Express        1  4 nights, 6-all times, 6 Express-weekdays AM s...
1    POINT (-74.00019 40.71880)                     Canal St  http://web.mta.info/nyct/service/  4-6-6 Express        2  4 nights, 6-all times, 6 Express-weekdays AM s...
2    POINT (-73.98385 40.76173)                      50th St  http://web.mta.info/nyct/service/            1-2        3                              1-all times, 2-nights
3    POINT (-73.97500 40.68086)                    Bergen St  http://web.mta.info/nyct/service/          2-3-4        4           4-nights, 3-all other times, 2-all times
4    POINT (-73.89489 40.66471)             Pennsylvania Ave  http://web.mta.info/nyct/service/            3-4        5                        4-nights, 3-all other times

The resulting GeoDataFrame has the geometry column set correctly and all the columns as I would expect, without needing to unnest any FeatureCollections

Answered by dericke on December 15, 2021

When using Fiona 1.8, this can (must?) be done using that project's MemoryFile or ZipMemoryFile.

For example:

import fiona.io
import geopandas as gpd
import requests

response = requests.get('http://example.com/Some_shapefile.zip')
data_bytes = response.content

with fiona.io.ZipMemoryFile(data_bytes) as zip_memory_file:
    with zip_memory_file.open('Some_shapefile.shp') as collection:
      geodf = gpd.GeoDataFrame.from_features(collection, crs=collection.crs)

Answered by esmail on December 15, 2021

The easiest way is inputting the GeoJSON URL directly into the gpd.read_file() function. I'd tried extracting a shapefile from a zip before this using BytesIO & zipfile and had issues with gpd (specifically Fiona) accepting file-like objects.

import geopandas as gpd
import David.SQL_pull_by_placename as sql
import os

os.environ['PROJ_LIB'] = r'C:UserslittlexsparkeeAnaconda3Libraryshareproj'

geojson_url = f'https://github.com/loganpowell/census-geojson/blob/master/GeoJSON/500k/2018/{sql.state}/block-group.json?raw=true'
census_tracts_gdf = gpd.read_file(geojson_url)

Answered by littlexsparkee on December 15, 2021

Since fiona.BytesCollection doesn't seem to work for TopoJSON here an solution that works for all without the need of gdal:

import fiona
import geopandas as gpd
import requests

# parse the topojson file into memory
request = requests.get('https://vega.github.io/vega-datasets/data/us-10m.json')
visz = fiona.ogrext.buffer_to_virtual_file(bytes(request.content))

# read the features from a fiona collection into a GeoDataFrame
with fiona.Collection(visz, driver='TopoJSON') as f:
    gdf = gpd.GeoDataFrame.from_features(f, crs=f.crs)

Answered by Mattijn on December 15, 2021

Yes, it is possible now with Fiona (see https://github.com/Toblerity/Fiona/issues/409). I'm not sure if this feature is exposed yet in Geopandas.

Answered by sgillies on December 15, 2021

You can pass the json directly to the GeoDataFrame constructor:

import geopandas as gpd
import requests
data = requests.get("https://data.cityofnewyork.us/api/geospatial/arq3-7z49?method=export&format=GeoJSON")
gdf = gpd.GeoDataFrame(data.json())
gdf.head()

Outputs:

                                            features               type
0  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection
1  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection
2  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection
3  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection
4  {'type': 'Feature', 'geometry': {'type': 'Poin...  FeatureCollection

For supported single-file formats or zipped shapefiles, you can use fiona.BytesCollection and GeoDataFrame.from_features:

import requests
import fiona
import geopandas as gpd

url = 'http://www.geopackage.org/data/gdal_sample.gpkg'
request = requests.get(url)
b = bytes(request.content)
with fiona.BytesCollection(b) as f:
    crs = f.crs
    gdf = gpd.GeoDataFrame.from_features(f, crs=crs)
    print(gdf.head())
and for zipped shapefiles (supported as of fiona 1.7.2)
url = 'https://www2.census.gov/geo/tiger/TIGER2010/STATE/2010/tl_2010_31_state10.zip'
request = requests.get(url)
b = bytes(request.content)
with fiona.BytesCollection(b) as f:
    crs = f.crs
    gdf = gpd.GeoDataFrame.from_features(f, crs=crs)
    print(gdf.head())

You can find out what formats Fiona supports using something like:

import fiona
for name, access in fiona.supported_drivers.items():
    print('{}: {}'.format(name, access))

And a hacky workaround for reading in-memory zipped data in fiona 1.7.1 or earlier:

import requests
import uuid
import fiona
import geopandas as gpd
from osgeo import gdal

request = requests.get('https://github.com/OSGeo/gdal/blob/trunk/autotest/ogr/data/poly.zip?raw=true')
vsiz = '/vsimem/{}.zip'.format(uuid.uuid4().hex) #gdal/ogr requires a .zip extension

gdal.FileFromMemBuffer(vsiz,bytes(request.content))
with fiona.Collection(vsiz, vsi='zip', layer ='poly') as f:
    gdf = gpd.GeoDataFrame.from_features(f, crs=f.crs)
    print(gdf.head())

Answered by user2856 on December 15, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP