TransWikia.com

BigQuery GIS table to GeoJSON type Feature Collection with SQL and/or Python

Geographic Information Systems Asked by SeaGo on May 2, 2021

I have some BigQuery tables with spatial data (lon, lat, point, and linestring) representing boat tracks.
I’m trying to get them into a GeoJSON featurecollection
like [https://geojson.org/], for an API output.
I have one method that uses Python to query my table into a pandas dataframe, which I can then use a function to create a GeoJSON featurecollection.

But, there’s so much data/day that it’d be ideal if I could perform some kind of efficient linestring union to turn my linestrings into multilinestrings, grouped by ship and date, and then simplify those features.

Overall, I can’t figure out a way to go from a BigQuery table with properties and linestring or multilinestring geography to a proper GeoJSON FeatureCollection.

My current method uses LEAD(lon), LEAD(lat) and lon, lat to create linestrings in the Python data2geojson() function:

# Query BigQuery table
sql = """SELECT 
    lon, lat, lead_lon, lead_lat, 
    CAST(CAST(timestamp AS DATE) as string) as date,
    UNIX_SECONDS(timestamp) as unix_secs,
    CAST(ship_num AS STRING) AS ship_num,
    op,
    CAST(knots AS FLOAT64) AS knots,
    point,
    linestring
    FROM `ship_segments`  
    WHERE timestamp BETWEEN '2020-04-16' AND '2020-04-17';"""

# Make into pandas dataframe
df = client.query(sql).to_dataframe()

#df to geojson fn
def data2geojson(df):
        features = []
        insert_features = lambda X: features.append(
                geojson.Feature(geometry=geojson.LineString(([X["lead_lon"], X["lead_lat"], X["knots"], X["unix_secs"]],
                                                             [X["lon"], X["lat"], X["knots"], X["unix_secs"]])),
                                properties=dict(date=X["date"],
                                                mmsi=X["ship_num"],
                                                operator=X["op"]
                                                )))
        df.apply(insert_features, axis=1)

        geojson_obj = geojson.dumps(geojson.FeatureCollection(features, indent=2, sort_keys=True), sort_keys=True, ensure_ascii=False)
        return(geojson_obj)

results = data2geojson(df)

This returns a GeoJSON kinda like:

{"features": [{"geometry": {"coordinates": [[-119.049945, 33.983277, 10.5502, 1587104709], [-119.034677, 33.975823, 10.5502, 1587104709]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "235098383", "operator": "Evergreen Marine Corp"}, "type": "Feature"}, {"geometry": {"coordinates": [[-120.176933, 34.282107, 22.7005, 1587114969], [-120.144453, 34.275147, 22.7005, 1587114969]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "235098383", "operator": "Evergreen Marine Corp"}, "type": "Feature"}, {"geometry": {"coordinates": [[-118.361737, 33.64647, 11.3283, 1587096305], [-118.356308, 33.643713, 11.3283, 1587096305]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "538005412", "operator": "Scorpio MR Pool Ltd"}, "type": "Feature"}, {"geometry": {"coordinates": [[-118.414667, 33.673013, 12.7684, 1587097278], [-118.411707, 33.671493, 12.7684, 1587097278]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "538005412", "operator": "Scorpio MR Pool Ltd"}, "type": "Feature"}, {"geometry": {"coordinates": [[-119.377783, 34.062612, 10.5456, 1587102119], [-119.384212, 34.064217, 10.5456, 1587102119]], "type": "LineString"}, "properties": {"date": "2020-04-17", "mmsi": "636018225", "operator": "Ocean Network Express Pte Ltd"}, "type": "Feature"}], "indent": 2, "sort_keys": true, "type": "FeatureCollection"}

Which can be visualized on a web map
[http://geojson.tools/][1]

But I’m trying something like this to join my linestrings into multilinestrings:

select 
ship_num,
date(timestamp) as date,
AVG(speed_knots) as avg_speed_knots,
st_union_agg(linestring) as multiline
from(
SELECT 
  *,
  row_number() OVER w AS num,
  ST_GeogPoint(lon,lat) as geom,
  LEAD(ST_GeogPoint(lon,lat)) OVER w AS geom2,
  ST_MAKELINE((ST_GeogPoint(lon,lat)), (LEAD(ST_GeogPoint(lon,lat)) OVER w)) AS linestring,
  LEAD(STRING(timestamp), 0) OVER w AS t1, 
  LEAD(STRING(timestamp), 1) OVER w AS t2,
  FROM
  `ship_data`
  where timestamp >= '2020-04-10'
  WINDOW w AS (PARTITION BY ship_num ORDER BY timestamp)) AS q
group by
ship_num, date(timestamp);

This gives me multilinestrings in a table, but I need to simplify them and then get them into a GeoJSON FeatureCollection output.

Any ideas that don’t use PostGIS?

One Answer

GDAL's ogr2ogr tool can do this if you output the result of that query to a CSV file.

ogr2ogr result.geojson result.csv 
  -f GeoJSON 
  -oo GEOM_POSSIBLE_NAMES=multiline 
  -oo KEEP_GEOM_COLUMNS=NO

Answered by Jeremy Malczyk on May 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP