TransWikia.com

Choosing efficient output format for spatial data in R

Geographic Information Systems Asked by max norton on June 8, 2021

I’m assembling a moderately large spatial dataset as simple features in R (4.0.5).

Is there a good way to select the most performance-efficient output format/OGR driver to save my assembled data?

I’m interested in a general approach, but here are some particulars of my current case:

  • The data consists of multiple vector types plus non-spatial columns.
  • The spatial inputs include ESRI shapefiles, ESRI file geodatabases, and GeoJSON.
  • My purpose in writing the data is to read back into R for analysis.
  • My constraints are processing power and working memory, not disk space.

I’m looking for a general algorithm for selecting among the many available OGR formats.

One Answer

Performance might be very dependent on data structure and size, complexity of geometry vs attributes etc. A very multi-dimensional problem space--consequently, no straightforward, general approach.

In your use case: GeoPackage. Can store several sets of spatial features in one file, can also store non-spatial tables as well, few constraints on column name length, well-supported and an open specification. Can use SQL to select subsets at load time. All good stuff.

But there's one other option if all you care about is getting this data back into R, and that's to save each set of features as an .RDS file using saveRDS. It's not OGR so you can't load it into anything else, but it would probably be the most efficient R -> files -> R only round-trip solution.

Note that most of the OGR formats are either limited in functionality and/or derived from vendor-specific formats. I would only use GeoPackages for saving spatial data to disk now, unless I had an end-user with a desperate and unfixable need for something else, such as a shapefile, or a web-oriented application that needed a GeoJSON file (which is practically limited to WGS84 coordinates) or I needed to load it into a GPS receiver in which case I'd probably have to use GPX (but that only stores points and tracks, not polygons).

Correct answer by Spacedman on June 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP