TransWikia.com

Accuracy of country shapefiles

Geographic Information Systems Asked on September 4, 2021

I am working with a country shapefile for Pakistan, however it does not seem to be completely accurate. I wonder if this is an inherent limitation of shapefiles, or whether I could find an accurate shapefile, and if so where?

For example, with the current shapefile I am working on, some places that are close to the border between Pakistan and Afghanistan are classified to be outside of Pakistan, when it is in Pakistan (I have manually checked its latitude and longitude) on Google Maps.

3 Answers

A shapefile is capable of representing features on the ground to within a centimetre of precision. Disagreements between two spatial data sets that claim to represent the same entity may occur for several reasons.

  • Representing a curved border with centimetre precision requires a lot of points along the border. This makes your data set very very large. In practice spatial data is often "generalised", which can drop many of the points to create a smaller data file which is a bit further from the ground truth in places.
  • Data converted between different projection systems (eg lat-long to some flat projection) can result in differences when going back and forth. This isn't usually much unless an incorrect projection is used at some stage or the border lines are long segments which would be curves in lat-long and become straight lines in projected coordinates.
  • Borders do change through the years. Areas are disputed, and you may be looking at two different government's ideas of where their borders are. Or the same government at different times.

However, none of these reasons are limitations of shapefiles - they can occur with misuse of any spatial data format.

Answered by Spacedman on September 4, 2021

Given all of the political and military contention surrounding Pakistan's borders through the years, my guess is that you are seeing some "version" of the border on Google Maps that does not match your own understanding of the border. Try checking other sources too, like OpenStreetMap

It is also possible that the shapefile you are using is intended for small-scale maps (as in, maps intended to show multiple countries all in one image, "zoomed out"), which entails a great deal of simplification, rounding off of corners, etc. This means that when you "zoom in" there will be spatial imprecision--not because the shapefile is "wrong," but because it is being used at a different scale than intended.

You can download very precise, high zoom-level national borders from OpenStreetMap here. If the border is contested, the data should include all of the different versions--though OSM depends on people digitizing manually, and so the coverage is better in some parts of the world than in others.

As others have suggested, you could also go with the US State Department's borders, or presumably Pakistan's national geographic agency has a data portal.

Answered by LAT on September 4, 2021

Shapefiles utilize IEEE 64-bit floating-point representation to store coordinates. They have a precision of at least micrometres at the Equator and 180th meridian using decimal degrees, far in excess of the capabilities of geodata collection. Shapefiles can map the logic gates of modern semiconductor CPUs within a UTM zone, if x-ray crystallography or a scanning electron microscope were used to collect the locations.

When working with global international boundaries, the US State Department "Large Scale International Boundaries" (LSIB) dataset is designed for accuracy (albeit to US policy), and its metadata includes (emphasis mine):

The LSIB is in WGS84 datum and is generally accurate to within a couple hundred meters or better.

If you compare this with your five decimal place coordinate values (precise to ~1 meter at the Equator), you can begin to see that the issue is not in the shapefile format, but the coordinate generalization of the dataset in use.

Large-scale boundaries are slower to render due their storage requirements (often one hundred times larger than their small-scale brethern). Ultra-large scale boundaries would be very difficult to use at most scales, and fantastically expensive to collect (even in places where the boundary is undisputed).

The problem here is not a matter of the data format, but of the applicability of the dataset to the intended use.

Answered by Vince on September 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP