Geographic Information Systems Asked by Antonello on May 18, 2021
I have two files:
fileLUse.tif
defines land cover classes in GeoTiff format (from https://www.esa-landcover-cci.org/?q=node/164)fileRegions.shp
defines administrative borders and it is a shapefile (from https://gadm.org)Both files are EPSG 4326, that if I understood it well it is WTS84 datum with coordinates system (no projected) and cover the whole World.
I need to select the land use classes from fileLUse
whose value is in a certain range ([40-130]) and compute the area and area ratio of this aggregate for each region in fileRegions
.
My "output" would then be a CSV with region name (from fileRegions), total area, share of aggregate land use.
My main problem is that the files are not projected, so I don’t really know how to give them a unit, as all "equal area" projections I saw are specific for a certain area of the World.
Which approach should I take ? I could use QGIS, GRASS, R, Python… (I’m on Linux)
[EDIT]
What I have done so far:
gdal_calc.py -A fileLUse.tif --outfile=fileNatVegAreas.tif --calc="logical_and(A>=40,A<=140)"
from rasterstats import zonal_stats
stats = zonal_stats("fileRegions.shp", "fileNatVegAreas.tif")
What I am left with:
stats
is a vector of dictionaries with the various stats, e.g.>>> stats[0]
{'min': 0.0, 'max': 1.0, 'mean': 0.5842002754571467, 'count': 739861}
How do I link back my stats to the field "RegionName" in my shapefile, and possibly export everything as a CSV ?
At the end I done like this:
gdal_calc.py -A fileLUse.tif --outfile=natVegIndicator2019.tif --calc="logical_and(A>=40,A<=140)"
For some reason this created a 8GB map out of a 300MB original tif map.
vectorBoundariesFile = "gadm36_1.shp"
rasterDataFile = "natVegIndicator2019.tif"
sqKmPerCell = (300*300)/(1000*1000)
from rasterstats import zonal_stats
import geopandas as gpd
import numpy as np
import pandas as pd
stats = zonal_stats(vectorBoundariesFile, rasterDataFile, stats=['mean', 'count','sum'])
regdf = gpd.read_file(vectorBoundariesFile)
countCells = pd.Series([i['count'] for i in stats])
sqKm = countCells * sqKmPerCell
means = pd.Series([i['mean'] for i in stats])
sums = pd.Series([i['sum'] for i in stats])
data = [regdf['GID_0'], regdf['NAME_0'],regdf['GID_1'], regdf['NAME_1'],means,sqKm,countCells,sums]
headers = ["GID_0", "NAME_0","GID_1", "NAME_1","natVegRatio","SqKm","countAllCells","countForCell"]
out = pd. concat(data, axis=1, keys=headers)
out.to_csv("vegIndex2019.csv")
An other (and perhaps better) approach is to just use the step #2 passing the parameter categorical=True
to the zonal_stats
function call (see the rasterstat user manual) and then compute the ratio of the desired range ex-post from the dataframe in Python.
Note that the final SqKm
could be biased and different than official country-level statistics.. I learn that even just "measuring an area" on a large part of the Earth is not a trivial task. However, using the counts of cells to obtain the ratio of different land classes should be an unbiased method.
Answered by Antonello on May 18, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP