Geostatistics in SAGA: Strange variogram output

Question

I'm verry new to geostatistic. I'm using variogram to visualize a data (available in .shp here). Then I got the data like this or from this link (updated zip files contains shp, shx, prj, mshp, dbf, cpg). The sensor value is in the column value (air pollution concentration).

The shapes look like "inverted" from the common variogram shape. Is it violating any law? Or is there another type of data that has variogram like that?
Update:
Steps

csv data contain lat, lon, physical value (eg temperature). the data is taken in the morning from several days
import to SAGA GIS, projecting to wgs72 to get meter unit
geoprocessing > spatial and geostatistics > variogram
scatter plot variance vs distance. there are also option of var. cum in the program

saQuist · Accepted Answer

Unfortunately, I cannot help you with your implementation in SAGA GIS, but I believe the problem lies rather with the data than with the implementation. I have been playing around with the data in R, so I'll share the scripts and outputs below. If you want to learn geostatistics in R, I recommend courses DataCamp Spatial Stats in R, and DataCamp Spatial Data in R.

So if we look at the data, we see that the sensor values are very skewed towards the left and that we do not have a clear spatial trend in the data. So, the sensor values do not seem to be dependent on their location.

# load packages 
require(sf)
require(rgstat)
require(magrittr)

# read the data 
sensordata <- read_sf("72_variogram_20210111_aoi_sore.shp")
sensordata$Value <- as.numeric(sensordata$Value) # transform value collumn to type numeric.

# plot histogram of sensor values
hist(sensordata$Value, main = "Histogram of Sensor Values") 
# plot map of the sensor values 
plot(sensordata["Value"], key.pos = 4, main = "Sensor Values")

We can decide to transform the data to make it more normally distributed. If we take for instance the log-transform, we start to see more of a pattern, but still, it doesn't help us as much. (Note that the interpretation of the data is getting more difficult when transforming your data)

# take a log transform of your data 
sensordata$logvalue <- log(sensordata$Value) 
hist(sensordata$logvalue, main = "Histogram of Log Transformed Sensor Values") 
plot(sensordata["logvalue"], key.pos = 4)

Now, let's look at the semivariogram. If we look at this brief explanation of a semivariogram, we expect some line that begins somewhere near the origin of the axis and continues in a straight line until it hits some plateau. The point closest to the origin signifies the "nugget", or your short distance variation. The plateau is the "sill", or the total variation in your data. The graph shows how the observations are dependent on distance. Usually, things that are close by are more similar than things that are far away. If it does not matter whether you are close by or far away, then there is no spatial dependency in your data, and your semivariogram is "pure nugget", or a straight horizontal line.

In your case, the semivariogram is odd indeed. To the left, we see the camel-shaped figure which would imply something like "if you are close by the observations are all over the place, but if you are further away (400m), the observations are more similar". But if we take the semivariogram of the transformed sensor values, we start to see more of a straight horizontal line.

# make a gstatVariogram object for the sensor values 
vgval <-   gstat(id = c("Value"), formula = Value~1, data = sensordata) %>% variogram()
# make a gstatVariogram object for the log transformed sensor values 
vglogval <-   gstat(id = c("logvalue"), formula = logvalue~1, data = sensordata) %>% variogram()
# plot the semivariograms 
plot(vgval, main = "Semivariogram of Sensor Values")
plot(vglogval, main = "Semivariogram of Log-Transformed Values")

So by transforming the data, we confirmed what could be concluded from the start, namely that the data is not spatially dependent. Because we see an almost straight horizontal line in the semivariogram, it does not matter whether you are far away or close by, the value of the observations are mostly similar. Therefore, the variation that is measured by your sensor does not depend on where you are, but by other things, like the time of the measurement or the calibration of your instrument for instance.

This is my conclusion based on the data as they were, without any context of the situation. If you believe there should be a spatial dependency in the data according to some variable you have recorded, I recommend you to subset the data over your desired variable, and run the variogram again.

Geostatistics in SAGA: Strange variogram output

One Answer

Add your own answers!

Ask a Question