NCSU GIS 714:
Geospatial Computing and Simulations

Geostatistical simulations

This assignment was adapted from A Practical Guide to Geostatistical Mapping by T. Hengl. Download and unzip dataset baranjahill.

Start GRASS GIS and create new location (e.g. use Create new location in current GRASS database in Data tab toolbar or right click on GRASS database in Data tab). Going through the wizard name the new location baranja, and then in Select Coordinate Reference System select Read CRS from a georeferenced data file. Select the downloaded (and extracted) elevations.shp and finish creating the location. It asks if you want to import it, say no and import the dataset separately once the location is created with:

 v.import input=/path/to/elevations.shp  
 

Compute a smooth DEM and extract stream network

Get the basic information about the data, such as number of points, attribute names, and univariate statistics. Then explore the spatial pattern of the data points: first display selected subsets of points and then display all points. Add 1km grid to your display to understand the scale of the dataset.
v.info -c elevations
v.univar elevations column=VALUE
d.vect map=elevations where="cat<800" color=grey
d.vect map=elevations where="VALUE=180" color=brown
d.vect map=elevations size=1
d.grid 1000
What can you tell about the data? Looking at the spatial pattern, how they might have been acquired?

To compute a histogram of the given data distribution we can bin the points to 10m resolution grid and then run the histogram. The region is set to match grid created by R spsample function used later on.

g.region n=5074351 s=5070471 e=6555639 w=6551799 res=10 -p
v.to.rast elevations out=elevations_bin10m type=point use=attr attr="VALUE"
d.histogram elevations_bin10m
Can you explain the discrete spikes in the histogram?

Interpolate a smooth DEM using spline with tension with simultaneous computation of slope, aspect and curvatures to visualize the surface and its parameters.

v.surf.rst input=elevations zcolumn=VALUE elevation=elevation_10 slope=slope_10 aspect=aspect_10 pcurv=pcurv_10 tcurv=tcurv_10
d.histogram elevation_10
Why is the histogram of interpolated DEM so different from the histogram of binned given data?

Extract the streams. Which method for extracting streams are we using in terms of method used for flow direction, flow routing and handling of depressions? What does the threshold 300 mean?

r.watershed elevation=elevation_10 threshold=300 stream=stream10_300 accum=accum10
r.thin stream10_300 out=stream10_300t --o
r.to.vect stream10_300t out=stream10_300t type=line
d.vect stream10_300t co=blue

Estimate the uncertainty in extracted streams

We will estimate uncertainty in the extracted streams due to errors in elevation using a combination of R and GRASS GIS tools. To start R from GRASS type R in the GRASS terminal. You will need several packages:
install.packages(c("rgdal", "gstat", "geoR", "rgrass7"))
Read in libraries and vector data:
library(rgrass7)
library(rgdal)
library(gstat)
library(geoR)
use_sp()

elevations <- readVECT("elevations")
Subsample data and explore distribution of values in the data.
sel <- runif(length(elevations@data[[2]])) < 0.2
Z.geo <- as.geodata(elevations[sel,"VALUE"])
plot(Z.geo, qt.col=grey(runif(4)))
How many points are subsampled and displayed out of the given 6367? What is the method used for subsampling? What does Density in the histogram mean? Why is the histogram different from the histograms derived in GRASS GIS?

We plot empirical variogram in four directions using the subsampled data, Then we fit isotropic model variogram using the Matern covariance function:

plot(variog4(Z.geo, max.dist=1000, messages=FALSE), lwd=2)
Z.svar <- variog(Z.geo, max.dist=1000, messages=FALSE)
Z.vgm <- variofit(Z.svar, ini=c(var(Z.geo$data), 1000), fix.nugget=T, nugget=0)
env.model <- variog.model.env(Z.geo, obj.var=Z.svar, model=Z.vgm)
plot(Z.svar, envelope=env.model); lines(Z.vgm, lwd=2);
Why could we use isotropic model variogram? We used max distance 1000, compute the variogram with 3000 and discuss the differences. Which max.dist value would you recommend for interpolating or simulating a DEM from our point data set?

Compute 100 realizations of a DEM (10m resolution) using Stochastic Conditional Gaussian Simulation with Matern covariance function (this may take few minutes so be patient):

demgrid <- spsample(elevations, type="regular", cellsize=c(10,10), offset=c(0.5,0.5))
gridded(demgrid) <- TRUE
fullgrid(demgrid) <- TRUE
Z.ovgm <- vgm(psill=Z.vgm$cov.pars[1], model="Mat", range=Z.vgm$cov.pars[2], nugget=Z.vgm$nugget, kappa=1.2)
N.sim <- 100
DEM.sim <- krige(elevations$VALUE ~ 1, elevations, demgrid, Z.ovgm, nmax=30, nsim=N.sim)
fullgrid(DEM.sim) <- TRUE
spplot(DEM.sim[1:4], col.regions=grey(seq(0,1,0.025)))
Save simulated elevation rasters as GRASS rasters, compute a mean simulated DEM and its standard deviation. Derive streams from each of the simulated DEMs:
for(i in 1:N.sim) {writeRAST(DEM.sim[i], paste("simulated_", i, sep=""))}
execGRASS("g.list", type="raster", pattern="simulated_*", output="tmp_list_simulated.txt")
execGRASS("r.series", file="tmp_list_simulated.txt", output="simulated_mean", method="average")
execGRASS("r.series", file="tmp_list_simulated.txt", output="simulated_std", method="stddev")
for(i in 1:N.sim) {execGRASS("r.watershed", elevation=paste("simulated_", i, sep=""),
 threshold=30, stream=paste("stream_", i, sep=""), flags=c("quiet"))}
Compute number of streams extracted at each 10m grid cell (count), derive probability that a grid cell has a stream and associated error map:
execGRASS("g.list", type="raster", pattern="stream*", output="tmp_list_stream.txt")
execGRASS("r.series", file="tmp_list_stream.txt", output="count", method="count")
execGRASS("r.mapcalc", expression=paste("prob = float(count)/", N.sim))
execGRASS("r.mapcalc", expression="error = -prob * log(prob) - (1-prob)* log(1-prob)")
Quit R quit() and we can now visualize and analyze our results in GRASS GIS.

First, we compare the smooth DEM with the simulated mean and a single realization i=10. We use aspect to visualy compare the roughness of the DEMs. We compute the difference between the interpolated and simulated DEM.

r.colors simulated_mean rast=elevation_10
r.colors simulated_10 rast=elevation_10
r.colors simulated_std co=bcyr -e
r.colors aspect_10 co=aspect
r.slope.aspect simulated_mean slo=slp_mean asp=asp_mean
r.slope.aspect simulated_10 slo=slp_10 asp=asp_10
r.mapcalc "diff_spline_gsim = elevation_10 - simulated_mean"
r.colors diff_spline_gsim co=differences
What are the white spots in the simulated_mean map? Can you explain the pattern of standard deviations in simulated_std (hint: switch on the elevations point layer)? You can use File > mapswipe to compare asp10_10, asp10_mean and asp10_mean, aspect_10.

Assign new color ramp and visualize the stream probability map along with the streams extracted from the smooth DEM.

r.colors prob co=viridis -i
d.rast prob values=0.001-1.0
d.vect stream10_300t
Use 3D view to drape the probability raster prob over elevation layer simulated_mean. Where is the high probability of streams and where we cannot map the streams reliably? Which topographic parameter may explain the stream extraction uncertainty? How does the stream probability map compare with the streams extracted from the smooth DEM stream10_300t?