NCSU GIS 714: Geospatial Computation and Simulations

Urban growth modeling with FUTURES on Hazel

Here we demonstrate how to run parallel jobs in GRASS using an urban growth modeling case study.

We will all share PERMANENT mapset of project albers with prepared data and each student will create their own mapset (using their user name) for computations in their own folder ( description of this setup). Typically, the PERMANENT Mapset may be in a research storage and you may not have write access to it. Please do not modify anything in the PERMANENT mapset.

mkdir /share/gis714s25/$USER/grassdata/albers
ln -s /share/gis714s25/grassdata/albers/PERMANENT/ /share/gis714s25/$USER/grassdata/albers/

module use --append /usr/local/usrapps/geospatial/modulefiles
module load gcc
module load grass

Start GRASS in a temporary mapset:

grass --tmp-mapset /share/gis714s25/$USER/grassdata/albers/

Check you can access the data:

g.list type=raster,vector  -mt

Install FUTURES extension:

g.extension extension=r.futures url=/gpfs_archive/gis714s25/grass-addons/src/raster/r.futures

Exit GRASS session (temporary mapset will be removed):

exit

We won't be going through the entire workflow, but rather demonstrating some parallel computation techniques on selected parts of the workflow. The next three sections (computing slope, distance to features, and development pressure) are independent, so you can submit all the jobs at the same time.

Computing slope

Tool r.slope.aspect can take advantage of multiple cores, because it uses OpenMP library to automatically distribute work among certain number of threads. OpenMP uses shared memory and therefore the job needs to run on a single node. We will submit a parallel job, that requests 4 cores and specifies that those cores need to be on one computing node (span[hosts=1]). The number of cores specified in the header should match the number in nprocs parameter. We will compute slope in a separate mapset called slope. Note that we don't have to set computational region for the new mapset because all new mapsets are using default region that is set in PERMANENT.

#!/bin/bash
#BSUB -n 4
#BSUB -W 30
#BSUB -R "rusage[mem=1GB]"
#BSUB -R span[hosts=1] 
#BSUB -oo slope_out
#BSUB -eo slope_err
#BSUB -J slope

module use --append /usr/local/usrapps/geospatial/modulefiles
module load grass

# remove this mapset which may exist from a previous run of this job
rm -rf /share/gis714s25/$USER/grassdata/albers/slope

# create a new mapset with -c flag
grass -c /share/gis714s25/$USER/grassdata/albers/slope --exec r.slope.aspect elevation=DEM slope=slope nprocs=4

Submit this job with a different number of cores and compare the run time. Change the name of the mapset if you want to run it at the same time.

Computing distance

Here we compute distance to roads, water and forest that will serve as predictors. Since tool r.grow.distance is not internally parallelized, we will execute distance to roads, water, and forest so that they run concurrently using 3 cores. This can be simply achieved by executing them in the background. Alternatively, you can split it and execute as 3 separate jobs.

#!/bin/bash
#BSUB -n 3
#BSUB -W 3:00
#BSUB -R "rusage[mem=1GB]"
#BSUB -oo dist_out
#BSUB -eo dist_err
#BSUB -J dist

module use --append /usr/local/usrapps/geospatial/modulefiles
module load grass

rm -rf /share/gis714s25/$USER/grassdata/albers/roads
rm -rf /share/gis714s25/$USER/grassdata/albers/water
rm -rf /share/gis714s25/$USER/grassdata/albers/forest

# run in background by appending & after the tool
grass -c /share/gis714s25/$USER/grassdata/albers/roads --exec r.grow.distance input=roads_nlcd_2019 distance=dist_to_roads_2019 -m &
grass -c /share/gis714s25/$USER/grassdata/albers/water --exec r.grow.distance input=water distance=dist_to_water -m &
grass -c /share/gis714s25/$USER/grassdata/albers/forest --exec r.grow.distance input=forest_2019 distance=dist_to_forest_2019 -m &
# this will wait for the processes to finish before starting the next set of processes
wait
grass /share/gis714s25/$USER/grassdata/albers/roads --exec r.mapcalc "log_dist_to_roads_2019 = log(dist_to_roads_2019 + 1)" &
grass /share/gis714s25/$USER/grassdata/albers/water --exec r.mapcalc "log_dist_to_water = log(dist_to_water + 1)" &
grass /share/gis714s25/$USER/grassdata/albers/forest --exec r.mapcalc "log_dist_to_forest_2019 = log(dist_to_forest_2019 + 1)" &
wait

Computing development pressure

Here we combine the two previous approaches to compute development pressure for 2001 and 2019. Tool r.futures.devpressure is internally parallelized, so we can run the tool for both years concurrently and for each of them request certain number of cores. Notice that we need to request twice as many cores then (on a single node).

#!/bin/bash
#BSUB -n 16
#BSUB -W 15:00
#BSUB -R span[hosts=1]
#BSUB -R "rusage[mem=2GB]"
#BSUB -oo devpressure_out
#BSUB -eo devpressure_err
#BSUB -J devpressure

module use --append /usr/local/usrapps/geospatial/modulefiles
module load grass

rm -rf /share/gis714s25/$USER/grassdata/albers/devpressure_2001
rm -rf /share/gis714s25/$USER/grassdata/albers/devpressure_2019

# run in background by appending & after the tool
grass -c /share/gis714s25/$USER/grassdata/albers/devpressure_2001 --exec r.futures.devpressure input=urban_noroads_2001 output=devpressure_2001 size=20 gamma=1 nprocs=8 &
grass -c /share/gis714s25/$USER/grassdata/albers/devpressure_2019 --exec r.futures.devpressure input=urban_noroads_2019 output=devpressure_2019 size=20 gamma=1 nprocs=8 &

# this will wait for the processes to finish
wait

Running the simulation

Once all the previous steps are finished, we can run the futures simulation. We will use already precomputed input CSV files. First, in your home create a new file named "pga.sh" with the following content:

#!/bin/bash

g.region raster=state_${1}
r.mask raster=masking
r.futures.simulation developed=urban_2019 development_pressure=devpressure_2019@devpressure_2019 \
    compactness_mean=0.5 compactness_range=0.1 discount_factor=1 \
    predictors=log_dist_to_forest_2019@forest,log_dist_to_roads_2019@roads,log_dist_to_water@water,slope@slope,wetland_density \
    n_dev_neighbourhood=20 devpot_params=/share/gis714s25/input_csv/best_model.csv num_neighbors=4 seed_search=probability \
    development_pressure_approach=gravity gamma=1 scaling_factor=1 num_steps=31 \
    subregions=state_${1} demand=/share/gis714s25/input_csv/demand_${1}.csv \
    output=out_state_${1}_seed_${2} patch_sizes=/share/gis714s25/input_csv/patch_sizes_${1}.csv random_seed=${2}

Then create a script "bsub_futures.sh" with the content below:

STATE=${1}
SEED=${2}

module use --append /usr/local/usrapps/geospatial/modulefiles
module load grass

rm -rf /share/gis714s25/$USER/grassdata/albers/pga_${STATE}_${SEED}
grass -c /share/gis714s25/$USER/grassdata/albers/pga_${STATE}_${SEED} --exec bash /home/${USER}/pga.sh ${STATE} ${SEED}

Finally submit the computations as 60 separate jobs with the following command:

for SEED in {1..10}
do
    for STATE in 1 12 13 37 45 47
    do
        bsub -n 1 -W 5:00 -R "rusage[mem=15GB]" -J pga_${SEED}_${STATE} -oo pga_${SEED}_${STATE}_out -eo pga_${SEED}_${STATE}_err bash bsub_futures.sh ${STATE} ${SEED}
    done
done

This submits a total of 60 jobs. Note that submissions of large number of jobs is discouraged, there are other ways to do it, for example using pynodelauncher command or job arrays.

Postprocessing

We need to patch the results for individual states. For that we will use tool r.patch which is internally parallelized. First, create a bash script patch.sh that accepts 2 parameters (seed number and number of cores), loads grass, lists all FUTURES output rasters belonging to the specified seed and executes r.patch in a newly created mapset.

# get parameters
SEED=${1}
N=${2}

module use --append /usr/local/usrapps/geospatial/modulefiles
module load grass
module load PrgEnv-intel

# list results for each state for a single seed and save to a variable
MAPS=`grass --tmp-mapset /share/gis714s25/$USER/grassdata/albers --exec \
     g.list type=raster pattern="out_state_*_seed_${SEED}" mapset="*" -m separator=comma`
rm -rf /share/gis714s25/$USER/grassdata/albers/results_${SEED}
grass -c /share/gis714s25/$USER/grassdata/albers/results_${SEED} --exec r.patch input=${MAPS} output="out_seed_${SEED}" nprocs=${N}
grass /share/gis714s25/$USER/grassdata/albers/results_${SEED} --exec r.colors map=out_seed_${SEED} raster=out_state_1_seed_${SEED}@pga_1_${SEED}

Then create another script that submits the patching jobs for each seed (submit_patch.sh):

N=4
for SEED in {1..10}
do
    bsub -n ${N} -W 1:00 -R span[hosts=1] -J patch_${SEED} -oo patch_${SEED}_out -eo patch_${SEED}_err bash patch.sh ${SEED} ${N}
done

Run the script with bash submit_patch.sh. This will submit 10 jobs. The results are rasters out_seed_1@results_1, out_seed_2@results_2, etc. Report the run times for all your computations and include a figure with the results.