Parallel GRASS jobs: Difference between revisions

From GRASS-Wiki
Jump to navigation Jump to search
(gcollector cleanup fixed)
(fix broken URLs)
 
(72 intermediate revisions by 7 users not shown)
Line 1: Line 1:
== Parallel GRASS jobs ==
== Parallel GRASS jobs ==


NOTE: GRASS 6 libraries are NOT thread safe (except for GPDE, see below).
The idea of parallel GRASS GIS jobs is to speed up the computation.


=== Background ===
=== Background ===
This presentation from 2022 is a "must see":
'''[https://htmlpreview.github.io/?https://github.com/petrasovaa/FUTURES-CONUS-talk/blob/main/foss4g2022.html Tips for parallelization in GRASS GIS]'''


This you should know about GRASS' behaviour concerning multiple jobs:
This you should know about GRASS' behaviour concerning multiple jobs:
* You can run '''multiple processes''' in '''multiple locations''' ([http://grass.osgeo.org/grass64/manuals/html64_user/helptext.html what's that?]). ''Peaceful coexistence.''
* You can run '''multiple processes''' in '''multiple locations''' ([https://grass.osgeo.org/grass-stable/manuals/helptext.html what's that?]). ''Peaceful coexistence.''
* You can run multiple processes in the same mapset, but only if the region is untouched (but it's really not recommended). Better launch each job in its own mapset within the location.
* You can run multiple processes in the same mapset, but only if the region is untouched. If you are unsure, it's recommended to  launch each job in its own mapset within the location.
* You can run '''multiple processes''' in the '''same location''', but in '''different mapsets'''. ''Peaceful coexistence.''
* You can run '''multiple processes''' in the '''same location''', but in '''different mapsets'''. ''Peaceful coexistence.''


=== Approach ===
=== Approaches ===


Essentially there are at least two approaches of "poor man" parallelization without modifying GRASS source code:
See also the [[Parallelizing Scripts]] wiki page
* split map into spatial chunks (possibly with overlap to gain smooth results)
* time series: run each map elaboration on a different node.


== Parallelized code ==
== File locking ==
=== GPDE using OpenMP ===


The only parallelized library in GRASS >=6.3 is GRASS Partial Differential Equations Library (GPDE). Read more in [[OpenMP]].
GRASS doesn't perform any locking on the files within a GRASS
database, so the user may end up with one process reading a file while
another process is in the middle of writing it. The most problematic
case is the WIND file, which contains the [[current region]], although
there are others.


=== GPU Programming ===
If a user wants to run multiple commands concurrently, steps need to
be taken to ensure that this type of conflict doesn't happen. For the
current region, the user can use the WIND_OVERRIDE environment variable to
specify a named region which should be used instead of the WIND file.


* See [[GPU]]
Or the user can use the GRASS_REGION environment variable to specify the
region parameters (the syntax is the same as the WIND file, but with
newlines replaced with semicolons). With this approach, the region can
only be read, not modified.


== Cluster and Grid computing ==
Problems can also arise if the user reads files from another mapset while
another session is modifying those files. The WIND file isn't an issue
=== PBS Scheduler ===
here, nor are the files containing raster data (which are updated
 
atomically), but the various support files may be.
Note: For PBS details, read on [http://en.wikipedia.org/wiki/Portable_Batch_System here].


You need essentially two scripts:
See below for ways around these limitations.
* GRASS job script (which takes the name(s) of map(s) to elaborate from environmental variables
* script to launch this GRASS-script as job for each map to elaborate


'''General steps (for multiple serial jobs on many CPUs):'''
== Working with tiles ==


* Job definition
Huge map reprojection example:
** PBS setup (in the header): define calculation time, number of nodes, number of processors, amount of RAM for individual job;
** data are stored in centralized directory which is seen by all nodes;
* Job execution (launch of jobs)
** user launches all jobs ("qsub"), they are submitted to the queue. Use the [[GRASS and Shell#GRASS_Batch_jobs|GRASS_BATCH_JOB]] variable to define the name of the elaboration script.
** the scheduler optimizes among all user the execution of the jobs according to available resources and requested resources;
** for the user this means that 0..max jobs are executed in parallel (unless the administrators didn't define either priority or limits). The user can then observe the job queue ("showq") to see other jobs ahead and scheduling of own jobs. Once a job is running, the cluster possibly sends a notification email to the user, the same again when a job is terminating.
** At the end of the elaboration call a second batch job which only contains g.copy to copy the result into a common mapset. There is a low risk of race condition here in case that two nodes finish at the same time but this could be even trapped in a loop which checks if the target mapset is locked and, if needed, launches g.copy again 'till success.
* Job planning
** The challenging part for the user is to estimate the execution time since PBS kills jobs which exceed the requested time. The same applies to the request for number of nodes and CPUs per node as well as the amount of needed RAM. Usually tests are needed to see the performance in order to impose the values correctly in the PBS script (see below).


'''How to write the scripts:'''
'''Q:''' I'd like to try splitting a large raster into small chunks and then projecting each one separately, sending the project command to the background. The problem is that, if the GRASS command changes the region settings, things might not work.
To avoid race conditions, you can automatically generate multiple mapsets in a given location. When you start GRASS (in your script) with path to grassdata/location/mapset/ and the requested mapset does not yet exist, it will be automatically created. So, as first step in your job script, be sure to run
      g.mapsets add=mapset1_with_data[,mapset2_with_data]
to make the data which you want to elaborate accessible. You would then loop over many map names (e.g. "aqua_lst1km20020706.LST_Night_1km.filt") and launch the script with map name as first parameter:


<source lang="bash">
'''A:''' {{cmd|r.proj}} doesn't change the region.
      ------- snip (you need to update this for PBS stuff and save as 'launch_grassjob_55min.sh' -----------
      #!/bin/sh
      ### Project number, enter if applicable (needed to manage your CPU hours)
      #PBS -A HPC2N-2008-001
      #
      ### Job name - defaults to name of submit script
      #PBS -N modis_interpolation_GRASS_RST.sh
      #
      ### Output files - defaults to jobname.[eo]jobnumber
      #PBS -o modis_rst.$MYMODIS.out
      #PBS -e modis_rst.$MYMODIS.err
      #
      ### Mail on - a=abort, b=beginning, e=end - defaults to a
      #PBS -m abe
      ### Number of nodes - defaults to 1:1
      ### Requesting 1 nodes with 1 processor per node:
      #PBS -l nodes=1:ppn=1
      ### Requesting time - defaults to 30 minutes
      #PBS -l walltime=00:55:00
      ### amount of physical memory (in MB) each processor will use with a line:
      #PBS -l pvmem=3000m
      # we'll call this script below in a loop, giving the name of the map to elaborate as parameter
      MYMAPSET=$1
      MYLOC=nc_spm08
      TARGETMAPSET=results
     
      # define as env. variable the batch job which does the real GRASS elaboration (so, which contains the GRASS commands)
      # (job scripts need executable flag be set)
      GRASS_BATCH_JOB=/shareddisk/modis_job.sh
      grass64 -c -text /shareddisk/grassdata/$MYLOC/$MYMAPSET
     
      # since we write results to a temporary mapset, copy over result to target mapset
      # this we launch again as small GRASS job
      export INMAP=${CURRMAP}_rst
      export INMAPSET=$MYMAPSET
      export OUTMAP=$INMAP
      export GRASS_BATCH_JOB=/shareddisk/gcopyjob.sh
      grass64 -c -text  /shareddisk/grassdata/$MYLOC/$TARGETMAPSET
      if [ $? -ne 0 ] ; then
        # race condition with other copy job...
        echo "ERROR while running $GRASS_BATCH_JOB, retrying..."
        sleep 2
        $HOME/binaries/bin/grass64 -text grassdata/$MYLOC/$TARGETMAPSET
        if [ $? -ne 0 ] ; then
            echo "ERROR while running $GRASS_BATCH_JOB, retrying..."
            sleep 7
            $HOME/binaries/bin/grass64 -text grassdata/$MYLOC/$TARGETMAPSET
            if [ $? -ne 0 ] ; then
              echo "FINAL ERROR while running $GRASS_BATCH_JOB, giving up!"
              exit 1
            else
              echo "Done $GRASS_BATCH_JOB of <$INMAP>"
            fi
        else
            echo "Done $GRASS_BATCH_JOB of <$INMAP>"
        fi
      else
        echo "Done $GRASS_BATCH_JOB of <$INMAP>"
      fi


Processing the map in chunks requires setting a different region for
each command. That can be done by creating named regions and using the
WIND_OVERRIDE environment variable, e.g.:


      exit 0
      ------- snap ----------
</source>
You see, that GRASS is run twice. Note that you need GRASS Version >=6.3 to make use of GRASS_BATCH_JOB (if variable is present, GRASS automatically executes that job instead of launching the normal interactive user interface).
The script 'gcopyjob.sh' simply contains:
<source lang="bash">
<source lang="bash">
      ------- snip -----------
      g.region ... save=region1
      #!/bin/sh
      g.region ... save=region2
     
      ...
      # copy files from one mapset to another avoiding race conditions on target mapset
      WIND_OVERRIDE=region1 r.proj ... &
      LIST=`g.mlist type=rast mapset=$INMAPSET`
      WIND_OVERRIDE=region2 r.proj ... &
     
      ...
      for map in $LIST ; do
        g.copy rast=$map@$INMAPSET,$map --o
        if [ $? -ne 0 ] ; then
                # maybe race condition with other copy job...
                g.message -e 'ERROR while <g.copy ...>, retrying...'
                sleep 2
                g.copy rast=$map@$INMAPSET,$map --o
                if [ $? -ne 0 ] ; then
                        g.message -e 'ERROR while <g.copy ...>, retrying...'
                        sleep 3
                        g.copy rast=$map@$INMAPSET,$map --o
                        if [ $? -ne 0 ] ; then
                                g.message -e 'FINAL ERROR while <g.copy ...>, giving up!'
                                exit 1
                        else
                                echo "Done g.copy of <$map>"
                        fi
                else
                        echo "Done g.copy of <$map>"
                fi
        else
                echo "Done g.copy of <$map>"
        fi
      done
     
      ------- snap ----------
</source>
</source>


'''Launching many jobs:'''
(for python see the grass.use_temp_region() function)


We do this by simply looping over all map names to elaborate:
The main factor which is likely to affect parallelism is the fact that the processes won't share their caches, so there'll be some degree of inefficiency if there's substantial overlap between the source areas for the processes.
<source lang="bash">
      cd /shareddisk/
      # loop and launch (we just pick the names from the GRASS DB itself; here: do all maps)
      # instead of launching immediately, we create a launch script:
      for i in `find grassdata/myloc/modis_originals/cell/ -name '*'` ; do
          NAME=`basename $i`
          echo qsub -v MYMODIS=$NAME ./launch_grassjob_55min.sh
      done | sort > launch1.sh


      # now really launch the jobs:
If you have more than one such map to project, processing entire maps in parallel might be a better choice (so that you get N maps projected in 10 hours rather than 1 map in 10/N hours).
      sh launch1.sh
</source>


That's it! Emails will arrive to notify upon begin, abort (hopefully not!) and end of job execution.
== Parallelized code ==


=== Grid Engine ===
=== OpenMP ===


* URL: <strike>[http://waybackmachine.org/*/gridengine.sunsource.net http://gridengine.sunsource.net/]</strike> (new site at http://sourceforge.net/projects/gridscheduler/)
Good for a single system with a multi-core CPU.
* Lauching jobs: qsub
* Navigating the Grid Engine System with GUI: qmon
* Job statstics: qstat -f
* User statstics: qacct -o


Example script 'launch_grassjob_on_GE.sh' to lauch GRASS jobs with Grid Engine:
Configure GRASS 7 with:
<source lang="bash">
  ./configure --with-openmp
#!/bin/sh
   
# Serial job, MODIS LST elaboration
# Markus Neteler, 2008, 2011
## GE settings
# request Bourne shell as shell for job
#$ -S /bin/sh
# run in current working directory
#$ -cwd
# We want Grid Engine to send mail when the job begins and when it ends.
#$ -M neteler@somewhere.it
#$ -m abe
# uncomment next line for bash debug output
#set -x
############
# SUBMIT from home dir to node:
#  cd $HOME
#  qsub -cwd -l mem_free=3000M -v MYMODIS=aqua_lst1km20020706.LST_Night_1km.filt launch_grassjob_on_GE.sh
#
# WATCH
#  watch 'qstat | grep "$USER\|job-ID"'
#    Under the state column you can see the status of your job. Some of the codes are
#    * r: the job is running
#    * t: the job is being transferred to a cluster node
#    * qw: the job is queued (and not running yet)
#    * Eqw: an error occurred with the job
#
###########


# better say where to find libs and bins:
==== GPDE using OpenMP ====
export PATH=$PATH:$HOME/binaries/bin:/usr/local/bin:$HOME/sge_batch_jobs
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/binaries/lib/


# generate machine (blade) unique TMP string
The only parallelized library in GRASS >=6.3 is GRASS Partial Differential Equations Library (GPDE) and the gmath library in GRASS 7. Read more in [[OpenMP]].
UNIQUE=`mktemp`
MYTMP=`basename $UNIQUE`


# path to GRASS binaries and libraries:
=== Python ===
export GISBASE=/usr/local/grass-6.4.1svn
export PATH=$PATH:$GISBASE/bin:$GISBASE/scripts
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GISBASE/lib


# use process ID (PID) as GRASS lock file number:
[http://grass.osgeo.org/grass73/manuals/libpython/pygrass.modules.interface.html?highlight=parallelmodulequeue#pygrass.modules.interface.module.ParallelModuleQueue PyGRASS ParallelModuleQueue]
export GIS_LOCK=$$
export GRASS_MESSAGE_FORMAT=plain
export TERM=linux


# use Grid Engine jobid + unique string as MAPSET to avoid GRASS lock
=== pthreads ===
MYMAPSET=sge.$JOB_ID.$MYTMP
MYLOC=patUTM32
MYUSER=$MYMAPSET
TARGETMAPSET=modisLSTinterpolation


# Note: remember to to use env vars for variable transfer
Note: only used in the r.mapcalc ''parser''!
GRASS_BATCH_JOB=$HOME/binaries/bin/modis_interpolation_GRASS_RST.sh


# Note: remember to to use env vars for variable transfer
Good for a single system with a multi-core CPU.
export MODISMAP="$MYMODIS"


# print nice percentages:
Configure GRASS 7 with:
export GRASS_MESSAGE_FORMAT=plain
./configure --with-pthread


################ nothing to change below ############
The ''parser'' of {{cmd|r.mapcalc}} in GRASS 7 has been parallelized using GNU {{wikipedia|pthreads}}. The computation itself is executed serially.
#execute_your_command


echo "************ Starting job at `date` on blade `hostname` *************"
=== Bourne and Python Scripts ===


# DEBUG turn on bash debugging, i.e. print all lines to
Good for a single system with a multi-core CPU.
# standard error before they are executed
# set -x


echo "SGE_ROOT $SGE_ROOT"
Often very easy & can be done without modification to the main source code.
echo "The cell in which the job runs $SGE_CELL. Got $NSLOTS slots on $NHOSTS hosts"


# temporarily store *locally* the stuff to avoid NFS overflow/locking problem
* See the [[Parallelizing Scripts]] wiki page
## Remove XXX to enable
grep XXX/storage/local /etc/mtab >/dev/null && (
if test "$MYMAPSET" = ""; then
  echo "You are crazy to not define \$MYMAPSET. Emergency exit."
  exit 1
else
  rm -rf /storage/local/$MYMAPSET
fi
        # (create new mapset on the fly)
mkdir /storage/local/$MYMAPSET
ln -s /storage/local/$MYMAPSET /grassdata/$MYLOC/$MYMAPSET
if [ $? -ne 0 ] ; then
  echo "ERROR: Something went wrong creating the local storage link..."
  exit 1
fi
) || ( # in case that /storage/local is unmounted:
        # (create new mapset on the fly)
        mkdir /grassdata/$MYLOC/$MYMAPSET
)


=== OpenMPI ===


# Set the global grassrc file to individual file name
Good for a multi-system cluster connected by a fast network.
MYGISRC="$HOME/.grassrc6.$MYUSER.`uname -n`.$MYTMP"


#generate GISRCRC
The {{AddonCmd|GIPE}} ''i.vi.mpi'' addon module has been created as a MPI ({{wikipedia|Message Passing Interface}}) implementation of the {{AddonCmd|GIPE}} ''i.vi'' addon module.
echo "GISDBASE: /grassdata" > "$MYGISRC"
* See also the [[Agriculture and HPC]] wiki page.
echo "LOCATION_NAME: $MYLOC" >> "$MYGISRC"
echo "MAPSET: $MYMAPSET" >> "$MYGISRC"
echo "GRASS_GUI: text" >> "$MYGISRC"


# path to GRASS settings file
=== MPI Programming ===
export GISRC=$MYGISRC


# fix WIND in the newly created mapset
There is a sample implementation at module level in [https://github.com/OSGeo/grass-addons/tree/grass8/src/imagery/i.vi.mpi i.vi.mpi]
cp "/grassdata/$MYLOC/PERMANENT/DEFAULT_WIND" "/grassdata/$MYLOC/$MYMAPSET/WIND"
db.connect -c --quiet


# run the GRASS job:
=== GPU Programming ===
. $GRASS_BATCH_JOB


# cleaning up temporary files
Good for certain kinds of calculations (e.g. ray-tracing) on a single system with a fast graphics card.
$GISBASE/etc/clean_temp > /dev/null
rm -f ${MYGISRC}
rm -rf /tmp/grass6-$USER-$GIS_LOCK
 
# move data back from local disk to NFS
## Remove XXX to enable
grep XXX/storage/local /etc/mtab >/dev/null && (
if test "$MYMAPSET" = ""; then
  echo "You are crazy to not define \$MYMAPSET. Emergency exit."
  exit 1
else
  # rm temporary link:
  rm -fr /grassdata/patUTM32/$MYMAPSET/.tmp
  rm -f /grassdata/$MYLOC/$MYMAPSET
  # copy stuff
  cp -rp /storage/local/$MYMAPSET /grassdata/$MYLOC/$MYMAPSET
  if [ $? -ne 0 ] ; then
    echo "ERROR: Something went wrong copying back from local storage to NFS directory..."
    exit 1
  fi
  rm -rf /storage/local/$MYMAPSET
fi
)


# leave breadcrumb to find related mapsets back when moving results
There is a version of the {{cmd|r.sun}} module which has been modified to use {{wikipedia|OpenCL}}. (works; still experimental)
# into final mapset:
touch /grassdata/$MYLOC/$MYMAPSET/$JOB_NAME


echo "************ Finished at `date` *************"
* See [[GPU]]
exit 0
</source>


Configure GRASS GIS with:
  ./configure --with-opencl


The '''real GRASS job''' is just the bare script (no #!/bin/sh nor exit 0 must be included) containing the commands to be executed. As defined in 'launch_grassjob_on_GE.sh', it must be stored as '$HOME/binaries/bin/modis_interpolation_GRASS_RST.sh':
== Cluster and Grid computing ==
 
<source lang="bash">
# GRASS job for usage in Grid Engine
###################################
# MN 2008, 2011
 
# $MYMODIS is passed on via qsub job submission (-v variable)
r.info $MYMODIS
# do something
</source>


Finally, since we leave the temporary mapsets as is in the Grid Engine processing, we have to '''collect all results''' and store them into the target mapset. We do this by checking all mapsets which contain the breadcrumb left therein (launch job script name as emtpy file, see above). This scripts 'gcollector_job.sh' checks and transfers the map(s):
A cluster or grid computing system  consists of a number of computers that are tightly coupled together. The manager or master controls the utilization of compute nodes.
<source lang="bash">
#!/bin/sh


# Markus Neteler, 2011
=== Job scheduler ===
# Copy all raster maps from the temporaneous Grid Engine mapsets into target mapset


if [ $# -ne 3 ] ; then
Common job schedulers are [https://slurm.schedmd.com/ SLURM], [http://www.adaptivecomputing.com/products/open-source/torque/ TORQUE], [http://www.pbspro.org/ PBS], [https://arc.liv.ac.uk/trac/SGE Son of Grid Engine], and [https://kubernetes.io/ kubernetes]
  echo "Script to move Grid Engine job results to GRASS target mapset"
  echo ""
  echo "Usage: $0 targetlocation targetmapset name_of_launch_script"
  echo "Example:"
  echo "  $0 patUTM32  modis_final launch_grassjob_on_GE.sh"
  exit 0
fi


# define location to work in
See also [https://slurm.schedmd.com/rosetta.html Rosetta Stone of Workload Managers]
GRASSDBROOT=/grassdata
MYLOC=$1
MYMAPSET=$2
BREADCRUMBFILE=$3


# path to GRASS binaries and libraries:
A job consists of tasks, e.g. processing of a single raster map in a time series of many raster maps. Jobs are assigned to a queue and started as soon as a slot in the queue is free. Jobs are removed from the queue once they finished.
export GISBASE=/usr/local/grass-6.4.1svn
export PATH=$PATH:$GISBASE/bin:$GISBASE/scripts
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GISBASE/lib


# use process ID (PID) as GRASS lock file number:
=== GRASS on a cluster ===
export GIS_LOCK=$$
export GRASS_MESSAGE_FORMAT=plain


# path to GRASS settings file
If you want to launch several GRASS jobs in parallel, you might consider to launch each job in its own mapset.
export GISRC=$HOME/.grassrc6
GISDBASE=`cat $GISRC | grep GISDBASE | cut -d' ' -f2`


TOCOLLECTFILE=to_be_collected_$JOB_NAME.csv
* set up chunks of data to be processed (temporal or spatial, temporal chunks are usually easier to handle)
find $GRASSDBROOT/$MYLOC/sge* -type f -name "$BREADCRUMBFILE" > $GRASSDBROOT/$MYLOC/$TOCOLLECTFILE
* write a script with the actual processing of one chunk
* write a script that initializes GRASS, creates a unique mapset, executes the script with the actual processing, and copies the results to a common mapset
* add that script as a task to a job, create one job for each data chunk


rm -f $GRASSDBROOT/$MYLOC/clean_$TOCOLLECTFILE
The general concept is to create a script that
for myname in `cat $GRASSDBROOT/$MYLOC/$TOCOLLECTFILE` ; do
    basename `dirname $myname` >> $GRASSDBROOT/$MYLOC/clean_$TOCOLLECTFILE
done


LIST=`cat $GRASSDBROOT/$MYLOC/clean_$TOCOLLECTFILE`
# creates a unique temporary mapset
rm -f $GRASSDBROOT/$MYLOC/clean_$TOCOLLECTFILE
# creates a unique temporary GISRC file for this mapset
# does the processing in this mapset
# changes to the target mapset by updating the GISRC file, verify with g.gisenv
# copies results from the temporary mapset to the target mapset
# deletes the temporary mapset and GISRC file


countmapsets=0
Such a script should not call grassXY and must be executable outside GRASS, because it is establishing a GRASS session, performing some processing, and closing the GRASS session, all by itself. See also [[GRASS_and_Shell|GRASS and Shell]].
countmaps=0
for mapset in $LIST ; do
    countmapsets=`expr $countmapsets + 1`
    MAPS=`g.mlist rast mapset=$mapset`
    for map in $MAPS ; do
        countmaps=`expr $countmaps + 1`
g.copy rast=$map@$mapset,$map --o
    done
done


rm -f $GRASSDBROOT/$MYLOC/$MYMAPSET/.gislock
Such a script will take arguments to specify the particular data to be processed.
unset $GISBASE $GIS_LOCK $GRASS_MESSAGE_FORMAT $GISRC


echo "----------------------------------------------------"
A job specification for an HPC job scheduler would then contain this script with specific arguments.
echo "In total $countmaps maps transferred from $countmapsets mapsets into $MYMAPSET"
echo "Now you can remove all maps with:
  cd $GISDBASE/$MYLOC/
  rm -rf \`cat $GRASSDBROOT/$MYLOC/$TOCOLLECTFILE | sed \"s+$BREADCRUMBFILE++g\"\`
  rm -f $GRASSDBROOT/$MYLOC/$TOCOLLECTFILE
"
exit 0
</source>


The common bottleneck when using GRASS on a cluster is often disk I/O. Try to start the jobs with nice/ionice to reduce strain on the storage devices.


To submit a '''single''' job from home directory to cluster node, enter:
== Cloud computing ==
<source lang="bash">
cd $HOME
qsub -cwd -l mem_free=3000M -v MYMODIS=aqua_lst1km20020706.LST_Night_1km.filt launch_grassjob_on_GE.sh
sh gcollector_job.sh patUTM32 modis_final launch_grassjob_on_GE.sh
</source>


GRASS GIS is running in the cloud as web processing service backend. Have a look at:


To submit a '''series of jobs''', use a "for" loop as described in the PBS section above.
{{YouTube|jg2pb_Xjq8Y|desc=GRASS 7 in the cloud (by Sören Gebbert)}}
In the HOME directory stdout and stderr logs will be stored for each job.
After '''all jobs are completed''', run the 'gcollector_job.sh' script.


=== OpenMosix ===
This Open Cloud GIS has been set up in a private Amazon compatible cloud environment using:
* Ubuntu 10.04 LTS and 10.10 cloud server edition
* Eucalyptus Cloud
* GRASS GIS 7 latest svn
* PyWPS latest svn
* wps-grass-bridge latest svn
* QGIS 1.7 with a modified QWPS plugin


''NOTE: The openMosix Project has officially closed as of March 1, 2008.''
For latest development, visit: https://github.com/actinia-org/actinia-core


If you want to launch several GRASS jobs in parallel, you have to launch each job in its own mapset. Be sure to indicate the mapset correctly in the GISRC file (see above). You can use the process ID (PID, get with $$ or use PBS jobname) to generate a almost unique number which you can add to the mapset name.
== GRASS GIS on VPS ==


Instructions to run GRASS GIS on a commercial VPS to do some memory-intensive operations:


Now you could launch the jobs on an [http://openmosix.sourceforge.net/ openMosix cluster] (just install openMosix on your colleague's computers...).
https://plantarum.ca/2014/08/19/medium-performance-cluster-computing/


== Hints for NFS users ==
== Hints for NFS users ==
* AVOID script forking on the cluster (but inclusion via ". script.sh" works ok). This means that the GRASS_BATCH_JOB approach is prone to '''fail'''. It is highly recommended to simply set a series of environmental variables to define the GRASS session, see [[GRASS_and_Shell#Automated_batch_jobs:_Setting_the_GRASS_environmental_variables|here]] how to do that.
* AVOID script forking on the cluster (but inclusion via ". script.sh" works ok). This means that the GRASS_BATCH_JOB approach is prone to '''fail'''. It is highly recommended to simply set a series of environmental variables to define the GRASS session, see [[GRASS_and_Shell#Automated_batch_jobs:_Setting_the_GRASS_environmental_variables|here]] how to do that.
* be careful with concurrent file writing (use "lockfile" locking, the lockfile command is provided by [http://www.procmail.org/ procmail]);
* be careful with concurrent file writing (use "lockfile" locking, the lockfile command is provided by [http://www.procmail.org/ procmail]);
* store as much temporary data as possible (even the final maps) on the local blade disks if you have.
* store as much temporary data as possible (even the final maps) on the local blade disks if you have.
* finally collect all results from local blade disks *after* the parallel job execution in a sequential collector jog (I am writing that now) to not kill NFS. For example, Grid Engine offers a "hold" function to only execute the collector job after having done the rest.
* finally collect all results from local blade disks *after* the parallel job execution in a sequential collector job (I am writing that now) to not kill NFS. For example, Grid Engine offers a "hold" function to only execute the collector job after having done the rest.
* If all else fails, and the I/O load is not too great, consider using {{wikipedia|sshfs}} with ssh passkeys instead of NFS.
* In some situations it is necessary to preserve the same directory structure on all nodes, and symlinks are a nice way to do that, but some (closed source 3rd party which will remain nameless) software insists on expanding symlinks. In this situation the [http://code.google.com/p/bindfs/ bindfs] FUSE extension can help. It is safer to use than "mount" binds, and you don't have to be root to set them up. As with ''sshfs'' there is a performance penalty so it may not be appropriate in high I/O situations.
 
== Error: Too many open files ==
 
When working with long time series and {{cmd|r.series}} starts to complain that files are missing/not readable or the message
Too many open files
 
For a solution, see [[Large_raster_data_processing#Number_of_open_files_limitation]]


== Misc Tips & Tricks ==
== Misc Tips & Tricks ==
See the poor-man's multi-processing script on the [[Parallelizing Scripts]] wiki page. This approach has been used in the {{cmd|r3.in.xyz}} script.
== Workshop on Parallelization ==
Haedrich, C., Petrasova, A. Parallelization for big EO data processing, OpenGeoHub Summer School 2023
* https://doi.org/10.5446/63123


* When working with long time series and {{cmd|r.series}} starts to complain that files are missing/not readable, check if you opened more than 1024 files in this process. this is the typical limit (check with ulimit -n). To overcome this problem, the admin has to add in /etc/security/limits.conf something like this:
'''Schedule'''


  # Limit user nofile - max number of open files
Session 1
  * soft  nofile 1500
* [Slides] Introduction to Parallelization, GRASS GIS and Python
  * hard  nofile 1800
* [Lab] Introduction to Parallelization with GRASS GIS and Python Notebook


For less invasive solution which still requires root access, see [http://lists.osgeo.org/pipermail/grass-dev/2008-November/040886.html here].
Session 2
* [Lab] Parallelization Case Study: Urban Growth Modeling


* See the poor-man's multi-processing script on the [[OpenMP]] wiki page.
Jupyter Notebook:
* https://github.com/ncsu-geoforall-lab/opengeohub-2023
** https://github.com/ncsu-geoforall-lab/opengeohub-2023/blob/main/01_intro_to_GRASS_parallelization.ipynb


== See also ==
== See also ==


This Wiki:
* [[GRASS_and_Shell#Automated_batch_jobs:_Setting_the_GRASS_environmental_variables|GRASS batch jobs]] (by settings env. variables)
* The [[OpenMP]] wiki page.
* The [[OpenMP]] wiki page.
* [http://gfoss.blogspot.com/2008/11/building-cluster-for-grass-gis-and.html Building a cluster for GRASS GIS and other software from the OSGeo stack]
* The [[Parallelizing Scripts]] wiki page.
* [[GPU]]
* [[GPU]] computing


Elsewhere:
* [https://neteler.org/blog/building-a-cluster-for-grass-gis-and-other-software-from-the-osgeo-stack/ Building a cluster for GRASS GIS and other software from the OSGeo stack]
* [https://research.csc.fi/geocomputing Using supercomputers for spatial analysis]
* [https://github.com/csc-training/geocomputing/tree/master/grass GRASS batch job and parallelization examples for a supercomputer with SLURM]


[[Category:Parallelization]]
[[Category:Parallelization]]
[[Category: massive data analysis]]

Latest revision as of 19:43, 27 December 2023

Parallel GRASS jobs

The idea of parallel GRASS GIS jobs is to speed up the computation.

Background

This presentation from 2022 is a "must see":

Tips for parallelization in GRASS GIS

This you should know about GRASS' behaviour concerning multiple jobs:

  • You can run multiple processes in multiple locations (what's that?). Peaceful coexistence.
  • You can run multiple processes in the same mapset, but only if the region is untouched. If you are unsure, it's recommended to launch each job in its own mapset within the location.
  • You can run multiple processes in the same location, but in different mapsets. Peaceful coexistence.

Approaches

See also the Parallelizing Scripts wiki page

File locking

GRASS doesn't perform any locking on the files within a GRASS database, so the user may end up with one process reading a file while another process is in the middle of writing it. The most problematic case is the WIND file, which contains the current region, although there are others.

If a user wants to run multiple commands concurrently, steps need to be taken to ensure that this type of conflict doesn't happen. For the current region, the user can use the WIND_OVERRIDE environment variable to specify a named region which should be used instead of the WIND file.

Or the user can use the GRASS_REGION environment variable to specify the region parameters (the syntax is the same as the WIND file, but with newlines replaced with semicolons). With this approach, the region can only be read, not modified.

Problems can also arise if the user reads files from another mapset while another session is modifying those files. The WIND file isn't an issue here, nor are the files containing raster data (which are updated atomically), but the various support files may be.

See below for ways around these limitations.

Working with tiles

Huge map reprojection example:

Q: I'd like to try splitting a large raster into small chunks and then projecting each one separately, sending the project command to the background. The problem is that, if the GRASS command changes the region settings, things might not work.

A: r.proj doesn't change the region.

Processing the map in chunks requires setting a different region for each command. That can be done by creating named regions and using the WIND_OVERRIDE environment variable, e.g.:

       g.region ... save=region1
       g.region ... save=region2
       ...
       WIND_OVERRIDE=region1 r.proj ... &
       WIND_OVERRIDE=region2 r.proj ... &
       ...

(for python see the grass.use_temp_region() function)

The main factor which is likely to affect parallelism is the fact that the processes won't share their caches, so there'll be some degree of inefficiency if there's substantial overlap between the source areas for the processes.

If you have more than one such map to project, processing entire maps in parallel might be a better choice (so that you get N maps projected in 10 hours rather than 1 map in 10/N hours).

Parallelized code

OpenMP

Good for a single system with a multi-core CPU.

Configure GRASS 7 with:

./configure --with-openmp

GPDE using OpenMP

The only parallelized library in GRASS >=6.3 is GRASS Partial Differential Equations Library (GPDE) and the gmath library in GRASS 7. Read more in OpenMP.

Python

PyGRASS ParallelModuleQueue

pthreads

Note: only used in the r.mapcalc parser!

Good for a single system with a multi-core CPU.

Configure GRASS 7 with:

./configure --with-pthread

The parser of r.mapcalc in GRASS 7 has been parallelized using GNU pthreads. The computation itself is executed serially.

Bourne and Python Scripts

Good for a single system with a multi-core CPU.

Often very easy & can be done without modification to the main source code.

OpenMPI

Good for a multi-system cluster connected by a fast network.

The GIPE i.vi.mpi addon module has been created as a MPI (Message Passing Interface) implementation of the GIPE i.vi addon module.

MPI Programming

There is a sample implementation at module level in i.vi.mpi

GPU Programming

Good for certain kinds of calculations (e.g. ray-tracing) on a single system with a fast graphics card.

There is a version of the r.sun module which has been modified to use OpenCL. (works; still experimental)

Configure GRASS GIS with:

 ./configure --with-opencl

Cluster and Grid computing

A cluster or grid computing system consists of a number of computers that are tightly coupled together. The manager or master controls the utilization of compute nodes.

Job scheduler

Common job schedulers are SLURM, TORQUE, PBS, Son of Grid Engine, and kubernetes

See also Rosetta Stone of Workload Managers

A job consists of tasks, e.g. processing of a single raster map in a time series of many raster maps. Jobs are assigned to a queue and started as soon as a slot in the queue is free. Jobs are removed from the queue once they finished.

GRASS on a cluster

If you want to launch several GRASS jobs in parallel, you might consider to launch each job in its own mapset.

  • set up chunks of data to be processed (temporal or spatial, temporal chunks are usually easier to handle)
  • write a script with the actual processing of one chunk
  • write a script that initializes GRASS, creates a unique mapset, executes the script with the actual processing, and copies the results to a common mapset
  • add that script as a task to a job, create one job for each data chunk

The general concept is to create a script that

  1. creates a unique temporary mapset
  2. creates a unique temporary GISRC file for this mapset
  3. does the processing in this mapset
  4. changes to the target mapset by updating the GISRC file, verify with g.gisenv
  5. copies results from the temporary mapset to the target mapset
  6. deletes the temporary mapset and GISRC file

Such a script should not call grassXY and must be executable outside GRASS, because it is establishing a GRASS session, performing some processing, and closing the GRASS session, all by itself. See also GRASS and Shell.

Such a script will take arguments to specify the particular data to be processed.

A job specification for an HPC job scheduler would then contain this script with specific arguments.

The common bottleneck when using GRASS on a cluster is often disk I/O. Try to start the jobs with nice/ionice to reduce strain on the storage devices.

Cloud computing

GRASS GIS is running in the cloud as web processing service backend. Have a look at:


GRASS 7 in the cloud (by Sören Gebbert)

This Open Cloud GIS has been set up in a private Amazon compatible cloud environment using:

  • Ubuntu 10.04 LTS and 10.10 cloud server edition
  • Eucalyptus Cloud
  • GRASS GIS 7 latest svn
  • PyWPS latest svn
  • wps-grass-bridge latest svn
  • QGIS 1.7 with a modified QWPS plugin

For latest development, visit: https://github.com/actinia-org/actinia-core

GRASS GIS on VPS

Instructions to run GRASS GIS on a commercial VPS to do some memory-intensive operations:

https://plantarum.ca/2014/08/19/medium-performance-cluster-computing/

Hints for NFS users

  • AVOID script forking on the cluster (but inclusion via ". script.sh" works ok). This means that the GRASS_BATCH_JOB approach is prone to fail. It is highly recommended to simply set a series of environmental variables to define the GRASS session, see here how to do that.
  • be careful with concurrent file writing (use "lockfile" locking, the lockfile command is provided by procmail);
  • store as much temporary data as possible (even the final maps) on the local blade disks if you have.
  • finally collect all results from local blade disks *after* the parallel job execution in a sequential collector job (I am writing that now) to not kill NFS. For example, Grid Engine offers a "hold" function to only execute the collector job after having done the rest.
  • If all else fails, and the I/O load is not too great, consider using sshfs with ssh passkeys instead of NFS.
  • In some situations it is necessary to preserve the same directory structure on all nodes, and symlinks are a nice way to do that, but some (closed source 3rd party which will remain nameless) software insists on expanding symlinks. In this situation the bindfs FUSE extension can help. It is safer to use than "mount" binds, and you don't have to be root to set them up. As with sshfs there is a performance penalty so it may not be appropriate in high I/O situations.

Error: Too many open files

When working with long time series and r.series starts to complain that files are missing/not readable or the message

Too many open files

For a solution, see Large_raster_data_processing#Number_of_open_files_limitation

Misc Tips & Tricks

See the poor-man's multi-processing script on the Parallelizing Scripts wiki page. This approach has been used in the r3.in.xyz script.

Workshop on Parallelization

Haedrich, C., Petrasova, A. Parallelization for big EO data processing, OpenGeoHub Summer School 2023

Schedule

Session 1

  • [Slides] Introduction to Parallelization, GRASS GIS and Python
  • [Lab] Introduction to Parallelization with GRASS GIS and Python Notebook

Session 2

  • [Lab] Parallelization Case Study: Urban Growth Modeling

Jupyter Notebook:

See also

This Wiki:

Elsewhere: