Parallel GRASS jobs: Difference between revisions
(gcollector cleanup fixed) |
(PBS removed as outdated; layout improved) |
||
Line 26: | Line 26: | ||
== Cluster and Grid computing == | == Cluster and Grid computing == | ||
=== Grid Engine === | |||
* URL: <strike>[http://waybackmachine.org/*/gridengine.sunsource.net http://gridengine.sunsource.net/]</strike> (new site at http://sourceforge.net/projects/gridscheduler/) | |||
* | * Lauching jobs: qsub | ||
* | * Navigating the Grid Engine System with GUI: qmon | ||
* Job statstics: qstat -f | |||
* User statstics: qacct -o | |||
'''General steps (for multiple serial jobs on many CPUs):''' | '''General steps (for multiple serial jobs on many CPUs):''' | ||
* Job definition | * Job definition | ||
** | ** Grid Engine setup (in the header): define calculation time, number of nodes, number of processors, amount of RAM for individual job; | ||
** data are stored in centralized directory which is seen by all nodes; | ** data are stored in centralized directory which is seen by all nodes, often mounted via NFS; | ||
* Job execution (launch of jobs) | * Job execution (launch of jobs) | ||
** user launches all jobs ("qsub"), they are submitted to the queue | ** user launches all jobs ("qsub"), they are submitted to the queue. | ||
** the scheduler optimizes among all user the execution of the jobs according to available resources and requested resources; | ** the scheduler optimizes among all user the execution of the jobs according to available resources and requested resources; | ||
** for the user this means that 0..max jobs are executed in parallel (unless the administrators didn't define either priority or limits). The user can then observe the job queue (" | ** for the user this means that 0..max jobs are executed in parallel (unless the administrators didn't define either priority or limits). The user can then observe the job queue ("qstat") to see other jobs ahead and scheduling of own jobs. Once a job is running, the cluster possibly sends a notification email to the user, the same again when a job is terminating or is aborted. | ||
** At the end of the | ** At the end of the worker script call a second batch job which only contains g.copy to copy the result into a common mapset. | ||
* Job planning | * Job planning | ||
** The challenging part for the user is to estimate | ** The challenging part for the user is to estimate the request for number of nodes and CPUs per node as well as the amount of needed RAM. Usually tests are needed to see the performance in order to impose the values correctly in the Grid Engine script (see below). | ||
==== The Grid Engine script ==== | |||
Job launcher: 'launch_grassjob_on_GE.sh' to launch GRASS jobs with Grid Engine: | |||
=== Grid Engine === | |||
<source lang="bash"> | <source lang="bash"> | ||
#!/bin/sh | #!/bin/sh | ||
Line 338: | Line 203: | ||
</source> | </source> | ||
==== The GRASS worker script ==== | |||
The '''real GRASS job''' is just the bare script (no #!/bin/sh nor exit 0 must be included) containing the commands to be executed. As defined in 'launch_grassjob_on_GE.sh', it must be stored as '$HOME/binaries/bin/modis_interpolation_GRASS_RST.sh': | The '''real GRASS job''' is just the bare script (no #!/bin/sh nor exit 0 must be included) containing the commands to be executed. As defined in 'launch_grassjob_on_GE.sh', it must be stored as '$HOME/binaries/bin/modis_interpolation_GRASS_RST.sh': | ||
Line 350: | Line 217: | ||
# do something | # do something | ||
</source> | </source> | ||
==== Collecting the results ==== | |||
Finally, since we leave the temporary mapsets as is in the Grid Engine processing, we have to '''collect all results''' and store them into the target mapset. We do this by checking all mapsets which contain the breadcrumb left therein (launch job script name as emtpy file, see above). This scripts 'gcollector_job.sh' checks and transfers the map(s): | Finally, since we leave the temporary mapsets as is in the Grid Engine processing, we have to '''collect all results''' and store them into the target mapset. We do this by checking all mapsets which contain the breadcrumb left therein (launch job script name as emtpy file, see above). This scripts 'gcollector_job.sh' checks and transfers the map(s): | ||
Line 421: | Line 291: | ||
</source> | </source> | ||
==== Launching a single job ==== | |||
To submit a '''single''' job from home directory to cluster node, enter: | To submit a '''single''' job from home directory to cluster node, enter: | ||
Line 429: | Line 301: | ||
</source> | </source> | ||
==== Launching many jobs ==== | |||
To submit a '''series of jobs''', use a "for" loop as described in the PBS section above. | To submit a '''series of jobs''', use a "for" loop as described in the PBS section above. | ||
In the HOME directory stdout and stderr logs will be stored for each job. | In the HOME directory stdout and stderr logs will be stored for each job. | ||
After '''all jobs are completed''', run the 'gcollector_job.sh' script. | |||
We do this by simply looping over all map names to elaborate: | |||
<source lang="bash"> | |||
cd $HOME | |||
# loop and launch (we just pick the names from the GRASS DB itself; here: do all maps) | |||
# instead of launching immediately, we create a launch script: | |||
for i in `find /grassdata/myloc/modis_originals/cell/ -name '*'` ; do | |||
NAME=`basename $i` | |||
echo qsub -v MYMODIS=$NAME ./launch_grassjob_on_GE.sh | |||
done | sort > launch1.sh | |||
# now really launch the jobs: | |||
sh launch1.sh | |||
</source> | |||
That's it! Emails will arrive to notify upon begin, abort (hopefully not!) and end of job execution. | |||
After '''all jobs are completed''', run the 'gcollector_job.sh' script from above | |||
=== PBS Scheduler === | |||
It works similar to Grid Engine, see above. | |||
=== OpenMosix === | === OpenMosix === |
Revision as of 22:10, 26 March 2011
Parallel GRASS jobs
NOTE: GRASS 6 libraries are NOT thread safe (except for GPDE, see below).
Background
This you should know about GRASS' behaviour concerning multiple jobs:
- You can run multiple processes in multiple locations (what's that?). Peaceful coexistence.
- You can run multiple processes in the same mapset, but only if the region is untouched (but it's really not recommended). Better launch each job in its own mapset within the location.
- You can run multiple processes in the same location, but in different mapsets. Peaceful coexistence.
Approach
Essentially there are at least two approaches of "poor man" parallelization without modifying GRASS source code:
- split map into spatial chunks (possibly with overlap to gain smooth results)
- time series: run each map elaboration on a different node.
Parallelized code
GPDE using OpenMP
The only parallelized library in GRASS >=6.3 is GRASS Partial Differential Equations Library (GPDE). Read more in OpenMP.
GPU Programming
- See GPU
Cluster and Grid computing
Grid Engine
- URL:
http://gridengine.sunsource.net/(new site at http://sourceforge.net/projects/gridscheduler/) - Lauching jobs: qsub
- Navigating the Grid Engine System with GUI: qmon
- Job statstics: qstat -f
- User statstics: qacct -o
General steps (for multiple serial jobs on many CPUs):
- Job definition
- Grid Engine setup (in the header): define calculation time, number of nodes, number of processors, amount of RAM for individual job;
- data are stored in centralized directory which is seen by all nodes, often mounted via NFS;
- Job execution (launch of jobs)
- user launches all jobs ("qsub"), they are submitted to the queue.
- the scheduler optimizes among all user the execution of the jobs according to available resources and requested resources;
- for the user this means that 0..max jobs are executed in parallel (unless the administrators didn't define either priority or limits). The user can then observe the job queue ("qstat") to see other jobs ahead and scheduling of own jobs. Once a job is running, the cluster possibly sends a notification email to the user, the same again when a job is terminating or is aborted.
- At the end of the worker script call a second batch job which only contains g.copy to copy the result into a common mapset.
- Job planning
- The challenging part for the user is to estimate the request for number of nodes and CPUs per node as well as the amount of needed RAM. Usually tests are needed to see the performance in order to impose the values correctly in the Grid Engine script (see below).
The Grid Engine script
Job launcher: 'launch_grassjob_on_GE.sh' to launch GRASS jobs with Grid Engine:
#!/bin/sh
# Serial job, MODIS LST elaboration
# Markus Neteler, 2008, 2011
## GE settings
# request Bourne shell as shell for job
#$ -S /bin/sh
# run in current working directory
#$ -cwd
# We want Grid Engine to send mail when the job begins and when it ends.
#$ -M neteler@somewhere.it
#$ -m abe
# uncomment next line for bash debug output
#set -x
############
# SUBMIT from home dir to node:
# cd $HOME
# qsub -cwd -l mem_free=3000M -v MYMODIS=aqua_lst1km20020706.LST_Night_1km.filt launch_grassjob_on_GE.sh
#
# WATCH
# watch 'qstat | grep "$USER\|job-ID"'
# Under the state column you can see the status of your job. Some of the codes are
# * r: the job is running
# * t: the job is being transferred to a cluster node
# * qw: the job is queued (and not running yet)
# * Eqw: an error occurred with the job
#
###########
# better say where to find libs and bins:
export PATH=$PATH:$HOME/binaries/bin:/usr/local/bin:$HOME/sge_batch_jobs
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/binaries/lib/
# generate machine (blade) unique TMP string
UNIQUE=`mktemp`
MYTMP=`basename $UNIQUE`
# path to GRASS binaries and libraries:
export GISBASE=/usr/local/grass-6.4.1svn
export PATH=$PATH:$GISBASE/bin:$GISBASE/scripts
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GISBASE/lib
# use process ID (PID) as GRASS lock file number:
export GIS_LOCK=$$
export GRASS_MESSAGE_FORMAT=plain
export TERM=linux
# use Grid Engine jobid + unique string as MAPSET to avoid GRASS lock
MYMAPSET=sge.$JOB_ID.$MYTMP
MYLOC=patUTM32
MYUSER=$MYMAPSET
TARGETMAPSET=modisLSTinterpolation
# Note: remember to to use env vars for variable transfer
GRASS_BATCH_JOB=$HOME/binaries/bin/modis_interpolation_GRASS_RST.sh
# Note: remember to to use env vars for variable transfer
export MODISMAP="$MYMODIS"
# print nice percentages:
export GRASS_MESSAGE_FORMAT=plain
################ nothing to change below ############
#execute_your_command
echo "************ Starting job at `date` on blade `hostname` *************"
# DEBUG turn on bash debugging, i.e. print all lines to
# standard error before they are executed
# set -x
echo "SGE_ROOT $SGE_ROOT"
echo "The cell in which the job runs $SGE_CELL. Got $NSLOTS slots on $NHOSTS hosts"
# temporarily store *locally* the stuff to avoid NFS overflow/locking problem
## Remove XXX to enable
grep XXX/storage/local /etc/mtab >/dev/null && (
if test "$MYMAPSET" = ""; then
echo "You are crazy to not define \$MYMAPSET. Emergency exit."
exit 1
else
rm -rf /storage/local/$MYMAPSET
fi
# (create new mapset on the fly)
mkdir /storage/local/$MYMAPSET
ln -s /storage/local/$MYMAPSET /grassdata/$MYLOC/$MYMAPSET
if [ $? -ne 0 ] ; then
echo "ERROR: Something went wrong creating the local storage link..."
exit 1
fi
) || ( # in case that /storage/local is unmounted:
# (create new mapset on the fly)
mkdir /grassdata/$MYLOC/$MYMAPSET
)
# Set the global grassrc file to individual file name
MYGISRC="$HOME/.grassrc6.$MYUSER.`uname -n`.$MYTMP"
#generate GISRCRC
echo "GISDBASE: /grassdata" > "$MYGISRC"
echo "LOCATION_NAME: $MYLOC" >> "$MYGISRC"
echo "MAPSET: $MYMAPSET" >> "$MYGISRC"
echo "GRASS_GUI: text" >> "$MYGISRC"
# path to GRASS settings file
export GISRC=$MYGISRC
# fix WIND in the newly created mapset
cp "/grassdata/$MYLOC/PERMANENT/DEFAULT_WIND" "/grassdata/$MYLOC/$MYMAPSET/WIND"
db.connect -c --quiet
# run the GRASS job:
. $GRASS_BATCH_JOB
# cleaning up temporary files
$GISBASE/etc/clean_temp > /dev/null
rm -f ${MYGISRC}
rm -rf /tmp/grass6-$USER-$GIS_LOCK
# move data back from local disk to NFS
## Remove XXX to enable
grep XXX/storage/local /etc/mtab >/dev/null && (
if test "$MYMAPSET" = ""; then
echo "You are crazy to not define \$MYMAPSET. Emergency exit."
exit 1
else
# rm temporary link:
rm -fr /grassdata/patUTM32/$MYMAPSET/.tmp
rm -f /grassdata/$MYLOC/$MYMAPSET
# copy stuff
cp -rp /storage/local/$MYMAPSET /grassdata/$MYLOC/$MYMAPSET
if [ $? -ne 0 ] ; then
echo "ERROR: Something went wrong copying back from local storage to NFS directory..."
exit 1
fi
rm -rf /storage/local/$MYMAPSET
fi
)
# leave breadcrumb to find related mapsets back when moving results
# into final mapset:
touch /grassdata/$MYLOC/$MYMAPSET/$JOB_NAME
echo "************ Finished at `date` *************"
exit 0
The GRASS worker script
The real GRASS job is just the bare script (no #!/bin/sh nor exit 0 must be included) containing the commands to be executed. As defined in 'launch_grassjob_on_GE.sh', it must be stored as '$HOME/binaries/bin/modis_interpolation_GRASS_RST.sh':
# GRASS job for usage in Grid Engine
###################################
# MN 2008, 2011
# $MYMODIS is passed on via qsub job submission (-v variable)
r.info $MYMODIS
# do something
Collecting the results
Finally, since we leave the temporary mapsets as is in the Grid Engine processing, we have to collect all results and store them into the target mapset. We do this by checking all mapsets which contain the breadcrumb left therein (launch job script name as emtpy file, see above). This scripts 'gcollector_job.sh' checks and transfers the map(s):
#!/bin/sh
# Markus Neteler, 2011
# Copy all raster maps from the temporaneous Grid Engine mapsets into target mapset
if [ $# -ne 3 ] ; then
echo "Script to move Grid Engine job results to GRASS target mapset"
echo ""
echo "Usage: $0 targetlocation targetmapset name_of_launch_script"
echo "Example:"
echo " $0 patUTM32 modis_final launch_grassjob_on_GE.sh"
exit 0
fi
# define location to work in
GRASSDBROOT=/grassdata
MYLOC=$1
MYMAPSET=$2
BREADCRUMBFILE=$3
# path to GRASS binaries and libraries:
export GISBASE=/usr/local/grass-6.4.1svn
export PATH=$PATH:$GISBASE/bin:$GISBASE/scripts
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GISBASE/lib
# use process ID (PID) as GRASS lock file number:
export GIS_LOCK=$$
export GRASS_MESSAGE_FORMAT=plain
# path to GRASS settings file
export GISRC=$HOME/.grassrc6
GISDBASE=`cat $GISRC | grep GISDBASE | cut -d' ' -f2`
TOCOLLECTFILE=to_be_collected_$JOB_NAME.csv
find $GRASSDBROOT/$MYLOC/sge* -type f -name "$BREADCRUMBFILE" > $GRASSDBROOT/$MYLOC/$TOCOLLECTFILE
rm -f $GRASSDBROOT/$MYLOC/clean_$TOCOLLECTFILE
for myname in `cat $GRASSDBROOT/$MYLOC/$TOCOLLECTFILE` ; do
basename `dirname $myname` >> $GRASSDBROOT/$MYLOC/clean_$TOCOLLECTFILE
done
LIST=`cat $GRASSDBROOT/$MYLOC/clean_$TOCOLLECTFILE`
rm -f $GRASSDBROOT/$MYLOC/clean_$TOCOLLECTFILE
countmapsets=0
countmaps=0
for mapset in $LIST ; do
countmapsets=`expr $countmapsets + 1`
MAPS=`g.mlist rast mapset=$mapset`
for map in $MAPS ; do
countmaps=`expr $countmaps + 1`
g.copy rast=$map@$mapset,$map --o
done
done
rm -f $GRASSDBROOT/$MYLOC/$MYMAPSET/.gislock
unset $GISBASE $GIS_LOCK $GRASS_MESSAGE_FORMAT $GISRC
echo "----------------------------------------------------"
echo "In total $countmaps maps transferred from $countmapsets mapsets into $MYMAPSET"
echo "Now you can remove all maps with:
cd $GISDBASE/$MYLOC/
rm -rf \`cat $GRASSDBROOT/$MYLOC/$TOCOLLECTFILE | sed \"s+$BREADCRUMBFILE++g\"\`
rm -f $GRASSDBROOT/$MYLOC/$TOCOLLECTFILE
"
exit 0
Launching a single job
To submit a single job from home directory to cluster node, enter:
cd $HOME
qsub -cwd -l mem_free=3000M -v MYMODIS=aqua_lst1km20020706.LST_Night_1km.filt launch_grassjob_on_GE.sh
sh gcollector_job.sh patUTM32 modis_final launch_grassjob_on_GE.sh
Launching many jobs
To submit a series of jobs, use a "for" loop as described in the PBS section above. In the HOME directory stdout and stderr logs will be stored for each job.
We do this by simply looping over all map names to elaborate:
cd $HOME
# loop and launch (we just pick the names from the GRASS DB itself; here: do all maps)
# instead of launching immediately, we create a launch script:
for i in `find /grassdata/myloc/modis_originals/cell/ -name '*'` ; do
NAME=`basename $i`
echo qsub -v MYMODIS=$NAME ./launch_grassjob_on_GE.sh
done | sort > launch1.sh
# now really launch the jobs:
sh launch1.sh
That's it! Emails will arrive to notify upon begin, abort (hopefully not!) and end of job execution.
After all jobs are completed, run the 'gcollector_job.sh' script from above
PBS Scheduler
It works similar to Grid Engine, see above.
OpenMosix
NOTE: The openMosix Project has officially closed as of March 1, 2008.
If you want to launch several GRASS jobs in parallel, you have to launch each job in its own mapset. Be sure to indicate the mapset correctly in the GISRC file (see above). You can use the process ID (PID, get with $$ or use PBS jobname) to generate a almost unique number which you can add to the mapset name.
Now you could launch the jobs on an openMosix cluster (just install openMosix on your colleague's computers...).
Hints for NFS users
- AVOID script forking on the cluster (but inclusion via ". script.sh" works ok). This means that the GRASS_BATCH_JOB approach is prone to fail. It is highly recommended to simply set a series of environmental variables to define the GRASS session, see here how to do that.
- be careful with concurrent file writing (use "lockfile" locking, the lockfile command is provided by procmail);
- store as much temporary data as possible (even the final maps) on the local blade disks if you have.
- finally collect all results from local blade disks *after* the parallel job execution in a sequential collector jog (I am writing that now) to not kill NFS. For example, Grid Engine offers a "hold" function to only execute the collector job after having done the rest.
Misc Tips & Tricks
- When working with long time series and r.series starts to complain that files are missing/not readable, check if you opened more than 1024 files in this process. this is the typical limit (check with ulimit -n). To overcome this problem, the admin has to add in /etc/security/limits.conf something like this:
# Limit user nofile - max number of open files * soft nofile 1500 * hard nofile 1800
For less invasive solution which still requires root access, see here.
- See the poor-man's multi-processing script on the OpenMP wiki page.