Parallel GRASS jobs

From GRASS-Wiki
Revision as of 15:48, 13 February 2008 by Neteler (talk | contribs) (new; PBS explained)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Parallel GRASS jobs

NOTE: GRASS 6 libraries are NOT thread safe.

OpenMosix

If you want to launch several GRASS jobs in parallel, you have to launch each job in its own mapset. Be sure to indicate the mapset correctly in the GISRC file (see above). You can use the process ID (PID, get with $$ or use PBS jobname) to generate a almost unique number which you can add to the mapset name.


Now you could launch the jobs on an openMosix cluster (just install openMosix on your colleague's computers...).

PBS

You need essentially two scripts:

  • GRASS job script (which takes the name(s) of map(s) to elaborate from environmental variables
  • script to launch this GRASS-script as job for each map to elaborate

Steps (for multiple serial jobs on many CPUs):

  • Job definition
    • PBS setup (in the header): define calculation time, number of nodes, number of processors, amount of RAM for individual job;
    • data are stored in centralized directory which is seen by all nodes;
  • Job execution (launch of jobs)
    • user launches all jobs ("qsub"), they are submitted to the queue. Use the GRASS_BATCH_JOB variable to define the name of the elaboration script.
    • the scheduler optimizes among all user the execution of the jobs according to available resources and requested resources;
    • for the user this means that 0..max jobs are executed in parallel (unless the administrators didn't define either priority or limits). The user can then observe the job queue ("showq") to see other jobs ahead and scheduling of own jobs. Once a job is running, the cluster possibly sends a notification email to the user, the same again when a job is terminating.
  • Job planning
    • The challenging part for the user is to estimate the execution time since PBS kills jobs which exceed the requested time. The same applies to the request for number of nodes and CPUs per node as well as the amount of needed RAM. Usually tests are needed to see the performance.