Parallelizing Scripts

From GRASS-Wiki
Revision as of 04:50, 19 June 2012 by ⚠️HamishBowman (talk | contribs) (init)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Bourne shell script

  • Poor-man's multithreading using Bourne shell script & backgrounding. WARNING: not all GRASS modules and scripts are safe to have other things happening in the same mapset while they are running. Try at your own risk after performing a suitable safety audit. e.g. Make sure g.region is not run, externally changing the region settings.

Example:

 ### r.sun mode 1 loop ###
 SUNRISE=7.67
 SUNSET=16.33
 STEP=0.01
 # | wc -l   867
 CORES=4
 
 DAY=355
 for TIME in `seq $SUNRISE $STEP $SUNSET` ; do
    echo "time=$TIME"
    CMD="r.sun -s elevin=gauss day=$DAY time=$TIME \
          beam_rad=rad1_test.${DAY}_${TIME}_beam --quiet"
 
    # poor man's multi-threading for a multi-core CPU
    MODULUS=`echo "$TIME $STEP $CORES" | awk '{print $1 % ($2 * $3)}'`
    if [ "$MODULUS" = "$STEP" ] || [ "$TIME" = "$SUNSET" ] ; then
       # stall to let the background jobs finish
       $CMD
       sleep 2
       wait
       #while [ `pgrep -c r.sun` -ne 0 ] ; do
       #   sleep 5
       #done
    else
      $CMD &
    fi
 done
 wait   # wait for background jobs to finish to avoid race conditions
  • This approach has been used in the r3.in.xyz addon script.
  • Another example using r.sun Mode 2 can be found on the r.sun wiki page.

Python

  • Due to the "GIL" in Python 2.x-3.0, pure python will only run on a single core, even when multi-threaded. All multithreading schemes & modules for (pure) Python are therefore wrappers around multiple system processes, which are a lot more expensive than threads to create and destroy. Thus it is more efficient to create large high-level Python 'threads' (processes) than to bury them deep inside of a loop.

Example of multiprocessing at the GRASS module level:

Similar to the Bourne shell example above, but using the subprocess python module. The i.oif script in GRASS7 is using this method.

bands = [1,2,3,4,5,7]

# run all bands in parallel
if "WORKERS" in os.environ:
    workers = int(os.environ["WORKERS"])
else:
    workers = 6

proc = {}
pout = {}

# spawn jobs in the background
for band in bands:
    grass.debug("band %d, <%s>  %% %d" % (band, image[band], band % workers))
    proc[band] = grass.pipe_command('r.univar', flags = 'g', map = image[band])
    if band % workers is 0:
	# wait for the ones launched so far to finish
	for bandp in bands[:band]:
	    if not proc[bandp].stdout.closed:
		pout[bandp] = proc[bandp].communicate()[0]
	    proc[bandp].wait()

# wait for jobs to finish, collect the output
for band in bands:
    if not proc[band].stdout.closed:
	pout[band] = proc[band].communicate()[0]
    proc[band].wait()

# parse the results
for band in bands:
    kv = grass.parse_key_val(pout[band])
    stddev[band] = float(kv['stddev'])