Parallelizing Scripts

From GRASS-Wiki
Revision as of 04:50, 19 June 2012 by ⚠️HamishBowman (talk | contribs) (init)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Bourne shell script

  • Poor-man's multithreading using Bourne shell script & backgrounding. WARNING: not all GRASS modules and scripts are safe to have other things happening in the same mapset while they are running. Try at your own risk after performing a suitable safety audit. e.g. Make sure g.region is not run, externally changing the region settings.

Example:

 ### r.sun mode 1 loop ###
 SUNRISE=7.67
 SUNSET=16.33
 STEP=0.01
 # | wc -l   867
 CORES=4
 
 DAY=355
 for TIME in `seq $SUNRISE $STEP $SUNSET` ; do
    echo "time=$TIME"
    CMD="r.sun -s elevin=gauss day=$DAY time=$TIME \
          beam_rad=rad1_test.${DAY}_${TIME}_beam --quiet"
 
    # poor man's multi-threading for a multi-core CPU
    MODULUS=`echo "$TIME $STEP $CORES" | awk '{print $1 % ($2 * $3)}'`
    if [ "$MODULUS" = "$STEP" ] || [ "$TIME" = "$SUNSET" ] ; then
       # stall to let the background jobs finish
       $CMD
       sleep 2
       wait
       #while [ `pgrep -c r.sun` -ne 0 ] ; do
       #   sleep 5
       #done
    else
      $CMD &
    fi
 done
 wait   # wait for background jobs to finish to avoid race conditions
  • This approach has been used in the r3.in.xyz addon script.
  • Another example using r.sun Mode 2 can be found on the r.sun wiki page.

Python

  • Due to the "GIL" in Python 2.x-3.0, pure python will only run on a single core, even when multi-threaded. All multithreading schemes & modules for (pure) Python are therefore wrappers around multiple system processes, which are a lot more expensive than threads to create and destroy. Thus it is more efficient to create large high-level Python 'threads' (processes) than to bury them deep inside of a loop.

Example of multiprocessing at the GRASS module level:

Similar to the Bourne shell example above, but using the subprocess python module. The i.oif script in GRASS7 is using this method.

bands = [1,2,3,4,5,7]

# run all bands in parallel
if "WORKERS" in os.environ:
    workers = int(os.environ["WORKERS"])
else:
    workers = 6

proc = {}
pout = {}

# spawn jobs in the background
for band in bands:
    grass.debug("band %d, <%s>  %% %d" % (band, image[band], band % workers))
    proc[band] = grass.pipe_command('r.univar', flags = 'g', map = image[band])
    if band % workers is 0:
	# wait for the ones launched so far to finish
	for bandp in bands[:band]:
	    if not proc[bandp].stdout.closed:
		pout[bandp] = proc[bandp].communicate()[0]
	    proc[bandp].wait()

# wait for jobs to finish, collect the output
for band in bands:
    if not proc[band].stdout.closed:
	pout[band] = proc[band].communicate()[0]
    proc[band].wait()

# parse the results
for band in bands:
    kv = grass.parse_key_val(pout[band])
    stddev[band] = float(kv['stddev'])