OpenMP: Difference between revisions
(moved into own page) |
m (remove hardcoded version specific urls, use latest version) |
||
(72 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
== Multithreaded jobs in GRASS == | == Multithreaded jobs in GRASS == | ||
* see also the [[Parallel GRASS jobs]] and [[Parallelizing Scripts]] wiki pages. | |||
[http://openmp.org/wp/ OpenMP] is an implementation of multithreading, a method of parallelization whereby the master "thread" (a series of instructions executed consecutively) "forks" a specified number of slave "threads" and a task is divided among them (from [http://en.wikipedia.org/wiki/OpenMP wikipedia]). The job is distributed over the available processor cores (2-core, 4-core, ...). | [http://openmp.org/wp/ OpenMP] is an implementation of multithreading, a method of parallelization whereby the master "thread" (a series of instructions executed consecutively) "forks" a specified number of slave "threads" and a task is divided among them (from [http://en.wikipedia.org/wiki/OpenMP wikipedia]). The job is distributed over the available processor cores (2-core, 4-core, ...). | ||
The (yet) only parallelized library in GRASS >=6.3 is GRASS Partial Differential Equations Library (GPDE). The library design is thread safe and supports threaded parallelism with OpenMP. The code is not yet widely used in GRASS. See [http://download.osgeo.org/grass/grass6_progman/gpdelib.html | The (yet) only parallelized library in GRASS >=6.3 is GRASS Partial Differential Equations Library (GPDE). The library design is thread safe and supports threaded parallelism with OpenMP. The code is not yet widely used in GRASS. See the [http://download.osgeo.org/grass/grass6_progman/gpdelib.html gpde programmer's manual] for details. | ||
How to activate it with GCC >= 4.2 (compiler flag '-fopenmp' as well as library '-lgomp' are needed): | How to activate it with GCC >= 4.2 (compiler flag '-fopenmp' as well as library '-lgomp' are needed): | ||
Line 13: | Line 15: | ||
# uncomment the EXTRA_CFLAGS row and switch the two existing EXTRA_LIBS rows | # uncomment the EXTRA_CFLAGS row and switch the two existing EXTRA_LIBS rows | ||
Integrated system-wide support for OpenMP via a --with-openmp ./configure flag has been requested in {{trac|657}} and the configure switch is now present in trunk (grass7). | |||
=== OpenMP support in GRASS 7 === | |||
Several modules have fully functioning OpenMP support: | |||
* {{Cmd|v.surf.rst}}, {{Cmd|r.sim.water}}, {{Cmd|r.sim.sediment}}, it is available in GRASS GIS 7.4+. | |||
* {{AddonCmd|r.sun.mp|version=7}} has be merged and is available in GRASS GIS 7.4+. | |||
* The gpde and the gmath libraries are providing functions which are partly parallelized with OpenMP. All blas level 2 and 3 functions as well as many linear equation solver in the gmath library, are parallelized using OpenMP pragmas. Several numerical modules, which are using those functions, can now benefit from multi core systems (i.e.: {{cmd|r.proj}}, {{Cmd|r.gwflow}}, {{Cmd|r3.gwflow}}, {{Cmd|r.solute.transport}}). | |||
Other modules like {{Cmd|v.surf.bspline}} have initial support (via the LU solvers in the gmath library) but are still in the prototype stage and not very efficient yet. | |||
OpenMP flags are compiler dependent, thus OpenMP support should be set using C- and linker-flags before calling configure. | |||
I.e: for gcc > 4.2: | |||
CFLAGS="-O3 -Wall -Werror-implicit-function-declaration -fno-common -fopenmp" | |||
LDFLAGS="-lgomp" | |||
* ''Update'': in GRASS 7 you can now switch on OpenMP support with: | |||
./configure --with-openmp | |||
This should enable OpenMP support in the libraries and ALL depending modules. | |||
You can test the OpenMP support when compiling the gpde and gmath tests by hand (switch into the test directories in the lib dirs and type make). The test library modules "test.gmath.lib" and "test.gpde.lib" should be available in the path after starting grass. | |||
The gmath lib test module "test.gmath.lib" provides additionally benchmarks for blas level 2 and 3 functions and for many solver. | |||
gmath/test> test.gmath.lib --help | |||
Description: | |||
Performs benchmarks, unit and integration tests for the gmath library | |||
Usage: | |||
test.gmath.lib [-uia] [unit=string] [integration=string] [rows=value] | |||
[solverbench=string] [blasbench=string] [--verbose] [--quiet] | |||
Flags: | |||
-u Run all unit tests | |||
-i Run all integration tests | |||
-a Run all unit and integration tests | |||
--v Verbose module output | |||
--q Quiet module output | |||
Parameters: | |||
unit Choose the unit tests to run | |||
options: blas1,blas2,blas3,solver,ccmath,matconv | |||
integration Choose the integration tests to run | |||
options: | |||
rows The size of the matrices and vectors for benchmarking | |||
default: 1000 | |||
solverbench Choose solver benchmark | |||
options: krylov,direct | |||
blasbench Choose blas benchmark | |||
options: blas2,blas3 | |||
I.e testing the speedup of the blas level 2 and 3 functions of the latest svn trunk of grass7, compiled with the flags mentioned above on a 8 core intel xeon system: | |||
gmath/test> setenv OMP_NUM_THREADS 1 | |||
gmath/test> test.gmath.lib blasbench=blas2 rows=5000 | |||
++ Running blas level 2 benchmark ++ | |||
Computation time G_math_Ax_sparse: 0.244123 | |||
Computation time G_math_Ax_sband: 0.280636 | |||
Computation time G_math_d_Ax: 0.134494 | |||
Computation time G_math_d_Ax_by: 0.18556 | |||
Computation time G_math_d_x_dyad: 0.268684 | |||
-- gmath lib tests finished successfully -- | |||
gmath/test> setenv OMP_NUM_THREADS 4 | |||
gmath/test> test.gmath.lib blasbench=blas2 rows=5000 | |||
++ Running blas level 2 benchmark ++ | |||
Computation time G_math_Ax_sparse: 0.072549 | |||
Computation time G_math_Ax_sband: 0.192712 | |||
Computation time G_math_d_Ax: 0.036652 | |||
Computation time G_math_d_Ax_by: 0.047904 | |||
Computation time G_math_d_x_dyad: 0.080534 | |||
-- gmath lib tests finished successfully -- | |||
gmath/test> setenv OMP_NUM_THREADS 1 | |||
gmath/test> test.gmath.lib blasbench=blas3 rows=1000 | |||
++ Running blas level 3 benchmark ++ | |||
Computation time G_math_d_aA_B: 0.013263 | |||
Computation time G_math_d_AB: 18.729 | |||
-- gmath lib tests finished successfully -- | |||
gmath/test> setenv OMP_NUM_THREADS 4 | |||
gmath/test> test.gmath.lib blasbench=blas3 rows=1000 | |||
++ Running blas level 3 benchmark ++ | |||
Computation time G_math_d_aA_B: 0.006946 | |||
Computation time G_math_d_AB: 4.80446 | |||
-- gmath lib tests finished successfully -- | |||
== General code structure == | == General code structure == | ||
Example cited from "openMP tutorial" (see below): | Example cited from "openMP tutorial" (see below): | ||
<source lang="C"> | |||
#include <omp.h> | #include <omp.h> | ||
Line 42: | Line 140: | ||
} | } | ||
</source> | |||
And in the Makefile, add something like this: | |||
#openMP support | |||
EXTRA_CFLAGS=-fopenmp | |||
EXTRA_LIBS=$(GISLIB) -lgomp $(MATHLIB) | |||
* Examples: | |||
: https://computing.llnl.gov/tutorials/openMP/exercise.html | |||
== Run time == | |||
The default is to create as many threads as the system has processors. If you don't want that, you can control the number with the OMP_NUM_THREADS environment variable. For example to request 3 threads from a Bourne shell: | |||
OMP_NUM_THREADS=3 | |||
export OMP_NUM_THREADS | |||
g.module ... | |||
== Candidates == | |||
It is important to understand which modules are '''processor bound''', and concentrate on them. i.e. do not needlessly complicate the code of non-long running processor bound or I/O-bound modules. Almost all of the GIS libraries are not thread-safe. Regardless, these are typically I/O bound not processor bound, so not critical to parallelize. It is expected that most of the CPU-bound loops which will benefit from parallelization will be found in the modules. | |||
A good place to start is by running a [[Bugs#Using_a_profiling_tool|profiling tool]] to find the worst offending functions and deal with them first. Blindly parallelizing every loop you can find has the potential to slow things down due to the overheads needed to create and destroy threads. | |||
* v.lidar: parallelize tcholDec() in {{src|lib/lidar/TcholBand.c}} (?moved) | |||
: This would speed up the CPU-bound {{cmd|v.surf.bspline}} and {{cmd|v.lidar.edgedetection}} considerably. [Partially done in the gmath lib by Soeren?] | |||
* (suggestion) {{cmd|r.watershed}} | |||
: Please contact and coordinate with Markus Metz before starting work on this. | |||
* (suggestion) {{Cmd|r.viewshed}} (C++) | |||
: Please contact and coordinate with Laura Toma before starting work on this. | |||
: Should fix bug described in {{trac|390}} first and once that is done move module into the main repo. | |||
* (done) {{cmd|r.sun}} [Note, OpenCL GPU support already added by Seth Price as part of his Summer of Code project; OpenCL supports multi CPU as well as GPU; ./configure GRASS 7 --with-opencl] | |||
: Please contact and coordinate with Markus Neteler / Jaro Hofierka before starting work on this. | |||
* (suggestion) {{cmd|r.univar}} when run with multiple input maps. Send each input map to its own thread. | |||
* (suggestion) {{cmd|r.neighbors}} particularly for large neighborhood sizes. | |||
* (suggestion) {{cmd|r.mfilter}} particularly for large neighborhood sizes. | |||
* (suggestion) {{cmd|r.texture}} particularly for multiple methods and/or large regions. | |||
== Complete == | |||
* Add support to the build system in GRASS 7. (enable with `<tt>./configure --with-openmp</tt>`) | |||
:* GRASS 7 also has a <tt>./configure</tt> switch for `<tt>--with-pthread</tt>` | |||
* The GPDE library ({{src|lib/gpde/}}) has OpenMP support | |||
* The gmath library ({{src|lib/gmath/}}) has OpenMP support for grass blas level 1, 2 and 3 algorithms as well as several iterative and direct linear equation solver. This is a work in progress, and may not be very efficient currently. When finished, a parallelized LU solver will help with tasks like {{cmd|v.surf.rst}} and {{cmd|v.surf.bspline}}. | |||
* Experimental {{wikipedia|Pthread}} support for [http://thread.gmane.org/gmane.comp.gis.grass.devel/30313 r.mapcalc is now in GRASS 7svn] | |||
* Yann has added OpenMP support to {{cmd|i.atcorr}}. (not in SVN, since not working properly) | |||
* MarkusM has added OpenMP support to {{cmd|r.proj}} in GRASS 7 (and removed, since not working properly) | |||
== Alternatives == | |||
* {{wikipedia|pthreads}} | |||
* [[GPU]] using {{wikipedia|OpenCL}} and {{wikipedia|CUDA}} | |||
* For some tasks it may be possible to easily parallelize at the module level. See [[Parallelizing Scripts]] | |||
; WARNING: not all GRASS modules and scripts are safe to have other things happening in the same mapset while they are running. Try at your own risk after performing a suitable safety audit. e.g. Make sure {{cmd|g.region}} is not run, externally (temporarily) changing the region settings. | |||
== Benchmarks == | |||
Neighborhood analysis performance using OpenMP and different compilers: | |||
* [[OpenMP/Benchmarks]] | |||
: http://trac.osgeo.org/grass/browser/sandbox/soeren/benchmarks/neighborhood_openmp/ | |||
== See also == | |||
* [https://docs.loni.org/wiki/Introduction_to_OpenMP Introduction to OpenMP] | * [https://docs.loni.org/wiki/Introduction_to_OpenMP Introduction to OpenMP] | ||
* [https://computing.llnl.gov/tutorials/openMP | * [https://archive.fosdem.org/2014/schedule/event/hpc_devroom_openmp_tutorial Using OpenMP to Simply Parallelize CPU-Intensive C Code] - Klaas van Gend, 2014, FOSDEM Brussels | ||
* [https://computing.llnl.gov/tutorials/openMP OpenMP tutorial] | |||
* [http://developer.amd.com/Membership/Print.aspx?ArticleID=157&web=http%3a%2f%2fdeveloper.amd.com%2fdocumentation%2farticles Easy OpenMP tutorial showing multithreading a program by sections], as opposed to the usual loop examples. | |||
* [http://developer.amd.com/documentation/articles/pages/1121200682.aspx Another, introducing loops] | |||
* [http://software.intel.com/en-us/articles/threading-models-for-high-performance-computing-pthreads-or-openmp/ Threading Models for High-Performance Computing: Pthreads or OpenMP?] | |||
* GRASS mailing list discussions: | |||
** http://thread.gmane.org/gmane.comp.gis.grass.devel/16410/ | |||
** http://lists.osgeo.org/pipermail/grass-dev/2009-April/043375.html | |||
* [http://www.khronos.org/opencl/ OpenCL] ({{wikipedia|OpenCL}} Wikipedia entry) | |||
* idea: You ''might'' be able to run the mpd daemon and then launch jobs via <tt>[http://www.mcs.anl.gov/mpi/index.htm mpirun -np 4 <command>]</tt> in order to make your quad-core into a mini self-contained Beowulf cluster. | |||
[[Category: Development]] | |||
[[Category: Parallelization]] |
Latest revision as of 10:40, 4 December 2018
Multithreaded jobs in GRASS
- see also the Parallel GRASS jobs and Parallelizing Scripts wiki pages.
OpenMP is an implementation of multithreading, a method of parallelization whereby the master "thread" (a series of instructions executed consecutively) "forks" a specified number of slave "threads" and a task is divided among them (from wikipedia). The job is distributed over the available processor cores (2-core, 4-core, ...).
The (yet) only parallelized library in GRASS >=6.3 is GRASS Partial Differential Equations Library (GPDE). The library design is thread safe and supports threaded parallelism with OpenMP. The code is not yet widely used in GRASS. See the gpde programmer's manual for details.
How to activate it with GCC >= 4.2 (compiler flag '-fopenmp' as well as library '-lgomp' are needed):
# GPDE with openMP support: cd lib/gpde/ vim Makefile # uncomment the EXTRA_CFLAGS row and switch the two existing EXTRA_LIBS rows
Integrated system-wide support for OpenMP via a --with-openmp ./configure flag has been requested in trac #657 and the configure switch is now present in trunk (grass7).
OpenMP support in GRASS 7
Several modules have fully functioning OpenMP support:
- v.surf.rst, r.sim.water, r.sim.sediment, it is available in GRASS GIS 7.4+.
- r.sun.mp has be merged and is available in GRASS GIS 7.4+.
- The gpde and the gmath libraries are providing functions which are partly parallelized with OpenMP. All blas level 2 and 3 functions as well as many linear equation solver in the gmath library, are parallelized using OpenMP pragmas. Several numerical modules, which are using those functions, can now benefit from multi core systems (i.e.: r.proj, r.gwflow, r3.gwflow, r.solute.transport).
Other modules like v.surf.bspline have initial support (via the LU solvers in the gmath library) but are still in the prototype stage and not very efficient yet.
OpenMP flags are compiler dependent, thus OpenMP support should be set using C- and linker-flags before calling configure. I.e: for gcc > 4.2:
CFLAGS="-O3 -Wall -Werror-implicit-function-declaration -fno-common -fopenmp" LDFLAGS="-lgomp"
- Update: in GRASS 7 you can now switch on OpenMP support with:
./configure --with-openmp
This should enable OpenMP support in the libraries and ALL depending modules.
You can test the OpenMP support when compiling the gpde and gmath tests by hand (switch into the test directories in the lib dirs and type make). The test library modules "test.gmath.lib" and "test.gpde.lib" should be available in the path after starting grass.
The gmath lib test module "test.gmath.lib" provides additionally benchmarks for blas level 2 and 3 functions and for many solver.
gmath/test> test.gmath.lib --help Description: Performs benchmarks, unit and integration tests for the gmath library Usage: test.gmath.lib [-uia] [unit=string] [integration=string] [rows=value] [solverbench=string] [blasbench=string] [--verbose] [--quiet] Flags: -u Run all unit tests -i Run all integration tests -a Run all unit and integration tests --v Verbose module output --q Quiet module output Parameters: unit Choose the unit tests to run options: blas1,blas2,blas3,solver,ccmath,matconv integration Choose the integration tests to run options: rows The size of the matrices and vectors for benchmarking default: 1000 solverbench Choose solver benchmark options: krylov,direct blasbench Choose blas benchmark options: blas2,blas3
I.e testing the speedup of the blas level 2 and 3 functions of the latest svn trunk of grass7, compiled with the flags mentioned above on a 8 core intel xeon system:
gmath/test> setenv OMP_NUM_THREADS 1 gmath/test> test.gmath.lib blasbench=blas2 rows=5000 ++ Running blas level 2 benchmark ++ Computation time G_math_Ax_sparse: 0.244123 Computation time G_math_Ax_sband: 0.280636 Computation time G_math_d_Ax: 0.134494 Computation time G_math_d_Ax_by: 0.18556 Computation time G_math_d_x_dyad: 0.268684 -- gmath lib tests finished successfully -- gmath/test> setenv OMP_NUM_THREADS 4 gmath/test> test.gmath.lib blasbench=blas2 rows=5000 ++ Running blas level 2 benchmark ++ Computation time G_math_Ax_sparse: 0.072549 Computation time G_math_Ax_sband: 0.192712 Computation time G_math_d_Ax: 0.036652 Computation time G_math_d_Ax_by: 0.047904 Computation time G_math_d_x_dyad: 0.080534 -- gmath lib tests finished successfully -- gmath/test> setenv OMP_NUM_THREADS 1 gmath/test> test.gmath.lib blasbench=blas3 rows=1000 ++ Running blas level 3 benchmark ++ Computation time G_math_d_aA_B: 0.013263 Computation time G_math_d_AB: 18.729 -- gmath lib tests finished successfully -- gmath/test> setenv OMP_NUM_THREADS 4 gmath/test> test.gmath.lib blasbench=blas3 rows=1000 ++ Running blas level 3 benchmark ++ Computation time G_math_d_aA_B: 0.006946 Computation time G_math_d_AB: 4.80446 -- gmath lib tests finished successfully --
General code structure
Example cited from "openMP tutorial" (see below):
#include <omp.h>
int main () {
int var1, var2, var3;
Some serial code
...
/* Beginning of parallel section. Fork a team of threads. */
/* Specify variable scoping */
#pragma omp parallel private(var1, var2) shared(var3)
{
/* Parallel section executed by all threads */
...
/* All threads join master thread and disband */
} /* end pragma */
/* Resume serial code */
...
}
And in the Makefile, add something like this:
#openMP support EXTRA_CFLAGS=-fopenmp EXTRA_LIBS=$(GISLIB) -lgomp $(MATHLIB)
- Examples:
Run time
The default is to create as many threads as the system has processors. If you don't want that, you can control the number with the OMP_NUM_THREADS environment variable. For example to request 3 threads from a Bourne shell:
OMP_NUM_THREADS=3 export OMP_NUM_THREADS g.module ...
Candidates
It is important to understand which modules are processor bound, and concentrate on them. i.e. do not needlessly complicate the code of non-long running processor bound or I/O-bound modules. Almost all of the GIS libraries are not thread-safe. Regardless, these are typically I/O bound not processor bound, so not critical to parallelize. It is expected that most of the CPU-bound loops which will benefit from parallelization will be found in the modules.
A good place to start is by running a profiling tool to find the worst offending functions and deal with them first. Blindly parallelizing every loop you can find has the potential to slow things down due to the overheads needed to create and destroy threads.
- v.lidar: parallelize tcholDec() in lib/lidar/TcholBand.c (?moved)
- This would speed up the CPU-bound v.surf.bspline and v.lidar.edgedetection considerably. [Partially done in the gmath lib by Soeren?]
- (suggestion) r.watershed
- Please contact and coordinate with Markus Metz before starting work on this.
- (suggestion) r.viewshed (C++)
- Please contact and coordinate with Laura Toma before starting work on this.
- Should fix bug described in trac #390 first and once that is done move module into the main repo.
- (done) r.sun [Note, OpenCL GPU support already added by Seth Price as part of his Summer of Code project; OpenCL supports multi CPU as well as GPU; ./configure GRASS 7 --with-opencl]
- Please contact and coordinate with Markus Neteler / Jaro Hofierka before starting work on this.
- (suggestion) r.univar when run with multiple input maps. Send each input map to its own thread.
- (suggestion) r.neighbors particularly for large neighborhood sizes.
- (suggestion) r.mfilter particularly for large neighborhood sizes.
- (suggestion) r.texture particularly for multiple methods and/or large regions.
Complete
- Add support to the build system in GRASS 7. (enable with `./configure --with-openmp`)
- GRASS 7 also has a ./configure switch for `--with-pthread`
- The GPDE library (lib/gpde/) has OpenMP support
- The gmath library (lib/gmath/) has OpenMP support for grass blas level 1, 2 and 3 algorithms as well as several iterative and direct linear equation solver. This is a work in progress, and may not be very efficient currently. When finished, a parallelized LU solver will help with tasks like v.surf.rst and v.surf.bspline.
- Experimental Pthread support for r.mapcalc is now in GRASS 7svn
- Yann has added OpenMP support to i.atcorr. (not in SVN, since not working properly)
- MarkusM has added OpenMP support to r.proj in GRASS 7 (and removed, since not working properly)
Alternatives
- For some tasks it may be possible to easily parallelize at the module level. See Parallelizing Scripts
- WARNING
- not all GRASS modules and scripts are safe to have other things happening in the same mapset while they are running. Try at your own risk after performing a suitable safety audit. e.g. Make sure g.region is not run, externally (temporarily) changing the region settings.
Benchmarks
Neighborhood analysis performance using OpenMP and different compilers:
See also
- Introduction to OpenMP
- Using OpenMP to Simply Parallelize CPU-Intensive C Code - Klaas van Gend, 2014, FOSDEM Brussels
- OpenMP tutorial
- Easy OpenMP tutorial showing multithreading a program by sections, as opposed to the usual loop examples.
- Another, introducing loops
- GRASS mailing list discussions:
- idea: You might be able to run the mpd daemon and then launch jobs via mpirun -np 4 <command> in order to make your quad-core into a mini self-contained Beowulf cluster.