OpenMP/Benchmarks: Difference between revisions
< OpenMP
(Created page with "== Neighborhood analysis == Performance using OpenMP and different compilers: source|svn/sandbox/soeren/benchmarks/neighborhood_openmp/ results table: {| border="1" class=...") |
(+compiler flags overview link) |
||
(19 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
The idea of this wiki page is to explore which compiler and compiler flag combinations are most useful for speeding up computations without degrading data quality. | |||
For an compiler flags overview, see [https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/Compilers HPC Compendium: Compilers] | |||
== Neighborhood analysis == | == Neighborhood analysis == | ||
Performance using OpenMP | * Performance test using OpenMP with different compilers and compiler options. | ||
Code and an automated run-script can be found at: | |||
: https://trac.osgeo.org/grass/browser/sandbox/soeren/benchmarks/neighborhood_openmp | |||
It's best to run it 4 times for each case, discard the first and average the next 3. | |||
Example usage: | |||
<source lang="bash"> | |||
unset OMP_NUM_THREADS | |||
time ./neighbor 5000 5000 23 | |||
export OMP_NUM_THREADS=1 | |||
time ./neighbor 5000 5000 23 | |||
... | |||
export OMP_NUM_THREADS=6 | |||
time ./neighbor 5000 5000 23 | |||
</source> | |||
=== Results table === | |||
Test setup: 5000x5000 array with a window size of 23x23 cells | |||
{| border="1" class="wikitable sortable" style="margin: 1em 1em 1em 0; background: #f9f9f9; border: 1px #aaaaaa solid; border-collapse: collapse;" | {| border="1" class="wikitable sortable" style="margin: 1em 1em 1em 0; background: #f9f9f9; border: 1px #aaaaaa solid; border-collapse: collapse;" | ||
!CPU | !CPU | ||
! | !Available cores | ||
!Time | ! OMP NUM THREADS | ||
!Time "real" | |||
!Time "user" | |||
!Time "sys" | |||
!Compiler | !Compiler | ||
!Compiler version | !Compiler version | ||
!Compiler flags | |||
!OS | |||
!System RAM | !System RAM | ||
!Data sum | |||
!Data mean | |||
|- | |||
|- | |- | ||
|AMD Phenom II X6 1090T | |||
|6 | |||
|1 | |||
|129.17s | |||
|124.94s | |||
|0.75s | |||
|gcc | |||
| 4.4.5 | |||
| -O0 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
| | | | ||
| | | | ||
| | |- | ||
|AMD Phenom II X6 1090T | |||
|6 | |||
|2 | |||
|64.64s | |||
|127.89s | |||
|0.31s | |||
|gcc | |||
| 4.4.5 | |||
| -O0 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
| | | | ||
| | | | ||
|- | |- | ||
|AMD Phenom II X6 1090T | |||
|6 | |||
|4 | |||
|37.26s | |||
|145.96s | |||
|0.52s | |||
|gcc | |||
| 4.4.5 | |||
| -O0 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
| | | | ||
| | | | ||
|- | |||
|AMD Phenom II X6 1090T | |||
|6 | |||
|6 | |||
|25.17s | |||
|147.70s | |||
|0.49s | |||
|gcc | |||
| 4.4.5 | |||
| -O0 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
| | | | ||
| | | | ||
|- | |- | ||
| | |- | ||
| | |AMD Phenom II X6 1090T | ||
| | |6 | ||
| | |1 | ||
| | |34.86s | ||
| | |34.67s | ||
|0.19s | |||
|gcc | |||
| 4.4.5 | |||
| -O3 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
|- | |||
|AMD Phenom II X6 1090T | |||
|6 | |||
|2 | |||
|18.38s | |||
|35.71s | |||
|0.13s | |||
|gcc | |||
| 4.4.5 | |||
| -O3 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
|- | |||
|AMD Phenom II X6 1090T | |||
|6 | |||
|4 | |||
|10.04s | |||
|38.40s | |||
|0.32s | |||
|gcc | |||
| 4.4.5 | |||
| -O3 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
|- | |||
|AMD Phenom II X6 1090T | |||
|6 | |||
|6 | |||
|7.03s | |||
|37.69s | |||
|0.29s | |||
|gcc | |||
| 4.4.5 | |||
| -O3 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|1 | |||
|64.08s | |||
|63.90s | |||
|0.07s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|2 | |||
|32.64s | |||
|64.69s | |||
|0.21s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|4 | |||
|17.54s | |||
|68.18s | |||
|0.20s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|6 | |||
|16.07s | |||
|85.88s | |||
|0.19s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|8 | |||
|13.80s | |||
|106.93s | |||
|0.24s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|1 | |||
|19.56s | |||
|19.38s | |||
|0.07s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|2 | |||
|10.49s | |||
|20.53s | |||
|0.07s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|4 | |||
|5.67s | |||
|21.57s | |||
|0.07s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|6 | |||
|4.75s | |||
|25.64s | |||
|0.08s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|8 | |||
|3.80s | |||
|28.97s | |||
|0.10s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|1 | |||
|7.96s | |||
|7.94s | |||
|0.07s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
| ? | |||
| ? | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|2 | |||
|4.69s | |||
|8.89s | |||
|0.08s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
| ? | |||
| ? | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|4 | |||
|2.83s | |||
|10.01s | |||
|0.08s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|6 | |||
|2.69s | |||
|13.26s | |||
|0.10s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|8 | |||
|2.49s | |||
|17.55s | |||
|0.12s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
| ? | |||
| ? | |||
|} | |} | ||
[[Category: Massive data analysis]] | |||
[[Category: Parallelization]] |
Latest revision as of 16:43, 15 September 2013
The idea of this wiki page is to explore which compiler and compiler flag combinations are most useful for speeding up computations without degrading data quality.
For an compiler flags overview, see HPC Compendium: Compilers
Neighborhood analysis
- Performance test using OpenMP with different compilers and compiler options.
Code and an automated run-script can be found at:
It's best to run it 4 times for each case, discard the first and average the next 3.
Example usage:
unset OMP_NUM_THREADS
time ./neighbor 5000 5000 23
export OMP_NUM_THREADS=1
time ./neighbor 5000 5000 23
...
export OMP_NUM_THREADS=6
time ./neighbor 5000 5000 23
Results table
Test setup: 5000x5000 array with a window size of 23x23 cells
CPU | Available cores | OMP NUM THREADS | Time "real" | Time "user" | Time "sys" | Compiler | Compiler version | Compiler flags | OS | System RAM | Data sum | Data mean |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AMD Phenom II X6 1090T | 6 | 1 | 129.17s | 124.94s | 0.75s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 2 | 64.64s | 127.89s | 0.31s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 4 | 37.26s | 145.96s | 0.52s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 6 | 25.17s | 147.70s | 0.49s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 1 | 34.86s | 34.67s | 0.19s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 2 | 18.38s | 35.71s | 0.13s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 4 | 10.04s | 38.40s | 0.32s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 6 | 7.03s | 37.69s | 0.29s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 64.08s | 63.90s | 0.07s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 32.64s | 64.69s | 0.21s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 17.54s | 68.18s | 0.20s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 16.07s | 85.88s | 0.19s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 13.80s | 106.93s | 0.24s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 19.56s | 19.38s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 10.49s | 20.53s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 5.67s | 21.57s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 4.75s | 25.64s | 0.08s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 3.80s | 28.97s | 0.10s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 7.96s | 7.94s | 0.07s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 4.69s | 8.89s | 0.08s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 2.83s | 10.01s | 0.08s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 2.69s | 13.26s | 0.10s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 2.49s | 17.55s | 0.12s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |