OpenMP/Benchmarks: Difference between revisions
< OpenMP
m (why) |
(+compiler flags overview link) |
||
(7 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
The idea of this wiki page is to explore which compiler and compiler flag combinations are most useful for speeding up computations without degrading data quality. | The idea of this wiki page is to explore which compiler and compiler flag combinations are most useful for speeding up computations without degrading data quality. | ||
For an compiler flags overview, see [https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/Compilers HPC Compendium: Compilers] | |||
== Neighborhood analysis == | == Neighborhood analysis == | ||
Performance using OpenMP | * Performance test using OpenMP with different compilers and compiler options. | ||
Code and an automated run-script can be found at: | |||
: https://trac.osgeo.org/grass/browser/sandbox/soeren/benchmarks/neighborhood_openmp | |||
It's best to run it 4 times for each case, discard the first and average the next 3. | |||
Example usage: | Example usage: | ||
Line 21: | Line 26: | ||
time ./neighbor 5000 5000 23 | time ./neighbor 5000 5000 23 | ||
</source> | </source> | ||
=== Results table === | |||
Test setup: 5000x5000 array with a window size of 23x23 cells | Test setup: 5000x5000 array with a window size of 23x23 cells | ||
{| border="1" class="wikitable sortable" style="margin: 1em 1em 1em 0; background: #f9f9f9; border: 1px #aaaaaa solid; border-collapse: collapse;" | {| border="1" class="wikitable sortable" style="margin: 1em 1em 1em 0; background: #f9f9f9; border: 1px #aaaaaa solid; border-collapse: collapse;" | ||
Line 98: | Line 103: | ||
| | | | ||
|- | |- | ||
|- | |||
|AMD Phenom II X6 1090T | |||
|6 | |||
|1 | |||
|34.86s | |||
|34.67s | |||
|0.19s | |||
|gcc | |||
| 4.4.5 | |||
| -O3 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
|- | |||
|AMD Phenom II X6 1090T | |||
|6 | |||
|2 | |||
|18.38s | |||
|35.71s | |||
|0.13s | |||
|gcc | |||
| 4.4.5 | |||
| -O3 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
|- | |||
|AMD Phenom II X6 1090T | |||
|6 | |||
|4 | |||
|10.04s | |||
|38.40s | |||
|0.32s | |||
|gcc | |||
| 4.4.5 | |||
| -O3 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
|- | |||
|AMD Phenom II X6 1090T | |||
|6 | |||
|6 | |||
|7.03s | |||
|37.69s | |||
|0.29s | |||
|gcc | |||
| 4.4.5 | |||
| -O3 | |||
|Debian GNU/Linux 6.0.7 (squeeze) | |||
|8.0 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|1 | |||
|64.08s | |||
|63.90s | |||
|0.07s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|2 | |||
|32.64s | |||
|64.69s | |||
|0.21s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|4 | |||
|17.54s | |||
|68.18s | |||
|0.20s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|6 | |||
|16.07s | |||
|85.88s | |||
|0.19s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|8 | |||
|13.80s | |||
|106.93s | |||
|0.24s | |||
|gcc | |||
| 4.6.3 | |||
| -O0 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|1 | |||
|19.56s | |||
|19.38s | |||
|0.07s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|2 | |||
|10.49s | |||
|20.53s | |||
|0.07s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|4 | |||
|5.67s | |||
|21.57s | |||
|0.07s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|6 | |||
|4.75s | |||
|25.64s | |||
|0.08s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|8 | |||
|3.80s | |||
|28.97s | |||
|0.10s | |||
|gcc | |||
| 4.6.3 | |||
| -O3 | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|1 | |||
|7.96s | |||
|7.94s | |||
|0.07s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
| ? | |||
| ? | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|2 | |||
|4.69s | |||
|8.89s | |||
|0.08s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
| ? | |||
| ? | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|4 | |||
|2.83s | |||
|10.01s | |||
|0.08s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|6 | |||
|2.69s | |||
|13.26s | |||
|0.10s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
|- | |||
|Intel Core i7-3770 @ 3.40GHz | |||
|8 | |||
|8 | |||
|2.49s | |||
|17.55s | |||
|0.12s | |||
|icc | |||
| 12.1.3 20120212 | |||
| -Ofast | |||
|Ubuntu 12.04.2 LTS | |||
|15.5 gb | |||
| ? | |||
| ? | |||
|} | |} | ||
[[Category: Massive data analysis]] | |||
[[Category: Parallelization]] |
Latest revision as of 16:43, 15 September 2013
The idea of this wiki page is to explore which compiler and compiler flag combinations are most useful for speeding up computations without degrading data quality.
For an compiler flags overview, see HPC Compendium: Compilers
Neighborhood analysis
- Performance test using OpenMP with different compilers and compiler options.
Code and an automated run-script can be found at:
It's best to run it 4 times for each case, discard the first and average the next 3.
Example usage:
unset OMP_NUM_THREADS
time ./neighbor 5000 5000 23
export OMP_NUM_THREADS=1
time ./neighbor 5000 5000 23
...
export OMP_NUM_THREADS=6
time ./neighbor 5000 5000 23
Results table
Test setup: 5000x5000 array with a window size of 23x23 cells
CPU | Available cores | OMP NUM THREADS | Time "real" | Time "user" | Time "sys" | Compiler | Compiler version | Compiler flags | OS | System RAM | Data sum | Data mean |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AMD Phenom II X6 1090T | 6 | 1 | 129.17s | 124.94s | 0.75s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 2 | 64.64s | 127.89s | 0.31s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 4 | 37.26s | 145.96s | 0.52s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 6 | 25.17s | 147.70s | 0.49s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 1 | 34.86s | 34.67s | 0.19s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 2 | 18.38s | 35.71s | 0.13s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 4 | 10.04s | 38.40s | 0.32s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 6 | 7.03s | 37.69s | 0.29s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 64.08s | 63.90s | 0.07s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 32.64s | 64.69s | 0.21s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 17.54s | 68.18s | 0.20s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 16.07s | 85.88s | 0.19s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 13.80s | 106.93s | 0.24s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 19.56s | 19.38s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 10.49s | 20.53s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 5.67s | 21.57s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 4.75s | 25.64s | 0.08s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 3.80s | 28.97s | 0.10s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 7.96s | 7.94s | 0.07s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 4.69s | 8.89s | 0.08s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 2.83s | 10.01s | 0.08s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 2.69s | 13.26s | 0.10s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 2.49s | 17.55s | 0.12s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |