OpenMP/Benchmarks
< OpenMP
The idea of this wiki page is to explore which compiler and compiler flag combinations are most useful for speeding up computations without degrading data quality.
For an compiler flags overview, see HPC Compendium: Compilers
Neighborhood analysis
- Performance test using OpenMP with different compilers and compiler options.
Code and an automated run-script can be found at:
It's best to run it 4 times for each case, discard the first and average the next 3.
Example usage:
unset OMP_NUM_THREADS
time ./neighbor 5000 5000 23
export OMP_NUM_THREADS=1
time ./neighbor 5000 5000 23
...
export OMP_NUM_THREADS=6
time ./neighbor 5000 5000 23
Results table
Test setup: 5000x5000 array with a window size of 23x23 cells
CPU | Available cores | OMP NUM THREADS | Time "real" | Time "user" | Time "sys" | Compiler | Compiler version | Compiler flags | OS | System RAM | Data sum | Data mean |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AMD Phenom II X6 1090T | 6 | 1 | 129.17s | 124.94s | 0.75s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 2 | 64.64s | 127.89s | 0.31s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 4 | 37.26s | 145.96s | 0.52s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 6 | 25.17s | 147.70s | 0.49s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 1 | 34.86s | 34.67s | 0.19s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 2 | 18.38s | 35.71s | 0.13s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 4 | 10.04s | 38.40s | 0.32s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 6 | 7.03s | 37.69s | 0.29s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 64.08s | 63.90s | 0.07s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 32.64s | 64.69s | 0.21s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 17.54s | 68.18s | 0.20s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 16.07s | 85.88s | 0.19s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 13.80s | 106.93s | 0.24s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 19.56s | 19.38s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 10.49s | 20.53s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 5.67s | 21.57s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 4.75s | 25.64s | 0.08s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 3.80s | 28.97s | 0.10s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 7.96s | 7.94s | 0.07s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 4.69s | 8.89s | 0.08s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 2.83s | 10.01s | 0.08s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 2.69s | 13.26s | 0.10s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 2.49s | 17.55s | 0.12s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |