OpenMP/Benchmarks
< OpenMP
The idea of this wiki page is to explore which compiler and compiler flag combinations are most useful for speeding up computations without degrading data quality.
Neighborhood analysis
- Performance test using OpenMP with different compilers and compiler options.
Code and an automated run-script can be found at:
It's best to run it 4 times for each case, discard the first and average the next 3.
Example usage:
unset OMP_NUM_THREADS
time ./neighbor 5000 5000 23
export OMP_NUM_THREADS=1
time ./neighbor 5000 5000 23
...
export OMP_NUM_THREADS=6
time ./neighbor 5000 5000 23
Results table
Test setup: 5000x5000 array with a window size of 23x23 cells
CPU | Available cores | OMP NUM THREADS | Time "real" | Time "user" | Time "sys" | Compiler | Compiler version | Compiler flags | OS | System RAM | Data sum | Data mean |
---|---|---|---|---|---|---|---|---|---|---|---|---|
AMD Phenom II X6 1090T | 6 | 1 | 129.17s | 124.94s | 0.75s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 2 | 64.64s | 127.89s | 0.31s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 4 | 37.26s | 145.96s | 0.52s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 6 | 25.17s | 147.70s | 0.49s | gcc | 4.4.5 | -O0 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 1 | 34.86s | 34.67s | 0.19s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 2 | 18.38s | 35.71s | 0.13s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 4 | 10.04s | 38.40s | 0.32s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
AMD Phenom II X6 1090T | 6 | 6 | 7.03s | 37.69s | 0.29s | gcc | 4.4.5 | -O3 | Debian GNU/Linux 6.0.7 (squeeze) | 8.0 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 64.08s | 63.90s | 0.07s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 32.64s | 64.69s | 0.21s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 17.54s | 68.18s | 0.20s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 16.07s | 85.88s | 0.19s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 13.80s | 106.93s | 0.24s | gcc | 4.6.3 | -O0 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 19.56s | 19.38s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 10.49s | 20.53s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 5.67s | 21.57s | 0.07s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 4.75s | 25.64s | 0.08s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 3.80s | 28.97s | 0.10s | gcc | 4.6.3 | -O3 | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 1 | 7.96s | 7.94s | 0.07s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |
Intel Core i7-3770 @ 3.40GHz | 8 | 2 | 4.69s | 8.89s | 0.08s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |
Intel Core i7-3770 @ 3.40GHz | 8 | 4 | 2.83s | 10.01s | 0.08s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 6 | 2.69s | 13.26s | 0.10s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ||
Intel Core i7-3770 @ 3.40GHz | 8 | 8 | 2.49s | 17.55s | 0.12s | icc | 12.1.3 20120212 | -Ofast | Ubuntu 12.04.2 LTS | 15.5 gb | ? | ? |