Difference between revisions of "OpenMP/Benchmarks"

From GRASS-Wiki
Jump to: navigation, search
(Neighborhood analysis)
(+compiler flags overview link)
 
(10 intermediate revisions by one other user not shown)
Line 1: Line 1:
 +
The idea of this wiki page is to explore which compiler and compiler flag combinations are most useful for speeding up computations without degrading data quality.
 +
 +
For an compiler flags overview, see [https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/Compilers HPC Compendium: Compilers]
 +
 
== Neighborhood analysis ==
 
== Neighborhood analysis ==
  
Performance using OpenMP and different compilers:
+
* Performance test using OpenMP with different compilers and compiler options.
source|svn/sandbox/soeren/benchmarks/neighborhood_openmp/
 
  
Best to run it 4 times for each case, discard the first and average the next 3.
+
Code and an automated run-script can be found at:
 +
: https://trac.osgeo.org/grass/browser/sandbox/soeren/benchmarks/neighborhood_openmp
 +
 
 +
 
 +
It's best to run it 4 times for each case, discard the first and average the next 3.
  
 
Example usage:
 
Example usage:
Line 19: Line 26:
 
  time ./neighbor 5000 5000 23
 
  time ./neighbor 5000 5000 23
 
</source>
 
</source>
 +
 +
=== Results table ===
  
 
Test setup: 5000x5000 array with a window size of 23x23 cells
 
Test setup: 5000x5000 array with a window size of 23x23 cells
  
results table:
 
 
{| border="1" class="wikitable sortable" style="margin: 1em 1em 1em 0; background: #f9f9f9; border: 1px #aaaaaa solid; border-collapse: collapse;"
 
{| border="1" class="wikitable sortable" style="margin: 1em 1em 1em 0; background: #f9f9f9; border: 1px #aaaaaa solid; border-collapse: collapse;"
 
!CPU
 
!CPU
Line 42: Line 50:
 
|6
 
|6
 
|1
 
|1
|134.70s
+
|129.17s
|137.78s
+
|124.94s
|0.89s
+
|0.75s
 
|gcc
 
|gcc
 
| 4.4.5
 
| 4.4.5
Line 56: Line 64:
 
|6
 
|6
 
|2
 
|2
|65.73s
+
|64.64s
|129.46s
+
|127.89s
|0.89s
+
|0.31s
 
|gcc
 
|gcc
 
| 4.4.5
 
| 4.4.5
Line 95: Line 103:
 
|
 
|
 
|-
 
|-
 +
|-
 +
|AMD Phenom II X6 1090T
 +
|6
 +
|1
 +
|34.86s
 +
|34.67s
 +
|0.19s
 +
|gcc
 +
| 4.4.5
 +
| -O3
 +
|Debian GNU/Linux 6.0.7 (squeeze)
 +
|8.0 gb
 +
|-
 +
|AMD Phenom II X6 1090T
 +
|6
 +
|2
 +
|18.38s
 +
|35.71s
 +
|0.13s
 +
|gcc
 +
| 4.4.5
 +
| -O3
 +
|Debian GNU/Linux 6.0.7 (squeeze)
 +
|8.0 gb
 +
|-
 +
|AMD Phenom II X6 1090T
 +
|6
 +
|4
 +
|10.04s
 +
|38.40s
 +
|0.32s
 +
|gcc
 +
| 4.4.5
 +
| -O3
 +
|Debian GNU/Linux 6.0.7 (squeeze)
 +
|8.0 gb
 +
|-
 +
|AMD Phenom II X6 1090T
 +
|6
 +
|6
 +
|7.03s
 +
|37.69s
 +
|0.29s
 +
|gcc
 +
| 4.4.5
 +
| -O3
 +
|Debian GNU/Linux 6.0.7 (squeeze)
 +
|8.0 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|1
 +
|64.08s
 +
|63.90s
 +
|0.07s
 +
|gcc
 +
| 4.6.3
 +
| -O0
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|2
 +
|32.64s
 +
|64.69s
 +
|0.21s
 +
|gcc
 +
| 4.6.3
 +
| -O0
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|4
 +
|17.54s
 +
|68.18s
 +
|0.20s
 +
|gcc
 +
| 4.6.3
 +
| -O0
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|6
 +
|16.07s
 +
|85.88s
 +
|0.19s
 +
|gcc
 +
| 4.6.3
 +
| -O0
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|8
 +
|13.80s
 +
|106.93s
 +
|0.24s
 +
|gcc
 +
| 4.6.3
 +
| -O0
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|1
 +
|19.56s
 +
|19.38s
 +
|0.07s
 +
|gcc
 +
| 4.6.3
 +
| -O3
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|2
 +
|10.49s
 +
|20.53s
 +
|0.07s
 +
|gcc
 +
| 4.6.3
 +
| -O3
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|4
 +
|5.67s
 +
|21.57s
 +
|0.07s
 +
|gcc
 +
| 4.6.3
 +
| -O3
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|6
 +
|4.75s
 +
|25.64s
 +
|0.08s
 +
|gcc
 +
| 4.6.3
 +
| -O3
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|8
 +
|3.80s
 +
|28.97s
 +
|0.10s
 +
|gcc
 +
| 4.6.3
 +
| -O3
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|1
 +
|7.96s
 +
|7.94s
 +
|0.07s
 +
|icc
 +
| 12.1.3 20120212
 +
| -Ofast
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
| ?
 +
| ?
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|2
 +
|4.69s
 +
|8.89s
 +
|0.08s
 +
|icc
 +
| 12.1.3 20120212
 +
| -Ofast
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
| ?
 +
| ?
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|4
 +
|2.83s
 +
|10.01s
 +
|0.08s
 +
|icc
 +
| 12.1.3 20120212
 +
| -Ofast
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|6
 +
|2.69s
 +
|13.26s
 +
|0.10s
 +
|icc
 +
| 12.1.3 20120212
 +
| -Ofast
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
|-
 +
|Intel Core i7-3770 @ 3.40GHz
 +
|8
 +
|8
 +
|2.49s
 +
|17.55s
 +
|0.12s
 +
|icc
 +
| 12.1.3 20120212
 +
| -Ofast
 +
|Ubuntu 12.04.2 LTS
 +
|15.5 gb
 +
| ?
 +
| ?
 
|}
 
|}
 +
 +
[[Category: Massive data analysis]]
 +
[[Category: Parallelization]]

Latest revision as of 09:43, 15 September 2013

The idea of this wiki page is to explore which compiler and compiler flag combinations are most useful for speeding up computations without degrading data quality.

For an compiler flags overview, see HPC Compendium: Compilers

Neighborhood analysis

  • Performance test using OpenMP with different compilers and compiler options.

Code and an automated run-script can be found at:

https://trac.osgeo.org/grass/browser/sandbox/soeren/benchmarks/neighborhood_openmp


It's best to run it 4 times for each case, discard the first and average the next 3.

Example usage:

 unset OMP_NUM_THREADS
 time ./neighbor 5000 5000 23
 
 export OMP_NUM_THREADS=1
 time ./neighbor 5000 5000 23
 
 ...
 
 export OMP_NUM_THREADS=6
 time ./neighbor 5000 5000 23

Results table

Test setup: 5000x5000 array with a window size of 23x23 cells

CPU Available cores OMP NUM THREADS Time "real" Time "user" Time "sys" Compiler Compiler version Compiler flags OS System RAM Data sum Data mean
AMD Phenom II X6 1090T 6 1 129.17s 124.94s 0.75s gcc 4.4.5 -O0 Debian GNU/Linux 6.0.7 (squeeze) 8.0 gb
AMD Phenom II X6 1090T 6 2 64.64s 127.89s 0.31s gcc 4.4.5 -O0 Debian GNU/Linux 6.0.7 (squeeze) 8.0 gb
AMD Phenom II X6 1090T 6 4 37.26s 145.96s 0.52s gcc 4.4.5 -O0 Debian GNU/Linux 6.0.7 (squeeze) 8.0 gb
AMD Phenom II X6 1090T 6 6 25.17s 147.70s 0.49s gcc 4.4.5 -O0 Debian GNU/Linux 6.0.7 (squeeze) 8.0 gb
AMD Phenom II X6 1090T 6 1 34.86s 34.67s 0.19s gcc 4.4.5 -O3 Debian GNU/Linux 6.0.7 (squeeze) 8.0 gb
AMD Phenom II X6 1090T 6 2 18.38s 35.71s 0.13s gcc 4.4.5 -O3 Debian GNU/Linux 6.0.7 (squeeze) 8.0 gb
AMD Phenom II X6 1090T 6 4 10.04s 38.40s 0.32s gcc 4.4.5 -O3 Debian GNU/Linux 6.0.7 (squeeze) 8.0 gb
AMD Phenom II X6 1090T 6 6 7.03s 37.69s 0.29s gcc 4.4.5 -O3 Debian GNU/Linux 6.0.7 (squeeze) 8.0 gb
Intel Core i7-3770 @ 3.40GHz 8 1 64.08s 63.90s 0.07s gcc 4.6.3 -O0 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 2 32.64s 64.69s 0.21s gcc 4.6.3 -O0 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 4 17.54s 68.18s 0.20s gcc 4.6.3 -O0 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 6 16.07s 85.88s 0.19s gcc 4.6.3 -O0 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 8 13.80s 106.93s 0.24s gcc 4.6.3 -O0 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 1 19.56s 19.38s 0.07s gcc 4.6.3 -O3 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 2 10.49s 20.53s 0.07s gcc 4.6.3 -O3 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 4 5.67s 21.57s 0.07s gcc 4.6.3 -O3 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 6 4.75s 25.64s 0.08s gcc 4.6.3 -O3 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 8 3.80s 28.97s 0.10s gcc 4.6.3 -O3 Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 1 7.96s 7.94s 0.07s icc 12.1.3 20120212 -Ofast Ubuntu 12.04.2 LTS 15.5 gb  ?  ?
Intel Core i7-3770 @ 3.40GHz 8 2 4.69s 8.89s 0.08s icc 12.1.3 20120212 -Ofast Ubuntu 12.04.2 LTS 15.5 gb  ?  ?
Intel Core i7-3770 @ 3.40GHz 8 4 2.83s 10.01s 0.08s icc 12.1.3 20120212 -Ofast Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 6 2.69s 13.26s 0.10s icc 12.1.3 20120212 -Ofast Ubuntu 12.04.2 LTS 15.5 gb
Intel Core i7-3770 @ 3.40GHz 8 8 2.49s 17.55s 0.12s icc 12.1.3 20120212 -Ofast Ubuntu 12.04.2 LTS 15.5 gb  ?  ?