ALERT! Warning: your browser isn't supported. Please install a modern one, like Firefox, Opera, Safari, Chrome or the latest Internet Explorer. Thank you!
Startseite » ... » Zentrale Einrichtungen  » ZIH  » Wiki
phone prefix: +49 351 463.....

HPC Support

Operation Status

Ulf Markwardt: 33640
Claudia Schmidt: 39833

Login and project application

Phone: 40000
Fax: 42328


The following compilers are available on our platforms:
  Intel GNU PGI
C Compiler icc gcc pgcc
C++ Compiler icpc g++ pgc++
Fortran Compiler ifort gfortran pgfortran
For an overview of the installed compiler versions, please see our automatically updated SoftwareModulesList

All C compiler support ANSI C and C99 with a couple of different language options. The support for Fortran77, Fortran90, Fortran95, and Fortran2003 differs from one compiler to the other. Please check the man pages to verify that your code can be compiled.

Please note that the linking of C++ files normally requires the C++ version of the compiler to link the correct libraries. For serious problems with Intel's compilers please refer to FurtherDocumentation.

Compiler Flags

Common options are:

  • -g to include information required for debugging
  • -pg to generate gprof -style sample-based profiling information during the run
  • -O0, -O1, -O2, -O3 to customize the optimization level from no ( -O0 ) to aggressive ( -O3 ) optimization
  • -I to set search path for header files
  • -L to set search path for libraries
Please note that aggressive optimization allows deviation from the strict IEEE arithmetic. Since the performance impact of options like -mp is very hard the user herself has to balance speed and desired accuracy of her application. There are several options for profiling, profile-guided optimization, data alignment and so on. You can list all available compiler options with the option -help . Reading the man-pages is a good idea, too.

The user benefits from the (nearly) same set of compiler flags for optimization for the C,C++, and Fortran-compilers. In the following table, only a couple of important compiler-dependant options are listed. For more detailed information, the user should refer to the man pages or use the option -help to list all options of the compiler.

GCC Open64 Intel PGI Pathscale Description*
-fopenmp -mp -openmp -mp -mp turn on OpenMP support
-ieee-fp -fno-fast-math -mp -Kieee -no-fast-math use this flag to limit floating-point optimizations and maintain declared precision
-ffast-math -ffast-math -mp1 -Knoieee -ffast-math some floating-point optimizations are allowed, less performance impact than -mp .
-Ofast -Ofast -fast -fast -Ofast Maximize performance, implies a couple of other flags
    -fpe(1) -ftz(2) -Ktrap...   Controls the behavior of the processor when floating-point exceptions occur.
-mavx -msse4.2 -mavx -msse4.2 -msse4.2 -fastsse -mavx "generally optimal flags" for supporting SSE instructions
  -ipa -ipo -Mipa -ipa inter procedure optimization (across files)
    -ip -Mipa   inter procedure optimization (within files)
  -apo -parallel -Mconcur -apo Auto-parallelizer
-fprofile-generate   -prof-gen -Mpfi -fb-create Create intrumented code to generate profile in file
-fprofile-use   -prof-use -Mpfo -fb-opt Use profile data for optimization. - Leave all other optimization options
We can not generally give advice as to which option should be used - even -O0 sometimes leads to a fast code. To gain maximum performance please test the compilers and a few combinations of optimization flags. In case of doubt, you can also contact ZIH and ask the staff for help.

Vector Extensions

To build an executable for different node types (e.g. Sandybridge and Westmere) the option -msse4.2 -axavx (for Intel compilers) uses SSE4.2 as default path and runs along a different execution path if AVX is available. This increases the size of the program code (might result in poorer L1 instruction cache hits) but enables to run the same program on different hardware types.

To optimize for the host architecture, the flags:
GCC Intel
-march=native -xHost
can be used.

The following matrix shows some proper optimization flags for the different hardware in Taurus, as of 2020-04-08:
Arch GCC Intel Compiler
Intel Sandy Bridge -march=sandybridge -xAVX
Intel Haswell -march=haswell -xCORE-AVX2
AMD Rome -march=znver2 -march=core-avx2
Intel Cascade Lake -march=cascadelake -xCOMMON-AVX512

Compiler Optimization Hints

To achieve the best performance the compiler needs to exploit the parallelism in the code. Therefore it is sometimes necessary to provide the compiler with some hints. Some possible directives are (Fortran style):
CDEC$ ivdep ignore assumed vector dependences
CDEC$ swp try to software-pipeline
CDEC$ noswp disable softeware-pipeling
CDEC$ loop count (n) hint for optimzation
CDEC$ distribute point split this large loop
CDEC$ unroll (n) unroll (n) times
CDEC$ nounroll do not unroll
CDEC$ prefetch a prefetch array a
CDEC$ noprefetch a do not prefetch array a
The compiler directives are the same for ifort and icc . The syntax for C/C++ is like #pragma ivdep, #pragma swp, and so on. FurtherDocumentation


1 : ifort only

2 : flushes denormalized numbers to zero: On Itanium 2 an underflow raises an underflow exception that needs to be handled in software. This takes about 1000 cycles!