ALERT! Warning: your browser isn't supported. Please install a modern one, like Firefox, Opera, Safari, Chrome or the latest Internet Explorer. Thank you!
Startseite » ... » Zentrale Einrichtungen  » ZIH  » Wiki
phone prefix: +49 351 463.....

HPC Support

Operation Status

Ulf Markwardt: 33640
Claudia Schmidt: 39833 hpcsupport@zih.tu-dresden.de

Login and project application

Phone: 40000
Fax: 42328
servicedesk@tu-dresden.de

You are here: Compendium » SystemTaurus » KnlNodes

Intel Xeon Phi (Knights Landing)

This page is under contruction!

The nodes taurusknl[1-32] are equipped with
  • Intel Xeon Phi procesors: 64 cores Intel Xeon Phi 7210 (1,3 GHz)
  • 96 GB RAM DDR4
  • 16 GB MCDRAM
  • /scratch, /lustre/ssd, /projects, /home are mounted
Benchmarks, so far (single node):
  • HPL (Linpack): 1863.74 GFlops
  • SGEMM (single precision) MKL: 4314 GFlops
  • Stream (only 1.4 GiB memory used): 431 GB/s

Each of them can run 4 threads, so one can start a job here with e.g.
srun -p knl -N 1 --mem=90000 -n 1 -c 64 a.out

In order to get their optimal performance please re-compile your code with the most recent Intel compiler and explicitely set the compiler flag -xMIC-AVX512.

MPI works now, we recommend to use the latest Intel MPI version (intelmpi/2017.1.132). To utilize the OmniPath Fabric properly, make sure to use the "ofi" fabric provider, which is the new default set by the module file.

Most nodes have a fixed configuration for cluster mode (Quadrant) and memory mode (Cache). For testing purposes, we have configured a few nodes with different modes (other configurations are possible upon request):

Nodes Cluster Mode Memory Mode
taurusknl[1-28] Quadrant Cache
taurusknl29 Quadrant Flat
taurusknl[30-32] SNC4 Flat
They have SLURM features set, so that you can request them specifically by using the SLURM parameter --constraint where multiple values can be linked with the & operator, e.g. --constraint="SNC4&Flat". If you don't set a constraint, your job will run preferably on the nodes with Quadrant+Cache.

Note that your performance might take a hit if your code is not NUMA-aware and does not make use of the Flat memory mode while running on the nodes that have those modes set, so you might want to use --constraint="Quadrant&Cache" in such a case to ensure your job does not run on an unfavorable node (which might happen if all the others are already allocated).

KNL Best Practice Guide from PRACE