Mathematics Libraries¶

Many software heavily relies on math libraries, e.g., for linear algebra or FFTW calculations. Writing portable and fast math functions is a really challenging task. You can try it for fun, but you really should avoid writing you own matrix-matrix multiplication. Thankfully, there are several high quality math libraries available at ZIH systems.

In the following, a few often-used interfaces/specifications and libraries are described. All libraries are available as modules.

BLAS, LAPACK and ScaLAPACK¶

Over the last decades, the three de-facto standard specifications BLAS, LAPACK and ScaLAPACK for basic linear algebra routines have been emerged.

The BLAS (Basic Linear Algebra Subprograms) specification contains routines for common linear algebra operations such as vector addition, matrix-vector multiplication, and dot product. BLAS routines can be understood as basic building blocks for advanced numerical algorithms.

The Linear Algebra PACKage (LAPACK) provides more sophisticated numerical algorithms, such as solving linear systems of equations, matrix factorization, and eigenvalue problems.

The Scalable Linear Algebra PACKage (ScaLAPACK) takes the idea of high-performance linear algebra routines to parallel distributed memory machines. It offers functionality to solve dense and banded linear systems, least squares problems, eigenvalue problems, and singular value problems.

Many concrete implementations, often tuned and optimized for specific hardware architectures, have been developed over the last decades. The two hardware vendors Intel and AMD each offer their own math library - Intel MKL and AOCL. Both libraries are worth to consider from a users point of view, since they provide extensive math functionality ranging from BLAS and LAPACK to random number generators and Fast Fourier Transformation with consistent interfaces and the "promises" to be highly tuned and optimized and continuously developed further.

BLAS reference implementation in Fortran
LAPACK reference implementation
ScaLAPACK reference implementation
OpenBlas
For GPU implementations, refer to the GPU section

AMD Optimizing CPU Libraries (AOCL)¶

AMD Optimizing CPU Libraries (AOCL) is a set of numerical libraries tuned specifically for AMD EPYC processor family. AOCL offers linear algebra libraries (BLIS, libFLAME, ScaLAPACK, AOCL-Sparse, FFTW routines, AMD Math Library (LibM), as well as AMD Random Number Generator Library and AMD Secure RNG Library.

Math Kernel Library (MKL)¶

The Intel Math Kernel Library (Intel MKL) provides extensively threaded math routines which are highly optimized for Intel CPUs. It contains routines for linear algebra, direct and iterative sparse solvers, random number generators and Fast Fourier Transformation (FFT).

Note

MKL comes in an OpenMP-parallel version. If you want to use it, make sure you know how to place your jobs. ¹

The available MKL modules can be queried as follows

marie@login$ module avail imkl

Linking¶

For linker flag combinations, we highly recommend the MKL Link Line Advisor (please make sure that JavaScript is enabled for this page).

Libraries for GPUs¶

GPU implementations of math functions and routines are often much faster compared to CPU implementations. This also holds for basic routines from BLAS and LAPACK. You should consider using GPU implementations in order to obtain better performance.

There are several math libraries for Nvidia GPUs, e.g.

cuBLAS
cuSOLVER (reduced set of LAPACK routines)
cuSPARSE (sparse matrix library)
cuFFT

Nvidia provides a comprehensive overview and starting point.

MAGMA¶

The project Matrix Algebra on GPU and Multicore Architectures (MAGMA) aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures, starting with current "Multicore+GPU" systems. MAGMA is available at ZIH systems in different versions. You can list the available modules using

marie@login$ module spider magma
[...]
        magma/2.5.4-fosscuda-2019b
        magma/2.5.4

FFTW¶

FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST). Before using this library, please check out the functions of vendor-specific libraries such as AOCL or Intel MKL

In [c't 18, 2010], Andreas Stiller proposes the usage of GOMP_CPU_AFFINITY to allow the mapping of AMD cores. KMP_AFFINITY works only for Intel processors. ↩