Many software heavily relies on math libraries, e.g., for linear algebra or FFTW calculations. Writing portable and fast math functions is a really challenging task. You can try it for fun, but you really should avoid writing you own matrix-matrix multiplication. Thankfully, there are several high quality math libraries available at ZIH systems.
In the following, a few often-used interfaces/specifications and libraries are described. All libraries are available as modules.
BLAS, LAPACK and ScaLAPACK¶
Over the last decades, the three de-facto standard specifications BLAS, LAPACK and ScaLAPACK for basic linear algebra routines have been emerged.
The BLAS (Basic Linear Algebra Subprograms) specification contains routines for common linear algebra operations such as vector addition, matrix-vector multiplication, and dot product. BLAS routines can be understood as basic building blocks for advanced numerical algorithms.
The Linear Algebra PACKage (LAPACK) provides more sophisticated numerical algorithms, such as solving linear systems of equations, matrix factorization, and eigenvalue problems.
The Scalable Linear Algebra PACKage (ScaLAPACK) takes the idea of high-performance linear algebra routines to parallel distributed memory machines. It offers functionality to solve dense and banded linear systems, least squares problems, eigenvalue problems, and singular value problems.
Many concrete implementations, often tuned and optimized for specific hardware architectures, have been developed over the last decades. The two hardware vendors Intel and AMD each offer their own math library - Intel MKL and AOCL. Both libraries are worth to consider from a users point of view, since they provide extensive math functionality ranging from BLAS and LAPACK to random number generators and Fast Fourier Transformation with consistent interfaces and the "promises" to be highly tuned and optimized and continuously developed further.
- BLAS reference implementation in Fortran
- LAPACK reference implementation
- ScaLAPACK reference implementation
- For GPU implementations, refer to the GPU section
AMD Optimizing CPU Libraries (AOCL)¶
AMD Optimizing CPU Libraries (AOCL) is a set of numerical libraries tuned specifically for AMD EPYC processor family. AOCL offers linear algebra libraries (BLIS, libFLAME, ScaLAPACK, AOCL-Sparse, FFTW routines, AMD Math Library (LibM), as well as AMD Random Number Generator Library and AMD Secure RNG Library.
Math Kernel Library (MKL)¶
The Intel Math Kernel Library (Intel MKL) provides extensively threaded math routines which are highly optimized for Intel CPUs. It contains routines for linear algebra, direct and iterative sparse solvers, random number generators and Fast Fourier Transformation (FFT).
MKL comes in an OpenMP-parallel version. If you want to use it, make sure you know how to place your jobs. 1
The available MKL modules can be queried as follows
marie@login$ module avail imkl
Libraries for GPUs¶
GPU implementations of math functions and routines are often much faster compared to CPU implementations. This also holds for basic routines from BLAS and LAPACK. You should consider using GPU implementations in order to obtain better performance.
There are several math libraries for Nvidia GPUs, e.g.
Nvidia provides a comprehensive overview and starting point.
The project Matrix Algebra on GPU and Multicore Architectures (MAGMA)
aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid
architectures, starting with current "Multicore+GPU" systems.
MAGMA is available at ZIH systems in
different versions. You can list the available modules using
marie@login$ module spider magma
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST). Before using this library, please check out the functions of vendor-specific libraries such as AOCL or Intel MKL
In [c't 18, 2010], Andreas Stiller proposes the usage of
GOMP_CPU_AFFINITYto allow the mapping of AMD cores. KMP_AFFINITY works only for Intel processors. ↩