Skip to content

Known Issues when Using MPI

This pages holds known issues observed with MPI and concrete MPI implementations.

Mpirun on partition alphaand ml

Using mpirun on partitions alpha and ml leads to wrong resource distribution when more than one node is involved. This yields a strange distribution like e.g. SLURM_NTASKS_PER_NODE=15,1 even though --tasks-per-node=8 was specified. Unless you really know what you're doing (e.g. use rank pinning via perl script), avoid using mpirun.

Another issue arises when using the Intel toolchain: mpirun calls a different MPI and caused a 8-9x slowdown in the PALM app in comparison to using srun or the GCC-compiled version of the app (which uses the correct MPI).

R Parallel Library on Multiple Nodes

Using the R parallel library on MPI clusters has shown problems when using more than a few compute nodes. The error messages indicate that there are buggy interactions of R/Rmpi/OpenMPI and UCX. Disabling UCX has solved these problems in our experiments.

We invoked the R script successfully with the following command:

mpirun -mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx -np 1 Rscript
--vanilla the-script.R

where the arguments -mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx disable usage of UCX.

MPI Function MPI_Win_allocate

The function MPI_Win_allocate is a one-sided MPI call that allocates memory and returns a window object for RDMA operations (ref. man page).

Using MPI_Win_allocate rather than separate MPI_Alloc_mem + MPI_Win_create may allow the MPI implementation to optimize the memory allocation. (Using advanced MPI)

It was observed for at least for the OpenMPI/4.0.5 module that using MPI_Win_Allocate instead of MPI_Alloc_mem in conjunction with MPI_Win_create leads to segmentation faults in the calling application . To be precise, the segfaults occurred at partition romeo when about 200 GB per node where allocated. In contrast, the segmentation faults vanished when the implementation was refactored to call the MPI_Alloc_mem + MPI_Win_create functions.