Known Issues when Using MPI¶
This pages holds known issues observed with MPI and concrete MPI implementations.
Mpirun on partition
mpirun on partitions
ml leads to wrong resource distribution when more than
one node is involved. This yields a strange distribution like e.g.
--tasks-per-node=8 was specified. Unless you really know what you're doing (e.g.
use rank pinning via perl script), avoid using mpirun.
Another issue arises when using the Intel toolchain: mpirun calls a different MPI and caused a 8-9x slowdown in the PALM app in comparison to using srun or the GCC-compiled version of the app (which uses the correct MPI).
R Parallel Library on Multiple Nodes¶
Using the R parallel library on MPI clusters has shown problems when using more than a few compute nodes. The error messages indicate that there are buggy interactions of R/Rmpi/OpenMPI and UCX. Disabling UCX has solved these problems in our experiments.
We invoked the R script successfully with the following command:
mpirun -mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx -np 1 Rscript --vanilla the-script.R
where the arguments
-mca btl_openib_allow_ib true --mca pml ^ucx --mca osc ^ucx disable usage of
MPI_Win_allocate is a one-sided MPI call that allocates memory and returns a window
object for RDMA operations (ref. man page).
Using MPI_Win_allocate rather than separate MPI_Alloc_mem + MPI_Win_create may allow the MPI implementation to optimize the memory allocation. (Using advanced MPI)
It was observed for at least for the
OpenMPI/4.0.5 module that using
MPI_Win_Allocate instead of
MPI_Alloc_mem in conjunction with
MPI_Win_create leads to segmentation faults in the calling
application . To be precise, the segfaults occurred at partition
romeo when about 200 GB per node
where allocated. In contrast, the segmentation faults vanished when the implementation was
refactored to call the
MPI_Alloc_mem + MPI_Win_create functions.