-
Notifications
You must be signed in to change notification settings - Fork 912
Closed
Milestone
Description
I'm having trouble using MPI_Comm_split_type
for in-node splits with custom OMPI_COMM_TYPE_*
. Everything works fine when I run with mpirun
, but the same code doesn't work with srun
. Is this supposed to work, or are there some limitations I'm not aware of? I'm trying with OpenMPI 4.0.3 and 4.0.5, Centos 7.7 with Slurm 19.05, stock hwloc 1.11 (but I also tried to compile OpenMPI with hwloc 2.4.0), pmix 3.1.5.
This is a simple test app:
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv)
{
MPI_Comm split_comm;
int split_rank = -1, split_size = -1;
MPI_Init(&argc, &argv);
MPI_Comm_split_type(MPI_COMM_WORLD, OMPI_COMM_TYPE_NUMA, 0, MPI_INFO_NULL, &split_comm);
MPI_Comm_rank(split_comm, &split_rank);
MPI_Comm_size(split_comm, &split_size);
fprintf(stderr, "rank %d >>> split size %d\n", split_rank, split_size);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
On our EPYC 7742
system I get this with mpirun
:
mpirun -np 128 ./splittest
rank 0 >>> split size 16
and this with srun
srun -n 128 ./splittest
rank 0 >>> split size 1
Essentially, I get a split size 1 whatever type I use.