Skip to content

[Q] srun + OpenMPI MPI_Comm_split_type #8299

@angainor

Description

@angainor

I'm having trouble using MPI_Comm_split_type for in-node splits with custom OMPI_COMM_TYPE_*. Everything works fine when I run with mpirun, but the same code doesn't work with srun. Is this supposed to work, or are there some limitations I'm not aware of? I'm trying with OpenMPI 4.0.3 and 4.0.5, Centos 7.7 with Slurm 19.05, stock hwloc 1.11 (but I also tried to compile OpenMPI with hwloc 2.4.0), pmix 3.1.5.

This is a simple test app:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char **argv)
{
  MPI_Comm split_comm;
  int split_rank = -1, split_size = -1;

  MPI_Init(&argc, &argv);

  MPI_Comm_split_type(MPI_COMM_WORLD, OMPI_COMM_TYPE_NUMA, 0, MPI_INFO_NULL, &split_comm);
  MPI_Comm_rank(split_comm, &split_rank);
  MPI_Comm_size(split_comm, &split_size);

  fprintf(stderr, "rank %d >>> split size %d\n", split_rank, split_size);

  MPI_Barrier(MPI_COMM_WORLD);
  MPI_Finalize();
}

On our EPYC 7742 system I get this with mpirun:

mpirun -np 128 ./splittest
rank 0 >>> split size 16

and this with srun

srun -n 128 ./splittest
rank 0 >>> split size 1

Essentially, I get a split size 1 whatever type I use.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions