MPI topic: Hybrid computing

Experimental html version of downloadable textbook, see http://www.tacc.utexas.edu/~eijkhout/istc/istc.html
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 13.1 : MPI support for threading
Back to Table of Contents

13 MPI topic: Hybrid computing

While the MPI standard itself makes no mention of threads --~process being the primary unit of computation~-- the use of threads is allowed. Below we will discuss what provisions exist for doing so.

Using threads and other shared memory models in combination with MPI leads of course to the question how race condition s are handled. Example of a code with a data race that pertains to MPI:

#pragma omp sections
#pragma omp section
  MPI_Send( x, /* to process 2 */ )
#pragma omp section
  MPI_Recv( x, /* from process 3 */ )

The MPI standard here puts the burden on the user: this code is not legal, and behavior is not defined.

13.1 MPI support for threading

crumb trail: > mpi-hybrid > MPI support for threading

In hybrid execution, the main question is whether all threads are allowed to make MPI calls. To determine this, replace the MPI_Init call by

C:
int MPI_Init_thread(int *argc, char ***argv, int required, int *provided)

Fortran:
MPI_Init_thread(required, provided, ierror)
INTEGER, INTENT(IN) :: required
INTEGER, INTENT(OUT) :: provided
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
MPI_Init_thread Here the required and provided parameters can take the following (monotonically increasing) values:

  • MPI_THREAD_SINGLE : Only a single thread will execute.
  • MPI_THREAD_FUNNELED : The program may use multiple threads, but only the main thread will make MPI calls.

    The main thread is usually the one selected by the master directive, but technically it is the only that executes MPI_Init_thread . If you call this routine in a parallel region, the main thread may be different from the master.

  • MPI_THREAD_SERIALIZED : The program may use multiple threads, all of which may make MPI calls, but there will never be simultaneous MPI calls in more than one thread.
  • MPI_THREAD_MULTIPLE : Multiple threads may issue MPI calls, without restrictions.

After the initialization call, you can query the support level with

C:
int MPI_Query_thread(int *provided)

Fortran:
MPI_Query_thread(provided, ierror)
INTEGER, INTENT(OUT) :: provided
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
MPI_Query_thread .

In case more than one thread performs communication,

C:
int MPI_Is_thread_main(int *flag)

Fortran:
MPI_Is_thread_main(flag, ierror)
LOGICAL, INTENT(OUT) :: flag
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
MPI_Is_thread_main can determine whether a thread is the main thread.

MPL note MPL

always calls MPI_Init_thread requesting the highest level MPI_THREAD_MULTIPLE .

enum mpl::threading_modes {
  mpl::threading_modes::single = MPI_THREAD_SINGLE,
  mpl::threading_modes::funneled = MPI_THREAD_FUNNELED,
  mpl::threading_modes::serialized = MPI_THREAD_SERIALIZED,
  mpl::threading_modes::multiple = MPI_THREAD_MULTIPLE
};
threading_modes mpl::environment::threading_mode ();
bool mpl::environment::is_thread_main ();
End of MPL note

The mvapich implementation of MPI does have the required threading support, but you need to set this environment variable:

export MV2_ENABLE_AFFINITY=0

Another solution is to run your code like this:

  ibrun tacc_affinity <my_multithreaded_mpi_executable

Intel MPI uses an environment variable to turn on thread support:

I_MPI_LIBRARY_KIND=<value>
where
release : multi-threaded with global lock
release_mt : multi-threaded with per-object lock for thread-split

The mpiexec program usually propagates environment variables , so the value of OMP_NUM_THREADS when you call mpiexec will be seen by each MPI process.

  • It is possible to use blocking sends in threads, and let the threads block. This does away with the need for polling.
  • You can not send to a thread number: use the MPI message tag to send to a specific thread.

Exercise

Consider the 2D heat equation and explore the mix of MPI/OpenMP parallelism:

  • Give each node one MPI process that is fully multi-threaded.
  • Give each core an MPI process and don't use multi-threading.

Discuss theoretically why the former can give higher performance. Implement both schemes as special cases of the general hybrid case, and run tests to find the optimal mix.

// thread.c
MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&threading);
comm = MPI_COMM_WORLD;
MPI_Comm_rank(comm,&procno);
MPI_Comm_size(comm,&nprocs);

if (procno==0) {
  switch (threading) {
  case MPI_THREAD_MULTIPLE : printf("Glorious multithreaded MPI\n"); break;
  case MPI_THREAD_SERIALIZED : printf("No simultaneous MPI from threads\n"); break;
  case MPI_THREAD_FUNNELED : printf("MPI from main thread\n"); break;
  case MPI_THREAD_SINGLE : printf("no threading supported\n"); break;
  }
}
MPI_Finalize();

Back to Table of Contents