OpenMP topic: Parallel regions

Experimental html version of downloadable textbook, see http://www.tacc.utexas.edu/~eijkhout/istc/istc.html
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 17.1 : Nested parallelism
17.2 : Cancel parallel construct
Back to Table of Contents

17 OpenMP topic: Parallel regions

The simplest way to create parallelism in OpenMP is to use the parallel pragma. A~block preceded by the omp parallel pragma is called a parallel region ; it is executed by a newly created team of threads. This is an instance of the SPMD model: all threads execute the same segment of code.

#pragma omp parallel
{
  // this is executed by a team of threads
}

It would be pointless to have the block be executed identically by all threads. One way to get a meaningful parallel code is to use the function omp_get_thread_num , to find out which thread you are, and execute work that is individual to that thread. There is also a function omp_get_num_threads to find out the total number of threads. Both these functions give a number relative to the current team; recall from figure  16.3 that new teams can be created recursively.

For instance, if you program computes

result = f(x)+g(x)+h(x)

you could parallelize this as

double result,fresult,gresult,hresult;
#pragma omp parallel
{ int num = omp_get_thread_num();
  if (num==0)      fresult = f(x);
  else if (num==1) gresult = g(x);
  else if (num==2) hresult = h(x);
}
result = fresult + gresult + hresult;

The first thing we want to do is create a team of threads. This is done with a parallel region . Here is a very simple example:

// hello.c
#pragma omp parallel
  {
    int t = omp_get_thread_num();
    printf("Hello world from %d!\n",t);
  }
or in Fortran
// hellocount.F90
!$omp parallel
  nthreads = omp_get_num_threads()
  mythread = omp_get_thread_num()
  write(*,'("Hello from",i3," out of",i3)') mythread,nthreads
!$omp end parallel
or in C++
// hello.cxx
#pragma omp parallel
  {
    int t = omp_get_thread_num();
    stringstream proctext;
    proctext << "Hello world from " << t << endl;
    cerr << proctext.str();
  }
(Note the use of stringstream : without that the output lines from the various threads may get mixed up.)

This code corresponds to the model we just discussed:

  • Immediately preceding the parallel block, one thread will be executing the code. In the main program this is the initial thread .
  • At the start of the block, a new team of threads is created, and the thread that was active before the block becomes the master thread of that team.
  • After the block only the master thread is active.
  • Inside the block there is team of threads: each thread in the team executes the body of the block, and it will have access to all variables of the surrounding environment. How many threads there are can be determined in a number of ways; we will get to that later.

Remark

In future versions of OpenMP, the master thread will be called the primary thread . In 5.1 the master construct will be deprecated, and masked (with added functionality) will take its place. In 6.0 master will disappear from the Spec, including proc_bind master “variable” and combined master constructs (master taskloop, etc.)

Exercise

Make a full program based on this fragment. Insert different print statements before, inside, and after the parallel region. Run this example. How many times is each print statement executed?

You see that the parallel directive

  • Is preceded by a special marker: a #pragma omp for C/C++, and the !$OMP sentinel for Fortran;
  • Is followed by a single statement or a block in C/C++, or followed by a block in Fortran which is delimited by an !$omp end directive.

Directives look like cpp directives , but they are actually handled by the compiler, not the preprocessor.

Exercise

Take the `hello world' program above, and modify it so that you get multiple messages to you screen, saying

  Hello from thread 0 out of 4!
  Hello from thread 1 out of 4!

and so on. (The messages may very well appear out of sequence.)

What happens if you set your number of threads larger than the available cores on your computer?

Exercise

What happens if you call omp_get_thread_num and omp_get_num_threads outside a parallel region?

  omp_get_thread_limit

See also OMP_WAIT_POLICY values: ACTIVE,PASSIVE

17.1 Nested parallelism

crumb trail: > omp-parallel > Nested parallelism

What happens if you call a function from inside a parallel region, and that function itself contains a parallel region?

int main() {
  ...
#pragma omp parallel
  {
  ...
  func(...)
  ...
  }
} // end of main
void func(...) {
#pragma omp parallel
  {
  ...
  }
}

By default, the nested parallel region will have only one thread. To allow nested thread creation, set

OMP_NESTED=true
 or
omp_set_nested(1)

For more fine-grained control use the environment variable OMP_MAX_ACTIVE_LEVELS (default: 1) or the functions omp_set_max_active_levels and omp_get_max_active_levels :

OMP_MAX_ACTIVE_LEVELS=3
 or
void omp_set_max_active_levels(int);
int omp_get_max_active_levels(void);

Exercise

Test nested parallelism by writing an OpenMP program as follows:

  1. Write a subprogram that contains a parallel region.
  2. Write a main program with a parallel region; call the subprogram both inside and outside the parallel region.
  3. Insert print statements

    1. in the main program outside the parallel region,
    2. in the parallel region in the main program,
    3. in the subprogram outside the parallel region,
    4. in the parallel region inside the subprogram.

Run your program and count how many print statements of each type you get.

Writing subprograms that are called in a parallel region illustrates the following point: directives are evaluation with respect to the dynamic scope of the parallel region, not just the lexical scope. In the following example:

#pragma omp parallel
{
  f();
}
void f() {
#pragma omp for
  for ( .... ) {
    ...
  }
}

the body of the function  f falls in the dynamic scope of the parallel region, so the for loop will be parallelized.

If the function may be called both from inside and outside parallel regions, you can test which is the case with omp_in_parallel .

The amount of nested parallelism can be set:

OMP_NUM_THREADS=4,2

means that initially a parallel region will have four threads, and each thread can create two more threads.

OMP_MAX_ACTIVE_LEVELS=123


omp_set_max_active_levels( n )
n = omp_get_max_active_levels()


OMP_THREAD_LIMIT=123


n = omp_get_thread_limit()


omp_set_max_active_levels
omp_get_max_active_levels
omp_get_level
omp_get_active_level
omp_get_ancestor_thread_num


omp_get_team_size(level)

17.2 Cancel parallel construct

crumb trail: > omp-parallel > Cancel parallel construct

!$omp cancel construct [if (expr)]

where construct is parallel , sections , do or taskgroup

Back to Table of Contents