OpenMP topic: Controlling thread data

Experimental html version of downloadable textbook, see http://www.tacc.utexas.edu/~eijkhout/istc/istc.html
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 20.1 : Shared data
20.2 : Private data
20.3 : Data in dynamic scope
20.4 : Temporary variables in a loop
20.5 : Default
20.6 : Array data
20.7 : First and last private
20.8 : Persistent data through threadprivate
20.8.1 : Thread private initialization
20.8.2 : Thread private example
20.9 : Allocators
20.9.1 : Pre-defined types
Back to Table of Contents

20 OpenMP topic: Controlling thread data

In a parallel region there are two types of data: private and shared. In this sections we will see the various way you can control what category your data falls under; for private data items we also discuss how their values relate to shared data.

20.1 Shared data

crumb trail: > omp-data > Shared data

In a parallel region, any data declared outside it will be shared: any thread using a variable~ x will access the same memory location associated with that variable.

Example:

  int x = 5;
#pragma omp parallel
  {
    x = x+1;
    printf("shared: x is %d\n",x);
  }

All threads increment the same variable, so after the loop it will have a value of five plus the number of threads; or maybe less because of the data races involved. See Eijkhout:IntroHPC for an explanation of the issues involved; see 22.2.1 for a solution in OpenMP.

Sometimes this global update is what you want; in other cases the variable is intended only for intermediate results in a computation. In that case there are various ways of creating data that is local to a thread, and therefore invisible to other threads.

20.2 Private data

crumb trail: > omp-data > Private data

In the C/C++ language it is possible to declare variables inside a lexical scope ; roughly: inside curly braces. This concept extends to OpenMP parallel regions and directives: any variable declared in a block following an OpenMP directive will be local to the executing thread.

Example:

  int x = 5;
#pragma omp parallel
  {
    int x; x = 3;
    printf("local: x is %d\n",x);
  }

After the parallel region the outer variable  x will still have the value  5 : there is no storage association between the private variable and global one.

Fortran note

The Fortran language does not have this concept of scope, so you have to use a

!$OMP parallel private(x)

The private directive declares data to have a separate copy in the memory of each thread. Such private variables are initialized as they would be in a main program. Any computed value goes away at the end of the parallel region. (However, see below.) Thus, you should not rely on any initial value, or on the value of the outer variable after the region.

  int x = 5;
#pragma omp parallel private(x)
  {
    x = x+1; // dangerous
    printf("private: x is %d\n",x);
  }
  printf("after: x is %d\n",x); // also dangerous

Data that is declared private with the private directive is put on a separate stack per thread . The OpenMP standard does not dictate the size of these stacks, but beware of stack overflow . A typical default is a few megabyte; you can control it with the environment variable OMP_STACKSIZE . Its values can be literal or with suffixes:

123 456k 567K 678m 789M 246g 357G

A normal Unix process also has a stack, but this is independent of the OpenMP stacks for private data. You can query or set the Unix stack with ulimit :

[] ulimit -s
64000
[] ulimit -s 8192
[] ulimit -s
8192

The Unix stack can grow dynamically as space is needed. This does not hold for the OpenMP stacks: they are immediately allocated at their requested size. Thus it is important not too make them too large.

20.3 Data in dynamic scope

crumb trail: > omp-data > Data in dynamic scope

Functions that are called from a parallel region fall in the dynamic scope of that parallel region. The rules for variables in that function are as follows:

  • Any variables locally defined to the function are private.
  • static variables in C and save variables in Fortran are shared.
  • The function arguments inherit their status from the calling environment.

20.4 Temporary variables in a loop

crumb trail: > omp-data > Temporary variables in a loop

It is common to have a variable that is set and used in each loop iteration:

#pragma omp parallel for
for ( ... i ... ) {
  x = i*h;
  s = sin(x); c = cos(x);
  a[i] = s+c;
  b[i] = s-c;
}

By the above rules, the variables x,s,c are all shared variables. However, the values they receive in one iteration are not used in a next iteration, so they behave in fact like private variables to each iteration.

  • In both C and Fortran you can declare these variables private in the parallel for directive.
  • In C, you can also redefine the variables inside the loop.

Sometimes, even if you forget to declare these temporaries as private, the code may still give the correct output. That is because the compiler can sometimes eliminate them from the loop body, since it detects that their values are not otherwise used.

20.5 Default

crumb trail: > omp-data > Default

  • Loop variables in an omp for are private;
  • Local variables in the parallel region are private.

You can alter this default behaviour with the

#pragma omp parallel default(shared) private(x)
{ ... }
#pragma omp parallel default(private) shared(matrix)
{ ... }

and if you want to play it safe:

#pragma omp parallel default(none) private(x) shared(matrix)
{ ... }

  • The variables from the outer scope are shared in the parallel region; any private variables need to be declared explicitly. This is the default behaviour.
  • The outer variables become private in the parallel region. They are not initialized; see the next option. Any shared variables in the parallel region need to be declared explicitly. This value is not available in C.
  • The outer variables are private in the parallel region, and initialized with their outer value. Any shared variables need to be declared explicitly. This value is not available in C.
  • The because it forces you to specify for each variable in the parallel region whether it's private or shared. Also, if your code behaves differently in parallel from sequential there is probably a data race. Specifying the status of every variable is a good way to debug this.

20.6 Array data

crumb trail: > omp-data > Array data

The rules for arrays are slightly different from those for scalar data:

  1. Statically allocated data, that is with a syntax like

    int array[100];
    integer,dimension(:) :: array(100}
    

    can be shared or private, depending on the clause you use.

  2. Dynamically allocated data, that is, created with malloc or allocate , can only be shared.

Example of the first type: in \cverbatimsnippet[examples/omp/c/alloc2.c]{privatearray} each thread gets a private copy of the array, properly initialized.

On the other hand, in \cverbatimsnippet[examples/omp/c/alloc1.c]{privatepointer} each thread gets a private pointer, but all pointers point to the same object.

20.7 First and last private

crumb trail: > omp-data > First and last private

Above, you saw that private variables are completely separate from any variables by the same name in the surrounding scope. However, there are two cases where you may want some storage association between a private variable and a global counterpart.

First of all, private variables are created with an undefined value. You can force their initialization with

  int t=2;
#pragma omp parallel firstprivate(t)
  {
    t += f( omp_get_thread_num() );
    g(t);
  }

The variable t behaves like a private variable, except that it is initialized to the outside value.

Secondly, you may want a private value to be preserved to the environment outside the parallel region. This really only makes sense in one case, where you preserve a private variable from the last iteration of a parallel loop, or the last section in an sections construct. This is done with

#pragma omp parallel for \
        lastprivate(tmp)
for (i=0; i<N; i+) {
  tmp = ......
  x[i] = .... tmp ....
}
..... tmp ....

20.8 Persistent data through threadprivate

crumb trail: > omp-data > Persistent data through threadprivate

Most data in OpenMP parallel regions is either inherited from the master thread and therefore shared, or temporary within the scope of the region and fully private. There is also a mechanism for thread-private data , which is not limited in lifetime to one parallel region. The threadprivate pragma is used to declare that each thread is to have a private copy of a variable:

#pragma omp threadprivate(var)

The variable needs be:

  • a file or static variable in C,
  • a static class member in C++, or
  • a program variable or common block in Fortran.

20.8.1 Thread private initialization

crumb trail: > omp-data > Persistent data through threadprivate > Thread private initialization

If each thread needs a different value in its threadprivate variable, the initialization needs to happen in a parallel region.

In the following example a team of 7 threads is created, all of which set their thread-private variable. Later, this variable is read by a larger team: the variables that have not been set are undefined, though often simply zero:

// threadprivate.c
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

static int tp;

int main(int argc,char **argv) {

#pragma omp threadprivate(tp)

#pragma omp parallel num_threads(7)
  tp = omp_get_thread_num();

#pragma omp parallel num_threads(9)
  printf("Thread %d has %d\n",omp_get_thread_num(),tp);

  return 0;
}

On the other hand, if the thread private data starts out identical in all threads, the

#pragma omp threadprivate(private_var)


private_var = 1;
#pragma omp parallel copyin(private_var)
  private_var += omp_get_thread_num()

If one thread needs to set all thread private data to its value, the

#pragma omp parallel
{
  ...
#pragma omp single copyprivate(private_var)
  private_var = read_data();
  ...
}

20.8.2 Thread private example

crumb trail: > omp-data > Persistent data through threadprivate > Thread private example

The typical application for thread-private variables is in random number generator s. A random number generator needs saved state, since it computes each next value from the current one. To have a parallel generator, each thread will create and initialize a private `current value' variable. This will persist even when the execution is not in a parallel region; it gets updated only in a parallel region.

Exercise

Calculate the area of the Mandelbrot set by random sampling. Initialize the random number generator separately for each thread; then use a parallel loop to evaluate the points. Explore performance implications of the different loop scheduling strategies.

Fortran note

Named common blocks can be made thread-private with the syntax

$!OMP threadprivate( /blockname/ )

Threadprivate variables require OMP_DYNAMIC to be switched off.

20.9 Allocators

crumb trail: > omp-data > Allocators

The OpenMP was initially designed for shared memory. With accelerators (see chapter  OpenMP topic: Offloading , non-coherent memory was added to this. In the OpenMP- standard, the story is further complicated, to account for new memory types such as high-bandwidth memory and non-volatile memory .

There are several ways of using the OpenMP memory allocators.

  • First, in a directory on a static array:

    float A[N], B[N];
    #pragma omp allocate(A) \
        allocator(omp_large_cap_mem_alloc)
    
  • As a clause on private variables:

    #pragma omp task private(B) allocate(omp_const_mem_alloc: B)
    
  • With omp_alloc , using a (possibly user-defined) allocator.

Next, there are memory spaces. The binding between OpenMP identifiers and hardware is implementation defined.

20.9.1 Pre-defined types

crumb trail: > omp-data > Allocators > Pre-defined types

Allocators: omp_default_mem_alloc , omp_large_cap_mem_alloc , omp_const_mem_alloc , omp_high_bw_mem_alloc , omp_low_lat_mem_alloc , omp_cgroup_mem_alloc , omp_pteam_mem_alloc , omp_thread_mem_alloc .

Memory spaces: omp_default_mem_space , omp_large_cap_mem_space , omp_const_mem_space , omp_high_bw_mem_space , omp_low_lat_mem_space .

Back to Table of Contents