OpenMP topic: Tasks

Experimental html version of downloadable textbook, see http://www.tacc.utexas.edu/~eijkhout/istc/istc.html
\[ \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{${\equiv\atop\mathrm{\scriptstyle D}}$}}} \] 23.1 : Task data
23.2 : Task synchronization
23.3 : Task dependencies
23.4 : More
23.4.1 : Scheduling points
23.4.2 : Task cancelling
23.5 : Examples
23.5.1 : Fibonacci
23.5.2 : Binomial coefficients
23.5.3 : Tree traversal
23.5.3.1 : Post-order traversal
23.5.3.2 : Pre-order traversal
Back to Table of Contents

23 OpenMP topic: Tasks

Tasks are a mechanism that OpenMP uses under the cover: if you specify something as being parallel, OpenMP will create a `block of work': a~section of code plus the data environment in which it occurred. This block is set aside for execution at some later point.

Let's look at a simple example using the task directive.

CodeExecution
x = f(); the variable x gets a value
#pragma omp task a task is created with the current value of x
\n{ \{ y = g(x); \}}
z = h(); the variable z gets a value

The thread that executes this code segment creates a task, which will later be executed, probably by a different thread. The exact timing of the execution of the task is up to a task scheduler , which operates invisible to the user.

The task mechanism allows you to do things that are hard or impossible with the loop and section constructs. For instance, a loop} traversing a linked list can be implemented with tasks:

CodeExecution
p = head_of_list(); one thread traverses the list
\n{while (!end_of_list(p)) \{}
#pragma omp task a task is created,
process( p ); one for each element
p = next_element(p); the generating thread goes on without waiting
\ }the tasks are executed while more are being generated.

The way tasks and threads interact is different from the worksharing constructs you've seen so far. Typically, one thread will generate the tasks, adding them to a queue, from which all threads can take and execute them. This leads to the following idiom:

#pragma omp parallel
#pragma omp single
{
  ...
#pragma omp task
  { ... }
  ...
}
  1. A parallel region creates a team of threads;
  2. a single thread then creates the tasks, adding them to a queue that belongs to the team,
  3. and all the threads in that team (possibly including the one that generated the tasks)

With tasks it becomes possible to parallelize processes that did not fit the earlier OpenMP constructs. For instance, if a certain operation needs to be applied to all elements of a linked list, you can have one thread go down the list, generating a task for each element of the list.

Another concept that was hard to parallelize earlier is the `while loop'. This does not fit the requirement for OpenMP parallel loops that the loop bound needs to be known before the loop executes.

Exercise

Use tasks to find the smallest factor of a large number (using $2999\cdot 3001$ as test case): generate a task for each trial factor. Start with this code:

  int factor=0;
#pragma omp parallel
#pragma omp single
  for (int f=2; f<4000; f++) {
    { // see if `f' is a factor
      if (N%f==0) { // found factor!
        factor = f;
      }
    }
    if (factor>0)
      break;
  }
  if (factor>0)
    printf("Found a factor: %d\n",factor);
  • Turn the factor finding block into a task.
  • Run your program a number of times:

    for i in `seq 1 1000` ; do ./taskfactor ; done | grep -v 2999
    

    Does it find the wrong factor? Why? Try to fix this.

  • Once a factor has been found, you should stop generating tasks. Let tasks that should not have been generated, meaning that they test a candidate larger than the factor found, print out a message.

23.1 Task data

crumb trail: > omp-task > Task data

Treatment of data in a task is somewhat subtle. The basic problem is that a task gets created at one time, and executed at another. Thus, if shared data is accessed, does the task see the value at creation time or at execution time? In fact, both possibilities make sense depending on the application, so we need to discuss the rules when which possibility applies.

The first rule is that shared data is shared in the task, but private data becomes code fragments. In the first example:

int count = 100;
#pragma omp parallel
#pragma omp single
{
  while (count>0) {
#pragma omp task
    {
      int countcopy = count;
      if (count==50) {
        sleep(1);
        printf("%d,%d\n",count,countcopy);
      } // end if
    }   // end task
    count--;
  }     // end while
}       // end single

the variable count is declared outside the parallel region and is therefore shared. When the print statement is executed, all tasks will have been generated, and so count will be zero. Thus, the output will likely be 0,50 .

In the second example:

#pragma omp parallel
#pragma omp single
{
  int count = 100;
  while (count>0) {
#pragma omp task
    {
      int countcopy = count;
      if (count==50) {
        sleep(1);
        printf("%d,%d\n",count,countcopy);
      } // end if
    }   // end task
    count--;
  }     // end while
}       // end single

the count variable is private to the thread creating the tasks, and so it will be firstprivate in the task, preserving the value that was current when the task was created.

23.2 Task synchronization

crumb trail: > omp-task > Task synchronization

Even though the above segment looks like a linear set of statements, it is impossible to say when the code after the task directive will be executed. This means that the following code is incorrect:

  x = f();
#pragma omp task
  { y = g(x); }
  z = h(y);

Explanation: when the statement computing z is executed, the task computing  y has only been scheduled; it has not necessarily been executed yet.

In order to have a guarantee that a task is finished, you need the taskwait directive. The following creates two tasks, which can be executed in parallel, and then waits for the results:

CodeExecution
x = f(); the variable x gets a value
#pragma omp task \multirow{4}{*}{two tasks are created with the current value of x }
\n{ \{ y1 = g1(x); \}}
#pragma omp task
\n{ \{ y2 = g2(x); \}}
#pragma omp taskwait the thread waits until the tasks are finished
z = h(y1)+h(y2); the variable z is computed using the task results

The task pragma is followed by a structured block. Each time the structured block is encountered, a new task is generated. On the other hand taskwait is a standalone directive; the code that follows is just code, it is not a structured block belonging to the directive.

Another aspect of the distinction between generating tasks and executing them: usually the tasks are generated by one thread, but executed by many threads. Thus, the typical idiom is:

#pragma omp parallel
#pragma omp single
{
  // code that generates tasks
}

This makes it possible to execute loops in parallel that do not have the right kind of iteration structure for a omp parallel for . As an example, you could traverse and process a linked list:

#pragma omp parallel
#pragma omp single
{
  while (!tail(p)) {
    p = p->next();
#pragma omp task
    process(p)
  }
#pragma omp taskwait
}

One task traverses the linked list creating an independent task for each element in the list. These tasks are then executed in parallel; their assignment to threads is done by the task scheduler.

You can indicate task dependencies in several ways:

  1. Using the `task wait' directive you can explicitly indicate the join of the forked tasks. The instruction after the wait directive will therefore be dependent on the spawned tasks.
  2. The taskgroup directive, followed by a structured block, ensures completion of all tasks created in the block, even if recursively created.
  3. Each OpenMP task can have a clause, indicating what data dependency of the task. By indicating what data is produced or absorbed by the tasks, the scheduler can construct the dependency graph for you.

Another mechanism for dealing with tasks is the taskgroup : a task group is a code block that can contain task directives; all these tasks need to be finished before any statement after the block is executed.

A task group is somewhat similar to having a taskwait directive after the block. The big difference is that that taskwait directive does not wait for tasks that are recursively generated, while a taskgroup does.

23.3 Task dependencies

crumb trail: > omp-task > Task dependencies

It is possible to put a partial ordering on tasks through use of the

#pragma omp task
  x = f()
#pragma omp task
  y = g(x)

it is conceivable that the second task is executed before the first, possibly leading to an incorrect result. This is remedied by specifying:

#pragma omp task depend(out:x)
  x = f()
#pragma omp task depend(in:x)
  y = g(x)

Exercise

Consider the following code:

for i in [1:N]:
    x[0,i] = some_function_of(i)
    x[i,0] = some_function_of(i)


for i in [1:N]:
    for j in [1:N]:
        x[i,j] = x[i-1,j]+x[i,j-1]
  • Observe that the second loop nest is not amenable to OpenMP loop parallelism.
  • Can you think of a way to realize the computation with OpenMP loop parallelism? Hint: you need to rewrite the code so that the same operations are done in a different order.
  • Use tasks with dependencies to make this code parallel without any rewriting: the only change is to add OpenMP directives.

Tasks dependencies are used to indicated how two uses of one data item relate to each other. Since either use can be a read or a write, there are four types of dependencies.

  • [RaW (Read after Write)] The second task reads an item that the first task writes. The second task has to be executed after the first:

    ... omp task depend(OUT:x)
      foo(x)
    ... omp task depend( IN:x)
      foo(x)
    
  • [WaR (Write after Read)] The first task reads and item, and the second task overwrites it. The second task has to be executed second to prevent overwriting the initial value:

    ... omp task depend( IN:x)
      foo(x)
    ... omp task depend(OUT:x)
      foo(x)
    
  • [WaW (Write after Write)] Both tasks set the same variable. Since the variable can be used by an intermediate task, the two writes have to be executed in this order.

    ... omp task depend(OUT:x)
      foo(x)
    ... omp task depend(OUT:x)
      foo(x)
    
  • [RaR (Read after Read)] Both tasks read a variable. Since neither tasks has an `out' declaration, they can run in either order.

    ... omp task depend(IN:x)
      foo(x)
    ... omp task depend(IN:x)
      foo(x)
    

23.4 More

crumb trail: > omp-task > More

23.4.1 Scheduling points

crumb trail: > omp-task > More > Scheduling points

Normally, a task stays tied to the thread that first executes it. However, at a task scheduling point the thread may switch to the execution of another task created by the same team.

  • There is a scheduling point after explicit task creation. This means that, in the above examples, the thread creating the tasks can also participate in executing them.
  • There is a scheduling point at taskwait and taskyield .

On the other hand a task created with them on the task pragma is never tied to one thread. This means that after suspension at a scheduling point any thread can resume execution of the task. If you do this, beware that the value of a thread-id does not stay fixed. Also locks become a problem.

Example: if a thread is waiting for a lock, with a scheduling point it can suspend the task and work on another task.

while (!omp_test_lock(lock))
#pragma omp taskyield
  ;

23.4.2 Task cancelling

crumb trail: > omp-task > More > Task cancelling

It is possible (in OpenMP version 4 ) to cancel tasks. This is useful when tasks are used to perform a search: the task that finds the result first can cancel any outstanding search tasks.

The directive surrounding construct ( parallel, for, sections, taskgroup ) in which the tasks are cancelled.

Exercise

Modify the prime finding example.

23.5 Examples

crumb trail: > omp-task > Examples

23.5.1 Fibonacci

crumb trail: > omp-task > Examples > Fibonacci

As an example of the use of tasks, consider computing an array of Fibonacci values: \cverbatimsnippet[examples/omp/c/taskgroup0.c]{fiboarray} If you simply turn each calculation into a task, results will be unpredictable (confirm this!) since tasks can be executed in any sequence. To solve this, we put dependencies on the tasks: \cverbatimsnippet[examples/omp/c/taskgroup2.c]{fibotaskdepend}

23.5.2 Binomial coefficients

crumb trail: > omp-task > Examples > Binomial coefficients

Exercise

An array of binomial coefficients can be computed as follows: \cverbatimsnippet[code/omp/c/binomial1.c]{binomialarray} Putting a single task group around the double loop, and use depend clauses to make the execution satisfy the proper dependencies.

23.5.3 Tree traversal

crumb trail: > omp-task > Examples > Tree traversal

OpenMP tasks are a great way of handling trees.

23.5.3.1 Post-order traversal

crumb trail: > omp-task > Examples > Tree traversal > Post-order traversal

In post-order tree traversal you visit the subtrees before visiting the root. This is the traversal that you use to find summary information about a tree, for instance the sum of all nodes, and the sums of nodes of all subtrees:

\For{all children $c$} {compute the sum $s\_c$}\

$ s \leftarrow \sum\_c s\_c$

Another example is matrix factorization: \[ S = A_{33} - A_{31}A_{11}\inv A_{13} - A_{32}A_{22}\inv A_{23} \] where the two inverses $A_{11}\inv,A_{22}\inv$ can be computed indepedently and recursively.

23.5.3.2 Pre-order traversal

crumb trail: > omp-task > Examples > Tree traversal > Pre-order traversal

If a property needs to propagate from the root to all subtrees and nodes, you can use pre-order tree traversal :

Update node value $s$\; \For{all children $c$} {update $c$ with the new value $s$}\

Back to Table of Contents