23.4 : More
23.4.1 : Scheduling points
23.5 : Examples
23.5.1 : Fibonacci
23.5.2 : Binomial coefficients

Tasks are a mechanism that OpenMP uses under the cover: if you specify something as being parallel, OpenMP will create a block of work': a~section of code plus the data environment in which it occurred. This block is set aside for execution at some later point.

Let's look at a simple example using the task directive.

 Code Execution x = f(); the variable x gets a value #pragma omp task a task is created with the current value of x \n{ \{ y = g(x); \}} z = h(); the variable z gets a value

The thread that executes this code segment creates a task, which will later be executed, probably by a different thread. The exact timing of the execution of the task is up to a task scheduler , which operates invisible to the user.

The task mechanism allows you to do things that are hard or impossible with the loop and section constructs. For instance, a loop} traversing a linked list can be implemented with tasks:

 Code Execution p = head_of_list(); one thread traverses the list \n{while (!end_of_list(p)) \{} #pragma omp task a task is created, process( p ); one for each element p = next_element(p); the generating thread goes on without waiting \ } the tasks are executed while more are being generated.

The way tasks and threads interact is different from the worksharing constructs you've seen so far. Typically, one thread will generate the tasks, adding them to a queue, from which all threads can take and execute them. This leads to the following idiom:

#pragma omp parallel
#pragma omp single
{
...
{ ... }
...
}

1. A parallel region creates a team of threads;
2. a single thread then creates the tasks, adding them to a queue that belongs to the team,
3. and all the threads in that team (possibly including the one that generated the tasks)

With tasks it becomes possible to parallelize processes that did not fit the earlier OpenMP constructs. For instance, if a certain operation needs to be applied to all elements of a linked list, you can have one thread go down the list, generating a task for each element of the list.

Another concept that was hard to parallelize earlier is the while loop'. This does not fit the requirement for OpenMP parallel loops that the loop bound needs to be known before the loop executes.

Exercise

Use tasks to find the smallest factor of a large number (using $2999\cdot 3001$ as test case): generate a task for each trial factor. Start with this code:

  int factor=0;
#pragma omp parallel
#pragma omp single
for (int f=2; f<4000; f++) {
{ // see if f' is a factor
if (N%f==0) { // found factor!
factor = f;
}
}
if (factor>0)
break;
}
if (factor>0)
printf("Found a factor: %d\n",factor);

• Turn the factor finding block into a task.
• Run your program a number of times:

for i in seq 1 1000 ; do ./taskfactor ; done | grep -v 2999


Does it find the wrong factor? Why? Try to fix this.

• Once a factor has been found, you should stop generating tasks. Let tasks that should not have been generated, meaning that they test a candidate larger than the factor found, print out a message.

Treatment of data in a task is somewhat subtle. The basic problem is that a task gets created at one time, and executed at another. Thus, if shared data is accessed, does the task see the value at creation time or at execution time? In fact, both possibilities make sense depending on the application, so we need to discuss the rules when which possibility applies.

The first rule is that shared data is shared in the task, but private data becomes code fragments. In the first example:

int count = 100;
#pragma omp parallel
#pragma omp single
{
while (count>0) {
{
int countcopy = count;
if (count==50) {
sleep(1);
printf("%d,%d\n",count,countcopy);
} // end if
count--;
}     // end while
}       // end single


the variable count is declared outside the parallel region and is therefore shared. When the print statement is executed, all tasks will have been generated, and so count will be zero. Thus, the output will likely be 0,50 .

In the second example:

#pragma omp parallel
#pragma omp single
{
int count = 100;
while (count>0) {
{
int countcopy = count;
if (count==50) {
sleep(1);
printf("%d,%d\n",count,countcopy);
} // end if
count--;
}     // end while
}       // end single


the count variable is private to the thread creating the tasks, and so it will be firstprivate in the task, preserving the value that was current when the task was created.

Even though the above segment looks like a linear set of statements, it is impossible to say when the code after the task directive will be executed. This means that the following code is incorrect:

  x = f();
{ y = g(x); }
z = h(y);


Explanation: when the statement computing z is executed, the task computing  y has only been scheduled; it has not necessarily been executed yet.

In order to have a guarantee that a task is finished, you need the taskwait directive. The following creates two tasks, which can be executed in parallel, and then waits for the results:

 Code Execution x = f(); the variable x gets a value #pragma omp task \multirow{4}{*}{two tasks are created with the current value of x } \n{ \{ y1 = g1(x); \}} #pragma omp task \n{ \{ y2 = g2(x); \}} #pragma omp taskwait the thread waits until the tasks are finished z = h(y1)+h(y2); the variable z is computed using the task results

The task pragma is followed by a structured block. Each time the structured block is encountered, a new task is generated. On the other hand taskwait is a standalone directive; the code that follows is just code, it is not a structured block belonging to the directive.

Another aspect of the distinction between generating tasks and executing them: usually the tasks are generated by one thread, but executed by many threads. Thus, the typical idiom is:

#pragma omp parallel
#pragma omp single
{
}


This makes it possible to execute loops in parallel that do not have the right kind of iteration structure for a omp parallel for . As an example, you could traverse and process a linked list:

#pragma omp parallel
#pragma omp single
{
while (!tail(p)) {
p = p->next();
process(p)
}
}


You can indicate task dependencies in several ways:

1. Using the task wait' directive you can explicitly indicate the join of the forked tasks. The instruction after the wait directive will therefore be dependent on the spawned tasks.
2. The taskgroup directive, followed by a structured block, ensures completion of all tasks created in the block, even if recursively created.
3. Each OpenMP task can have a clause, indicating what data dependency of the task. By indicating what data is produced or absorbed by the tasks, the scheduler can construct the dependency graph for you.

Another mechanism for dealing with tasks is the taskgroup : a task group is a code block that can contain task directives; all these tasks need to be finished before any statement after the block is executed.

A task group is somewhat similar to having a taskwait directive after the block. The big difference is that that taskwait directive does not wait for tasks that are recursively generated, while a taskgroup does.

It is possible to put a partial ordering on tasks through use of the

#pragma omp task
x = f()
y = g(x)


it is conceivable that the second task is executed before the first, possibly leading to an incorrect result. This is remedied by specifying:

#pragma omp task depend(out:x)
x = f()
y = g(x)


Exercise

Consider the following code:

for i in [1:N]:
x[0,i] = some_function_of(i)
x[i,0] = some_function_of(i)

for i in [1:N]:
for j in [1:N]:
x[i,j] = x[i-1,j]+x[i,j-1]

• Observe that the second loop nest is not amenable to OpenMP loop parallelism.
• Can you think of a way to realize the computation with OpenMP loop parallelism? Hint: you need to rewrite the code so that the same operations are done in a different order.
• Use tasks with dependencies to make this code parallel without any rewriting: the only change is to add OpenMP directives.

Tasks dependencies are used to indicated how two uses of one data item relate to each other. Since either use can be a read or a write, there are four types of dependencies.

... omp task depend(OUT:x)
foo(x)
foo(x)

• [WaR (Write after Read)] The first task reads and item, and the second task overwrites it. The second task has to be executed second to prevent overwriting the initial value:

... omp task depend( IN:x)
foo(x)
foo(x)

• [WaW (Write after Write)] Both tasks set the same variable. Since the variable can be used by an intermediate task, the two writes have to be executed in this order.

... omp task depend(OUT:x)
foo(x)
foo(x)

• [RaR (Read after Read)] Both tasks read a variable. Since neither tasks has an out' declaration, they can run in either order.

... omp task depend(IN:x)
foo(x)
foo(x)


## 23.4 More

crumb trail: > omp-task > More

### 23.4.1 Scheduling points

crumb trail: > omp-task > More > Scheduling points

Normally, a task stays tied to the thread that first executes it. However, at a task scheduling point the thread may switch to the execution of another task created by the same team.

• There is a scheduling point after explicit task creation. This means that, in the above examples, the thread creating the tasks can also participate in executing them.

On the other hand a task created with them on the task pragma is never tied to one thread. This means that after suspension at a scheduling point any thread can resume execution of the task. If you do this, beware that the value of a thread-id does not stay fixed. Also locks become a problem.

Example: if a thread is waiting for a lock, with a scheduling point it can suspend the task and work on another task.

while (!omp_test_lock(lock))
;


If a task involves only a small amount of work, the scheduling overhead may negate any performance gain. There are two ways of executing the task code directly:

• The if clause will only create a task if the test is true:

#pragma omp task if (n>100)
f(n)

• The if clause may still lead to recursively generated tasks. On the other hand, final will execute the code, and will also skip any recursively created tasks:

#pragma omp task final(level<3)
`

It is possible (in OpenMP version 4 ) to cancel tasks. This is useful when tasks are used to perform a search: the task that finds the result first can cancel any outstanding search tasks. See section  17.2 for details.

Exercise

Modify the prime finding example to use cancel .

## 23.5 Examples

crumb trail: > omp-task > Examples

### 23.5.1 Fibonacci

crumb trail: > omp-task > Examples > Fibonacci

As an example of the use of tasks, consider computing an array of Fibonacci values: \cverbatimsnippet[examples/omp/c/taskgroup0.c]{fiboarray} If you simply turn each calculation into a task, results will be unpredictable (confirm this!) since tasks can be executed in any sequence. To solve this, we put dependencies on the tasks: \cverbatimsnippet[examples/omp/c/taskgroup2.c]{fibotaskdepend}

### 23.5.2 Binomial coefficients

crumb trail: > omp-task > Examples > Binomial coefficients

Exercise

An array of binomial coefficients can be computed as follows: \cverbatimsnippet[code/omp/c/binomial1.c]{binomialarray} Putting a single task group around the double loop, and use depend clauses to make the execution satisfy the proper dependencies.