User`s guide

Compiler Directives and Assertions [C]

are performed by 20 threads, the first thread executes iteration 1,

iteration 21, iteration 41, and so forth. This scheduling leads to better

load balancing for triangular loops. For example:

void interleave_example(const double X[100][100],

const double Y[100], double Z[100], const int N)

{

#pragma mta interleave schedule

for (int i = 0; i < N; i++) {

double sum = 0.0;

for (int j = 0; j < i; j++) {

sum += X[i][j] * Y[j];

}

Z[i] = sum;

}

Here, a block schedule results in poor load balancing with the

first threads finishing before the last threads. With an interleaved

schedule, the work is much better balanced.

#pragma mta dynamic schedule

At execution time, threads are assigned one iteration at a time

through the use of a shared counter. After completing an assigned

iteration, each thread receives its next iteration by accessing the

counter. The number of iterations executed by each thread depends

on the execution time of the particular iterations assigned to the

thread. One thread may happen to receive all the long-running

iterations, and thus might execute fewer iterations than any other

thread. This method is preferred when the execution time for

individual iterations may vary greatly, although its overhead makes

it less desirable for general use.

#pragma mta use n streams

This directive indicates that the compiler should request at least n

threads per processor for the next loop. When multiple loops are

contained in the same parallel region, the largest n is used. In the

absence of a directive, the compiler determines the number of threads

needed to saturate the processor. This directive affects the next loop

only.

S–2479–20 123