User`s guide

Compiler Directives and Assertions [C]
are performed by 20 threads, the first thread executes iteration 1,
iteration 21, iteration 41, and so forth. This scheduling leads to better
load balancing for triangular loops. For example:
void interleave_example(const double X[100][100],
const double Y[100], double Z[100], const int N)
{
#pragma mta interleave schedule
for (int i = 0; i < N; i++) {
double sum = 0.0;
for (int j = 0; j < i; j++) {
sum += X[i][j] * Y[j];
}
Z[i] = sum;
}
}
Here, a block schedule results in poor load balancing with the
first threads finishing before the last threads. With an interleaved
schedule, the work is much better balanced.
#pragma mta dynamic schedule
At execution time, threads are assigned one iteration at a time
through the use of a shared counter. After completing an assigned
iteration, each thread receives its next iteration by accessing the
counter. The number of iterations executed by each thread depends
on the execution time of the particular iterations assigned to the
thread. One thread may happen to receive all the long-running
iterations, and thus might execute fewer iterations than any other
thread. This method is preferred when the execution time for
individual iterations may vary greatly, although its overhead makes
it less desirable for general use.
#pragma mta use n streams
This directive indicates that the compiler should request at least n
threads per processor for the next loop. When multiple loops are
contained in the same parallel region, the largest n is used. In the
absence of a directive, the compiler determines the number of threads
needed to saturate the processor. This directive affects the next loop
only.
S247920 123