User`s guide
Cray XMT™ Programming Environment User’s Guide
num_streams is the number of streams the compiler requests for each
processor. For loop future parallel loops, the directive limits to c the
number of futures created. The directive is ignored for explicityly
serial loops and cannot be used on a loop that also uses the
use n
streams directive. This directive is useful for managing nested
parallelism in application that have multiple parallel loops running
concurrently, and to reduce or prevent contention for resources.
For more information on using this pragma see Limiting Loop
Parallelism in Cray XMT Applications in the CrayDoc Knowledge
Base at http://docs.cray.com/kbase.
#pragma mta max n processors
The max n processors pragma limits the number of processors
used by a multiprocessor parallel loop. This is useful for load
balancing in applications that have multiple parallel loops running
concurrently. For more information on using this pragma see
Limiting Loop Parallelism in Cray XMT Applications in the CrayDoc
Knowledge Base at http://docs.cray.com/kbase.
#pragma mta max n streams per processor [may merge]
This directive sets a limit of n on the number of streams per
processor that will execute a parallel loop. This limit applies to
an entire parallel region. Thus, by default, the compiler will not
combine loops with different maximum stream specifications into
the same region. This includes cases where one loop has a specified
maximum and the other loop does not. However, if you add the
optional may merge parameter, the compiler will ignore maximum
stream specifications when deciding how to construct parallel regions
(i.e., loops that would have been placed in the same region with no
max streams pragma will still be placed in the same region if max
streams pragmas with may merge are added). You can view how
parallel regions are constructed in the canal report (see the Cray
XMT Performance Tools User's Guide). For example, consider the
following two loops:
for (int i = 0; i < size_foobar; i++) {
bar[i] = size_foobar - i;
}
for (int i = 0; i < size_foobar; i++) {
foo[i] += bar[i]/2;
}
116 S–2479–20