Technical data
#pragma Directives [3]
by splitting an inner loop into a set of smaller loops, each of which allocates no
more than six stream buffers, thus avoiding stream buffer thrashing. The stream
buffer feature reduces memory latency and increases memory bandwidth by
prefetching for long, small-strided sequences of memory references.
The split directive has the following format:
#pragma _CRI split
The split directive merely asserts that the loop can profit by splitting. It will not
cause incorrect code.
The compiler splits the loop only if it is safe. Generally, a loop is safe to split
under the same conditions that a loop is vectorizable. The compiler only splits
inner loops. The compiler may not split some loops with conditional code.
The split directive also causes the original loop to be stripmined. This is done
to increase the potential for cache hits between the resultant smaller loops.
Loop splitting can reduce the execution time of a loop by as much as 40%.
Candidates for loop splitting can have trip counts as low as 40. They must also
contain more than six different memory references with strides less than 16.
Note that there is a slight potential for increasing the execution time of certain
loops. Loop splitting also increases compile time, especially when loop unrolling
is also enabled.
For example:
#pragma _CRI split
for (i = 0; i < 1000; i++) {
a[i] = b[i] * c[i];
t = d[i] + a[i];
e[i] = f[i] + t * g[i];
h[i] += e[i];
}
First, the compiler generates the following loop (notice the expansion of the
scalar temporary t into the compiler temporary array ta):
S–2179–36 95










