Technical data

#pragma Directives [3]

by splitting an inner loop into a set of smaller loops, each of which allocates no

more than six stream buffers, thus avoiding stream buffer thrashing. The stream

buffer feature reduces memory latency and increases memory bandwidth by

prefetching for long, small-strided sequences of memory references.

The split directive has the following format:

#pragma _CRI split

The split directive merely asserts that the loop can profit by splitting. It will not

cause incorrect code.

The compiler splits the loop only if it is safe. Generally, a loop is safe to split

under the same conditions that a loop is vectorizable. The compiler only splits

inner loops. The compiler may not split some loops with conditional code.

The split directive also causes the original loop to be stripmined. This is done

to increase the potential for cache hits between the resultant smaller loops.

Loop splitting can reduce the execution time of a loop by as much as 40%.

Candidates for loop splitting can have trip counts as low as 40. They must also

contain more than six different memory references with strides less than 16.

Note that there is a slight potential for increasing the execution time of certain

loops. Loop splitting also increases compile time, especially when loop unrolling

is also enabled.

For example:

#pragma _CRI split

for (i = 0; i < 1000; i++) {

a[i] = b[i] * c[i];

t = d[i] + a[i];

e[i] = f[i] + t * g[i];

h[i] += e[i];

}

First, the compiler generates the following loop (notice the expansion of the

scalar temporary t into the compiler temporary array ta):

S–2179–36 95