Technical data
Breaking Data Dependencies
89
DO I = 1, NUM_THREADS
SUM = SUM + PARTIAL_SUM(I)
END DO
The outer K loop can be run in parallel. In this method, the array pieces for
the partial sums are contiguous, resulting in good cache utilization and
performance.
This is an important and common transformation, and so automatic support
is provided by the REDUCTION clause:
SUM = 0.0
C$DOACROSS LOCAL (I), REDUCTION (SUM)
DO 10 I = 1, N
SUM = SUM + A(I)
10 CONTINUE
This has essentially the same meaning as the much longer and more
confusing code above. It is an important example to study because the idea
of adding an extra dimension to an array to permit parallel computation,
and then combining the partial results, is an important technique for trying
to break data dependencies. This idea occurs over and over in various
contexts and disguises.
Note that reduction transformations such as this are not strictly correct.
Because computer arithmetic has limited precision, when you sum the
values together in a different order, as was done here, the round-off errors
accumulate slightly differently. It is likely that the final answer will be
slightly different from the original loop. Most of the time the difference is
irrelevant, but it can be significant, so some caution is in order.
This example is a sum reduction because the operator is plus (+). The Fortran
compiler supports three other types of reduction operations:
1. product: p = p*a(i)
2. mm: m = mm(m,a(i))
3. max: m = max(m,a(i))










