Technical data
Work Quantum
91
Example 1: Loop Interchange
DO K = 1, N
DO I = 1, N
DO J = 1, N
A(I,J) = A(I,J) + B(I,K) * C(K,J)
END DO
END DO
END DO
Here you have several choices: parallelize the J loop or the I loop. You cannot
parallelize the K loop because different iterations of the K loop will all try to
read and write the same values of A(I,J). Try to parallelize the outermost DO
loop possible, because it encloses the most work. In this example, that is the
I loop. For this example, use the technique called loop interchange. Although
the parallelizable loops are not the outermost ones, you can reorder the loops
to make one of them outermost.
Thus, loop interchange would produce
C$DOACROSS LOCAL(I, J, K)
DO I = 1, N
DO K = 1, N
DO J = 1, N
A(I,J) = A(I,J) + B(I,K) * C(K,J)
END DO
END DO
END DO
Now the parallelizable loop encloses more work and will show better
performance. In practice, relatively few loops can be reordered in this way.
However, it does occasionally happen that several loops in a nest of loops are
candidates for parallelization. In such a case, it is usually best to parallelize
the outermost one.
Occasionally, the only loop available to be parallelized has a fairly small
amount of work. It may be worthwhile to force certain loops to run without
parallelism or to select between a parallel version and a serial version, on
the basis of the length of the loop.










