Technical data

Work Quantum

Example 1: Loop Interchange

DO K = 1, N

DO I = 1, N

DO J = 1, N

A(I,J) = A(I,J) + B(I,K) * C(K,J)

END DO

Here you have several choices: parallelize the J loop or the I loop. You cannot

parallelize the K loop because different iterations of the K loop will all try to

read and write the same values of A(I,J). Try to parallelize the outermost DO

loop possible, because it encloses the most work. In this example, that is the

I loop. For this example, use the technique called loop interchange. Although

the parallelizable loops are not the outermost ones, you can reorder the loops

to make one of them outermost.

Thus, loop interchange would produce

C$DOACROSS LOCAL(I, J, K)

DO I = 1, N

DO K = 1, N

DO J = 1, N

A(I,J) = A(I,J) + B(I,K) * C(K,J)

END DO

Now the parallelizable loop encloses more work and will show better

performance. In practice, relatively few loops can be reordered in this way.

However, it does occasionally happen that several loops in a nest of loops are

candidates for parallelization. In such a case, it is usually best to parallelize

the outermost one.

Occasionally, the only loop available to be parallelized has a fairly small

amount of work. It may be worthwhile to force certain loops to run without

parallelism or to select between a parallel version and a serial version, on

the basis of the length of the loop.