Technical data

92
Chapter 5: Fortran Enhancements for Multiprocessors
Example 2: Conditional Parallelism
J = (N/4) * 4
DO I = J+1, N
A(I) = A(I) + X*B(I)
END DO
DO I = 1, J, 4
A(I) = A(I) + X*B(I)
A(I+1) = A(I+1) + X*B(I+1)
A(I+2) = A(I+2) + X*B(I+2)
A(I+3) = A(I+3) + X*B(I+3)
END DO
Here you are using loop unrolling of order four to improve speed. For the
rst loop, the number of iterations is always fewer than four, so this loop
does not do enough work to justify running it in parallel. The second loop is
worthwhile to parallelize if N is big enough. To overcome the parallel loop
overhead, N needs to be around 50.
An optimized version would use the IF clause on the DOACROSS directive:
J = (N/4) * 4
DO I = J+1, N
A(I) = A(I) + X*B(I)
END DO
C$DOACROSS IF (J.GE.50), LOCAL(I)
DO I = 1, J, 4
A(I) = A(I) + X*B(I)
A(I+1) = A(I+1) + X*B(I+1)
A(I+2) = A(I+2) + X*B(I+2)
A(I+3) = A(I+3) + X*B(I+3)
END DO
ENDIF