Technical data

Parallel Programming Exercise
123
both FORCE(I,1) and FORCE(J,1). There is no certainty that I and J will ever
be the same, so you cannot directly parallelize the outer loop. The uses of
FORCE look similar to sum reductions but are not quite the same. A likely
x is to use a technique similar to sum reduction.
In analyzing this, notice that the inner loop runs from 1 up to I1. Therefore,
J is always less than I, and so the various references to FORCE do not
overlap with iterations of the inner loop. Thus the various FORCE(J,*)
references would not cause a problem if you were parallelizing the inner
loop.
Further, the FORCE(I,*) references are simply sum reductions with respect
to the inner loop (see Debugging Parallel Fortran on page 110 Example 4,
for information on modifying this loop with a reduction transformation). It
appears you can parallelize the inner loop. This is a valuable fallback
position should you be unable to parallelize the outer loop.
But the idea is still to parallelize the outer loop. Perhaps sum reductions
might do the trick. However, remember round-off error: accumulating
partial sums gives different answers from the original because the precision
nature computer arithmetic is limited. Depending on your requirements,
sum reduction may not be the answer. The problem seems to center around
FORCE, so try pulling those statements entirely out of the loop.
Step 4: Rewrite
Rewrite the loop as follows; changes are noted in bold.
SUBROUTINE CALC(NUM_ATOMS,ATOMS,FORCE,THRESHOLD, WEIGHT)
IMPLICIT NONE
INTEGER MAX_ATOMS
PARAMETER(MAX_ATOMS = 1000)
INTEGER NUM_ATOMS
DOUBLE PRECISION ATOMS(MAX_ATOMS,3), FORCE(MAX_ATOMS,3)
DOUBLE PRECISION THRESHOLD, WEIGHT(MAX_ATOMS)
LOGICAL FLAGS(MAX_ATOMS,MAX_ATOMS)
DOUBLE PRECISION DIST_SQ(3), TOTAL_DIST_SQ
DOUBLE PRECISION THRESHOLD_SQ
INTEGER I, J
THRESHOLD_SQ = THRESHOLD ** 2
C$DOACROSS LOCAL(I,J,DIST_SQ,TOTAL_DIST_SQ)