HP Fortran Programmer Guide (766160-001, March 2014)
NOTE: A subroutine (but not a function) is always expected to have side effects. If you apply this
directive to a subroutine call, the optimizer assumes that the call has no effect on program results
and can eliminate the call to improve performance.
Indeterminate iteration counts
If the compiler finds that a runtime determination of a loop's iteration count cannot be made before
the loop starts to execute, the compiler will not parallelize the loop. The reason for this precaution
is that the runtime code must know the iteration count in order to determine how many iterations
to distribute to the executing processors.
The following conditions can prevent a runtime count:
• The loop is a DO-foreverconstruct.
• An EXITstatement appears in the loop.
• The loop contains a conditional GO TOstatement that exits from the loop.
• The loop modifies either the loop-control or loop-limit variable.
• The loop is a DO WHILEconstruct and the condition being tested is defined within the loop.
Data dependences
When a loop is parallelized, the iterations are executed independently on different processors,
and the order of execution will differ from the serial order when executing on a single processor.
This difference is not a problem if the iterations can occur in any order with no effect on the results.
Consider the following loop:
DO I = 1, 5
A(I) = A(I) * B(I)
END DO
In this example, the array A will always end up with the same data regardless of whether the order
of execution is 1-2-3-4-5, 5-4-3-2-1, 3-1-4-5-2, or any other order. The independence of each
iteration from the others makes the loop an eligible candidate for parallel execution.
Such is not the case in the following:
DO I = 2, 5
A(I) = A(I-1) * B(I)
END DO
In this loop, the order of execution does matter. The data used in iteration Iis dependent upon the
data that was produced in the previous iteration (I-1). The array A would end up with very
different data if the order of execution were any other than 2-3-4-5. The data dependence in this
loop thus makes it ineligible for parallelization.
Not all data dependences inhibit parallelization. The following paragraphs discuss some of the
exceptions.
Nested loops and matrices Some nested loops that operate on matrices may have a data
dependence in the inner loop only, allowing the outer loop to be parallelized. Consider the
following:
DO I = 1, 10
DO J = 2, 100
A(J,I) = A(J-1,I) + 1
END DO
END DO
The data dependence in this nested loop occurs in the inner (J) loop: each row access of
A(J,I)depends upon the preceding row (J-1)having been assigned in the previous iteration.
If the iterations of the J loop were to execute in any other order than the one in which they would
execute on a single processor, the matrix would be assigned different values. The inner loop,
therefore, must not be parallelized.
102 Performance and optimization