HP Fortran Programmer's Guide (March 2010)
Performance and optimization
Parallelizing HP Fortran programs
Chapter 6170
In this loop, the order of execution does matter. The data used in iteration I is dependent upon
the data that was produced in the previous iteration (I-1). The array A would end up with
very different data if the order of execution were any other than 2-3-4-5. The data dependence
in this loop thus makes it ineligible for parallelization.
Not all data dependences inhibit parallelization. The following paragraphs discuss some of
the exceptions.
Nested loops and matrices Some nested loops that operate on matrices may have a data
dependence in the inner loop only, allowing the outer loop to be parallelized. Consider the
following:
DO I = 1, 10
DO J = 2, 100
A(J,I) = A(J-1,I) + 1
END DO
END DO
The data dependence in this nested loop occurs in the inner (J) loop: each row access of A(J,I)
depends upon the preceding row (J-1) having been assigned in the previous iteration. If the
iterations of the J loop were to execute in any other order than the one in which they would
execute on a single processor, the matrix would be assigned different values. The inner loop,
therefore, must not be parallelized.
But no such data dependence appears in the outer loop: each column access is independent of
every other column access. Consequently, the compiler can safely distribute entire columns of
the matrix to execute on different processors; the data assignments will be the same
regardless of the order in which the columns are executed, so long as the rows execute in
serial order.
Assumed dependences When analyzing a loop, the compiler may err on the safe side and
assume that what looks like a data dependence really is one and so not parallelize the loop.
Consider the following:
DO I = 101, 200
A(I) = A(I-K)
END DO
The compiler will assume that a data dependence exists in this loop because it appears that
data that has been defined in a previous iteration is being used in a later iteration. On this
assumption, the compiler will not parallelize the loop.
However, if the value of K is 100, the dependence is assumed rather than real because A(I-K)
is defined outside the loop. If in fact this is the case, the programmer can insert one of the
following directives immediately before the loop, forcing the compiler to ignore any assumed
dependences when analyzing the loop for parallelization:
• DIR$ IVDEP