HP Fortran Programmer's Guide (B3908-90031; September 2011)
Performance and optimization
Parallelizing HP Fortran programs
Chapter 6 167
In this loop, the order of execution does matter. The data used in iteration I is dependent upon the data that
was produced in the previous iteration (I-1). The array A would end up with very different data if the order
of execution were any other than 2-3-4-5. The data dependence in this loop thus makes it ineligible for
parallelization.
Not all data dependences inhibit parallelization. The following paragraphs discuss some of the exceptions.
Nested loops and matrices Some nested loops that operate on matrices may have a data dependence in the
inner loop only, allowing the outer loop to be parallelized. Consider the following:
DO I = 1, 10
DO J = 2, 100
A(J,I) = A(J-1,I) + 1
END DO
END DO
The data dependence in this nested loop occurs in the inner (J) loop: each row access of A(J,I) depends
upon the preceding row (J-1) having been assigned in the previous iteration. If the iterations of the J loop
were to execute in any other order than the one in which they would execute on a single processor, the
matrix would be assigned different values. The inner loop, therefore, must not be parallelized.
But no such data dependence appears in the outer loop: each column access is independent of every other
column access. Consequently, the compiler can safely distribute entire columns of the matrix to execute on
different processors; the data assignments will be the same regardless of the order in which the columns are
executed, so long as the rows execute in serial order.
Assumed dependences When analyzing a loop, the compiler may err on the safe side and assume that what
looks like a data dependence really is one and so not parallelize the loop. Consider the following:
DO I = 101, 200
A(I) = A(I-K)
END DO
The compiler will assume that a data dependence exists in this loop because it appears that data that has been
defined in a previous iteration is being used in a later iteration. On this assumption, the compiler will not
parallelize the loop.
However, if the value of K is 100, the dependence is assumed rather than real because A(I-K) is defined
outside the loop. If in fact this is the case, the programmer can insert one of the following directives
immediately before the loop, forcing the compiler to ignore any assumed dependences when analyzing the
loop for parallelization:
• DIR$ IVDEP
• FPP$ NODEPCHK
• VD$ NODEPCHK
For more information about these directives, see “Compatibility directives” on page 215.