HP MLIB User's Guide Vol. 2 7th Ed.

Chapter 8 Introduction to LAPACK 645
Parallel processing
Call any parallelized LAPACK subprogram. Let it use parallelism internally
if it determines that it is appropriate to do so, based on such factors as
problem size, system configuration, and user environment.
Call LAPACK subprograms in a parallelized loop or region. LAPACK
supports nested parallelism where the outer parallelism is implemented
through OpenMP while the inner parallelism is implemented with LAPACK
SMP parallelism. To use this mechanism, you must be familiar with the
techniques of parallel processing. Refer to the Parallel Programming Guide
for HP-UX Systems for details.
Use the MPI explicit parallel model. Refer to the HP MPI User’s Guide and
the MPI(1) man page for details.
LAPACK subprograms are reentrant, meaning that they can be called several
times in parallel to do independent computations without one call interfering
with another. You can use this feature to call LAPACK subprograms in a
parallelized loop or region.
The compiler does not automatically parallelize loops containing a function
reference or subroutine call. You can force it to parallelize such a loop by
defining OpenMP parallel regions.
For example, the following Fortran code makes parallel calls to subprogram
DAXPY:
NTHREADS = 4
C$OMP PARALLEL DO NUM_THREADS(NTHREADS)
DO J=1, N
CALL DAXPY (N-I,A(I,J),A(I+1,I),1,A(I+1,J),1)
ENDO
C$OMP END PARALLEL DO
While optimizing a parallel program, you may want to make parallel calls to a
LAPACK subprogram to execute independent operations where the call
statements are not in a loop. OpenMP supports the PARALLEL and END
PARALLEL directions that define a block of code that is to be executed by
multiple threads in parallel.
OpenMP-based nested parallelism
Nested parallelism can be achieved when calling LAPACK parallelized
subprograms from an OpenMP parallel region. Consider the following code
running on an HP platform with at least four processors:
...
call omp_set_nested (.true.)
c$omp parallel NUM_THREADS(2)
myid = omp_get_thread_num
if (myid.eq.0) then
call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb,
beta, c,ldc)