HP MLIB User's Guide Vol. 2 7th Ed.
646 HP MLIB User’s Guide
Parallel processing
else
call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,
f,ldf)
endif
c$omp end parallel
call omp_set_nested(.false.)
...
Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way
parallel: one OpenMP thread for
and another for
Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism
and run the code four-way parallel.
If a parallel LAPACK subprogram is called from a parallelized loop or region,
LAPACK will automatically avoid over-subscription of the CPUs. The number
of threads spawned by each call to a parallelized LAPACK subroutine on a
nested parallel region is limited by:
• MLIB_NUMBER_OF_THREADS
• The number of threads still available in the system
• will never be larger than four. Specifically:
MIN (MLIB_NUMBER_OF_THREADS, threads still available, 4)
Message passing-based nested parallelism
Nested parallelism can be achieved when calling LAPACK parallelized
subprograms from an MPI process. Consider the following code:
...
call mpi_init (ierr)
call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)
if (myid.eq.0) then
call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb,
beta, c,ldc)
else
call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,
f,ldf)
endif
...
C αAB βC+=
F αDE βF+=