HP MLIB User's Guide Vol. 1 7th Ed.

Chapter 1 Introduction to VECLIB 21
Parallel processing
endif
c$omp end parallel
call omp_set_nested(.false.)
...
Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way
parallel: one OpenMP thread for
and another for
Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism
and run the code four-way parallel.
If a parallel VECLIB subprogram is called from a parallelized loop or region,
VECLIB will automatically avoid over-subscription of the CPUs. The number of
threads spawned by each call to a parallelized VECLIB subroutine on a nested
parallel region is limited by:
MLIB_NUMBER_OF_THREADS
The number of threads still available in the system
will never be larger than four. Specifically:
MIN (MLIB_NUMBER_OF_THREADS, threads still available, 4)
Message passing-based nested parallelism
Nested parallelism can be achieved when calling VECLIB parallelized
subprograms from an MPI process. (See “Parallelized subprograms in VECLIB”
on page 1104.) Consider the following code:
...
call mpi_init (ierr)
call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)
if (myid.eq.0) then
call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta,
c,ldc)
else
call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,
f,ldf)
endif
...
C αAB βC+=
F αDE βF+=