HP MLIB User's Guide Vol. 1 7th Ed.

Chapter 1 Introduction to VECLIB 21

Parallel processing

endif

c$omp end parallel

call omp_set_nested(.false.)

...

Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way

parallel: one OpenMP thread for

and another for

Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism

and run the code four-way parallel.

If a parallel VECLIB subprogram is called from a parallelized loop or region,

VECLIB will automatically avoid over-subscription of the CPUs. The number of

threads spawned by each call to a parallelized VECLIB subroutine on a nested

parallel region is limited by:

• MLIB_NUMBER_OF_THREADS

• The number of threads still available in the system

• will never be larger than four. Speciﬁcally:

MIN (MLIB_NUMBER_OF_THREADS, threads still available, 4)

Message passing-based nested parallelism

Nested parallelism can be achieved when calling VECLIB parallelized

subprograms from an MPI process. (See “Parallelized subprograms in VECLIB”

on page 1104.) Consider the following code:

...

call mpi_init (ierr)

call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)

if (myid.eq.0) then

call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb, beta,

c,ldc)

else

call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,

f,ldf)

endif

...

C αAB βC+=

F αDE βF+=