HP MLIB User's Guide Vol. 2 7th Ed.

ManualsBrandsHP ManualsSoftwareHP-UX Performance Tools

131

132

133

134

135

136

137

138

139

140

646 HP MLIB User’s Guide

Parallel processing

else

call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,

f,ldf)

endif

c$omp end parallel

call omp_set_nested(.false.)

...

Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way

parallel: one OpenMP thread for

and another for

Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism

and run the code four-way parallel.

If a parallel LAPACK subprogram is called from a parallelized loop or region,

LAPACK will automatically avoid over-subscription of the CPUs. The number

of threads spawned by each call to a parallelized LAPACK subroutine on a

nested parallel region is limited by:

• MLIB_NUMBER_OF_THREADS

• The number of threads still available in the system

• will never be larger than four. Speciﬁcally:

MIN (MLIB_NUMBER_OF_THREADS, threads still available, 4)

Message passing-based nested parallelism

Nested parallelism can be achieved when calling LAPACK parallelized

subprograms from an MPI process. Consider the following code:

...

call mpi_init (ierr)

call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr)

if (myid.eq.0) then

call dgemm(‘n’, ‘n’, m, m, m, alpha, a, lda, b, ldb,

beta, c,ldc)

else

call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta,

f,ldf)

endif

...

C αAB βC+=

F αDE βF+=