Product specifications

Table Of Contents

5–Using QLogic MPI

Performance Tuning

5-22 IB6054601-00 H

Use the taskset utility with mpirun to specify the mapping of MPI processes to

logical processors. This combination makes the best use of available memory

bandwidth or cache locality when running on dual-core Symmetric

MultiProcessing (SMP) cluster nodes.

The following example uses the NASA Advanced Supercomputing (NAS) Parallel

Benchmark’s Multi-Grid (MG) benchmark and the -c option to taskset.

$ mpirun -np 4 -ppn 2 -m $hosts taskset -c 0,2 bin/mg.B.4

$ mpirun -np 4 -ppn 2 -m $hosts taskset -c 1,3 bin/mg.B.4

The first command forces the programs to run on CPUs (or cores) 0 and 2. The

second command forces the programs to run on CPUs 1 and 3. See the taskset

man page for more information on usage.

To turn off CPU affinity, set the environment variable IPATH_NO_CPUAFFINITY.

This environment variable is propagated to node programs by mpirun.

mpirun Tunable Options

There are some mpirun options that can be adjusted to optimize communication.

The most important one is:

-long-len, -L [default: 64000]

This option determines the length of the message that the rendezvous protocol

(instead of the eager protocol) must use. The default value for -L was chosen for

optimal unidirectional communication. Applications that have this kind of traffic

pattern benefit from this higher default value. Other values for -L are appropriate

for different communication patterns and data size. For example, applications that

have bidirectional traffic patterns may benefit from using a lower value.

Experimentation is recommended.

Two other options that are useful are:

-long-len-shmem, -s [default: 16000]

This option determines the length of the message within the rendezvous protocol

(instead of the eager protocol) to be used for intra-node communications. This

option is for messages going through shared memory. The InfiniPath rendezvous

messaging protocol uses a two-way handshake (with MPI synchronous send

semantics) and receive-side DMA.

-rndv-window-size, -W [default: 262144]

When sending a large message using the rendezvous protocol, QLogic MPI splits

it into a number of fragments at the source and recombines them at the

destination. Each fragment is sent as a single rendezvous stage. This option

specifies the maximum length of each fragment. The default is 262144 bytes.

For more information on tunable options, type:

$ mpirun -h