HP-MPI User's Guide (11th Edition)

Tuning
Message latency and bandwidth
Chapter 5 185
Message latency and bandwidth
Latency is the time between the initiation of the data transfer in the
sending process and the arrival of the first byte in the receiving process.
Latency is often dependent upon the length of messages being sent. An
application’s messaging behavior can vary greatly based upon whether a
large number of small messages or a few large messages are sent.
Message bandwidth is the reciprocal of the time needed to transfer a
byte. Bandwidth is normally expressed in megabytes per second.
Bandwidth becomes important when message sizes are large.
To improve latency or bandwidth or both:
Reduce the number of process communications by designing
applications that have coarse-grained parallelism.
Use derived, contiguous data types for dense data structures to
eliminate unnecessary byte-copy operations in certain cases. Use
derived data types instead of MPI_Pack and MPI_Unpack if possible.
HP-MPI optimizes noncontiguous transfers of derived data types.
Use collective operations whenever possible. This eliminates the
overhead of using MPI_Send and MPI_Recv each time when one
process communicates with others. Also, use the HP-MPI collectives
rather than customizing your own.
Specify the source process rank whenever possible when calling
MPI routines. Using MPI_ANY_SOURCE may increase latency.
Double-word align data buffers if possible. This improves byte-copy
performance between sending and receiving processes because of
double-word loads and stores.
•Use MPI_Recv_init and MPI_Startall instead of a loop of
MPI_Irecv calls in cases where requests may not complete
immediately.
For example, suppose you write an application with the following
code section:
j = 0
for (i=0; i<size; i++) {
if (i==rank) continue;
MPI_Irecv(buf[i], count, dtype, i, 0, comm, &requests[j++]);