HP-MPI User's Guide (11th Edition)

Tuning
MPI routine selection
Chapter 5 191
MPI routine selection
To achieve the lowest message latencies and highest message
bandwidths for point-to-point synchronous communications, use the MPI
blocking routines MPI_Send and MPI_Recv. For asynchronous
communications, use the MPI nonblocking routines MPI_Isend and
MPI_Irecv.
When using blocking routines, try to avoid pending requests. MPI must
advance nonblocking messages, so calls to blocking receives must
advance pending requests, occasionally resulting in lower application
performance.
For tasks that require collective operations, use the appropriate MPI
collective routine. HP-MPI takes advantage of shared memory to
perform efficient data movement and maximize your application’s
communication performance.