HP-MPI User's Guide (11th Edition)

Tuning
Message latency and bandwidth
Chapter 5186
}
MPI_Waitall(size-1, requests, statuses);
Suppose that one of the iterations through MPI_Irecv does not
complete before the next iteration of the loop. In this case, HP-MPI
tries to progress both requests. This progression effort could continue
to grow if succeeding iterations also do not complete immediately,
resulting in a higher latency.
However, you could rewrite the code section as follows:
j = 0
for (i=0; i<size; i++) {
if (i==rank) continue;
MPI_Recv_init(buf[i], count, dtype, i, 0, comm,
&requests[j++]);
}
MPI_Startall(size-1, requests);
MPI_Waitall(size-1, requests, statuses);
In this case, all iterations through MPI_Recv_init are progressed
just once when MPI_Startall is called. This approach avoids the
additional progression overhead when using MPI_Irecv and can
reduce application latency.