HP-MPI Version 2.3.1 for Linux Release Note

Table Of Contents
than the largest communicator the rank can join, it returns MPI_COMM_NULL. However,
extensive communication failures, such as a failed switch, can make such knowledge
unattainable to a rank and result in splitting the communicator.
If the communicator returned by rank A contains rank B, then either the communicator
return by ranks A and B will be identical or rank B will return MPI_COMM_NULL and
any attempt by rank A to communicate with rank B immediately returns
MPI_ERR_EXITED. Therefore, any legal use of communicator return by
MPI_Comm_dup() should not result in a deadlock. Members of the resulting
communicator either agree to membership or are unreachable to all members. Any
attempt to communicate with unreachable members results in a failure.
Interruptible Collectives
When a failure (host, process, or interconnect) that affects a collective operation occurs,
at least one rank calling the collective returns with an error. The application must
initiate a recovery from those ranks by calling MPI_Comm_dup() on the communicator
used by the failed collective. This ensures that all other ranks within the collective also
exit the collective. Some ranks might exit successfully from a collective call while other
ranks do not. Ranks which exit with MPI_SUCCESS will have successfully completed
their role in the operation, and any output buffers will be correctly set. The return value
of MPI_SUCCESS does not indicate that all ranks have successfully completed their
role in the operation.
After a failure, one or more ranks must call MPI_Comm_dup(). All future
communication on that communicator results in failure for all ranks until each rank
has called MPI_Comm_dup() on the communicator. After all ranks have called
MPI_Comm_dup(), the parent communicator can be used for point-to-point
communication. MPI_Comm_dup() can be called successfully even after a failure.
Because the results of a collective call can vary by rank, ensure that an application is
written to avoid deadlocks. For example, using multiple communicators can be very
difficult as the following code demonstrates:
...
err = MPI_Bcast(buffer, len, type, root, commA);
if (err) {
MPI_Error_class(err, &class);
if (class == MPI_ERR_EXITED) {
err = MPI_Comm_dup(commA, &new_commA);
if (err != MPI_SUCCESS) {
cleanup_and_exit();
}
MPI_Comm_free(commA);
commA = new_commA;
}
}
err = MPI_Sendrecv_replace(buffer2, len2, type2, src, tag1, dest, tag2, commB, &status);
if (err) {
....
...
In this case, some ranks exit successfully from the MPI_Bcast() and move onto the
MPI_Sendrecv_replace() operation on a different communicator. The ranks that
call MPI_Comm_dup() only cause operations on commA to fail. Some ranks cannot
20 New or Changed Features in V2.3