HP-MPI Version 2.3.1 for Linux Release Note

ManualsBrandsHP ManualsSoftwareHP-UX Performance Tools

Table Of Contents

than the largest communicator the rank can join, it returns MPI_COMM_NULL. However,

extensive communication failures, such as a failed switch, can make such knowledge

unattainable to a rank and result in splitting the communicator.

If the communicator returned by rank A contains rank B, then either the communicator

return by ranks A and B will be identical or rank B will return MPI_COMM_NULL and

any attempt by rank A to communicate with rank B immediately returns

MPI_ERR_EXITED. Therefore, any legal use of communicator return by

MPI_Comm_dup() should not result in a deadlock. Members of the resulting

communicator either agree to membership or are unreachable to all members. Any

attempt to communicate with unreachable members results in a failure.

Interruptible Collectives

When a failure (host, process, or interconnect) that affects a collective operation occurs,

at least one rank calling the collective returns with an error. The application must

initiate a recovery from those ranks by calling MPI_Comm_dup() on the communicator

used by the failed collective. This ensures that all other ranks within the collective also

exit the collective. Some ranks might exit successfully from a collective call while other

ranks do not. Ranks which exit with MPI_SUCCESS will have successfully completed

their role in the operation, and any output buffers will be correctly set. The return value

of MPI_SUCCESS does not indicate that all ranks have successfully completed their

role in the operation.

After a failure, one or more ranks must call MPI_Comm_dup(). All future

communication on that communicator results in failure for all ranks until each rank

has called MPI_Comm_dup() on the communicator. After all ranks have called

MPI_Comm_dup(), the parent communicator can be used for point-to-point

communication. MPI_Comm_dup() can be called successfully even after a failure.

Because the results of a collective call can vary by rank, ensure that an application is

written to avoid deadlocks. For example, using multiple communicators can be very

difficult as the following code demonstrates:

...

err = MPI_Bcast(buffer, len, type, root, commA);

if (err) {

MPI_Error_class(err, &class);

if (class == MPI_ERR_EXITED) {

err = MPI_Comm_dup(commA, &new_commA);

if (err != MPI_SUCCESS) {

cleanup_and_exit();

}

MPI_Comm_free(commA);

commA = new_commA;

}

err = MPI_Sendrecv_replace(buffer2, len2, type2, src, tag1, dest, tag2, commB, &status);

if (err) {

....

...

In this case, some ranks exit successfully from the MPI_Bcast() and move onto the

MPI_Sendrecv_replace() operation on a different communicator. The ranks that

call MPI_Comm_dup() only cause operations on commA to fail. Some ranks cannot

20 New or Changed Features in V2.3