HP-MPI Version 2.3.1 for Linux Release Note

Table Of Contents
3.8.4 Using MPI_Comm_disconnect
In high availability mode, MPI_Comm_disconnect is collective only across the local
group of the calling process. This enables a process group to independently break a
connection to the remote group in an intercommunicator without synchronizing with
those processes. Unreceived messages on the remote side are buffered and might be
received until the remote side calls MPI_Comm_disconnect.
Receive calls that cannot be satisfied by a buffered message fail on the remote processes
after the local processes have called MPI_Comm_disconnect. Send calls on either side
of the intercommunicator fail after either side has called MPI_Comm_disconnect.
3.8.5 Instrumentation and High Availability Mode
HP-MPI lightweight instrumentation is now supported when using -ha and singletons.
In the event that some ranks terminate during or before MPI_Finalize(), then the
lowest rank id in MPI_COMM_WORLD produces the instrumentation output file on behalf
of the application and instrumentation data for the exited ranks is not included. For
other enhancements to instrumentation in this release, see “Expanded Lightweight
Instrumentation” (page 23).
The use of -ha and -i is available only on HP hardware. Usage on third-party hardware
results in an error message.
3.8.6 Failure Recover (-ha:recover)
Fault-Tolerant MPI_Comm_dup() That Excludes Failed Ranks
When using -ha:recover, the functionality of MPI_Comm_dup() enables an
application to recover from errors.
IMPORTANT: The MPI_Comm_dup() function is not standard compliance because a
call to MPI_Comm_dup() always terminates all outstanding communications with
failures on the communicator regardless of the presence or absence of errors.
When one or more pairs of ranks within a communicator are unable to communicate
because a rank has exited or the communication layers have returned errors, a call to
MPI_Comm_dup attempts to return the largest communicator containing ranks that
were fully interconnected at some point during the MPI_Comm_dup call. Because new
errors can occur at any time, the returned communicator might not be completely error
free. However, the two ranks in the original communicator that were unable to
communicate before the call are not included in a communicator generated by
MPI_Comm_dup.
Communication failures can partition ranks into two groups, A and B, so that no rank
in group A can communicate to any rank in group B and vice versa. A call to
MPI_Comm_dup() can behave similarly to a call to MPI_Comm_split(), returning
different legal communicators to different callers. When a larger communicator exists
3.8 Expanded Functionality for -ha 19