HP-MPI Version 2.3.1 for Linux Release Note

Table Of Contents

return from the MPI_Sendrecv_replace() call on commB if their partners are also

members of commA and are in the call to MPI_Comm_dup() call on commA. This

demonstrates the importance of using care when dealing with multiple communicators.

In this example, if the intersection of commA and commB is MPI_COMM_SELF, it is

simpler to write an application that does not deadlock during failure.

The use of the -ha:recover option is available only on HP hardware. Usage on

third-party hardware will result in an error message. On third-party systems, a failed

communicator can continue to be used for point-to-point communication, but no

recovery mechanism is available.

3.8.7 Network High Availability (-ha:net)

The net option to -ha turns on any network high availability. Network high availability

attempts to insulate an application from errors in the network. In this release, -ha:net

is only significant on IBV for OFED 1.2 or later, where Automatic Path Migration is

used. This option has no effect on TCP connections.

The use of the -ha:net option is available only on HP hardware. Usage on third-party

hardware results in an error message.

3.8.8 Failure Detection (-ha:detect)

When using the -ha:detect option, a communication failure is detected and prevents

interference with the application's ability to communicate with other processes that

have not been affected by the failure. In addition to specifying -ha:detect,

MPI_Errhandler must be set to MPI_ERRORS_RETURN using the

MPI_Comm_set_errhandler function. When an error is detected in a communication,

the error class MPI_ERR_EXITED is returned for the affected communication. Shared

memory is not used for communication between processes.

Only IBV and TCP are supported. This mode cannot be used with the diagnostic library.

3.8.9 Clarification of the Functionality of Completion Routines in High Availability Mode

Requests that cannot be completed because of network or process failures result in the

creation or completion functions returning with the error code MPI_ERR_EXITED.

When waiting or testing multiple requests using MPI_Testany(), MPI_Testsome(),

MPI_Waitany() or MPI_Waitsome(), a request that cannot be completed because

of network or process failures is considered a completed request and these routines

return with the flag or outcount argument set to non-zero. If some requests completed

successfully and some requests completed because of network or process failure, the

return value of the routine is MPI_ERR_IN_STATUS. The status array elements contain

MPI_ERR_EXITED for those requests that completed because of network or process

failure.

3.8 Expanded Functionality for -ha 21