HP-MPI Version 2.2.7 for Linux Release Note
striped over all the connections. When one of the connections is broken, a warning is issued, but
HP-MPI continues to use the rest of the healthy connections to transfer messages. If all the
connections are broken, HP-MPI issues an error message.
1.2.7.9 InfiniBand port failover support
A multi-port InfiniBand channel adapter can use Automatic Path Migration (APM) to provide
network high availability. APM is defined by the InfiniBand Architecture Specification, and
enables HP-MPI to enable recovery from network failures by specifying and using the alternate
paths in the network. The InfiniBand subnet manager defines one of the server links as primary
and one as redundant/alternate. When the primary link fails, the channel adapter automatically
redirects traffic to the redundant path when a link failure is detected. This support is provided
by the InfiniBand driver available in OFED 1.2 and later releases. Redirection and reissued
communications are transparently performed to applications running on the cluster.
For this release, the user has to explicitly enable APM by setting the environment variable
MPI_HA_NW_PORT_FAILOVER=1, as in the following example:
% /opt/hpmpi/bin/mpirun -np 4 -prot -e MPI_HA_NW_PORT_FAILOVER=1
-hostlist nodea,nodeb,nodec,noded /my/dir/hello_world
Figure 1-1 IB Port Failover
21
Alternate PathPrimary Path
When the MPI_HA_NW_PORT_FAILOVER environment variable is set, HP-MPI identifies and
specifies the primary (1) and the alternate (2) paths (if available) when it sets up the
communication channels between the ranks. See Figure 1-1. MPI_HA_NW_PORT_FAILOVER also
requests the InfiniBand driver to load the alternate path (2) for a potential path migration if a
network failure occurs. When a network failure occurs, the InfiniBand driver automatically
transitions to the alternate path (2), notifies HP-MPI of the path migration, and continues the
network communication on the alternate path (2). At this point, HP-MPI also reloads the original
primary path (1) as the new alternate path. If this new alternate path (1) is restored, this enables
the InfiniBand driver to automatically migrate to this path in case of future failures on the new
primary path (2). However, if the new alternate path (1) is not restored, or if alternate paths are
unavailable on the same card, future failures force HP-MPI to attempt to failover to alternate
cards if available. All of these operations are performed transparent to the application that uses
HP-MPI.
If the environment has multiple cards, with multiple ports per card, and has APM enabled,
HP-MPI gives InfiniBand port failover priority over card failover.
14 Information About This Release