Serviceguard Network Manager: Inbound Failure Detection, March 2007
The primary and standby NICs, lan0 and lan1, send polling packets to each other through the
cascading cable (see Figure 2). Assume that the cascading cable gets disconnected. Now each NIC
can send but cannot receive any poll packets. There is no other incoming traffic that can increment the
inbound statistics at the time. Using the default setting of INOUT, Serviceguard Network Manager
detects the failure. However, Serviceguard Network Manager takes no action and logs no errors.
Routers direct clients to network interfaces that are connected and give them access to applications. If
Serviceguard Network Manager is configured to immediately fail over when the cable is
disconnected, both primary and standby NICs fail. If this is the only heartbeat network, both nodes
would experience a transfer-of-control resulting in loss of the cluster. Refer to the Safeguard Network
Manager manual for more information.
The enhanced Serviceguard Network Manager polling mechanism determines whether to fail over,
and help avoid NIC failure and similar problems. This “full polling” mechanism sends poll packets to
all network interfaces on the same bridged network in the cluster and waits for any response. Full
polling is described in more detail in the following section, “Algorithm for inbound failure detection
method.”
Figure 3. An illustration of what happens when cascaded cable is broken if the INONLY_OR_INOUT setting for failure
detection is applied
Algorithm for inbound failure detection method
With the INONLY_OR_INOUT network failure detection setting, Serviceguard Network Manager
monitors the status of NICs by sending polling messages, just as it does with the existing default
method. This polling mechanism generates reliable traffic, which Serviceguard Network Manager
uses to track inbound and outbound statistics. If the statistics stop incrementing for a period of time,
Serviceguard Network Manager determines what actions to take next.
If the user sets NETWORK_FAILURE_DETECTION to INOUT, the existing method is used. With
inbound-only failures, Serviceguard Network Manager does not recognize the NIC as failed, and it
does not begin a failover.
If the user sets NETWORK_FAILURE_DETECTION to INONLY_OR_INOUT, a different method is to
determine whether or not to mark a network interface as down. .When Serviceguard Network
Manager detects that the inbound traffic of a NIC has stopped, it does the following:
1. Serviceguard Network Manager waits for a predetermined amount of time, based on the NIC
type.
5