Managing Serviceguard Nineteenth Edition, Reprinted June 2011

Monitoring LAN Interfaces and Detecting Failure: Link Level

At regular intervals, determined by the NETWORK_POLLING_INTERVAL (see “Cluster Configuration

Parameters ” (page 105)) Serviceguard polls all the network interface cards specified in the cluster

configuration file. Network failures are detected within each single node in the following manner.

One interface on the node is assigned to be the poller. The poller will poll the other primary and

standby interfaces in the same bridged net on that node to see whether they are still healthy.

Normally, the poller is a standby interface; if there are no standby interfaces in a bridged net, the

primary interface is assigned the polling task. (Bridged nets are explained under “Redundant

Network Components ” (page 27) in Chapter 2.)

The polling interface sends LAN packets to all other interfaces in the node that are on the same

bridged net and receives packets back from them.

Whenever a LAN driver reports an error, Serviceguard immediately declares that the card is bad

and performs a local switch, if applicable. For example, when the card fails to send, Serviceguard

will immediately receive an error notification and it will mark the card as down. See “Reporting

Link-Level and IP-Level Failures” (page 73).

Serviceguard Network Manager also looks at the numerical counts of packets sent and received

on an interface to determine if a card is having a problem. There are two ways Serviceguard can

handle the counts of packets sent and received. In the cluster configuration file, choose one of the

following values for the NETWORK_FAILURE_DETECTION parameter:

NOTE: For a full discussion, see the white paper Serviceguard Network Manager: Inbound

Failure Detection Enhancement at http://www.hp.com/go/hpux-serviceguard-docs.

• INOUT: When both the inbound and outbound counts stop incrementing for a certain amount

of time, Serviceguard will declare the card as bad. (Serviceguard calculates the time depending

on the type of LAN card.) Serviceguard will not declare the card as bad if only the inbound

or only the outbound count stops incrementing. Both must stop. This is the default.

• INONLY_OR_INOUT: This option will also declare the card as bad if both inbound and

outbound counts stop incrementing. However, it will also declare it as bad if only the inbound

count stops.

This option is not suitable for all environments. Before choosing it, be sure these conditions

are met:

◦ All bridged nets in the cluster should have more than two interfaces each.

◦ Each primary interface should have at least one standby interface, and it should be

connected to a standby switch.

◦ The primary switch should be directly connected to its standby.

◦ There should be no single point of failure anywhere on all bridged nets.

NOTE: You can change the value of the NETWORK_FAILURE_DETECTION parameter while the

cluster is up and running.

Local Switching

A local network switch involves the detection of a local network interface failure and a failover to

the local backup LAN card (also known as the standby LAN card). The backup LAN card must not

have any IP addresses configured.

In the case of local network switch, TCP/IP connections are not lost for Ethernet, but IEEE 802.3

connections will be lost. For IPv4, Ethernet uses the ARP protocol, and HP-UX sends out an unsolicited

ARP to notify remote systems of address mapping between MAC (link level) addresses and IP level

addresses. IEEE 802.3 does not have the rearp function.

66 Understanding Serviceguard Software Components