Managing Serviceguard 12th Edition, March 2006

Understanding Serviceguard Software Components
Responses to Failures
Chapter 3 127
Responses to Hardware Failures
If a serious system problem occurs, such as a system panic or physical
disruption of the SPU's circuits, Serviceguard recognizes a node failure
and transfers the failover packages currently running on that node to an
adoptive node elsewhere in the cluster. (System multi-node and
multi-node packages do not failover.)
The new location for each failover package is determined by that
package's configuration file, which lists primary and alternate nodes for
the package. Transfer of a package to another node does not transfer the
program counter. Processes in a transferred package will restart from
the beginning. In order for an application to be expeditiously restarted
after a failure, it must be “crash-tolerant”; that is, all processes in the
package must be written so that they can detect such a restart. This is
the same application design required for restart after a normal system
crash.
In the event of a LAN interface failure, a local switch is done to a
standby LAN interface if one exists. If a heartbeat LAN interface fails
and no standby or redundant heartbeat is configured, the node fails with
a TOC. If a monitored data LAN interface fails without a standby, the
node fails with a TOC only if NODE_FAILFAST_ENABLED (described
further in “Package Configuration Planning” on page 164) is set to YES
for the package. Otherwise any packages using that LAN interface will
be halted and moved to another node if possible.
Disk protection is provided by separate products, such as Mirrordisk/UX
in LVM or VERITAS mirroring in VxVM and CVM. In addition,
separately available EMS disk monitors allow you to notify operations
personnel when a specific failure, such as a lock disk failure, takes place.
Refer to the manual Using High Availablity Monitors (HP part number
B5736-90042) for additional information.
Serviceguard does not respond directly to power failures, although a loss
of power to an individual cluster component may appear to Serviceguard
like the failure of that component, and will result in the appropriate
switching behavior. Power protection is provided by HP-supported
uninterruptible power supplies (UPS), such as HP PowerTrust.