Managing Serviceguard Eighteenth Edition, September 2010

ManualsBrandsHP ManualsSoftwareHP Serviceguard Quorum Software

111

112

113

114

115

116

117

118

119

120

SystemB recognizes that it has failed to get the cluster lock and so cannot re-form the

cluster. To release all resources related to Package2 (such as exclusive access to volume

group vg02 and the Package2 IP address) as quickly as possible, SystemB halts

(system reset).

NOTE: If AUTOSTART_CMCLD in /etc/rc.config.d/cmcluster

($SGAUTOSTART) is set to zero, the node will not attempt to join the cluster when it

comes back up.

For more information on cluster failover, see the white paper Optimizing Failover Time

in a Serviceguard Environment (version A.11.19 and later) at www.hp.com/go/

hpux-serviceguard-docs. For troubleshooting information, see “Cluster

Re-formations Caused by MEMBER_TIMEOUT Being Set too Low” (page 415).

Responses to Hardware Failures

If a serious system problem occurs, such as a system panic or physical disruption of

the SPU's circuits, Serviceguard recognizes a node failure and transfers the failover

packages currently running on that node to an adoptive node elsewhere in the cluster.

(System multi-node and multi-node packages do not fail over.)

The new location for each failover package is determined by that package's configuration

file, which lists primary and alternate nodes for the package. Transfer of a package to

another node does not transfer the program counter. Processes in a transferred package

will restart from the beginning. In order for an application to be swiftly restarted after

a failure, it must be “crash-tolerant”; that is, all processes in the package must be written

so that they can detect such a restart. This is the same application design required for

restart after a normal system crash.

In the event of a LAN interface failure, a local switch is done to a standby LAN interface

if one exists. If a heartbeat LAN interface fails and no standby or redundant heartbeat

is configured, the node fails with a system reset. If a monitored data LAN interface

fails without a standby, the node fails with a system reset only if node_fail_fast_enabled

(page 290) is set to YES for the package. Otherwise any packages using that LAN interface

will be halted and moved to another node if possible (unless the LAN recovers

immediately; see “When a Service, Subnet, or Monitored Resource Fails, or a

Dependency is Not Met” (page 85)).

Disk protection is provided by separate products, such as Mirrordisk/UX in LVM or

Veritas mirroring in VxVM and related products. In addition, separately available EMS

disk monitors allow you to notify operations personnel when a specific failure, such

as a lock disk failure, takes place. Refer to the manual Using High Availability Monitors,

which you can find at the address given in the preface to this manual.

Serviceguard does not respond directly to power failures, although a loss of power to

an individual cluster component may appear to Serviceguard like the failure of that

118 Understanding Serviceguard Software Components