Managing Serviceguard 12th Edition, March 2006

Understanding Serviceguard Software Components
Responses to Failures
Chapter 3126
Responses to Failures
Serviceguard responds to different kinds of failures in specific ways. For
most hardware failures, the response is not user-configurable, but for
package and service failures, you can choose the systems response,
within limits.
Transfer of Control (TOC) When a Node Fails
The most dramatic response to a failure in a Serviceguard cluster is an
HP-UX TOC (Transfer of Control), which is an immediate halt of the
SPU without a graceful shutdown. This TOC is done to protect the
integrity of your data.
A TOC is done if a cluster node cannot communicate with the majority of
cluster members for the predetermined time, or if there is a kernel hang,
a kernel spin, a runaway real-time process, or if the Serviceguard cluster
daemon,
cmcld
, fails. During this event, a system dump is performed and
the following message is sent to the console:
Serviceguard: Unable to maintain contact with cmcld daemon.
Performing TOC to ensure data integrity.
A TOC is also initiated by Serviceguard itself under specific
circumstances. If the service failfast parameter is enabled in the package
configuration file, the entire node will fail with a TOC whenever there is
a failure of that specific service. If NODE_FAIL_FAST_ENABLED is set to
YES in the package configuration file, the entire node will fail with a TOC
whenever there is a timeout or a failure causing the package control
script to exit with a value other than 0 or 1. In addition, a node-level
failure may also be caused by events independent of a package and its
services. Loss of the heartbeat or loss of the cluster daemon (cmcld) or
other critical daemons will cause a node to fail even when its packages
and their services are functioning.
In a very few cases, an attempt is first made to reboot the system prior to
the TOC. If the reboot is able to complete before the safety timer expires,
then the TOC will not take place. In either case, packages are able to
move quickly to another node.