Managing Serviceguard 12th Edition, March 2006

Understanding Serviceguard Software Components
How the Cluster Manager Works
Chapter 364
How the Cluster Manager Works
The cluster manager is used to initialize a cluster, to monitor the
health of the cluster, to recognize node failure if it should occur, and to
regulate the re-formation of the cluster when a node joins or leaves the
cluster. The cluster manager operates as a daemon process that runs on
each node. During cluster startup and re-formation activities, one node is
selected to act as the cluster coordinator. Although all nodes perform
some cluster management functions, the cluster coordinator is the
central point for inter-node communication.
Configuration of the Cluster
The system administrator sets up cluster configuration parameters and
does an initial cluster startup; thereafter, the cluster regulates itself
without manual intervention in normal operation. Configuration
parameters for the cluster include the cluster name and nodes,
networking parameters for the cluster heartbeat, cluster lock
information, and timing parameters (discussed in the chapter “Planning
and Documenting an HA Cluster” on page 131). Cluster parameters are
entered using Serviceguard Manager or by editing the cluster ASCII
configuration file (see Chapter 5, “Building an HA Cluster
Configuration,” on page 187). The parameters you enter are used to build
a binary configuration file which is propagated to all nodes in the cluster.
This binary cluster configuration file must be the same on all the nodes
in the cluster.
Heartbeat Messages
Central to the operation of the cluster manager is the sending and
receiving of heartbeat messages among the nodes in the cluster. Each
node in the cluster exchanges heartbeat messages with the cluster
coordinator over each monitored TCP/IP network or RS232 serial line
configured as a heartbeat device. (LAN monitoring is further discussed
later in the section “Monitoring LAN Interfaces and Detecting Failure”
on page 103)
If a cluster node does not receive heartbeat messages from all other
cluster nodes within the prescribed time, a cluster re-formation is
initiated. At the end of the re-formation, if a new set of nodes form a