Managing Serviceguard A.11.20, March 2013
. Although all nodes perform some cluster management functions, the cluster coordinator is the
central point for inter-node communication.
Configuring the Cluster
The system administrator sets up cluster configuration parameters and does an initial cluster startup;
thereafter, the cluster regulates itself without manual intervention in normal operation. Configuration
parameters for the cluster include the cluster name and nodes, networking parameters for the cluster
heartbeat, cluster lock information, and timing parameters (discussed in the chapter “Planning and
Documenting an HA Cluster ” (page 97)). You can set cluster parameters using Serviceguard
Manager or by editing the cluster configuration file (see Chapter 5: “Building an HA Cluster
Configuration” (page 163)). The parameters you enter are used to build a binary configuration file
which is propagated to all nodes in the cluster. This binary cluster configuration file must be the
same on all the nodes in the cluster.
Heartbeat Messages
Central to the operation of the cluster manager is the sending and receiving of heartbeat messages
among the nodes in the cluster. Each node in the cluster exchanges UDP heartbeat messages with
every other node over each monitored IP network configured as a heartbeat device. (LAN monitoring
is discussed later, in the section “Monitoring LAN Interfaces and Detecting Failure: Link Level”
(page 70).)
If a cluster node does not receive heartbeat messages from all other cluster nodes within the
prescribed time, a cluster re-formation is initiated; see “What Happens when a Node Times Out”
(page 93) . At the end of the re-formation, information about the new cluster membership is passed
to the package coordinator (described further in this chapter, in “How the Package Manager
Works” (page 50)). Failover packages that were running on nodes that are no longer in the new
cluster are transferred to their adoptive nodes.
If heartbeat and data are sent over the same LAN subnet, data congestion may cause Serviceguard
to miss heartbeats and initiate a cluster re-formation that would not otherwise have been needed.
For this reason, HP recommends that you dedicate a LAN for the heartbeat as well as configuring
heartbeat over the data network.
NOTE: You can no longer run the heartbeat on a serial (RS232) line or an FDDI or Token Ring
network.
Each node sends its heartbeat message at a rate calculated by Serviceguard on the basis of the
value of the MEMBER_TIMEOUT parameter, set in the cluster configuration file, which you create
as a part of cluster configuration.
IMPORTANT: When multiple heartbeats are configured, heartbeats are sent in parallel;
Serviceguard must receive at least one heartbeat to establish the health of a node. HP recommends
that you configure all subnets that connect cluster nodes as heartbeat networks; this increases
protection against multiple faults at no additional cost.
Heartbeat IP addresses are usually on the same subnet on each node, but it is possible to configure
a cluster that spans subnets; see “Cross-Subnet Configurations” (page 29).
For more information about heartbeat requirements, see the entry for HEARTBEAT_IP, under
“Cluster Configuration Parameters ”. For timeout requirements and recommendations, see the
MEMBER_TIMEOUT parameter description in the same section. For troubleshooting information,
see “Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low” (page 340). See also
“Cluster Daemon: cmcld” (page 41).
How the Cluster Manager Works 45