Arbitration For Data Integrity in Serviceguard Clusters, July 2007

Arbitration for Data Integrity in Serviceguard Clusters
How Serviceguard Uses Arbitration
12
How Serviceguard Uses Arbitration
Serviceguard employs a lock disk, a quorum server, or arbitrator nodes to
provide definitive arbitration to prevent split-brain conditions. This
section describes how the software handles cluster formation and
re-formation and supplies arbitration when necessary.
Cluster Startup
The cluster manager is used to initialize a cluster, to monitor the
health of the cluster, to recognize node failure if it should occur, and to
regulate the re-formation of the cluster when a node joins or leaves the
cluster. The cluster manager operates as a daemon process that runs on
each node. During cluster startup and re-formation activities, one node is
selected to act as the cluster coordinator. Although all nodes perform
some cluster management functions, the cluster coordinator is the
central point for inter-node communication.
Startup and Re-Formation
The cluster can start when the cmruncl command is issued from the
command line. All nodes in the cluster must be present for cluster
startup to complete. If all nodes are not present, then the cluster must be
started by issuing commands that specify only a specific group of nodes.
This is to ensure that we do not create a split-brain situation.
Cluster re-formation occurs any time a node joins or leaves a running
cluster. This can follow the reboot of an individual node, or it may be
when all nodes in a cluster have failed, as when there has been an
extended power failure and all SPUs went down.
Automatic cluster startup will take place if the flag AUTOSTART_CMCLD is
set to 1 in the /etc/rc.config.d/cmcluster file. When any node
reboots with this parameter set to 1, it will rejoin an existing cluster, or if
none exists it will attempt to form a new cluster. As with the cmruncl
command, automatic initial startup requires all nodes in the cluster to be
present. If all nodes are not present, the cluster must be started with
commands.