Managing Serviceguard Eighteenth Edition, September 2010

which is a permanent modification of the configuration files. Re-formation of the cluster
occurs under the following conditions (not a complete list):
An SPU or network failure was detected on an active node.
An inactive node wants to join the cluster. The cluster manager daemon has been
started on that node.
A node has been added to or deleted from the cluster configuration.
The system administrator halted a node.
A node halts because of a package failure.
A node halts because of a service failure.
Heavy network traffic prohibited the heartbeat signal from being received by the
cluster.
The heartbeat network failed, and another network is not configured to carry
heartbeat.
Typically, re-formation results in a cluster with a different composition. The new cluster
may contain fewer or more nodes than in the previous incarnation of the cluster.
Cluster Quorum to Prevent Split-Brain Syndrome
In general, the algorithm for cluster re-formation requires a cluster quorum of a strict
majority (that is, more than 50%) of the nodes previously running. If both halves (exactly
50%) of a previously running cluster were allowed to re-form, there would be a
split-brain situation in which two instances of the same cluster were running. In a
split-brain scenario, different incarnations of an application could end up simultaneously
accessing the same disks. One incarnation might well be initiating recovery activity
while the other is modifying the state of the disks. Serviceguard’s quorum requirement
is designed to prevent a split-brain situation.
Cluster Lock
Although a cluster quorum of more than 50% is generally required, exactly 50% of the
previously running nodes may re-form as a new cluster provided that the other 50% of
the previously running nodes do not also re-form. This is guaranteed by the use of a
tie-breaker to choose between the two equal-sized node groups, allowing one group
to form the cluster and forcing the other group to shut down. This tie-breaker is known
as a cluster lock. The cluster lock is implemented by means of a lock disk, lock LUN,
or a Quorum Server.
The cluster lock is used as a tie-breaker only for situations in which a running cluster
fails and, as Serviceguard attempts to form a new cluster, the cluster is split into two
sub-clusters of equal size. Each sub-cluster will attempt to acquire the cluster lock. The
sub-cluster which gets the cluster lock will form the new cluster, preventing the
possibility of two sub-clusters running at the same time. If the two sub-clusters are of
unequal size, the sub-cluster with greater than 50% of the nodes will form the new
cluster, and the cluster lock is not used.
62 Understanding Serviceguard Software Components