Arbitration For Data Integrity in Serviceguard Clusters, July 2007
Arbitration for Data Integrity in Serviceguard Clusters
How Serviceguard Uses Arbitration
13
Dynamic Cluster Re-Formation
A dynamic re-formation is a temporary change in cluster membership
that takes place as nodes join or leave a running cluster. Re-formation
differs from reconfiguration, which is a permanent modification of the
configuration files. Re-formation of the cluster occurs under the following
conditions (not a complete list):
• An SPU or network failure was detected on an active node.
• An inactive node wants to join the cluster. The cluster manager
daemon has been started on that node.
• The system administrator halted a node.
• A node halts because of a package failure.
• A node halts because of a service failure.
• Heavy network traffic prohibited the heartbeat signal from being
received by the cluster.
• The heartbeat network failed, and another network is not configured
to carry heartbeat.
Typically, re-formation results in a cluster with a different composition.
The new cluster may contain fewer or more nodes than in the previous
incarnation of the cluster.
Cluster Quorum and Cluster Locking
Recall that the algorithm for cluster re-formation requires a cluster
quorum of a strict majority (that is, more than 50%) of the nodes
previously running. If both halves (exactly 50%) of a previously running
cluster were allowed to re-form, there would be a split-brain situation
in which two instances of the same cluster were running.
Cluster Lock
Although a cluster quorum of more than 50% is generally required,
Serviceguard allows exactly 50% of the previously running nodes to
re-form as a new cluster provided that the other 50% of the previously
running nodes do not also re-form. This is guaranteed by the use of an
arbiter or tie-breaker to choose between the two equal-sized node groups,
allowing one group to form the cluster and forcing the other group to
shut down. This type of arbitration is known as a cluster lock.