Arbitration For Data Integrity in Serviceguard Clusters, July 2007
Arbitration for Data Integrity in Serviceguard Clusters
How Serviceguard Uses Arbitration
14
The cluster lock is used as a tie-breaker only for situations in which a
running cluster fails and, as Serviceguard attempts to form a new
cluster, the cluster is split into two sub-clusters of equal size. Each
sub-cluster will attempt to acquire the cluster lock. The sub-cluster
which gets the cluster lock will form the new cluster, preventing the
possibility of two sub-clusters running at the same time. If the two
sub-clusters are of unequal size, the sub-cluster with greater than 50% of
the nodes will form the new cluster, and the cluster lock is not used.
If you have a two-node cluster, you are required to configure the cluster
lock. If communications are lost between these two nodes, the node that
obtains the cluster lock will take over the cluster and the other node will
undergo a forced halt. Without a cluster lock, a failure of either node in
the cluster will result in a forced immediate system halt of the other
node, and therefore the cluster will halt.
If the cluster lock fails or is unavailable during an attempt to acquire it,
the cluster will halt. You can avoid this problem by configuring the
cluster’s hardware so that the cluster lock is not lost due to an event that
causes a failure in another cluster component.
No Cluster Lock
Normally, you should not configure a cluster of three or fewer nodes
without a cluster lock. In two-node clusters, a cluster lock is required.
You may consider using no cluster lock with configurations of three or
more nodes, although the decision should be affected by the fact that any
cluster may require tie-breaking. For example, if one node in a
three-node cluster is removed for maintenance, the cluster reforms as a
two-node cluster. If a tie-breaking scenario later occurs due to a node or
communication failure, the entire cluster will become unavailable.
In a cluster with four or more nodes, you may not need a cluster lock
since the chance of the cluster being split into two halves of equal size is
very small. However, be sure to configure your cluster to prevent the
failure of exactly half the nodes at one time. For example, make sure
there is no potential single point of failure such as a single LAN between
equal numbers of nodes, and that you use multiple power circuits with
less than half of the nodes on any single power circuit.
Cluster lock disks are not allowed in clusters of more than four nodes. A
quorum server or arbitrator nodes may be employed with larger clusters,
and this kind of arbitration is necessary for extended distance clusters
and with MetroCluster configurations to provide disaster tolerance.