Arbitration For Data Integrity in Serviceguard Clusters, July 2007
Arbitration for Data Integrity in Serviceguard Clusters
Arbitration in Disaster-Tolerant Clusters
28
Note that if the first lock disk is located in the first data center when
the heartbeat is lost, the first data center will normally obtain the
lock first because it is closest to the disk. Thus in this scenario, the
first data center will re-form the cluster.
3. If a node in one data center is successful at obtaining the first lock
disk but the disk link is not viable because the other data center is
down, then the first data center will not be able to obtain the second
lock disk, but because the lock was not refused, it will still be allowed
to re-form the cluster. This is the expected behavior when there is a
disaster.
4. If there is a loss of both heartbeat and disk link, there is a danger of
split brain because each sub-cluster, attempting to acquire both lock
disks, is able to obtain the lock in its own data center, and is not
refused the other lock. It is important to minimize or eliminate this
slight danger by ensuring that data and heartbeat links are
separately routed between data centers.
NOTE A dual lock disk configuration does not provide a redundant cluster lock.
In fact, the dual lock is a compound lock, and both disks have to
participate in the protocol of lock acquisition by the two equal-sized sets
of nodes. Even when mirrored LVM is used via MirrorDisk/UX, the lock
disk area is not mirrored.
At cluster formation time, a set of nodes must gain access to one disk,
and must either gain access to the other disk or not be denied access to it.
(“Not being denied” occurs when a disk is not accessible to a set of nodes.)
The group of nodes that gains access to at least one disk and is not
denied access by any disk is allowed to form the new cluster.
If one of the dual lock disks fails, Serviceguard will detect this when it
carries out periodic checking, and it will write a message to the syslog
file. After the loss of one of the lock disks, if the failure of a cluster node
results in the need for arbitration, the cluster will go down.