Installation guide

2.1.1. Quorum Disks
A quorum disk or partition is a section of a disk that's set up for use with components of the cluster
project. It has a couple of purposes. Again, I'll explain with an example.
Suppose you have nodes A and B, and node A fails to get several of cluster manager's "heartbeat"
packets from node B. Node A doesn't know why it hasn't received the packets, but there are several
possibilities: either node B has failed, the network switch or hub has failed, node A's network adapter
has failed, or maybe just because node B was just too busy to send the packet. That can happen if
your cluster is extremely large, your systems are extremely busy or your network is flakey.
Node A doesn't know which is the case, and it doesn't know whether the problem lies within itself or
with node B. This is especially problematic in a two-node cluster because both nodes, out of touch
with one another, can try to fence the other.
So before fencing a node, it would be nice to have another way to check if the other node is really
alive, even though we can't seem to contact it. A quorum disk gives you the ability to do just that.
Before fencing a node that's out of touch, the cluster software can check whether the node is still
alive based on whether it has written data to the quorum partition.
In the case of two-node systems, the quorum disk also acts as a tie-breaker. If a node has access to
the quorum disk and the network, that counts as two votes.
A node that has lost contact with the network or the quorum disk has lost a vote, and therefore may
safely be fenced.
Further information about configuring quorum disk parameters is provided in the chapters on Conga
and ccs administration in the Cluster Administration manual.
2.1.2. T ie-breakers
Tie-breakers are additional heuristics that allow a cluster partition to decide whether or not it is
quorate in the event of an even-split - prior to fencing. A typical tie-breaker construct is an IP tie-
breaker, sometimes called a ping node.
With such a tie-breaker, nodes not only monitor each other, but also an upstream router that is on the
same path as cluster communications. If the two nodes lose contact with each other, the one that
wins is the one that can still ping the upstream router. Of course, there are cases, such as a switch-
loop, where it is possible for two nodes to see the upstream router - but not each other - causing what
is called a split brain. That is why, even when using tie-breakers, it is important to ensure that fencing
is configured correctly.
Other types of tie-breakers include where a shared partition, often called a quorum disk, provides
additional details. clumanager 1.2.x (Red Hat Cluster Suite 3) had a disk tie-breaker that allowed
operation if the network went down as long as both nodes were still communicating over the shared
partition.
More complex tie-breaker schemes exist, such as QDisk (part of linux-cluster). QDisk allows arbitrary
heuristics to be specified. These allow each node to determine its own fitness for participation in the
cluster. It is often used as a simple IP tie-breaker, however. See the qdisk(5) manual page for more
information.
CMAN has no internal tie-breakers for various reasons. However, tie-breakers can be implemented
using the API. This API allows quorum device registration and updating. For an example, look at the
QDisk source code.
You might need a tie-breaker if you:
Chapt er 2 . Clust er Management wit h CMAN
11