Managing Serviceguard 14th Edition, June 2007
Understanding Serviceguard Software Components
How the Cluster Manager Works
Chapter 3 69
If one of the dual lock disks fails, Serviceguard will detect this when it
carries out periodic checking, and it will write a message to the syslog
file. After the loss of one of the lock disks, the failure of a cluster node
could cause the cluster to go down if the remaining node(s) cannot access
the surviving cluster lock disk.
Use of the Quorum Server as the Cluster Lock
A quorum server can be used in clusters of any size. The quorum server
process runs on a machine outside of the cluster for which it is providing
quorum services. The quorum server listens to connection requests from
the Serviceguard nodes on a known port. The server maintains a special
area in memory for each cluster, and when a node obtains the cluster
lock, this area is marked so that other nodes will recognize the lock as
“taken.” If communications are lost between two equal-sized groups of
nodes, the group that obtains the lock from the Quorum Server will take
over the cluster and the other nodes will perform a system reset. Without
a cluster lock, a failure of either group of nodes will cause the other
group, and therefore the cluster, to halt. Note also that if the quorum
server is not available during an attempt to access it, the cluster will
halt.
The operation of the quorum server is shown in Figure 3-3. When there
is a loss of communication between node 1 and node 2, the quorum server
chooses one node (in this example, node 2) to continue running in the
cluster. The other node halts.