Managing Serviceguard Eighteenth Edition, September 2010

in two separate data centers, a single lock disk would be a single point of failure should
the data center it resides in suffer a catastrophic failure.
In these two cases only, a dual cluster lock, with two separately powered cluster disks,
should be used to eliminate the lock disk as a single point of failure.
NOTE: You must use Fibre Channel connections for a dual cluster lock; you can no
longer implement it in a parallel SCSI configuration.
For a dual cluster lock, the disks must not share either a power circuit or a node chassis
with one another. In this case, if there is a power failure affecting one node and disk,
the other node and disk remain available, so cluster re-formation can take place on the
remaining node. For a campus cluster, there should be one lock disk in each of the data
centers, and all nodes must have access to both lock disks. In the event of a failure of
one of the data centers, the nodes in the remaining data center will be able to acquire
their local lock disk, allowing them to successfully reform a new cluster.
NOTE: A dual lock disk does not provide a redundant cluster lock. In fact, the dual lock is
a compound lock. This means that two disks must be available at cluster formation time
rather than the one that is needed for a single lock disk. Thus, the only recommended
usage of the dual cluster lock is when the single cluster lock cannot be isolated at the
time of a failure from exactly one half of the cluster nodes.
If one of the dual lock disks fails, Serviceguard will detect this when it carries out
periodic checking, and it will write a message to the syslog file. After the loss of one
of the lock disks, the failure of a cluster node could cause the cluster to go down if the
remaining node(s) cannot access the surviving cluster lock disk.
Use of the Quorum Server as the Cluster Lock
A Quorum Server can be used in clusters of any size. The quorum server process runs
on a machine outside of the cluster for which it is providing quorum services. The quorum
server listens to connection requests from the Serviceguard nodes on a known port.
The server maintains a special area in memory for each cluster, and when a node obtains
the cluster lock, this area is marked so that other nodes will recognize the lock as
“taken.”
If communications are lost between two equal-sized groups of nodes, the group that
obtains the lock from the Quorum Server will take over the cluster and the other nodes
will perform a system reset. Without a cluster lock, a failure of either group of nodes
will cause the other group, and therefore the cluster, to halt. Note also that if the Quorum
Server is not available when its arbitration services are needed, the cluster will halt.
The operation of the Quorum Server is shown in Figure 3-3. When there is a loss of
communication between node 1 and node 2, the Quorum Server chooses one node (in
this example, node 2) to continue running in the cluster. The other node halts.
How the Cluster Manager Works 65