Managing Serviceguard Nineteenth Edition, Reprinted June 2011

Dual Lock Disk
If you are using disks that are internally mounted in the same cabinet as the cluster nodes, then a
single lock disk would be a single point of failure, since the loss of power to the node that has the
lock disk in its cabinet would also render the cluster lock unavailable. Similarly, in a campus cluster,
where the cluster contains nodes running in two separate data centers, a single lock disk would
be a single point of failure should the data center it resides in suffer a catastrophic failure.
In these two cases only, a dual cluster lock, with two separately powered cluster disks, should be
used to eliminate the lock disk as a single point of failure.
NOTE: You must use Fibre Channel connections for a dual cluster lock; you can no longer
implement it in a parallel SCSI configuration.
For a dual cluster lock, the disks must not share either a power circuit or a node chassis with one
another. In this case, if there is a power failure affecting one node and disk, the other node and
disk remain available, so cluster re-formation can take place on the remaining node. For a campus
cluster, there should be one lock disk in each of the data centers, and all nodes must have access
to both lock disks. In the event of a failure of one of the data centers, the nodes in the remaining
data center will be able to acquire their local lock disk, allowing them to successfully reform a new
cluster.
NOTE: A dual lock disk does not provide a redundant cluster lock. In fact, the dual lock is a
compound lock. This means that two disks must be available at cluster formation time rather than
the one that is needed for a single lock disk. Thus, the only recommended usage of the dual cluster
lock is when the single cluster lock cannot be isolated at the time of a failure from exactly one half
of the cluster nodes.
If one of the dual lock disks fails, Serviceguard will detect this when it carries out periodic checking,
and it will write a message to the syslog file. After the loss of one of the lock disks, the failure
of a cluster node could cause the cluster to go down if the remaining node(s) cannot access the
surviving cluster lock disk.
Use of the Quorum Server as the Cluster Lock
A Quorum Server can be used in clusters of any size. The quorum server process runs on a machine
outside of the cluster for which it is providing quorum services. The quorum server listens to
connection requests from the Serviceguard nodes on a known port. The server maintains a special
area in memory for each cluster, and when a node obtains the cluster lock, this area is marked so
that other nodes will recognize the lock as “taken.
If communications are lost between two equal-sized groups of nodes, the group that obtains the
lock from the Quorum Server will take over the cluster and the other nodes will perform a system
reset. Without a cluster lock, a failure of either group of nodes will cause the other group, and
therefore the cluster, to halt. Note also that if the Quorum Server is not available when its arbitration
services are needed, the cluster will halt.
The operation of the Quorum Server is shown in Figure 12. When there is a loss of communication
between node 1 and node 2, the Quorum Server chooses one node (in this example, node 2) to
continue running in the cluster. The other node halts.
46 Understanding Serviceguard Software Components