Managing Serviceguard Nineteenth Edition, Reprinted June 2011

Dual Lock Disk

If you are using disks that are internally mounted in the same cabinet as the cluster nodes, then a

single lock disk would be a single point of failure, since the loss of power to the node that has the

lock disk in its cabinet would also render the cluster lock unavailable. Similarly, in a campus cluster,

where the cluster contains nodes running in two separate data centers, a single lock disk would

be a single point of failure should the data center it resides in suffer a catastrophic failure.

In these two cases only, a dual cluster lock, with two separately powered cluster disks, should be

used to eliminate the lock disk as a single point of failure.

NOTE: You must use Fibre Channel connections for a dual cluster lock; you can no longer

implement it in a parallel SCSI configuration.

For a dual cluster lock, the disks must not share either a power circuit or a node chassis with one

another. In this case, if there is a power failure affecting one node and disk, the other node and

disk remain available, so cluster re-formation can take place on the remaining node. For a campus

cluster, there should be one lock disk in each of the data centers, and all nodes must have access

to both lock disks. In the event of a failure of one of the data centers, the nodes in the remaining

data center will be able to acquire their local lock disk, allowing them to successfully reform a new

cluster.

NOTE: A dual lock disk does not provide a redundant cluster lock. In fact, the dual lock is a

compound lock. This means that two disks must be available at cluster formation time rather than

the one that is needed for a single lock disk. Thus, the only recommended usage of the dual cluster

lock is when the single cluster lock cannot be isolated at the time of a failure from exactly one half

of the cluster nodes.

If one of the dual lock disks fails, Serviceguard will detect this when it carries out periodic checking,

and it will write a message to the syslog file. After the loss of one of the lock disks, the failure

of a cluster node could cause the cluster to go down if the remaining node(s) cannot access the

surviving cluster lock disk.

Use of the Quorum Server as the Cluster Lock

A Quorum Server can be used in clusters of any size. The quorum server process runs on a machine

outside of the cluster for which it is providing quorum services. The quorum server listens to

connection requests from the Serviceguard nodes on a known port. The server maintains a special

area in memory for each cluster, and when a node obtains the cluster lock, this area is marked so

that other nodes will recognize the lock as “taken.”

If communications are lost between two equal-sized groups of nodes, the group that obtains the

lock from the Quorum Server will take over the cluster and the other nodes will perform a system

reset. Without a cluster lock, a failure of either group of nodes will cause the other group, and

therefore the cluster, to halt. Note also that if the Quorum Server is not available when its arbitration

services are needed, the cluster will halt.

The operation of the Quorum Server is shown in Figure 12. When there is a loss of communication

between node 1 and node 2, the Quorum Server chooses one node (in this example, node 2) to

continue running in the cluster. The other node halts.

46 Understanding Serviceguard Software Components