User's Guide
Arbitrator Node Configuration Rules
Although you can use one arbitrator, having two arbitrators provides greater flexibility in taking
systems down for planned outages as well as providing better protection against multiple points
of failure. Using two arbitrators:
• Provides local failover capability to applications running on the arbitrator.
• Protects against multiple points of failure (MPOF).
• Provides for planned downtime on a single system anywhere in the cluster.
If you use a single arbitrator system, special procedures must be followed during planned downtime
to remain protected. Systems must be taken down in pairs, one from each of the data centers, so
that the Serviceguard quorum is maintained after a node failure. If the arbitrator itself must be
taken down, disaster recovery capability is at risk if one of the other systems fails.
Arbitrator systems can be used to perform important and useful work such as:
• Hosting mission-critical applications not protected by disaster recovery software
• Running monitoring and management tools such as IT/Operations or Network Node Manager
• Running backup applications such as Omniback
• Acting as application servers
Disk Array Data Replication Configuration Rules
Each disk array must be configured with redundant links for data replication. To prevent a single
point of failure (SPOF), there must be at least two physical boards in each disk array for the data
replication links. Each board usually has multiple ports. However, a redundant data replication
link must be connected to a port on a different physical board from the board that has the primary
data replication link.
For Continuous Access P9000 and XP, when using bi-directional configurations, where data center
A backs up data center B and data center B backs up data center A, you must have at least four
Continuous Access links, two in each direction. Four Continuous Access links are also required in
uni-directional configurations in which to allow failback.
Calculating a Cluster Quorum
When a cluster initially forms, all systems must be available to form the cluster (100% Quorum
requirement).
A quorum is dynamic and is recomputed after each system failure. For instance, if you start out
with an 8-node cluster and two systems fail, that leaves 6 out of 8 surviving nodes, or a 75%
quorum. The cluster size is reset to 6 nodes. If two more nodes fail, leaving 4 out of 6, quorum is
67%.
Each time a cluster forms, there must be more than 50% quorum to reform the cluster. With
Serviceguard a cluster lock disk or Quorum Server is used as the tie-breaker when quorum is exactly
50%. However, with a Metrocluster configuration, a Quorum Server is supported and a cluster
lock disk is not supported. Therefore, a quorum of 50% will require access to a Quorum Server,
otherwise all nodes will halt.
Example Failover Scenarios with One Arbitrator
Taking a node off-line for planned maintenance is treated the same as a node failure in these
scenarios. Study these scenarios to make sure you do not put your cluster at risk during planned
maintenance.
Designing a Disaster Recovery Architecture for use with Metrocluster Products 29