Cascading Failover in a Continentalclusters, December 2005
Primary Cluster Package Setup
Cascading failover uses a Continentalclusters in which the primary cluster is configured as a
metropolitan cluster. The following are two differences from the normal Continentalclusters
configuration in customizing the environment file.
1. Instead of the CLUSTER_TYPE variable being set to “continental”, it should be set to “metro.”
2. The DEVICE_GROUP variable should be set to the disk array device group name defined in
the primary and secondary disk array according to what disk array is directly connected to
the node.
Before executing the cmapplyconf on the primary cluster, make sure to split the data replication links,
between the primary disk array and the secondary disk array, for the disks associated with the
application package.
Recovery Cluster Package Setup
There is a slight difference from the normal Continentalclusters configuration in customizing the
recovery package environment file. The DEVICE_GROUP variable should be set to the disk array
device group name that was defined in the recovery disk array.
Just as with the primary cluster setup, before running cmapplyconf command for the recovery cluster,
make sure to split the data replication links, between the secondary disk array and the recovery disk
array, for the disks associated with the application package.
Steps for Failure and Recovery Scenarios
This section describes the procedures for the following various failover and failback scenarios:
• Failure of Primary Site within Primary Cluster
• Failback from the Secondary Site to the Primary Site
• Failure of Secondary Site within the Primary Cluster
• Failover from Primary Cluster to Recovery Cluster
• Failback from the Recovery Cluster to the Secondary Site within the Primary Cluster
• Failback from the Recovery Site Directly to the Primary Site in the Primary Cluster
Failure of Primary Site within Primary Cluster
When a failure occurs at the primary site, the hosts are down or the whole site is down, the
application package automatically fails over to the secondary site within the primary cluster. Until
the problems at the primary site are fixed, and data replication is reestablished, there is no remote
data protection for the package at the secondary site. Depending on the type of failure and how
quickly the primary site is back online, data refresh to the recovery site is still needed. This scenario is
illustrated in Figure 5.
8