Cascading Failover in a Continentalclusters, December 2005

Primary Cluster Package Setup

Cascading failover uses a Continentalclusters in which the primary cluster is configured as a

metropolitan cluster. The following are two differences from the normal Continentalclusters

configuration in customizing the environment file.

1. Instead of the CLUSTER_TYPE variable being set to “continental”, it should be set to “metro.”

2. The DEVICE_GROUP variable should be set to the disk array device group name defined in

the primary and secondary disk array according to what disk array is directly connected to

the node.

Before executing the cmapplyconf on the primary cluster, make sure to split the data replication links,

between the primary disk array and the secondary disk array, for the disks associated with the

application package.

Recovery Cluster Package Setup

There is a slight difference from the normal Continentalclusters configuration in customizing the

recovery package environment file. The DEVICE_GROUP variable should be set to the disk array

device group name that was defined in the recovery disk array.

Just as with the primary cluster setup, before running cmapplyconf command for the recovery cluster,

make sure to split the data replication links, between the secondary disk array and the recovery disk

array, for the disks associated with the application package.

Steps for Failure and Recovery Scenarios

This section describes the procedures for the following various failover and failback scenarios:

• Failure of Primary Site within Primary Cluster

• Failback from the Secondary Site to the Primary Site

• Failure of Secondary Site within the Primary Cluster

• Failover from Primary Cluster to Recovery Cluster

• Failback from the Recovery Cluster to the Secondary Site within the Primary Cluster

• Failback from the Recovery Site Directly to the Primary Site in the Primary Cluster

Failure of Primary Site within Primary Cluster

When a failure occurs at the primary site, the hosts are down or the whole site is down, the

application package automatically fails over to the secondary site within the primary cluster. Until

the problems at the primary site are fixed, and data replication is reestablished, there is no remote

data protection for the package at the secondary site. Depending on the type of failure and how

quickly the primary site is back online, data refresh to the recovery site is still needed. This scenario is

illustrated in Figure 5.