Disaster recovery rehearsal in Continentalclusters

4
Introduction to DR rehearsal
For a successful recovery in a Continentalclusters environment, it is critical that the configurations
on all the systems at both primary and recovery clusters are kept synchronized. After the initial
Continentalclusters setup, the configurations are usually subject to change. It is the operator’s
responsibility to ensure that any changes done at the primary cluster nodes are also updated on
the recovery cluster nodes. However, it is possible that the operator may have forgotten to update
all nodes in the recovery cluster resulting in stale configuration or due to an operator error the
configuration could have have being updated incorrectly. This can result in a stale or incorrect
configuration on the recovery cluster which can prevent a recovery on those nodes. For example,
Metrocluster environment file changed on the primary cluster nodes but not updated on the recovery
cluster nodes can result in recovery attempts to fail. The DR rehearsal feature provided for
Continentalclusters “rehearses” the recovery, without impacting the availability of the primary
packages. The DR rehearsal detects configuration discrepancies at the recovery cluster and thereby
helps to improve the “DR preparedness” of the recovery cluster.
The DR rehearsal is supported in Continentalclusters using Continuous Access P9000 or XP data
replication, Continentalclusters using EMC SRDF data replication, and Continentalclusters in 3DC DR
Solution. The DR rehearsal feature implemented in Continentalclusters provides a reliable way of
rehearsing recovery without impacting the availability of the primary package. It identifies the errors
at the recovery cluster that could have prevented a recovery. It also provides a built-in protection
mechanism that extends mutual exclusion between recovery and rehearsal.
For rehearsal, the Continentalclusters allows recovery groups to be configured with a Serviceguard
package known as the rehearsal package, which is configured with the recovery package’s volume
group and filesystem directory. The rehearsal package is specified as part of the recovery group
definition. Using the cmrecovercl command with the option {-r g <recovery group>}, the rehearsal
can be started for a recovery group.
During rehearsal, the availability of the primary package at the primary cluster will not be impacted.
The clients connected to the primary package would be unaware of the rehearsal that is in progress
at the recovery cluster.
DR rehearsal is allowed only when the recovery group is in maintenance mode so that while rehearsal
is in progress, recovery attempt is prevented. Since the rehearsal package and recovery package
share the same volume group and filesystem directory, allowing one of them to start while the other is
running can result in resource collisions and impact data integrity.
Note: During rehearsal, the replication between the primary and recovery cluster is required to be suspended.
Therefore, during rehearsal, production changes at the primary mirror copy would not be protected.
The rehearsal package started on the recovery cluster would be highly available at that cluster. For
example, in case of a node failure or machine reboot at the recovery cluster, the rehearsal package,
like any other SG failover type package, will automatically failover to an alternate node in the cluster.
DR rehearsal in a Continentalclusters environment is pictorially represented in
Figure 1.