Cascading Failover in a Continentalclusters, December 2005

Introduction
Overview
Cascading failover is the ability for an application to fail from a primary to a secondary location, and
then to fail to a recovery location. The primary location, the primary and secondary site, contains a
metropolitan cluster built with the HP Metrocluster solution, and the recovery location as a standard
Serviceguard cluster. Continentalclusters provides a “push-button” recovery between Serviceguard
clusters. Data replication also follows the cascading model. Data is synchronously replicated from the
primary disk array to the secondary disk array in the Metrocluster, and periodically data is manually
replicated via storage data replication technology to the third disk array in the Serviceguard recovery
cluster.
Continentalclusters with cascading failover uses three main data centers distributed between a
metropolitan cluster, which serves as a primary cluster, and a standard cluster, which serves as a
recovery cluster.
In the primary cluster, there are two disk arrays, either of which can have the source volumes for a
particular application. Throughout this document, the term primary disk array refers to the disk array
that holds the volumes that are being replicated to the remote disk array for a particular application,
and the data center where this disk array is located is called the primary site. The term secondary disk
array refers to the disk array that holds the volumes that the data is being replicated to using the
storage specific replication technology for a particular application, and the data center where the
secondary disk array for that application is located is known as the secondary site. Thus, primary and
secondary sites are roles that can be played by either disk array in the primary cluster. However,
once the data replication link has been defined for the secondary disk array to the recovery disk
array, primary and secondary sites will be fixed.
The recovery disk array holds a remote replicated copy of the data in the recovery cluster. The data
center that houses the recovery disk array is called the recovery site. The data is replicated from the
secondary disk array to the recovery disk array through manual operations or custom made scripts.
The basic design of the cascading failover solution is shown in Figure 1. The primary cluster, shown
on the left, is configured as a Metrocluster with three data centers physically located on three different
sites—two main sites (primary and secondary sites) and an arbitrator site (a third location) which is
not shown in the figure below. The primary and secondary sites can relative to the application given
that data replication is possible from both disk arrays in the primary cluster to the disk array in the
recovery cluster. A fourth data center (recovery site) is used for the recovery cluster, which is a
standard Serviceguard configuration. Also, the primary and recovery cluster are configured as a
Continentalclusters.
2