White Papers

Table Of Contents

33 Disaster Recovery with Dell PS Series SANs and VMware vSphere Site Recovery Manager | TR1073

10 Failback

Failback is the process that brings the recovered VMs at the DR site back to the original protected site after a

full recovery plan has been run. There can be multiple reasons for enacting the full recovery plan and moving

production VMs from the protected site to the recovery site; anything from power outage, equipment outage,

planned migration, to a true disaster. In each of these cases, careful consideration must be given to bringing

the existing environment back onto the original protected site.

Regardless of the reason that the recovery site is now servicing production VMs, there are two basic

scenarios for utilizing failback: The original SAN on the protected site is still in service and has some subset of

data from the production environment before the failover or the SAN is completely new because it is new

hardware or has been re-initialized. There may even be an instance where both of these techniques are used

depending on the reason for failover. Site Recovery Manager provides the ability to failback to the original

protected site using a process called reprotect. Reprotect is only available when the original protected site

and the associated data is still available.

With careful planning, bringing the recovery site virtual environment back into production on the original

protected site can happen with very little downtime.

During planning, the role of protected site and recovery site may change. This section denotes site A as the

original protected site that had data in production and site B as the original recovery site as the fail-over

destination.

10.1 Recovery scenario 1: Reprotect and failback

The first scenario is where the original protected site A still has a functioning SAN with some subset of data.

The failover could have been invoked due to a planned hardware outage or unplanned power failure, but

nothing involving the underlying server and storage environment. While disaster recovery failovers are seldom

planned, the failback process can be planned and controlled to ensure that there is no loss of data and

minimum disruption.

A controlled failback is when the administrator has the time and ability to schedule downtime and prepare for

failing back from site B to site A. Administrators can take their time devising a strategy to migrate back to

site A with all of the current data that was written since the failover occurred. Because failback is done at the

volume and datastore layer, administrators need to ensure that all of the VMs that reside on the volume are

shut down to insure data consistency. Also, if there are VMs that span multiple volumes, all of these volumes

need to be failed back at the same time to guarantee the VMs operation back at site A.

10.1.1 Reprotect

SRM provides the ability to failback to the original protected site using a process called reprotect. Reprotect

automates the process of re-establishing the replication going from the array at site B (the recovery site) back

to the array at site A (the protected site). Reprotect does not failback, but configures everything so that you

can test going back to the original protected site A, and if testing proves successful, then do a planned

migration. During the reprotect, fast failback can shorten the time period of the replication sync back. From a

high level the process is as follows:

1. Demote the protected site A volume to an inbound replica set.