Specifications
4
2.1 A TRADITIONAL DR SCENARIO
When failing over business operations to recover from a disaster, there are several steps that are manual,
lengthy, and complex. Often custom scripts are written and utilized to simplify some of these processes.
However, these processes can affect the real RTO that any DR solution can deliver.
Consider this simplified outline of the flow of a traditional disaster recovery scenario:
1. A disaster recovery solution was previously implemented, and replication has been occurring.
2. A disaster occurs that requires a failover to the DR site. This might be a lengthy power outage, which is
too long for the business to withstand without failing over, or a more unfortunate disaster, causing the
loss of data and/or equipment at the primary site.
3. The DR team takes necessary steps to confirm the disaster and decide to fail over business operations
to the DR site.
4. Assuming all has been well with the replication of the data, the current state of the DR site, and that
prior testing had been done to confirm these facts, then:
a. The replicated storage must be presented to the ESX hosts at the DR site.
b. The ESX hosts must be attached to the storage.
c. The virtual machines must be added to the inventory of the ESX hosts.
d. If the DR site is on a different network segment than the primary site, then each virtual machine
might need to be reconfigured for the new network.
e. Make sure that the environment is brought up properly, with certain systems and services being
made available in the proper order.
5. After the DR environment is ready, business can continue in whatever capacity is supported by the
equipment at the DR site.
6. At some point, the primary site will be made available again, or lost equipment will be replaced.
7. Changes that had been applied to data while the DR site was supporting business will need to be
replicated back to the primary site. Replication must be reversed to accomplish this.
8. The processes that took place in step 4 must now be performed again, this time within a controlled
outage window, to fail over the environment back to the primary site. Depending on how soon after the
original disaster event the DR team was able to engage, this process might take nearly as long as
recovering from the DR event.
9. After the primary environment is recovered, replication must be established in the original direction from
the primary site to the DR site.
10. Testing is done again, to make sure that the environment is ready for a future disaster. Any time testing
is performed; the process described in step 4 must be completed.
As mentioned earlier, a DR process can be lengthy, complex, and prone to human error. These factors carry
risk, which is amplified by the fact that the process will need to be performed again to recover operations
back to the primary site when it is made available. A DR solution is an important insurance policy for any
business. Periodic testing of the DR plan is a must if the solution is to be relied on. Due to physical
environment limitations and the difficulty of performing DR testing, most environments are only able to do so
a few times a year at most, and some not at all in a realistic manner.
3 BENEFITS OF IMPLEMENTING SRM WITH NETAPP
Implementing a virtualized environment using VMware vCenter Site Recovery Manager on NetApp storage
provides the infrastructure for unique opportunities to implement real working DR processes that are quick
and easy to test, consume little additional storage, and significantly reduce RTO and RPO times.
3.1 VMWARE VCENTER SITE RECOVERY MANAGER
One of the most time-consuming parts of DR failover in a VMware environment is the execution of the steps
necessary to connect, register, reconfigure, and power up virtual machines at the DR site.
VMware has solved these problems with the introduction of VMware vCenter Site Recovery Manager. SRM
enables two separate VMware environments, the primary and the DR (or paired) sites, to communicate with