Specifications

ManualsBrandsVMware ManualsComputer equipmentVC-SRM4-A - vCenter Site Recovery Manager

2.1 A TRADITIONAL DR SCENARIO

When failing over business operations to recover from a disaster, there are several steps that are manual,

lengthy, and complex. Often custom scripts are written and utilized to simplify some of these processes.

However, these processes can affect the real RTO that any DR solution can deliver.

Consider this simplified outline of the flow of a traditional disaster recovery scenario:

1. A disaster recovery solution was previously implemented, and replication has been occurring.

2. A disaster occurs that requires a failover to the DR site. This might be a lengthy power outage, which is

too long for the business to withstand without failing over, or a more unfortunate disaster, causing the

loss of data and/or equipment at the primary site.

3. The DR team takes necessary steps to confirm the disaster and decide to fail over business operations

to the DR site.

4. Assuming all has been well with the replication of the data, the current state of the DR site, and that

prior testing had been done to confirm these facts, then:

a. The replicated storage must be presented to the ESX hosts at the DR site.

b. The ESX hosts must be attached to the storage.

c. The virtual machines must be added to the inventory of the ESX hosts.

d. If the DR site is on a different network segment than the primary site, then each virtual machine

might need to be reconfigured for the new network.

e. Make sure that the environment is brought up properly, with certain systems and services being

made available in the proper order.

5. After the DR environment is ready, business can continue in whatever capacity is supported by the

equipment at the DR site.

6. At some point, the primary site will be made available again, or lost equipment will be replaced.

7. Changes that had been applied to data while the DR site was supporting business will need to be

replicated back to the primary site. Replication must be reversed to accomplish this.

8. The processes that took place in step 4 must now be performed again, this time within a controlled

outage window, to fail over the environment back to the primary site. Depending on how soon after the

original disaster event the DR team was able to engage, this process might take nearly as long as

recovering from the DR event.

9. After the primary environment is recovered, replication must be established in the original direction from

the primary site to the DR site.

10. Testing is done again, to make sure that the environment is ready for a future disaster. Any time testing

is performed; the process described in step 4 must be completed.

As mentioned earlier, a DR process can be lengthy, complex, and prone to human error. These factors carry

risk, which is amplified by the fact that the process will need to be performed again to recover operations

back to the primary site when it is made available. A DR solution is an important insurance policy for any

business. Periodic testing of the DR plan is a must if the solution is to be relied on. Due to physical

environment limitations and the difficulty of performing DR testing, most environments are only able to do so

a few times a year at most, and some not at all in a realistic manner.

3 BENEFITS OF IMPLEMENTING SRM WITH NETAPP

Implementing a virtualized environment using VMware vCenter Site Recovery Manager on NetApp storage

provides the infrastructure for unique opportunities to implement real working DR processes that are quick

and easy to test, consume little additional storage, and significantly reduce RTO and RPO times.

3.1 VMWARE VCENTER SITE RECOVERY MANAGER

One of the most time-consuming parts of DR failover in a VMware environment is the execution of the steps

necessary to connect, register, reconfigure, and power up virtual machines at the DR site.

VMware has solved these problems with the introduction of VMware vCenter Site Recovery Manager. SRM

enables two separate VMware environments, the primary and the DR (or paired) sites, to communicate with