HP Serviceguard Extended Distance Cluster for Linux A.12.00.00 Deployment Guide, March 2014
1 Disaster Recovery in a Serviceguard Cluster
This chapter introduces a variety of Hewlett-Packard high availability cluster technologies that
provide disaster recovery for your mission-critical applications. It is assumed that you are already
familiar with Serviceguard high availability concepts and configurations.
1.1 Evaluating the Need for Disaster Recovery Solution
Disaster recovery is the ability to restore applications and data within a reasonable period of time
after a disaster. Most people think of fire, flood, and earthquake as disasters, but a disaster can
be any event that unexpectedly interrupts service or corrupts data in an entire data center: the
backhoe that digs too deep and severs a network connection, or an act of sabotage.
Disaster recovery architectures protect against unplanned down time due to disasters by
geographically distributing the nodes in a cluster so that a disaster at one site does not disable
the entire cluster. To evaluate your need for a disaster recovery solution, you need to weigh:
• Risk of disaster. Areas prone to tornadoes, floods, or earthquakes may require a disaster
recovery solution. Some industries need to consider risks other than natural disasters or
accidents, such as terrorist activity or sabotage.
The type of disaster to which your business is prone, whether it is due to geographical location
or the nature of the business, will determine the type of disaster recovery you choose. For
example, if you live in a region prone to big earthquakes, you are not likely to put your
alternate or backup nodes in the same city as your primary nodes, because that sort of disaster
affects a large area.
The frequency of the disaster also plays an important role in determining whether to invest in
a rapid disaster recovery solution. For example, you would be more likely to protect from
hurricanes that occur seasonally, rather than protecting from a dormant volcano.
• Vulnerability of the business. How long can your business afford to be down? Some parts of
a business may be able to endure a 1 or 2 day recovery time, while others need to recover
in a matter of minutes. Some parts of a business only need local protection from single outages,
such as a node failure. Other parts of a business may need both local protection and protection
in case of site failure.
It is important to consider the role applications play in your business. For example, you may
target the assembly line production servers as most in need of quick recovery. But if the most
likely disaster in your area is an earthquake, it would render the assembly line inoperable as
well as the computers. In this case disaster recovery would be moot, and local failover is
probably the more appropriate level of protection.
On the other hand, you may have an order processing center that is prone to floods in the
winter. The business loses thousands of dollars a minute while the order processing servers
are down. A disaster recovery architecture is appropriate protection in this situation.
Deciding to implement a disaster recovery solution really depends on the balance between risk of
disaster, and the vulnerability of your business if a disaster occurs. The following pages give a
high-level view of a variety of disaster recovery solutions and sketch the general guidelines that
you must follow in developing a disaster recovery computing environment.
1.2 What is a Disaster Recovery Architecture?
In a Serviceguard cluster configuration, high availability is achieved by using redundant hardware
to eliminate single points of failure. This protects the cluster against hardware faults, such as the
node failure in Figure 1.
1.1 Evaluating the Need for Disaster Recovery Solution 9