Managing Serviceguard 13th Edition, February 2007
Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4 155
Cluster Configuration Planning
A cluster should be designed to provide the quickest possible recovery
from failures. The actual time required to recover from a failure depends
on several factors:
• The length of the cluster heartbeat interval and node timeout.
See the parameter descriptions for HEATRTBEAT_INTERVAL and
NODE_TIMEOUT under “Cluster Configuration Parameters” on
page 157 for recommendations.
• The design of the run and halt instructions in the package control
script. They should be written for fast execution.
• The availability of raw disk access. Applications that use raw disk
access should be designed with crash recovery services.
• The application and database recovery time. They should be
designed for the shortest recovery time.
In addition, you must provide consistency across the cluster so that:
• User names are the same on all nodes.
• UIDs are the same on all nodes.
• GIDs are the same on all nodes.
• Applications in the system area are the same on all nodes.
• System time is consistent across the cluster.
• Files that could be used by more than one node, such as /usr files,
must be the same on all nodes.
The Serviceguard Extension for Faster Failover is a purchased product
that can optimize failover time for certain two-node clusters. The clusters
must be configured to meet certain requirements. When installed, the
product is enabled by a parameter in the cluster configuration file.
Release Notes for the product are posted at http://docs.hp.com ->
high availability.