Managing Serviceguard 13th Edition, February 2007

Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4 155
Cluster Configuration Planning
A cluster should be designed to provide the quickest possible recovery
from failures. The actual time required to recover from a failure depends
on several factors:
The length of the cluster heartbeat interval and node timeout.
See the parameter descriptions for HEATRTBEAT_INTERVAL and
NODE_TIMEOUT under “Cluster Configuration Parameters” on
page 157 for recommendations.
The design of the run and halt instructions in the package control
script. They should be written for fast execution.
The availability of raw disk access. Applications that use raw disk
access should be designed with crash recovery services.
The application and database recovery time. They should be
designed for the shortest recovery time.
In addition, you must provide consistency across the cluster so that:
User names are the same on all nodes.
UIDs are the same on all nodes.
GIDs are the same on all nodes.
Applications in the system area are the same on all nodes.
System time is consistent across the cluster.
Files that could be used by more than one node, such as /usr files,
must be the same on all nodes.
The Serviceguard Extension for Faster Failover is a purchased product
that can optimize failover time for certain two-node clusters. The clusters
must be configured to meet certain requirements. When installed, the
product is enabled by a parameter in the cluster configuration file.
Release Notes for the product are posted at http://docs.hp.com ->
high availability.