Managing Serviceguard 13th Edition, February 2007

Planning and Documenting an HA Cluster

Cluster Configuration Planning

Chapter 4 155

Cluster Configuration Planning

A cluster should be designed to provide the quickest possible recovery

from failures. The actual time required to recover from a failure depends

on several factors:

• The length of the cluster heartbeat interval and node timeout.

See the parameter descriptions for HEATRTBEAT_INTERVAL and

NODE_TIMEOUT under “Cluster Configuration Parameters” on

page 157 for recommendations.

• The design of the run and halt instructions in the package control

script. They should be written for fast execution.

• The availability of raw disk access. Applications that use raw disk

access should be designed with crash recovery services.

• The application and database recovery time. They should be

designed for the shortest recovery time.

In addition, you must provide consistency across the cluster so that:

• User names are the same on all nodes.

• UIDs are the same on all nodes.

• GIDs are the same on all nodes.

• Applications in the system area are the same on all nodes.

• System time is consistent across the cluster.

• Files that could be used by more than one node, such as /usr files,

must be the same on all nodes.

The Serviceguard Extension for Faster Failover is a purchased product

that can optimize failover time for certain two-node clusters. The clusters

must be configured to meet certain requirements. When installed, the

product is enabled by a parameter in the cluster configuration file.

Release Notes for the product are posted at http://docs.hp.com ->

high availability.