Managing Serviceguard 12th Edition, March 2006

Planning and Documenting an HA Cluster

Cluster Configuration Planning

Chapter 4154

Cluster Configuration Planning

A cluster should be designed to provide the quickest possible recovery

from failures. The actual time required to recover from a failure depends

on several factors:

• The length of the cluster heartbeat interval and node timeout. They

should each be set as short as practical, but not shorter than

1000000 (one second) and 2000000 (two seconds), respectively. The

recommended value for heartbeat interval is 1000000 (one second),

and the recommended value for node timeout is within the 5 to 8

second range (5000000 to 8000000).

• The design of the run and halt instructions in the package control

script. They should be written for fast execution.

• The availability of raw disk access. Applications that use raw disk

access should be designed with crash recovery services.

• The application and database recovery time. They should be

designed for the shortest recovery time.

In addition, you must provide consistency across the cluster so that:

• User names are the same on all nodes.

• UIDs are the same on all nodes.

• GIDs are the same on all nodes.

• Applications in the system area are the same on all nodes.

• System time is consistent across the cluster.

• Files that could be used by more than one node, such as /usr files,

must be the same on all nodes.

The Serviceguard Extension for Faster Failover is a purchased product

that can optimize failover time for certain two-node clusters. The

clusters must be configured to meet certain requirements. When

installed, the product is enabled by a parameter in the cluster

configuration file. Release Notes for the product are posted at

http://docs.hp.com -> high availability.