Managing Serviceguard 14th Edition, June 2007
Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4160
NODE_TIMEOUT The time, in microseconds, after which a node may
decide that another node has become unavailable and
initiate cluster reformation.
Maximum value: 60,000,000 microseconds (60
seconds).
Minimum value: 2 * HEARTBEAT_INTERVAL
Default value: 2,000,000 microseconds (2 seconds).
Recommendations: You need to decide whether it's
more important for your installation to have fewer
cluster reformations, or faster reformations:
• To ensure the fastest cluster reformations, use the
default value. But keep in mind that this setting
can lead to reformations that are caused by
short-lived system hangs or network load spikes.
• For fewer reformations, use a setting in the range
of 5,000,000 to 8,000,000 microseconds (5 to 8
seconds). But keep in mind that this will lead to
slower reformations than the default value.
• The maximum recommended value is 30,000,000
microseconds (30 seconds).
Remember that a cluster reformation may result in a
system reset on one of the cluster nodes. For further
discussion, see“What Happens when a Node Times
Out” on page 125.
There are more complex cases that require you to make
a trade-off between fewer failovers and faster failovers.
For example, a network event such as a broadcast
storm may cause kernel interrupts to be turned off on
some or all nodes while the packets are being
processed, preventing the nodes from sending and
processing heartbeat messages. This in turn could
prevent the kernel’s safety timer from being reset,
causing the node to halt. (See “Cluster Daemon: cmcld”
on page 56 for more information about the safety
timer.)
Can be changed while the cluster is running.
AUTO_START_TIMEOUT