Managing Serviceguard Fifteenth Edition, reprinted May 2008

Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4166
There are more complex cases that require you to make
a trade-off between fewer failovers and faster failovers.
For example, a network event such as a broadcast
storm may cause kernel interrupts to be turned off on
some or all nodes while the packets are being
processed, preventing the nodes from sending and
processing heartbeat messages. This in turn could
prevent the kernel’s safety timer from being reset,
causing a system reset. (See “Cluster Daemon: cmcld”
on page 60 for more information about the safety
timer.)
Can be changed while the cluster is running.
AUTO_START_TIMEOUT
The amount of time a node waits before it stops trying
to join a cluster during automatic cluster startup. All
nodes wait this amount of time for other nodes to begin
startup before the cluster completes the operation. The
time should be selected based on the slowest boot time
in the cluster. Enter a value equal to the boot time of
the slowest booting node minus the boot time of the
fastest booting node plus 600 seconds (ten minutes).
Default is 600,000,000 microseconds.
Can be changed while the cluster is running.
NETWORK_POLLING_INTERVAL
The frequency at which the networks configured for
Serviceguard are checked. In the cluster configuration
file, this parameter is NETWORK_POLLING_INTERVAL.
Default is 2,000,000 microseconds in the configuration
file (2 seconds). Thus every 2 seconds, the network
manager polls each network interface to make sure it
can still send and receive information. Using the
default is highly recommended. Changing this value
can affect how quickly a network failure is detected.
The minimum value is 1,000,000 (1 second). The
maximum value recommended is 15 seconds, and the
maximum value supported is 30 seconds.
Can be changed while the cluster is running.