Managing Serviceguard Fifteenth Edition, reprinted May 2008

ManualsBrandsHP ManualsSoftwareHP Serviceguard Software

161

162

163

164

165

166

167

168

169

170

Planning and Documenting an HA Cluster

Cluster Configuration Planning

Chapter 4166

There are more complex cases that require you to make

a trade-off between fewer failovers and faster failovers.

For example, a network event such as a broadcast

storm may cause kernel interrupts to be turned off on

some or all nodes while the packets are being

processed, preventing the nodes from sending and

processing heartbeat messages. This in turn could

prevent the kernel’s safety timer from being reset,

causing a system reset. (See “Cluster Daemon: cmcld”

on page 60 for more information about the safety

timer.)

Can be changed while the cluster is running.

AUTO_START_TIMEOUT

The amount of time a node waits before it stops trying

to join a cluster during automatic cluster startup. All

nodes wait this amount of time for other nodes to begin

startup before the cluster completes the operation. The

time should be selected based on the slowest boot time

in the cluster. Enter a value equal to the boot time of

the slowest booting node minus the boot time of the

fastest booting node plus 600 seconds (ten minutes).

Default is 600,000,000 microseconds.

Can be changed while the cluster is running.

NETWORK_POLLING_INTERVAL

The frequency at which the networks configured for

Serviceguard are checked. In the cluster configuration

file, this parameter is NETWORK_POLLING_INTERVAL.

Default is 2,000,000 microseconds in the configuration

file (2 seconds). Thus every 2 seconds, the network

manager polls each network interface to make sure it

can still send and receive information. Using the

default is highly recommended. Changing this value

can affect how quickly a network failure is detected.

The minimum value is 1,000,000 (1 second). The

maximum value recommended is 15 seconds, and the

maximum value supported is 30 seconds.

Can be changed while the cluster is running.