Managing Serviceguard A.11.20, March 2013
NOTE:
• For most clusters that use an LVM cluster lock or lock
LUN, a minimum MEMBER_TIMEOUT of 14 seconds is
appropriate.
• For most clusters that use a MEMBER_TIMEOUT value
lower than 14 seconds, a quorum server is more
appropriate than a lock disk or lock LUN.
The cluster will fail if the time it takes to acquire the disk
lock exceeds 0.2 times the MEMBER_TIMEOUT. This
means that if you use a disk-based quorum device (lock
disk or lock LUN), you must be certain that the nodes
in the cluster, the connection to the disk, and the disk
itself can respond quickly enough to perform 10 disk
writes within 0.2 times the MEMBER_TIMEOUT.
With the lowest supported value of 3 seconds, a failover
time of 4 to 5 seconds can be achieved.
NOTE: The failover estimates provided here apply to the
Serviceguard component of failover; that is, the package is
expected to be up and running on the adoptive node in this
time, but the application that the package runs may take
more time to start.
Keep the following guidelines in mind when deciding how
to set the value.
Guidelines: You need to decide whether it's more important
for your installation to have fewer (but slower) cluster
re-formations, or faster (but possibly more frequent)
re-formations:
• To ensure the fastest cluster re-formations, use the
minimum value applicable to your cluster. But keep in
mind that this setting will lead to a cluster re-formation,
and to the node being removed from the cluster and
rebooted, if a system hang or network load spike
prevents the node from sending a heartbeat signal
within the MEMBER_TIMEOUT value. More than one
node could be affected if, for example, a network event
such as a broadcast storm caused kernel interrupts to
be turned off on some or all nodes while the packets
are being processed, preventing the nodes from
sending and processing heartbeat messages.
See “Cluster Re-formations Caused by
MEMBER_TIMEOUT Being Set too Low” (page 340) for
troubleshooting information.
• For fewer re-formations, use a setting in the range of
10 to 25 seconds (10,000,000 to 25,000,000
microseconds), keeping in mind that a value larger than
the default will lead to slower re-formations than the
default. A value in this range is appropriate for most
installations
Cluster Configuration Planning 125