Optimizing Failover Time in a Serviceguard Environment, June 2007

Quorum server considerations
If you choose a quorum server, be sure the network between it and the nodes is highly available and
reliable. If you encounter delays reaching the quorum server, you can configure a
QS_TIMEOUT_EXTENSION, but the extension time adds directly to the lock acquisition time.
Serviceguard calculates the time for the actual lock acquisition. You cannot change it directly, but (in
Serviceguard releases earlier than A.11.18) you can configure a faster lock device to reduce the lock
acquisition time.
Heartbeat subnet
If you have one heartbeat configured, with the required standby LAN, and if NODE_TIMEOUT value
is less than 4 seconds, you can reduce failover time if you configure multiple heartbeats instead. Since
heartbeat messages are sent over all heartbeat subnets concurrently, there will be no wait for network
switching if a primary LAN fails. To avoid delays from busy networks, configure at least one private
dedicated network for heartbeat.
Keep in mind that certain configurations, such as those using VERITAS Cluster Volume Manager
(CVM) 3.5 or earlier, do not allow multiple heartbeats.
Network failure detection
The NETWORK_POLLING_INTERVAL specifies how often Serviceguard checks its configured
networks. In general, the default works best.
Frequent polling allows Serviceguard to respond more quickly to LAN failure. A quick response could
reduce the package failover time simply because failover starts earlier. However, if the polling interval
is too short, the frequent traffic could just make the network and the system busier.
11