HP Serviceguard Quorum Server Version A.02.00 Release Notes, Fifth Edition, February 2007

HP Serviceguard Quorum Server Version A.02.00 Release Notes, Fifth Edition
Configuring Serviceguard to Use the Quorum Server
Chapter 118
Configuring Serviceguard to Use the Quorum
Server
Considerations for Setting Quorum Server Polling
Interval
NOTE This discussion relates to Serviceguard versions 11.16 and later.
Serviceguard probes the Quorum Server at intervals determined by the
QS_POLLING_INTERVAL parameter in the cluster configuration file. The
default value for QS_POLLING_INTERVAL is 5 minutes and the minimum
value is 10 seconds.
If the quorum server process goes down while its node is still up, the
Serviceguard cluster nodes can detect the halt in the quorum server
process. Serviceguard will try to re-connect to the quorum server every
10 seconds until the quorum server is back up and the connection is
successful. If the quorum server is needed as a tie-breaker during this
downtime, the cluster will halt.
However, Serviceguard cannot immediately detect the loss of connection
to the process if the quorum server’s node goes down. Serviceguard will
continue to poll at the configured interval. It will not discover that the
quorum server connection is down until the next polling is done. If a
cluster reformation starts before the next polling has occurred,
Serviceguard assumes the Quorum Server is down. Because it requires
the Quorum Server as a tie-breaker, it will halt the cluster. (Even if the
Quorum Server comes back up before or during reformation,
Serviceguard will not know that it has until the next polling.)
The minimum value for the polling interval is 10 seconds. Reducing the
QS_POLLING_INTERVAL means Serviceguard will detect Quorum Server
failures sooner, but will also increase the load on the Quorum Server. If
you set a low interval, you may have to reduce the number of clusters or
nodes using the Quorum Server to reduce the load. This is especially
important if you are using Serviceguard Extension for Faster Failover
(SGeFF) because in that case the lock acquisition time value is also set