HP Serviceguard Quorum Server Version A.04.00 Release Notes, revised August 2009
1. Comment out the Quorum Server entry in /etc/inittab and run the following command:
/sbin/init q
2. Uninstall the existing Quorum Server. For example:
rpm -e qs-A.02.04
CAUTION: This command may remove the file /var/log/qs/qs.log. If this is your log
file, you may want to save it before running this command.
3. Install the version of Quorum Server A.04.00 appropriate to your distribution and hardware.
For example:
rpm -ihv qs-A.04.00.00-0.sles10.i386.rpm
4. Uncomment the entry you commented out in /etc/inittab.
5. Restart the Quorum Server:
/sbin/init q
Configuring Serviceguard to Use the Quorum Server
About the QS Polling Interval and Timeout Extension
Serviceguard probes the Quorum Server at intervals determined by the QS_POLLING_INTERVAL
parameter in the cluster configuration file. The default value for QS_POLLING_INTERVAL is 5
minutes and the minimum value is 10 seconds.
If the quorum server process goes down while its node is still up, the Serviceguard cluster nodes
can detect the halt in the quorum server process. Serviceguard will try to reconnect to the quorum
server every 10 seconds until the quorum server is back up and the connection is successful. If
the quorum server is needed as a tie-breaker during this downtime, the cluster will halt.
However, Serviceguard cannot immediately detect the loss of connection to the process if the
quorum server’s node goes down. Serviceguard will continue to poll at the configured interval,
and will not discover that the quorum server connection is down until the next polling is done.
If a cluster reformation starts before the next polling has occurred, Serviceguard assumes the
Quorum Server is down. Because it requires the Quorum Server as a tie-breaker, it will halt the
cluster. (Even if the Quorum Server comes back up before or during reformation, Serviceguard
will not know that it has until the next polling.)
The minimum value for the polling interval is 10 seconds. Reducing the QS_POLLING_INTERVAL
means Serviceguard will detect Quorum Server failures sooner, but it will also increase the load
on the Quorum Server. If you set a short interval, you may have to reduce the number of clusters
or nodes using the Quorum Server to reduce the load. Test very low settings carefully to fine-tune
all timing parameters, and do the tests in an environment that imitates the actual production
environment as closely as possible.
You can use the optional QS_TIMEOUT_EXTENSION to increase the time interval (in
microseconds) after which the current connection (or attempt to connect) to the quorum server
is deemed to have failed; see “Network Recommendations” (page 9) and “Setting Quorum
Server Parameters in the Cluster Configuration File” (page 14).
Using Alternate Subnets
Some versions of Serviceguard (see “Compatibility with Serviceguard Versions” (page 10))
support new functionality in the Quorum Server that allows you to configure more than one
subnet on which communication between the Quorum Server and the cluster nodes can take
place.
In this case, you configure a primary subnet (indicated by the QS_HOST parameter in the cluster
configuration file) and a second subnet (indicated by QS_ADDR in the cluster configuration file).
Configuring Serviceguard to Use the Quorum Server 13