Managing HP Serviceguard A.12.00.00 for Linux, June 2014
The default Serviceguard control scripts are designed to take the straightforward steps needed to
get an application running or stopped. If the package administrator specifies a time limit within
which these steps need to occur and that limit is subsequently exceeded for any reason, Serviceguard
takes the conservative approach that the control script logic must either be hung or defective in
some way. At that point the control script cannot be trusted to perform cleanup actions correctly,
thus the script is terminated and the package administrator is given the opportunity to assess what
cleanup steps must be taken.
If you want the package to switch automatically in the event of a control script timeout, set the
node_fail_fast_enabled parameter (page 186) to YES. In this case, Serviceguard will cause
a reboot on the node where the control script timed out. This effectively cleans up any side effects
of the package’s run or halt attempt. In this case the package will be automatically restarted on
any available alternate node for which it is configured.
10.8.6 Node and Network Failures
These failures cause Serviceguard to transfer control of a package to another node. This is the
normal action of Serviceguard, but you have to be able to recognize when a transfer has taken
place and decide to leave the cluster in its current condition or to restore it to its original condition.
Possible node failures can be caused by the following conditions:
• reboot
• Kernel Oops
• Hangs
• Power failures
You can use the following commands to check the status of your network and subnets:
• ifconfig - to display LAN status and check to see if the package IP is stacked on the LAN
card.
• arp -a - to check the arp tables.
Since your cluster is unique, there are no cookbook solutions to all possible problems. But if you
apply these checks and commands and work your way through the log files, you will be successful
in identifying and solving problems.
10.8.7 Troubleshooting the Quorum Server
NOTE: See the HP Serviceguard Quorum Server Version A.04.00 Release Notes for information
about configuring the Quorum Server. Do not proceed without reading the Release Notes for your
version.
10.8.7.1 Authorization File Problems
The following kind of message in a Serviceguard node’s syslog file or in the output of cmviewcl
-v may indicate an authorization problem:
Access denied to quorum server 192.6.7.4
The reason may be that you have not updated the authorization file. Verify that the node is included
in the file, and try using /usr/lbin/qs -update to re-read the quorum server authorization
file.
10.8.7.2 Timeout Problems
The following kinds of message in a Serviceguard node’s syslog file may indicate timeout
problems:
280 Troubleshooting Your Cluster