Managing Serviceguard 13th Edition, February 2007

Troubleshooting Your Cluster
Solving Problems
Chapter 8 373
In the event of a TOC, a system dump is performed on the failed node
and numerous messages are also displayed on the console.
You can use the following commands to check the status of your network
and subnets:
netstat -in - to display LAN status and check to see if the package
IP is stacked on the LAN card.
lanscan - to see if the LAN is on the primary interface or has
switched to the standby interface.
arp -a - to check the arp tables.
lanadmin - to display, test, and reset the LAN cards.
Since your cluster is unique, there are no cookbook solutions to all
possible problems. But if you apply these checks and commands and
work your way through the log files, you will be successful in identifying
and solving problems.
Troubleshooting Quorum Server
Authorization File Problems
The following kind of message in a Serviceguard node’s syslog file or in
the output of cmviewcl -v may indicate an authorization problem:
Access denied to quorum server 192.6.7.4
The reason may be that you have not updated the authorization file.
Verify that the node is included in the file, and try using /usr/lbin/qs
-update to re-read the quorum server authorization file.
Timeout Problems
The following kinds of message in a Serviceguard node’s syslog file may
indicate timeout problems:
Unable to set client version at quorum server
192.6.7.2:reply timed out
Probe of quorum server 192.6.7.2 timed out
These messages could be an indication of an intermittent network; or the
default quorum server timeout may not be sufficient. You can set the
QS_TIMEOUT_EXTENSION to increase the timeout, or you can increase the
heartbeat or node timeout value.