HP Serviceguard A.11.20- Managing Serviceguard Twentieth Edition, August 2011

largest number says that cmcld was unable to run for the last 1.6 seconds, increase
MEMBER_TIMEOUT to more than 16 seconds.
2. This node is at risk of being evicted from the running cluster.
Increase MEMBER_TIMEOUT.
This means that the hang was long enough for other nodes to have noticed the delay in
receiving heartbeats and marked the node “unhealthy. This is the beginning of the process
of evicting the node from the cluster; see “What Happens when a Node Times Out” (page 88)
for an explanation of that process.
What to do: In isolation, this could indicate a transitory problem, as described in the previous
section. If you have diagnosed and fixed such a problem and are confident that it won't recur,
you need take no further action; otherwise you should increase MEMBER_TIMEOUT as instructed
in item 1.
3. Member node_name seems unhealthy, not receiving heartbeats from it.
This is the message that indicates that the node has been found unhealthy” as described in
the previous bullet.
What to do: See item 2.
For more information, including requirements and recommendations, see the MEMBER_TIMEOUT
discussion under “Cluster Configuration Parameters ” (page 109). See also “Modifying the
MEMBER_TIMEOUT Parameter” (page 192) and “Cluster Daemon: cmcld” (page 41).
System Administration Errors
There are a number of errors you can make when configuring Serviceguard that will not show up
when you start the cluster. Your cluster can be running, and everything appears to be fine, until
there is a hardware or software failure and control of your packages is not transferred to another
node as you would have expected.
These are errors caused specifically by errors in the cluster configuration file and package
configuration scripts. Examples of these errors include:
Volume groups not defined on adoptive node.
Mount point does not exist on adoptive node.
Network errors on adoptive node (configuration errors).
User information not correct on adoptive node.
You can use the following commands to check the status of your disks:
bdf - to see if your package's volume group is mounted.
vgdisplay -v - to see if all volumes are present.
lvdisplay -v - to see if the mirrors are synchronized.
strings /etc/lvmtab - to ensure that the configuration is correct.
ioscan -fnC disk - to see physical disks.
diskinfo -v /dev/rdsk/cxtydz - to display information about a disk.
lssf /dev/d*/* - to check logical volumes and paths.
vxdg list - to list Veritas disk groups.
vxprint- to show Veritas disk group details.
Package Control Script Hangs or Failures
When a RUN_SCRIPT_TIMEOUT or HALT_SCRIPT_TIMEOUT value is set, and the control script
hangs, causing the timeout to be exceeded, Serviceguard kills the script and marks the package
334 Troubleshooting Your Cluster