HP-UX HB v13.00 Ch-15 - Serviceguard

HP-UX Handbook Rev 13.00 Page 81 (of 108)
Chapter 15 Serviceguard
October 29, 2013
which missed the heartbeat), voting for a new cluster coordinator, and reforming the cluster (the
new cluster is based upon the number of nodes which responded during the reformation. Note, if
the node which missed the heartbeat is able to respond during the reformation, then the
reformation will end up with the same number of nodes in the cluster and your packages will not
be effected).
Example: Node Node1 is missing heartbeats from Node2. It starts reformation and goes for the
cluster lock disk, before Node2 comes back:
Aug 5 16:08:29 Node1 cmcld: Timed out node Node2. It may have failed.
Aug 5 16:08:29 Node1 cmcld: Attempting to form a new cluster
Aug 5 16:08:36 Node1 cmcld: Obtaining Cluster Lock
Aug 5 16:08:36 Node1 vmunix: SCSI: Reset requested from above --
lbolt:25093597, bus:0
Aug 5 16:08:37 Node1 vmunix: SCSI: Resetting SCSI -- lbolt:25093697, bus:0
Aug 5 16:08:37 Node1 vmunix: SCSI: Reset detected -- lbolt:25093697, bus:0
Aug 5 16:08:54 Node1 cmcld: Attempting to adjust cluster membership
Aug 5 16:08:56 Node1 cmcld: Enabling safety time protection
Aug 5 16:08:56 Node1 cmcld: Clearing Cluster Lock
Aug 5 16:08:57 Node1 cmcld: 2 nodes have formed a new cluster, sequence #7
Aug 5 16:08:57 Node1 cmcld: The new active cluster membership is:
Node1(id=1), Node2(id=2)
Often the factory-default setting for the cluster parameter NODE_TIMEOUT of 2 seconds
(2000000 microseconds) is too small for many configurations. The general recommendation is to
set NODE_TIMEOUT in the range of 5000000-8000000 microseconds.
• Problem: SCSI reset messages logged to syslog and the kernel's message buffer
If Serviceguard performs a cluster reformation, SCSI reset messages appear in the
dmesg output and syslog.log, if the reformation requires a race to the cluster lock disk. To ensure
that the SCSI bus is available for the server to grab the lock, a reset is performed, (please note
the “Reset requested from above” message), e.g.:
SCSI: Reset requested from above -- lbolt: 400804081, bus: 0 SCSI: Resetting
SCSI -- lbolt: 400804381, bus: 0 SCSI: Reset detected -- lbolt: 400804381,
bus: 0
The SCSI messages are only seen when the cluster lock disk is handled by the sdisk driver. The
disc3 driver does not output information when a SCSI reset is issued or detected.
• Error: The local node Node1 appears to belong to a different cluster.
The node's /etc/cmcluster/cmclconfig file already contains a configuration with a different cluster
ID in it. You should use cmdeleteconf to remove that configuration first. As a last resort you can
remove the file from that node.
Cluster daemon abort with possible node TOC