HP-UX HB v13.00 Ch-15 - Serviceguard

HP-UX Handbook – Rev 13.00 Page 81 (of 108)

Chapter 15 Serviceguard

October 29, 2013

which missed the heartbeat), voting for a new cluster coordinator, and reforming the cluster (the

new cluster is based upon the number of nodes which responded during the reformation. Note, if

the node which missed the heartbeat is able to respond during the reformation, then the

reformation will end up with the same number of nodes in the cluster and your packages will not

be effected).

Example: Node Node1 is missing heartbeats from Node2. It starts reformation and goes for the

cluster lock disk, before Node2 comes back:

Aug 5 16:08:29 Node1 cmcld: Timed out node Node2. It may have failed.

Aug 5 16:08:29 Node1 cmcld: Attempting to form a new cluster

Aug 5 16:08:36 Node1 cmcld: Obtaining Cluster Lock

Aug 5 16:08:36 Node1 vmunix: SCSI: Reset requested from above --

lbolt:25093597, bus:0

Aug 5 16:08:37 Node1 vmunix: SCSI: Resetting SCSI -- lbolt:25093697, bus:0

Aug 5 16:08:37 Node1 vmunix: SCSI: Reset detected -- lbolt:25093697, bus:0

Aug 5 16:08:54 Node1 cmcld: Attempting to adjust cluster membership

Aug 5 16:08:56 Node1 cmcld: Enabling safety time protection

Aug 5 16:08:56 Node1 cmcld: Clearing Cluster Lock

Aug 5 16:08:57 Node1 cmcld: 2 nodes have formed a new cluster, sequence #7

Aug 5 16:08:57 Node1 cmcld: The new active cluster membership is:

Node1(id=1), Node2(id=2)

Often the factory-default setting for the cluster parameter NODE_TIMEOUT of 2 seconds

(2000000 microseconds) is too small for many configurations. The general recommendation is to

set NODE_TIMEOUT in the range of 5000000-8000000 microseconds.

• Problem: SCSI reset messages logged to syslog and the kernel's message buffer

If Serviceguard performs a cluster reformation, SCSI reset messages appear in the

dmesg output and syslog.log, if the reformation requires a race to the cluster lock disk. To ensure

that the SCSI bus is available for the server to grab the lock, a reset is performed, (please note

the “Reset requested from above” message), e.g.:

SCSI: Reset requested from above -- lbolt: 400804081, bus: 0 SCSI: Resetting

SCSI -- lbolt: 400804381, bus: 0 SCSI: Reset detected -- lbolt: 400804381,

bus: 0

The SCSI messages are only seen when the cluster lock disk is handled by the sdisk driver. The

disc3 driver does not output information when a SCSI reset is issued or detected.

• Error: The local node Node1 appears to belong to a different cluster.

The node's /etc/cmcluster/cmclconfig file already contains a configuration with a different cluster

ID in it. You should use cmdeleteconf to remove that configuration first. As a last resort you can

remove the file from that node.

Cluster daemon abort with possible node TOC