HP-UX HB v13.00 Ch-15 - Serviceguard
HP-UX Handbook – Rev 13.00 Page 82 (of 108)
Chapter 15 Serviceguard
October 29, 2013
(See the section titled Serviceguard TOC)
• Active cmcld aborts with syslog messages like:
cmcld: Aborting!
cmcld: Service Guard Aborting!
cmcld: Aborting Serviceguard Daemon to preserve data integrity.
These messages are logged by cmcld before it actively aborts due to some fatal error condition,
that may be also part of the error message. Typically the syslog.log looks similar to this:
Aug 5 11:05:31 Node1 cmcld: Aborting: cl_rwlock.c 1030 (reader/writer lock
not locked)
Aug 5 11:05:35 Node1 cmlvmd: Could not read messages from /usr/lbin/cmcld:
Software caused connection abort
Aug 5 11:05:35 Node1 cmlvmd: CLVMD exiting
Aug 5 11:05:35 Node1 cmsrvassistd[8688]: The cluster daemon aborted our
connection.
Aug 5 11:05:35 Node1 cmsrvassistd[8688]: Lost connection with Serviceguard
cluster daemon (cmcld): Software caused connection abort
Aug 5 11:05:35 Node1 cmtaped[8691]: The cluster daemon aborted our
connection.
Aug 5 11:05:35 Node1 cmtaped[8691]: cmtaped terminating. (ATS 1.14)
...
The exact error message should be checked against known problems. Often also a cmcld core file
is written to the directory /var/adm/cmcluster, which may be helpful for further investigation.
• Node TOC after tuning TCP parameters using ndd(1M):
Oct 17 10:10:02 Node1 cmlvmd: Could not read messages from /usr/lbin/cmcld:
Connection reset by peer
Oct 17 10:10:02 Node1 cmlvmd: CLVMD exiting
...
Oct 17 10:10:08 Node1 vmunix: Halting Node1 to preserve data integrity
Oct 17 10:10:08 Node1 vmunix: Reason: LVM daemon failed
It is officially unsupported to change the TCP settings on a Serviceguard system. The specific
ndd(1M) parameters involved here are tcp_keepalive_interval and tcp_ip_abort_interval. The
same applies for the 10.X nettune(1M) parameters tcp_keepstart, tcp_keepfreq and tcp_keepstop.
The reason is that Serviceguard needs all ports to behave the same way, and if ndd(1M) is run
after starting the cluster, then some ports will use the original TCP behavior, while all other ports
established after running ndd(1M) use the new behavior. It has been found that usually no
problems arise if the ndd(1M) tuning is done before the cluster is started, but nevertheless, this is
neither tested nor supported.