HP-UX HB v13.00 Ch-15 - Serviceguard

HP-UX Handbook – Rev 13.00 Page 82 (of 108)

Chapter 15 Serviceguard

October 29, 2013

(See the section titled Serviceguard TOC)

• Active cmcld aborts with syslog messages like:

cmcld: Aborting!

cmcld: Service Guard Aborting!

cmcld: Aborting Serviceguard Daemon to preserve data integrity.

These messages are logged by cmcld before it actively aborts due to some fatal error condition,

that may be also part of the error message. Typically the syslog.log looks similar to this:

Aug 5 11:05:31 Node1 cmcld: Aborting: cl_rwlock.c 1030 (reader/writer lock

not locked)

Aug 5 11:05:35 Node1 cmlvmd: Could not read messages from /usr/lbin/cmcld:

Software caused connection abort

Aug 5 11:05:35 Node1 cmlvmd: CLVMD exiting

Aug 5 11:05:35 Node1 cmsrvassistd[8688]: The cluster daemon aborted our

connection.

Aug 5 11:05:35 Node1 cmsrvassistd[8688]: Lost connection with Serviceguard

cluster daemon (cmcld): Software caused connection abort

Aug 5 11:05:35 Node1 cmtaped[8691]: The cluster daemon aborted our

connection.

Aug 5 11:05:35 Node1 cmtaped[8691]: cmtaped terminating. (ATS 1.14)

...

The exact error message should be checked against known problems. Often also a cmcld core file

is written to the directory /var/adm/cmcluster, which may be helpful for further investigation.

• Node TOC after tuning TCP parameters using ndd(1M):

Oct 17 10:10:02 Node1 cmlvmd: Could not read messages from /usr/lbin/cmcld:

Connection reset by peer

Oct 17 10:10:02 Node1 cmlvmd: CLVMD exiting

...

Oct 17 10:10:08 Node1 vmunix: Halting Node1 to preserve data integrity

Oct 17 10:10:08 Node1 vmunix: Reason: LVM daemon failed

It is officially unsupported to change the TCP settings on a Serviceguard system. The specific

ndd(1M) parameters involved here are tcp_keepalive_interval and tcp_ip_abort_interval. The

same applies for the 10.X nettune(1M) parameters tcp_keepstart, tcp_keepfreq and tcp_keepstop.

The reason is that Serviceguard needs all ports to behave the same way, and if ndd(1M) is run

after starting the cluster, then some ports will use the original TCP behavior, while all other ports

established after running ndd(1M) use the new behavior. It has been found that usually no

problems arise if the ndd(1M) tuning is done before the cluster is started, but nevertheless, this is

neither tested nor supported.