HP-UX HB v13.00 Ch-15 - Serviceguard

HP-UX Handbook Rev 13.00 Page 75 (of 108)
Chapter 15 Serviceguard
October 29, 2013
Please note that often linkloop(1M) is not able to catch such problems since its checking differs
from Serviceguard's. Please refer to the tool dlpiping. More information about the tool can be
found under Troubleshooting.
Error: Network interface lanX on node Node1 couldn't talk to itself.
This message is usually a result of a failed LAN interface check, either on link level. Each
interface being part of the cluster configuration needs to be either configured (UP and ping'able
through its IP address) or unconfigured (unplumbed standby interface, without having any IP
adress).
If the interface is supposed be configured with HEARTBEAT_IP or STATIONARY_IP you
should check with e.g. ping(1M) if it is up and running. Otherwise, for standby interfaces, you
should check with ifconfig(1M) if it is really unplumbed. It should return no such interface.
It’s also important that every interface is able to communicate on link level. Verify this for each
reported interface using the linkloop(1M) command:
# linkloop i <PPA> <MAC address>
The i option specifies the lan interface as outgoing interface.
Please refer to the tool dlpiping. More information about the tool can be found under
Troubleshooting
Note: Some Ignite revisions are known to recover a bogus "0.0.0.0" configuration for unused
interfaces to /etc/rc.config.d/netconf. Remove that entry and unplumb the interface as described
above.
Error: Detected a partition of IP subnet X.X.X.X.
The error indicates that some lan interfaces are unable to talk to each other on IP level, although
they should be able to do so. Verify if the configuration information in the cluster ASCII file is
correct. Check the network's physical connections. A failure with the linkloop command means
there is no connectivity on link level, maybe because some network component such as a switch
does not pass that type of traffic.
• DLPI errors: Serviceguard cluster daemon (cmcld) may abort with error messages: "Unable
to send DLPI message, Interrupted system call" followed by "Aborting! Failed to
send over DLPI"
Serviceguard uses DLPI (Datalink Provider Interface) to perform network polling in order to
check the health of the lan cards in the cluster. Link level packets are sent and received which
allows cmcld to gather statistical information to ensure data is being transmitted and received