HP Serviceguard Version A.11.17 Release Notes, March 2006 (revised)

Serviceguard Version A.11.17 Release Notes
Fixed in This Version
Chapter 134
JAGaf68807 (SR8606408905): Serviceguard lans can
fail incorrectly when high polling interval used
What was the problem? hen the network polling interval
(NETWORK_POLLING_TIMEOUT in the cluster configuration file) is
configured to 30 seconds, the LAN interface will be marked down even
for single miss in the link level messages. Because of this, the LAN will
be down for one poll interval. Messages such as the following may appear
in syslog:
Jun 29 16:00:37 zeon cmcld: lan6 failed
Jun 29 16:00:37 zeon cmcld: Subnet 10.0.0.0 switched
from lan6 to lan1
Jun 29 16:00:37 zeon cmcld: lan6 switched to lan1
Jun 29 16:00:37 zeon cmcld: Switched 10.0.0.2 from
lan6 to lan1
Jun 29 16:00:37 zeon cmcld: Finished moving off lan6
Jun 29 16:01:07 zeon cmcld: Interface lan1 missed 1
both send & receive packet(s), being marked
doubtful. [1]
Jun 29 16:01:07 zeon cmcld: Interface lan1 has max
misses of send and receive packets.1.
Jun 29 16:01:07 zeon cmcld: lan1 failed
Jun 29 16:01:07 zeon cmcld: Subnet 10.0.0.0 down
Jun 29 16:01:37 zeon cmcld: lan1 recovered
Jun 29 16:01:37 zeon cmcld: Subnet 10.0.0.0 up
What was the resolution? In case of Ethernet bridged network, the wait
time for failure detection of LAN interface is 12 seconds. When network
polling interval is more than 12 seconds, the number for polls required to
detect failure is one. When there were, no updates on inbound and
outbound static data for the polling interval, the network monitor thread
used to mark the interface as down with out sending the poll packets.
Before checking the interface, there has been a check to see if this is a
last interface which is used for checking the state.