HP Serviceguard Version A.11.17 Release Notes, March 2006 (revised)

Serviceguard Version A.11.17 Release Notes

Fixed in This Version

Chapter 134

JAGaf68807 (SR8606408905): Serviceguard lans can

fail incorrectly when high polling interval used

What was the problem? hen the network polling interval

(NETWORK_POLLING_TIMEOUT in the cluster configuration file) is

configured to 30 seconds, the LAN interface will be marked down even

for single miss in the link level messages. Because of this, the LAN will

be down for one poll interval. Messages such as the following may appear

in syslog:

Jun 29 16:00:37 zeon cmcld: lan6 failed

Jun 29 16:00:37 zeon cmcld: Subnet 10.0.0.0 switched

from lan6 to lan1

Jun 29 16:00:37 zeon cmcld: lan6 switched to lan1

Jun 29 16:00:37 zeon cmcld: Switched 10.0.0.2 from

lan6 to lan1

Jun 29 16:00:37 zeon cmcld: Finished moving off lan6

Jun 29 16:01:07 zeon cmcld: Interface lan1 missed 1

both send & receive packet(s), being marked

doubtful. [1]

Jun 29 16:01:07 zeon cmcld: Interface lan1 has max

misses of send and receive packets.1.

Jun 29 16:01:07 zeon cmcld: lan1 failed

Jun 29 16:01:07 zeon cmcld: Subnet 10.0.0.0 down

Jun 29 16:01:37 zeon cmcld: lan1 recovered

Jun 29 16:01:37 zeon cmcld: Subnet 10.0.0.0 up

What was the resolution? In case of Ethernet bridged network, the wait

time for failure detection of LAN interface is 12 seconds. When network

polling interval is more than 12 seconds, the number for polls required to

detect failure is one. When there were, no updates on inbound and

outbound static data for the polling interval, the network monitor thread

used to mark the interface as down with out sending the poll packets.

Before checking the interface, there has been a check to see if this is a

last interface which is used for checking the state.