Technical data

LED Indicators
Shelf Manager Redundant Operation
Chapter 2
118
Shelf Manager Redundant Operation
The active Shelf Manager exposes the ShMC device (address 20h) on IPMB, manages IPMB and the IPM
controllers, and interacts with the System Manager over RMCP and other shelf-external interfaces. It
maintains an open TCP connection with the backup Shelf Manager. It communicates all changes in the state
of the managed objects to the backup Shelf Manager.
The backup Shelf Manager does not expose the ShMC on IPMB, does not actively manage IPMB and IPM
controllers, and does not interact with the System Manager via the shelf-external interfaces, with one
exception (noted later in this section). Instead, it maintains the state of the managed objects in its own
memory (volatile and nonvolatile) and updates the state as directed by the active Shelf Manager.
The backup Shelf Manager may become active as the result of a switchover. The following two types of
switchover are defined:
Cooperative switchover: The active and backup Shelf Managers negotiate the transfer of responsibilities
from the active to the backup Shelf Manager. This mode is supported via the CLI switchover command
issued on the active or backup Shelf Manager.
Forced switchover: The backup Shelf Manager determines that the active Shelf Manager is no longer
alive or healthy, and forcefully takes on the responsibilities of the active Shelf Manager.
The backup Shelf Manager recognizes the departure of the active Shelf Manager when the Remote Healthy or
Remote Presence low-level signal becomes inactive. Remote Presence signal monitors the presence of the peer
Shelf Manager; this signal going inactive means that the board hosting the peer Shelf Manager has been
removed from the shelf. The Remote Healthy signal is set by the peer Shelf Manager during initialization;
this signal going inactive means that the remote Shelf Manager has become unhealthy (typically, has been
powered off or reset).
Another situation that needs some action from the backup Shelf Manager is when the TCP connection
between the Shelf Managers closes. This happens either when the communication link between the two Shelf
Managers gets broken or when the shelfman process on the active Shelf Manager terminates (in a voluntary
or involuntary way, or due to a software exception). Also, because the keepalive option is enabled on the TCP
connection, it will close shortly after the active ShMM is switched off or reset. In the case of Shelf Manager
termination, it is possible that the TCP connection is closed before the Remote Healthy signal becomes
inactive. So, in order to determine why the TCP connection closed, the backup Shelf Manager samples the
state of the Remote Healthy signal immediately and, if it is still active, again after some delay. If the Remote
Healthy signal ultimately becomes inactive, the backup Shelf Manager concludes that the active Shelf
Manager is dead and initiates a switchover.
Otherwise, if the Remote Healthy signal stays active, the backup Shelf Manager concludes that the
communication link between the Shelf Managers is broken. In that case, no switchover is initiated. Instead,
the backup Shelf Manager repeatedly re initializes itself and tries to establish a connection with the active
Shelf Manager until the communication link is restored. Reinstallation is achieved by rebooting the ShMM
and automatically restarting the Shelf Manager after the reboot. Special logic in the Shelf Manager
guarantees that it does not try to become active at startup if the peer Shelf Manager is already active.
The Shelf Manager uses a watchdog timer to protect against becoming unresponsive due to infinite loops or
other software bugs. In the event the watchdog timer on the active Shelf Manager triggers, that ShMM will
be reset, causing the Remote Healthy signal on the backup ShMM to become inactive, thus triggering a
switchover.