HP Superdome 2 Health Management Stack Whitepaper (September 2011, 5900-2013)
2
Executive Summary
The Legacy HP Superdome Health Management Solution
In the Classic Health Monitoring Model, HP Server platforms use management processors such as
Integrated Lights-Out 3 (iLO 3) to monitor fundamental hardware health such as voltage, temperature,
power, and fans. In this classic design, the management processor signals software agents running on the
OS when it detects a problem that needs an administrator’s attention. These server health agents then alert
the administrator using standard protocols such as Intelligent Platform Management Interface (IPMI), Simple
Network Management Protocol (SNMP), or Web-Based Enterprise Management (WBEM).
HP extended this classic model and applied it to the HP Superdome servers. In these systems, there is a set
of management processors, monitoring the shared system hardware. In addition, there are separate
components monitoring the partition-specific hardware. Because these servers contain multiple OS-
partitions, every OS-partition is notified when a management processor detects a problem in shared
hardware. For example, if a power supply fails, every OS-partition is notified. Consequently, every OS-
partition sends an alert and the administrator is swamped with redundant error messages. Conversely,
problems found only on a single partition’s hardware are not shared with monitoring components in other
partitions or with the main management processor. Thus administrators must check multiple, separate health
logs for complete system information.
The HP Superdome 2 Health Management Solution
In HP Superdome 2, the legacy health monitoring model has been replaced with the HP Superdome 2
Analysis Engine which runs on the Onboard Administrator. The HP Superdome 2 Onboard Administrator
(OA) is the central point of control for all health management tasks. The OA delivers a common, powerful,
hardware based management and supports both the Command Line Interface (CLI) and the Web Graphical
User Interface (Web GUI).
The HP Superdome 2 Analysis Engine (AE) is built into the sx3000 chipset and SD2 firmware, and is
currently available only on HP Superdome 2.When a fault is detected, the Analysis Engine automatically
attempts to self-heal the problem and reports any problems that require the system to be serviced. It can
report directly to customers; and for systems under warranty, to HP Customer Support via HP Insight Remote
Support (Insight RS).
HP Superdome 2 Analysis Engine
Overview
The HP Superdome 2 Analysis Engine (AE) correlates, analyzes, and automatically initiates self-repair.
When a hardware error occurs in the system, the SD2 AE collects the health data and analyzes it and takes
action on the defective hardware part also referred to as the Field Replaceable Unit (FRU). It suggests
recommended actions to be taken to rectify the error in the system and where required will clearly indicate
which FRU requires service.