HP Superdome 2 Health Management Stack Whitepaper (September 2011, 5900-2013)

4
Figure 1: HP Superdome 2 (SD2) Analysis Engine
Error Logging Services
Error Logging Services (ELS) is responsible for collecting all the raw data from all the error logs in all
partitions and bundles the Error report and delivers to Core Analysis Engine to analyze the error.
Core Analysis Engine
Core Analysis Engine (CAE) resides on the Onboard Administrator and reads ELS error logs to perform
analysis of the error. CAE analyzes correctable errors and looks for repeating patterns that may exceed a
predictive error threshold so that administrators can be alerted before a hard failure occurs.
CAE takes appropriate action as part of the error analysis.
The Actions initiated by CAE are:
− Indictment -> Identifies a FRU as requiring service
− Suspicion -> Identifies a FRU that may be involved in a fault event
− Deconfigure -> Identify a FRU not to be used at next boot
− Deactivate -> Action to stop using the FRU in current boot starting immediately
− Acquittal -> Acquits the indictments (FRU is considered to have been serviced)
When the Analysis Engine decides that a FRU needs service, it indicts it. Indictment simply means that a
service action is needed for the indicted FRU. Indictment records contain additional information about the
fault, such as related components and affected sub-FRU components.
Deconfiguration occurs when the Analysis Engine marks a FRU or part of a FRU to be excluded from use the
next time a partition boots. It may do this to prevent the use of faulty or possibly intermittent hardware.