Specifications

For more information, regarding the use of the Offline Diagnostics Environment, see the Offline
Diagnostics Environment Administrator's and User's Guide (http://docs.hp.com/en/5992-6605/
5992-6605.pdf).
General diagnostic tools
DescriptionDiagnostic Tool
Provides detailed information about the IPMI event (Issue
description, cause, action)
IPMI Event Decoder
Fault management overview
The goal of fault management and monitoring is to increase server blade availability, by moving
from a reactive fault detection, diagnosis, and repair strategy to a proactive fault detection,
diagnosis, and repair strategy. The objectives are:
To detect issues automatically, as close as possible to the time of occurrence.
To diagnose issues automatically, at the time of detection.
To automatically report (in understandable text) a description of the issue, the likely causes
of the issue, the recommended actions to resolve the issue, and detailed information about
the issue.
To ensure that tools are available to repair or recover from the fault.
HP-UX Fault management
Proactive fault prediction and notification is provided on HP-UX by SFM and WBEM indications.
WBEM is a collection of standards that aid large-scale systems management. WBEM allows
management applications to monitor systems in a network.
SFM and WBEM indication providers enable users to monitor the operation of a wide variety of
hardware products, and alert them immediately if any failure or other unusual event occurs. By
using hardware event monitoring, users can virtually eliminate undetected hardware failures
that could interrupt server blade operation or cause data loss.
HP SMH is the applications used to query information about monitored devices and view
indications and instances on WBEM. This WBEM-based network management application enables
you to create subscriptions and view indications.
SysMgmtPlus functionality displays the property pages of various devices and firmware on HP
SMH. SysMgmtPlus enables HP SMH to display enhanced property pages that contain dynamic
content, providing the user to view and hide details of devices and firmware. The Health Tests
are associated with components. The healthtest feature provides an option to perform health test
on all the device instances of the component.
For complete information on installing, administrating, and troubleshooting SFM software and
its components, see the System Fault Management Administrator's Guide (http://docs.hp.com/hpux/
diag).
Errors and error logs
Event log definitions
Often the underlying root cause of an MCA event is captured by the server blade or firmware
in both the SEL and FPL logs. These errors are easily matched with MCA events by timestamps.
For example, the loss of a processor VRM might cause a processor fault. Decoding the MCA error
logs would only identify the failed processor as the most likely faulty CRU. Following are some
important points to remember about events and event logs:
Errors and error logs 65