User's Manual
Diagnostics
A suite of offline and online support tools are available to enable troubleshooting server blade
issues. In general, if the operating system (HP-UX) is already running, HP does not recommend
shutting down the server blade. Use the online support tools.
If the OS cannot be booted, use the offline support tools to resolve the issue. The offline support
tools are available from the UEFI partition. after you resolve the issue preventing booting, boot
HP-UX, and use the online support tools for any further testing.
If it is not possible to reach the UEFI from either the main disk or from LAN, you must troubleshoot
using the visual fault indicators, console messages, and system error logs that are available.
General diagnostic tools
DescriptionDiagnostic Tool
Provides detailed information about the IPMI event (Issue
description, cause, action)
IPMI Event Decoder
Fault management overview
The goal of fault management and monitoring is to increase server blade availability, by moving
from a reactive fault detection, diagnosis, and repair strategy to a proactive fault detection,
diagnosis, and repair strategy. The objectives are:
• To detect issues automatically, as close as possible to the time of occurrence.
• To diagnose issues automatically, at the time of detection.
• To automatically report (in understandable text) a description of the issue, the likely causes of
the issue, the recommended actions to resolve the issue, and detailed information about the
issue.
• To be sure that tools are available to repair or recover from the fault.
HP-UX Fault management
Proactive fault prediction and notification is provided on HP-UX by SFM and WBEM indications.
WBEM is a collection of standards that aid large-scale systems management. WBEM allows
management applications to monitor systems in a network.
SFM and WBEM indication providers enable users to monitor the operation of a wide variety of
hardware products, and alert them immediately if any failure or other unusual event occurs. By
using hardware event monitoring, users can virtually eliminate undetected hardware failures that
could interrupt server blade operation or cause data loss.
92 Troubleshooting