System Fault Management White Paper

Executive Summary
This white paper discusses the latest hardware diagnostics product on the HP-UX operating
system, System Fault Management (SFM). It describes the SFM features and benefits. The white
paper also describes how to use SFM to obtain information about server hardware, troubleshoot
faulty hardware devices, and perform server manageability tasks.
Intended Audience
This white paper is intended for system administrators, HP support personnel, and HP field
engineers who require information about server hardware inventory and other details to
troubleshoot faulty devices. Readers of this white paper must be familiar with the HP-UX system
administration.
Scope of Hardware Diagnostics
The administration and maintenance of servers is becoming increasingly complex, expensive,
and time-consuming owing to the growing need to consolidate IT hardware resources across
businesses. Also, in the event of a system failure, key user operations receive a setback as
applications, tools, and databases running on the servers are affected. Consequently, huge losses
are incurred. Therefore, it is imperative that servers operate continuously, in a reliable manner,
because they run critical applications and serve multiple needs. Hardware diagnostic tools ensure
that the system is functioning normally. They provide critical information in the event of a system
failure, and retrieve details about the system components. Also, operating conditions can lead
to a device failure or affect the system performance. Therefore, it is essential to monitor the server
components and notify the user of any occurrence of any abnormalities.
Hardware diagnostic tools monitor server hardware and notify the occurrence of any abnormal
behavior with a monitored component, thereby avoiding a potential system breakdown. If a
device breaks down, hardware diagnostic tools narrows the search to the faulty component.
Another key feature of hardware diagnostic tools is its ability to retrieve information related to
the hardware inventory, such as serial number and part number of processors and memory
modules. Obtaining information such as the processor capacity and memory module path can
be complex and time-consuming. However, using a tool, it is both simple and quick to obtain
such details.
Traditional Diagnostic Solutions
HP has been delivering a diagnostics product, called Online Diagnostics, on HP-UX-based
systems. The Online Diagnostics product consists of EMS Hardware Monitors and the Support
Tools Manager (STM). EMS Hardware Monitors comprise a group of monitors. They monitor
various hardware components on a supported HP 9000 or an HP Integrity® server. When a
monitored component does not function normally, an event is generated and reported to the
user through configured channels.
STM manages a set of support tools the user can use to verify and troubleshoot system hardware.
It provides information about system configuration and provides an interface to run hardware
validation tests.
However, system manageability needs are constantly evolving. System manageability has spread
to newer dimensions, where tools must support a wide range of products, adhere to industry
standards, co-exist, and must be easy to manage.
System Fault Management (SFM) delivers the following characteristics that make it a viable
solution to meet the current industry requirements:
supports a wide range of servers, disks, and firmware,
provides critical, and granular details to help troubleshoot faulty devices,
Executive Summary 5