System Fault Management White Paper HP Part Number: 5992-5243 Published: June 2008 Edition: 1.
Legal Notices © Copyright 2008 Hewlett-Packard Company, L.P. Confidential Computer Software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.11 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Table of Contents Executive Summary................................................................................................................................5 Intended Audience.................................................................................................................................5 Scope of Hardware Diagnostics..............................................................................................................5 Traditional Diagnostic Solutions..........................
List of Figures 1 2 3 4 5 6 7 8 9 4 CMS and Local System Display......................................................................................................7 SFM with HP Service Essentials Remote Support Pack..................................................................8 Follow-the-Red................................................................................................................................9 Event List......................................................................
Executive Summary This white paper discusses the latest hardware diagnostics product on the HP-UX operating system, System Fault Management (SFM). It describes the SFM features and benefits. The white paper also describes how to use SFM to obtain information about server hardware, troubleshoot faulty hardware devices, and perform server manageability tasks.
• • co-exists with other industry-standard solutions, enables simplified user management The following sections discuss SFM in detail. Overview System Fault Management (SFM) is a collection of tools that are used to monitor the health of HP servers running the HP-UX operating system.
Figure 1 CMS and Local System Display The SFM product includes the following components: • • • SFM Providers EVWEB Error Management Technology (EMT) A user can use each of these components through the interface of a supported management application, such as HP SMH and HP SIM. SFM providers are components of SFM that retrieve information about the inventory on a system and the events that occur on the hardware resources. SFM providers are of two types: instance providers and indication providers.
Integrating SFM with HP Service Essentials Remote Support Pack SFM integrates with other diagnostic tools, such as the HP Service Essentials Remote Support Pack. HP Service Essentials Remote Support Pack analyzes WBEM indications generated by the SFM providers, generate problem reports, and notify the user of any potential problems. Figure 2 illustrates the integration of SFM with HP Service Essentials Remote Support Pack.
Figure 3 Follow-the-Red A demonstration of the follow-the-red strategy is available at: http://h20324.www2.hp.com/ hpsdp/index.jsp?auto=1&ib=5009160&category_id=5009304&demo_id=5038469 Advantages of SFM SFM offers the following advantages: • • • • • • Displays information on standards-compliant graphical and command-line system management applications, such as HP SIM and HP SMH. Operates within the WBEM environment. Supports the Central Management Server (CMS) running on HP-UX, Linux®, or Windows®.
Use Cases This section describes various use cases of SFM. They are as follows: • • • • • • • Viewing Errors on a Monitored System Receiving Notification of Errors Viewing System Health Status Defining WBEM Indications Criteria Viewing Error Metadata Troubleshooting Hardware Viewing Event Logs Viewing Errors on a Monitored System This use case illustrates how events generated by SFM are displayed on HP SMH and HP SIM. Figure 4 is a sample output of the event list on HP SMH.
Figure 5 Configuring Event Destination Viewing System Health Status SFM provides the health status of the system as well as the health status of each of the monitored components. It also provides details of the hardware inventory on the system. Figure 6 shows a sample output of the health status on HP SMH.
Figure 6 Health Status Defining WBEM Indications Criteria SFM enables the user to subscribe to indications that suit the user’s needs. It also enables the user to specify the destination of the indications. Figure 7 shows a sample output of the subscription administration tasks.
Viewing Error Metadata SFM enables the user to search and view most of the errors that can occur on the HP-UX 11i v3 operating system. Figure 8 is a sample of the error metadata on a system. Figure 8 Error Metadata Troubleshooting Hardware The faulty hardware can be rectified using the details displayed in the WBEM indication. Figure 9 shows a sample output of the details of a WBEM indication.
Figure 9 Event Details Viewing Event Logs SFM enables the user to receive event logs in the textual format. These logs contain details of all the events that are generated. Following is a sample excerpt of an event text log: EvArchNo Severity Event # Event Category Archive Time Summary ========= =========== ============== ============== ============== ============= = 13 Minor 100101 System Inte... 2008-06-04 15: A platform .. . 12 Information 7 System Hard... 2008-06-02 11: The Diagnos.. .
Glossary A-C Central Management Server (CMS) The server monitoring the client systems in the network using SFM. Common Information Model Object Manager (CIMOM) The component of WBEM that manages the interaction between the providers and other modules in the WBEM environment D Distributed Management Task Force (DMTF) An industry organization involved in the development, adoption, and interoperability of management standards and initiatives for enterprise and Internet environments.
W-Z WBEM (Web-Based Enterprise Management) 16 Glossary A set of management and internet standard technologies to unify the management of enterprise computing environments.