User's Manual

A hardware event monitor monitors the hardware for unusual behavior (known as an event)
and sends a message to EMS, which notifies the system administrator and provides suggestions
for correcting the problem.
There is a disk monitor that will monitor all disks attached to the A7173A adapter.
For more information about EMS and other online diagnostic tools, see the documents at:
http://docs.hp.com/en/diag.html#2%20Online%20Diagnostics
HP Offline Diagnostics Environment (ODE)
The A7173A adapter supports HP’s Offline Diagnostics Environment (ODE). ODE is an offline
support tools platform for troubleshooting systems that are running without an operating system
or systems that cannot be tested using online tools. The offline environment is also useful for
testing that needs to be done before a system is booted.
ODE provides a user-friendly interface for diagnostics and utilities that have been developed to
run in this environment.
The Offline Diagnostics Environment has a distributed architecture consisting of several modules.
Each module has a specific function and uses well defined protocols to communicate with the
other modules.
You can use ODE with either a command line interface, or a menu-driven interface. The command
line interface enables you to select specific tests and utilities to perform on a specific hardware
module. The menu-driven interface enables you to specify the hardware module to be tested,
then automatically selects and performs the necessary tests.
The Offline Diagnostic Environment consists of:
A Test Controller, which acts as the user interface and launches the execution of the Test
Modules.
Test Modules, which consist of diagnostic or utility programs designed to execute within
ODE. These modules exercise or diagnose user specified hardware units.
A System Library (SysLib), which consists of a set of common routines for use by both the
Test Controller and the Test Modules. These routines perform I/O, string parsing, and system
control.
For more information about ODE, see the documents at:
http://docs.hp.com/en/diag.html#3%20Offline%20Diagnostics
PCI Error Recovery
The PCI Error Recovery feature provides the ability to detect, isolate, and automatically recover
from a PCI error, avoiding a system crash. PCI Error Recovery is included with the HP-UX 11i
v3 operating system, and it is enabled by default.
NOTE: PCI Error Recovery is not supported on all platforms. To determine if PCI Error Recovery
is supported on your system, see the PCI Error Recovery Support Matrix:
http://www.docs.hp.com/en/ha.html#PCI%20Error%20Recovery
With the PCI Error Recovery feature enabled, if an error occurs on a PCI bus containing an I/O
card that supports PCI Error Recovery the following events occur:
The PCI bus is quarantined to isolate the system from further I/O and prevent the error from
damaging the system.
The PCI Error Recovery feature will attempt to recover from the error and reinitialize the
bus so I/O can resume.
If the PCI Error Recovery feature is disabled and an error occurs on a PCI bus, a Machine Check
Abort (MCA) or a High Priority Machine Check (HPMC) will occur, and the system will crash.
62 Troubleshooting