White Paper
7
How PCI / PCI-Express Error Recovery Works
When an I/O driver detects PCI bus parity errors, it reports the errors to Error Recovery Infrastructure
and then to the core platform support module.
Core PSM implements error recovery functionality using interfaces that are independent of the
platform. This module verifies if the error is a device error or a bus error. If the error is a device error
then the PSM ignores the error. Otherwise, the PSM module handles the I/O error and notifies the
error recovery infrastructure about error handling. While handling the error, the core PSM invokes
firmware interface (like Health checker daemon as shown in the figure below) which logs and clears
the error.
Figure 3 depicts the PCI I/O error recovery control flow.
Figure 3: PCI Error Recovery Flow Diagram
The PCI I/O errors are handled as follows by the system that supports error recovery functionality:
Determine whether the platform and the drivers are error recovery capable. If that is the case,
set the I/O paths to SoftFail mode.
When a driver detects an error, it reports the error to error recovery Infrastructure. The report
is sent to the core PSM.
To handle and recover from PCI error, the core PSM completes the following three phases:
- Diagnose Phase
- Synchronization (suspension) Phase and
- Release (resumption) Phase
Diagnose Phase: During this phase, the I/O node information is passed from the driver to the core
PSM. This node forms the initial root of the error path. The primary goal of this state is to gather
additional information and determine the actual root of the error path. On PA-RISC platform system,
during this phase the errors are logged in the firmware in SAL format on legacy platforms and UEFI
format on HP Integrity Superdome platform.
No attempt is made during this state to recover the path from the error, and some hardware may be
inaccessible.
Synchronization (suspension) Phase: During this phase, the core PSM attempts to clear logged errors