Specifications
Chapter 4. Continuous availability and manageability 173
Figure 4-9 shows a schematic of a fault isolation register implementation.
Figure 4-9 Schematic of FIR implementation
Fault isolation
The service processor interprets error data that is captured by the FFDC checkers (saved in
the FIRs or other firmware-related data capture methods) to determine the root cause of the
error event.
Root cause analysis might indicate that the event is recoverable, meaning that a service
action point or need for repair has not been reached. Alternatively, it could indicate that a
service action point has been reached, where the event exceeded a pre-determined
threshold or was unrecoverable. Based on the isolation analysis, recoverable error
threshold counts can be incremented. No specific service action is necessary when the
event is recoverable.
When the event requires a service action, additional required information is collected to
service the fault. For unrecoverable errors or for recoverable events that meet or exceed their
service threshold, meaning that a service action point has been reached, a request for
service is initiated through an error logging component.
4.3.2 Diagnosing
Using the extensive network of advanced and complementary error detection logic that is built
directly into hardware, firmware, and operating systems, the IBM Power Systems servers can
perform considerable self-diagnosis.
Memory
CPU
L2 / L3
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
L1
Disk
Text
Text
Text
Text
Text
Text
Text
Text
Non-volatile
RAM
Service
Processor
Error checkers
Text
Fault isolation register (FIR)
Unique fingerprint of each
captured error
Log error