Specifications

Chapter 4. Continuous availability and manageability 173

Figure 4-9 shows a schematic of a fault isolation register implementation.

Figure 4-9 Schematic of FIR implementation

Fault isolation

The service processor interprets error data that is captured by the FFDC checkers (saved in

the FIRs or other firmware-related data capture methods) to determine the root cause of the

error event.

Root cause analysis might indicate that the event is recoverable, meaning that a service

action point or need for repair has not been reached. Alternatively, it could indicate that a

service action point has been reached, where the event exceeded a pre-determined

threshold or was unrecoverable. Based on the isolation analysis, recoverable error

threshold counts can be incremented. No specific service action is necessary when the

event is recoverable.

When the event requires a service action, additional required information is collected to

service the fault. For unrecoverable errors or for recoverable events that meet or exceed their

service threshold, meaning that a service action point has been reached, a request for

service is initiated through an error logging component.

4.3.2 Diagnosing

Using the extensive network of advanced and complementary error detection logic that is built

directly into hardware, firmware, and operating systems, the IBM Power Systems servers can

perform considerable self-diagnosis.

Memory

CPU

L2 / L3

Text

Disk

Text

Non-volatile

RAM

Service

Processor

Error checkers

Text

Fault isolation register (FIR)

Unique fingerprint of each

captured error

Log error