Datasheet
Intel
®
Xeon
®
Processor C5500/C3500 Series
February 2010 Datasheet, Volume 1
Order Number: 323103-001 377
Reliability, Availability, Serviceability (RAS)
11.3.1 Error Severity Classification
Errors are classified into three severities in the IIO: Correctable, Uncorrectable, and Fatal. This
classification separates those errors resulting in functional failures from those errors resulting in
degraded performance. In the IIO, each severity can trigger a system event according to the mapping
defined by the error severity register. This mechanism provides the software with the flexibility to
map an error to the suitable error severity. For example, a platform might choose to respond to a
uncorrectable ECC error with low priority while another platform design may require mapping the
same error to a higher severity. The mapping of the error is set to the default mapping at power-on,
such that it is consistent with default mapping defined in Table 129. The software/firmware can
choose to alter the default mapping after power-on.
11.3.1.1 Correctable Errors (Severity 0 Error)
Hardware correctable errors include those error conditions in which the system can recover without
loss of information. Hardware corrects these errors and no software intervention is required. For
example, a Link CRC error that is corrected by Data Link Level Retry is considered a correctable error.
— Error is corrected by the hardware without software intervention. System operation may be
degraded but its functionality is not compromised.
— Correctable error may be logged and reported in a implementation specific manner:
Upon the immediate detection of the correctable error, or
Upon the accumulation of errors reaching to a threshold.
11.3.1.2 Recoverable Errors (Severity 1 Error)
Recoverable errors are software-correctable or software/hardware-uncorrectable errors that cause a
particular transaction to be unreliable but the system hardware is otherwise fully functional. Isolating
recoverable from fatal errors provides system management software with the opportunity to recover
from the error without reset and disturbing other transactions in progress. Devices not associated
with the transaction in error are not impacted by the error. An example of recoverable error is an ECC
Uncorrectable error that affects only the data portion of a transaction.
— Error could not be corrected by hardware and may require software intervention for
correction.
— Or error could not be corrected. Data integrity is compromised, but system operation is not
compromised.
— Requires immediate logging and reporting of the error to CPU.
— OS/Firmware takes the action to contain the error.
11.3.1.2.1 Software Correctable Errors
Software correctable errors are considered as “recoverable” error. These errors include those error
conditions where the system can recover without any loss of information. Software intervention is
required to correct these errors.
— Requires immediate logging and reporting of the error to CPU.
— Firmware or other system software layers take corrective actions.
— Data integrity is not compromised with such errors.
11.3.1.3 Fatal Errors (Severity 2 Error)
Fatal errors are uncorrectable error conditions which render the IIO hardware unreliable. For fatal
error, inband reporting to the CPU is still possible. A reset might be required to return to reliable
operation.
— System integrity is compromised and continued operation may not be possible.