Datasheet
Intel
®
Xeon
®
Processor C5500/C3500 Series
February 2010 Datasheet, Volume 1
Order Number: 323103-001 375
Reliability, Availability, Serviceability (RAS)
11.0 Reliability, Availability, Serviceability (RAS)
11.1 IIO RAS Overview
This chapter describes the features provided by the Intel
®
Xeon
®
processor C5500/C3500 series IIO
module for the development of high RAS (Reliability, Availability, Serviceability) systems. RAS refers
to three main features associated with system’s robustness. These features are summarized as:
• Reliability: How often errors occur, and whether the system can recover from an error condition.
• Availability: How flexible the system resources can be allocated or redistributed for the system
utilizations and system recovery from errors.
• Serviceability: How well the system reports and handles events related to error, power
management, and hot plug.
IIO RAS features aim to achieve the following:
• Soft, uncorrectable error detection (Intel
®
QPI, PCIe) and recovery (PCIe) on links. CRC is used
for error detection (Intel
®
QPI, PCIe), and error recovered by packet retry (PCIe).
• Clearly identify non-fatal errors whenever possible and minimize fatal errors.
— Synchronous error reporting of the affected transactions by the appropriate completion
responses or data poisoning.
— Asynchronous error reporting for non-fatal and fatal errors via inband messages or outband
signals.
— Enable the software to contain and recover from errors.
— Error logging/reporting to quickly identify failures, contain and recover from errors.
• PCIe hot add/remove to provide better serviceability.
The processor IIO RAS features can be divided into five categories. These features are summarized
below and detailed in the subsequent sections:
1. System level RAS
— Platform or system level RAS for inband and outband system management features.
— On-line hot add/remove for serviceability.
— Memory mirroring, and sparing for memory protection.
2. IIO RAS
— IIO RAS features for error protection, logging, detection and reporting.
3. Intel
®
QuickPath Interconnect RAS
— Standard Intel
®
QuickPath Interconnect RAS features as specified in the Intel
®
QuickPath
Interconnect specification.
4. PCI Express RAS
— Standard PCIe RAS features as specified in the PCIe specification.
5. Hot Add/Remove
— PCIe hot plug/remove support.