Family paper

13
White Paper: The Intel® Itanium® Processor 9300 Series
Appendix B: Table of RAS Features
RAS Capability
Intel® Itanium® Processor
9100 Series
Intel® Itanium® Processor
9300 Series
Processor Core
Extensive Error Protection/Correction
On-core structures and system interface:
ECC, parity and/or SER hardened latches and registers are used to avoid, detect
and correct errors.
On-cache structures:
ECC is used to detect and correct errors.
Intel® Cache Safe Technology is used to disable failed cache lines for improved availability.
Enhanced
L3 only L2, L3 and Directory
Processor Socket
Advanced Error Protection/Correction on the Processor Links
Dynamic Link Rerouting: Sustains uninterrupted operation if an Intel® QuickPath
Interconnect or Intel® Scalable Memory Interconnect (Intel® SMI) link physically fails.
Enhanced
Processor Onlining/Ofining: A processor can be functionally enabled or disabled without downtime
to adjust available resources or to map out a failed component.
OEM-based
c
Native support
Processor Hot Plug: A processor can be physically added, removed or replaced without downtime
for system upgrades or to replace a problematic component.
OEM-based
c
Native support
Intel® Virtualization Technology – processor cores: Hardware-based virtualization support improves
ability to implement transparent workload migration to optimize resource utilization and simplify failover.
Intel VT-i Intel VT-i2
Memory Subsystem
Memory Error Correction mechanisms include:
Memory ECC Support: Automatically detects and corrects all single-bit errors and most double bit
stored errors (uncorrectable errors are detected and reported). Errors in up to eight consecutive bits
can be corrected.
Single Device Data Correction: Automatically corrects multi-bit errors on a single DRAM device; can
map out a failed device and continue correcting single-bit errors.
Dual Device Data Correction: Automatically corrects multi-bit errors on two DRAM devices; can map
out two failed DRAM devices and continue correcting single-bit errors.
OEM-based
c
or
Intel chipset required
Native support
OEM-based
c
Native support
Memory Channel Protection: Includes three levels of protection, Cyclic Redundancy Check (CRC) to detect
and repair transient errors; physical layer reset for persistent errors; and lane failover if the reset fails.
OEM-based
c
Native support
Memory Scrubbing: Memory is monitored to correct errors, which protects correctable errors from
accumulating and becoming uncorrectable. Performed automatically and periodically (Patrol) and also at
the request of the OS (Demand).
OEM-based
c
Native support
Memory DIMM sparing: Firmware copies data from a failing DIMM to a spare DIMM on the same memory
channel, and maps out the failed component to enable uninterrupted operation.
Memory Migration: Firmware copies data from a failing DIMM and migrates it to a DIMM
on another memory controller of the same or another processor.
OEM-based
c
Native support
Memory Mirroring: A backup copy of main memory can be maintained for very high-reliability error
correction (if used, requires twice the memory).
OEM-based
c
Native support
Memory Onlining/Ofining: One or more DIMMS or memory riser cards can be functionally enabled or
disabled without downtime to adjust available resources or to map out a failed component.
OEM-based
c
Native support
Memory Hot Plug Support: Memory components (DIMMs) can be physically added, removed or replaced
without downtime. Includes OS-visible and OS-transparent capabilities.
OEM-based
c
Native support
d
Memory Thermal Protection includes:
Closed Loop thermal throttling: Memory channel activity can be reduced (or fan speed increased)
when the temperature of a DIMM exceeds a preset level.
Open Loop Thermal Throttling: Memory channel activity can be reduced (or fan speed increased)
when the number of memory commands per DIMM exceeds a congurable limit over a congurable
time interval.