Concept Guide

14 Memory Errors and Dell PowerEdge YX4X Server Memory RAS Features
Figure 7 - PPR for a row in a bank group of a 4Gb x4 device
PPR is always available on PowerEdge server platforms that support it and if deemed necessary by BIOS
will automatically execute after a system cold reboot. For PPR to successfully execute, it is
recommended that users do not swap or replace DIMMs between boots when receiving memory error
event messages, unless instructed to do so by Dell technical support personnel.
In addition to PPR, the PowerEdge server memory self-healing process also includes memory re-training.
Memory training is the process by which the CPU initializes, calibrates, and tunes the link between itself
and the memory modules. While performing full memory training can help to ensure that the memory
bus operates at the highest level of signaling integrity, it is also a time-consuming process that directly
impacts server boot times. Therefore, PowerEdge servers only perform this step when necessary, such
as during the memory self-healing process.
Machine Check Architecture Recovery
MCA Recovery Feature Support Table
DIMMs Supported
x4 DIMMs:
x8 DIMMs:
Machine Check Architecture Recovery, or MCA Recovery, is an advanced RAS feature which when used in
conjunction with operating systems that support it, can prevent some uncorrectable memory errors
from generating an unexpected system-wide outage event. MCA Recovery is not a memory-specific RAS
feature. Its capabilities extend to various forms of CPU data consumption including data from I/O. The
scope of MCA Recovery discussed here will be limited to data consumption from system memory.
Essentially, MCA Recovery is a CPU capability that allows BIOS to signal consumption of an uncorrectable
error to the operating system through a Machine Check Exception. This allows the OS an opportunity to
potentially perform memory error containment. The outcome depends entirely on whether the
impacted memory is associated with kernel space or user/application/VM space:
If the impacted data was in kernel memory, then the OS will kernel panic.