Concept Guide

8 Memory Errors and Dell PowerEdge YX4X Server Memory RAS Features
Memory Configuration
Required
Two or more memory ranks per memory channel
Adaptive Double Device Data Correction (ADDDC) is an Intel platform-specific technology that allows for
two DRAM devices to sequentially fail before loss of fault-avoidance. ADDDC is only supported with x4
DIMM populations and requires a memory configuration of two or more memory ranks channel (two
DIMMs per channel or a single DIMM with multiple ranks).
ADDDC works by having the BIOS track the number of correctable errors per DRAM bank. If the number
approaches a threshold deemed unsafe by BIOS, then ADDDC is activated and the failing DRAM bank is
dynamically mapped out while a ‘buddy’ bank is mapped in to take its place. The DIMM continues to
operate with SDDC coverage. At this point, memory performance will be impacted as the memory
controller must do two reads for every read to the mapped-out cache-lines.
FYI: ADDDC will only provide fault coverage for sequential DRAM failures over
time. Two parallel DRAM failures within the same memory access still result in a
service outage. Additionally, ADDDC only applies to correctable errors and only
helps to protect an uncorrectable error from occurring by reducing the chance that
correctable errors become uncorrectable.
Memory Patrol Scrub
Memory Patrol Scrub Feature Support Table
DIMMs Supported
x4 DIMMs:
x8 DIMMs:
Memory Patrol Scrub is a Dell memory RAS feature designed to decrease the probability of a user
encountering a multi-bit error by removing the accumulation of soft errors in DRAM. This in turn
reduces the chance of encountering an uncorrectable error (depending on other RAS capabilities
enabled and where the multi-bit error occurs). Memory patrol scrub works by having the CPU memory
controller periodically scan through DRAM and correct any correctable errors that it encounters.
In addition to scrubbing for correctable errors, patrol scrub can also detect latent uncorrectable errors in
memory. These UCEs are referred to as unconsumed uncorrectable errors or uncorrectable errors
detected in a non-execution path of the CPU. Detection of these unconsumed UCEs are logged in the
System Event Log as a critical event, MEM9072: “The system memory has faced uncorrectable multi-bit
memory errors in the non-execution path of a memory device at the location <location>. If the server is
running BIOS version 2.8.2 or higher, it is highly recommended to perform a cold reboot as soon as
possible. This will prevent the system from consuming the UCE and allow the server BIOS to perform
self-healing at the affected memory location.
Memory patrol scrubbing is enabled by default and configured to perform in the background every 24
hours. Memory patrol scrub can be disabled or set to run at an accelerated schedule (every four hours)