Concept Guide

8 Memory Errors and Dell PowerEdge YX4X Server Memory RAS Features

Memory Configuration

Required

• Two or more memory ranks per memory channel

Adaptive Double Device Data Correction (ADDDC) is an Intel platform-specific technology that allows for

two DRAM devices to sequentially fail before loss of fault-avoidance. ADDDC is only supported with x4

DIMM populations and requires a memory configuration of two or more memory ranks channel (two

DIMMs per channel or a single DIMM with multiple ranks).

ADDDC works by having the BIOS track the number of correctable errors per DRAM bank. If the number

approaches a threshold deemed unsafe by BIOS, then ADDDC is activated and the failing DRAM bank is

dynamically mapped out while a ‘buddy’ bank is mapped in to take its place. The DIMM continues to

operate with SDDC coverage. At this point, memory performance will be impacted as the memory

controller must do two reads for every read to the mapped-out cache-lines.

FYI: ADDDC will only provide fault coverage for sequential DRAM failures over

time. Two parallel DRAM failures within the same memory access still result in a

service outage. Additionally, ADDDC only applies to correctable errors and only

helps to protect an uncorrectable error from occurring by reducing the chance that

correctable errors become uncorrectable.

Memory Patrol Scrub

Memory Patrol Scrub Feature Support Table

DIMMs Supported

x4 DIMMs: 

x8 DIMMs: 

Memory Patrol Scrub is a Dell memory RAS feature designed to decrease the probability of a user

encountering a multi-bit error by removing the accumulation of soft errors in DRAM. This in turn

reduces the chance of encountering an uncorrectable error (depending on other RAS capabilities

enabled and where the multi-bit error occurs). Memory patrol scrub works by having the CPU memory

controller periodically scan through DRAM and correct any correctable errors that it encounters.

In addition to scrubbing for correctable errors, patrol scrub can also detect latent uncorrectable errors in

memory. These UCEs are referred to as unconsumed uncorrectable errors – or uncorrectable errors

detected in a non-execution path of the CPU. Detection of these unconsumed UCEs are logged in the

System Event Log as a critical event, MEM9072: “The system memory has faced uncorrectable multi-bit

memory errors in the non-execution path of a memory device at the location <location>.” If the server is

running BIOS version 2.8.2 or higher, it is highly recommended to perform a cold reboot as soon as

possible. This will prevent the system from consuming the UCE and allow the server BIOS to perform

self-healing at the affected memory location.

Memory patrol scrubbing is enabled by default and configured to perform in the background every 24

hours. Memory patrol scrub can be disabled or set to run at an accelerated schedule (every four hours)