Release Notes

4 Memory Errors and Dell PowerEdge YX4X Server Memory RAS Features

a memory module replacement. However, some server competitors will go as far as to

say that an indefinite number of correctable errors are acceptable – a belief that is not

shared by Dell Engineering. Instead, PowerEdge server firmware will intelligently

monitor the health of memory and recommend self-healing action or module

replacement based on a variety of factors including DIMM capacity, rates of correctable

errors, and effectiveness of available self-healing. The intent behind Dell’s proprietary

predictive failure algorithms is to proactively identify DIMMs that are most likely to

continue to degrade and potentially generate uncorrectable errors.

o Uncorrectable Errors (UCEs)

o Uncorrectable errors are errors that can be detected but could not be corrected by the

server platform. These are the result of multi-bit errors and may be caused by any

combination of soft and hard errors (for example, soft-soft, soft-hard, hard-hard, etc.).

o Occurrence of an uncorrectable error will typically lead to either an application crash

(non-fatal error) or server crash (fatal error) – both of which result in unexpected

downtime. Systems with MCA Recovery have the capability of performing run-time

recovery from some types of uncorrectable memory errors.

A Primer on Dell EMC PowerEdge Server Memory RAS Capabilities

Previously discussed memory errors are mitigated through PowerEdge server memory RAS capabilities

which entail fault avoidance, detection, and correction in hardware and software. These mitigating RAS

features are all intended to improve system reliability and extend uptime in the event of memory errors.

FYI: It is useful to understand the difference between x4 and x8 DIMMs. This

refers to the width of the DRAM components on a memory module. x4 DIMMs

utilize DRAM components that have a 4-bit width and x8 DIMMs utilize

components with an 8-bit width.

The common DIMM organizational notation is as follows: #RxN. Where # is the

number of ranks and N is the width of the DRAM. Example – 2Rx4 means the

DIMM has two ranks of x4 DRAM devices.

Single Error Correction - Double Error Detection (SEC-DED) ECC

SEC-DED Feature Support Table

Platforms Supported

Intel Platforms: 

(All Xeon Families)

AMD Platforms: 

(All EPYC Families)

DIMMs Supported

x4 DIMMs: 

x8 DIMMs: 

Single Error Correction - Double Error Detection ECC, or SEC-DED ECC, is the most basic form of error

correcting code (ECC) available. All PowerEdge servers (both Intel and AMD based platforms) configured

with ECC memory modules are capable of SEC-DED for each memory page access (64 data bits + 8 ECC