Release Notes
4 Memory Errors and Dell PowerEdge YX4X Server Memory RAS Features
a memory module replacement. However, some server competitors will go as far as to
say that an indefinite number of correctable errors are acceptable – a belief that is not
shared by Dell Engineering. Instead, PowerEdge server firmware will intelligently
monitor the health of memory and recommend self-healing action or module
replacement based on a variety of factors including DIMM capacity, rates of correctable
errors, and effectiveness of available self-healing. The intent behind Dell’s proprietary
predictive failure algorithms is to proactively identify DIMMs that are most likely to
continue to degrade and potentially generate uncorrectable errors.
o Uncorrectable Errors (UCEs)
o Uncorrectable errors are errors that can be detected but could not be corrected by the
server platform. These are the result of multi-bit errors and may be caused by any
combination of soft and hard errors (for example, soft-soft, soft-hard, hard-hard, etc.).
o Occurrence of an uncorrectable error will typically lead to either an application crash
(non-fatal error) or server crash (fatal error) – both of which result in unexpected
downtime. Systems with MCA Recovery have the capability of performing run-time
recovery from some types of uncorrectable memory errors.
A Primer on Dell EMC PowerEdge Server Memory RAS Capabilities
Previously discussed memory errors are mitigated through PowerEdge server memory RAS capabilities
which entail fault avoidance, detection, and correction in hardware and software. These mitigating RAS
features are all intended to improve system reliability and extend uptime in the event of memory errors.
FYI: It is useful to understand the difference between x4 and x8 DIMMs. This
refers to the width of the DRAM components on a memory module. x4 DIMMs
utilize DRAM components that have a 4-bit width and x8 DIMMs utilize
components with an 8-bit width.
The common DIMM organizational notation is as follows: #RxN. Where # is the
number of ranks and N is the width of the DRAM. Example – 2Rx4 means the
DIMM has two ranks of x4 DRAM devices.
Single Error Correction - Double Error Detection (SEC-DED) ECC
SEC-DED Feature Support Table
Platforms Supported
Intel Platforms:
(All Xeon Families)
AMD Platforms:
(All EPYC Families)
DIMMs Supported
x4 DIMMs:
x8 DIMMs:
Single Error Correction - Double Error Detection ECC, or SEC-DED ECC, is the most basic form of error
correcting code (ECC) available. All PowerEdge servers (both Intel and AMD based platforms) configured
with ECC memory modules are capable of SEC-DED for each memory page access (64 data bits + 8 ECC