Specifications
BIOS Initialization QSSC-S4R Technical Product Specification
152
16.2.11.3.10 Memory Hot Replace in Inter Socket Mirroring Mode
Memory Hot Replace not supported in Inter Socket Mirroring Mode. If user tries to hot replace memory in Inter Socket
Mirroring mode, BIOS will log a configuration error SEL.
16.2.12 Memory Error Handling
This section describes the BIOS and chipset policies used for handling and reporting errors occurring in the memory
subsystem.
Memory errors can occur as a result of several conditions, such as from solar flares. A description of such conditions is
beyond the scope of this document, but an introduction is provided in the following sections.
16.2.12.1 Memory Error Classification
The BIOS classifies memory errors into the following categories:
x Memory Initialization errors: These are errors that occur during early POST DIMM discovery and channel
initialization. Errors in this category include SPD read errors and failure of DQ/DQS training on the channel during
memory channel initialization.
x Correctable ECC errors: Errors that occur between the Intel® Xeon® 7500 processor and the DRAM memory cells
and are corrected by the chipset. This correction could be the result of ECC correction, a successfully retried
memory cycle, or both. This also includes errors that are corrected in hardware via a RAS feature, such as a
failover mechanism. The memory performance may be compromised as a result.
x Unrecoverable/Fatal ECC Errors: Errors that occur in the memory cells and result in data corruption. The chipset‘s
ECC engine detects these errors, but cannot correct them. These errors create a loss of data fidelity and cause a
catastrophic failure of the system.
There are two specific stages in which memory errors can occur:
x Early POST, during memory discovery.
x Late POST, or at runtime (when the OS is running).
During POST, the BIOS captures and reports memory BIST errors.
At runtime, the BIOS captures and reports correctable, uncorrectable/fatal errors occurring in the memory subsystem.
16.2.12.1.1 Invalid DDR3 DIMM Population
The BIOS provides detection of a DDR3 DIMM installation that does not meet memory population requirements – the
“fill farthest first” rule. A DDR3 DIMM that is incorrectly installed as a single DIMM in the wrong socket on the channel
will be disabled. An example of this would be a single DIMM installed in slot DIMM_1D, with slot DIMM_1B empty on
the memory board in MEM1_SLOT. DIMM_1D will be disabled.
However, a DDR3 DIMM that is not valid for the platform, that is, it does not meet the size, organization, speed, or
timing constraints for the Intel® Xeon® 7500 processor series IMC during memory initialization in POST, will be
considered as having failed the memory test.
16.2.12.1.2 Faulty DDR-3 DIMMs
The BIOS provides detection of a faulty or failing DDR3 DIMM. A DDR3 DIMM is considered faulty if it fails the memory
BIST. The BIOS enables the HW Memory BIST engine in the Intel® Xeon® 7500 processor during memory
initialization in POST.
The Memory BIST function is run on every DDR3 DIMM during each boot of the system, unless waking from S3 (S3 is
supported only on Workstation SKUs if any). The Memory BIST cycle isolates failed, failing, or faulty DDR3 DIMMs and
the BIOS then marks those DDR3 DIMMs as failed, and takes these DDR3 DIMMs offline.
If all DDR3 DIMM fails the Memory BIST, the BIOS halts with POST Diagnostics code 0xEB (Memory Test Error, as
described in Section 16.2.2).
If usable DIMMs remain available, POST continues. The BIOS sends Set Fault Indication IPMI command for failed
DDR3 DIMMs so that BMC can light failed Fault LEDs, and BIOS takes those DDR3 DIMMs offline. A Memory Error
beep code is sounded as described in Section 16.2.2. Later, the Error Manager displays appropriate DIMM error
codes. The DDR3 DIMMs taken offline will be excluded from the available memory shown in the BIOS Setup screen
memory displays and other memory reporting functions.