Intel Server Board S2400BB

Intel® Server Board S2400BB TPS

Revision 2.0

When Mirroring Mode is operational, each channel in a pair is “mirrored” by the other channel. The impact on

Effective Memory size is to reduce by half the total amount of installed memory available for use.

When Mirroring Mode is operational, the system treats Correctable Errors the same way as it would in

Independent channel mode. There is a correctable error threshold. Correctable error counts accumulate by

rank, and the first event is logged.

What Mirroring primarily protects against is the possibility of an Uncorrectable ECC Error occurring with critical

data “in process”. Without Mirroring, the system would be expected to “Blue Screen” and halt, possibly with

serious impact to operations. But with Mirroring Mode in operation, an Uncorrectable ECC Error from one

channel becomes a Mirroring Fail Over (MFO) event instead, in which the IMC retrieves the correct data from

the “mirror image” channel and disables the failed channel. Since the ECC Error was corrected in the process

of the MFO Event, the ECC Error is demoted to a Correctable ECC Error. The channel pair becomes a single

non-redundant channel, but without impacting operations, and the Mirroring Fail Over Event is logged to SEL

to alert the user that there is memory hardware that has failed and needs to be replaced.

3.2.2.6 Rank Sparing Mode

Rank Sparing Mode enhances the system’s RAS capability by “swapping out” failing ranks of DIMMs. Rank

Sparing is strictly channel and rank oriented. Each memory channel is a Sparing Domain.

For Rank Sparing to be available as a RAS option, there must be 2 or more single rank or dual rank DIMMs, or

at least one quad rank DIMM installed on each memory channel.

Rank Sparing Mode is enabled/disabled in the Memory RAS and Performance Configuration screen in the

<F2> Bios Setup Utility

When Sparing Mode is operational, for each channel, the largest size memory rank is reserved as a “spare”

and is not used during normal operations. The impact on Effective Memory Size is to subtract the sum of the

reserved ranks from the total amount of installed memory.

Hardware registers count the number of Correctable ECC Errors for each rank of memory on each channel

during operations and compare the count against a Correctable Error Threshold. When the correctable error

count for a given rank hits the threshold value, that rank is deemed to be “failing”, and it triggers a Sparing Fail

Over (SFO) event for the channel in which that rank resides. The data in the failing rank is copied to the Spare

Rank for that channel, and the Spare Rank replaces the failing rank in the IMC’s address translation registers.

An SFO Event is logged to the BMC SEL. The failing rank is then disabled, and any further Correctable Errors

on that now non-redundant channel will be disregarded.

The correctable error that triggered the SFO may be logged to the BMC SEL, if it was the first one to occur in

the system. That first correctable error event will be the only one logged for the system. However, since each

channel is a Sparing Domain, the correctable error counting continues for other channels which are still in a

redundant state. There can be as many SFO Events as there are memory channels with DIMMs installed.

3.2.2.7 Single Device Data Correction (SDDC)

SDDC – Single Device Data Correction is a technique by which data can be replaced by the IMC from an

entire x4 DRAM device which is failing, using a combination of CRC plus parity. This is an automatic IMC

driven hardware. It can be extended to x8 DRAM technology by placing the system in Channel Lockstep Mode.

3.2.2.8 Error Correction Code (ECC) Memory

ECC uses “extra bits” – 64-bit data in a 72-bit DRAM array – to add an 8-bit calculated “Hamming Code” to

each 64 bits of data. This additional encoding enables the memory controller to detect and report single or

multiple bit errors when data is read, and to correct single-bit errors.