S2600GZ and S2600GL

ManualsBrandsIntel ManualsOtherIntel Server System R2208GZ4GS9

Table Of Contents

Intel® Server Board S2600GZ/GL TPS Product Architecture Overview

The system BIOS has logic to cope with the random factor in correctable ECC errors. Rather than reporting

every correctable error that occurs, the BIOS has a threshold and only logs a correctable error when a

threshold value is reached. Additional correctable errors that occur after the threshold has been reached are

disregarded. In addition, on the expectation the server system may have extremely long operational runs

without being rebooted, there is a “Leaky Bucket” algorithm incorporated into the correctable error counting

and comparing mechanism. The “Leaky Bucket” algorithm reduces the correctable error count as a function of

time – as the system remains running for a certain amount of time, the correctable error count will “leak out” of

the counting registers. This prevents correctable error counts from building up over an extended runtime.

The correctable memory error threshold value is a configurable option in the <F2> BIOS Setup Utility, where

you can configure it for 20/10/5/ALL/None

Once a correctable memory error threshold is reached, the event is logged to the System Event Log (SEL) and

the appropriate memory slot fault LED is lit to indicate on which DIMM the correctable error threshold crossing

occurred.

3.2.4.5.2.2 Uncorrectable Memory ECC Error Handling

All multi-bit “detectable but not correctable“ memory errors are classified as Uncorrectable Memory ECC Errors.

This is generally a fatal error.

However, before returning control to the OS drivers from Machine Check Exception (MCE) or Non-Maskable

Interrupt (NMI), the Uncorrectable Memory ECC Error is logged to the SEL, the appropriate memory slot fault

LED is lit, and the System Status LED state is changed to a solid Amber.

3.2.4.5.3

Demand Scrubbing for ECC Memory

Demand scrubbing is the ability to write corrected data back to the memory once a correctable error is

detected on a read transaction. This allows for correction of data in memory at detect, and decrease the

chances of a second error on the same address accumulating to cause a multi-bit error (MBE) condition.

Demand Scrubbing is enabled/disabled (default is enabled) in the Memory Configuration screen in Setup.

3.2.4.5.4

Patrol Scrubbing for ECC Memory

Patrol scrubs are intended to ensure that data with a correctable error does not remain in DRAM long enough

to stand a significant chance of further corruption to an uncorrectable stage.

3.2.4.5.5

Rank Sparing Mode

Rank Sparing Mode enhances the system’s RAS capability by “swapping out” failing ranks of DIMMs. Rank

Sparing is strictly channel and rank oriented. Each memory channel is a Sparing Domain.

For Rank Sparing to be available as a RAS option, there must be 2 or more single rank or dual rank DIMMs, or

at least one quad rank DIMM installed on each memory channel.

Rank Sparing Mode is enabled/disabled in the Memory RAS and Performance Configuration screen in the <F2>

Bios Setup Utility

When Sparing Mode is operational, for each channel, the largest size memory rank is reserved as a “spare”

and is not used during normal operations. The impact on Effective Memory Size is to subtract the sum of the

reserved ranks from the total amount of installed memory.

Hardware registers count the number of Correctable ECC Errors for each rank of memory on each channel

during operations and compare the count against a Correctable Error Threshold. When the correctable error

count for a given rank hits the threshold value, that rank is deemed to be “failing”, and it triggers a Sparing Fail

Over (SFO) event for the channel in which that rank resides. The data in the failing rank is copied to the Spare

Rank for that channel, and the Spare Rank replaces the failing rank in the IMC’s address translation registers.

Revision 2.4