Server Board Family Datasheet

Intel® S5000 Server Board Family Datasheet System BIOS
Revision 1.3
Intel order number D38960-006
41
Unrecoverable and Fatal Errors: errors that are outside of the scope of the standard
ECC engine. These errors are thermal errors, FBD channel errors and data path errors.
These errors bring about catastrophic failure of the system.
There are two specific stages in which memory errors can occur:
Early POST, during memory discovery
Late POST, or at runtime, when the operating system is running
During POST, the BIOS will capture and report memory BIST errors.
Memory RAS configuration errors
At runtime, the BIOS will capture and report correctable, uncorrectable, and fatal errors
occurring in the memory sub-system.
Loss of memory RAS functionality
3.3.10.1.1 Faulty FBDIMMs
The BIOS provides detection of a faulty or failing FBDIMM. An FBDIMM is considered faulty if it
fails the memory BIST. The BIOS enables the in-built memory BIST engine in the Intel
®
5000
Series Chipsets during memory initialization in POST. The memory BIST cycle isolates failed,
failing, or faulty FBDIMMs and the BIOS then marks those FBDIMMs as failed and takes these
FBDIMMs off-line.
FBDIMMs can fail during normal operation. The BIOS marks these FBDIMMs as temporarily
disabled, and performs other housekeeping tasks as relevant. The memory BIST function is
performed on every FBDIMM during each boot of the system, unless waking from S3.
3.3.10.1.2 Faulty Links
FBDIMM technology is a serial technology. Therefore, errors or failures can occur on the serial
path between FBDIMMs. These errors are different from ECC errors, and do not necessarily
occur as a result of faulty FBDIMMs. The BIOS keeps track of such link-level failures.
In general, when a link failure occurs, the BIOS will disable all FBDIMMs on that link. If all
FBDIMMs are present on the same faulty link, the BIOS will generate POST code 0xE1 to
indicate that the system has no usable memory, and then halts the system.
If a link failure occurs during normal operation at runtime (after POST), the BIOS will signal a
fatal error and perform policies related to fatal error handling.
3.3.10.1.3 Error Counters and Thresholds
The BIOS handles memory errors thru a variety of platform-specific policies. Each of these
policies is aimed at providing comprehensive diagnostic support to the system administrator
towards system recovery following the failure.