Datasheet

Intel
®
Xeon
®
Processor C5500/C3500 Series
February 2010 Datasheet, Volume 1
Order Number: 323103-001 51
Interfaces
MC_SMI_SPARE_CNTRL
1
register holds an SMI_ERROR_THRESHOLD
1
to which the
counters are compared. If any counter exceeds the threshold, the enabled interrupt will
be generated, and status bits are set to indicate which counter met threshold.
2.1.7.3 Identifying the Cause of An Interrupt
Table 15 defines how to determine what caused the interrupt.
2.1.8 Single Device Data Correction (SDDC) Support
The Integrated Memory Controller employs a Single Device Data Correction (SDDC)
algorithm that will recover from a x4/x8 component failure. In addition the Integrated
Memory Controller supports demand and patrol scrubbing.
A scrub corrects a correctable error in memory. A four-byte ECC is attached to each
32-byte “payload”. An error is detected when the ECC calculated from the payload
mismatches the ECC read from memory. The error is corrected by modifying either the
ECC or the payload or both and writing both the ECC and payload back to memory.
Only one demand or patrol scrub can be in process at a time.
2.1.9 Patrol Scrub
Patrol scrubs are intended to ensure that data with a correctable error does not remain
in DRAM long enough to stand a significant chance of further corruption to an
uncorrectable error due to particle error. The Integrated Memory Controller will issue a
Patrol Scrub at a rate sufficient to write every line once a day. For a maximum capacity
of 64 GB, this would be one scrub every 82 ms. The Sparing/Scrub (SS) engine sends
scrubs to one channel at a time. The Patrol Scrub rate is configurable. The scrub engine
will scrub all active channels which includes the spare channel. The spare channel will
be scrubbed and errors will be signaled and logged if errors are enabled.
1.
Table 15. Causes of SMI or NMI
Condition Cause
Recommended platform software
response.
MC_SMI_SPARE_DIMM_ERROR_STATUS.
DIMM_ERROR_OVERFLOW_STATUS != 0
This register has one bit for each
DIMM error counter that meets
threshold.
This can happen at the same time
as any of the other SMI events
(Sparing complete, redundancy
lost in Mirror Mode). It is
recommended that software
address one, so that the other
cause remains when the second
event is taken.
Examine the associated
MC_COR_ECC_CNT_X register. Determine
the time since the counter has been cleared.
If a spare channel exists, and the threshold
has been exceeded faster than would be
expected given the background rate of
correctable errors, Sparing should be
initiated. The counter should be cleared to
reset the overflow bit.
MC_RAS_STATUS.REDUNDANCY_LOSS = 1
One channel of a mirrored pair had
an uncorrectable error and
redundancy has been lost.
Raise an indication that a reboot should be
scheduled, possibly replace the failed DIMM
specified in the
MC_SMI_SPARE_DIMM_ERROR_STATUS
register. (Not present on Astep)
MC_SSRSTATUS. CMPLT = 1
A sparing copy operation set up by
software has completed.
Advance to the next step in the sparing flow.