Datasheet

Processor Uncore Configuration Registers
454 Datasheet, Volume 2
4.2.16.18 DEVTAG_CNTRL[0:7]—Device Tagging Control for Logical
Rank 0 Register
Usage model – When the number of correctable errors (CORRERRCNT_x) from a
particular rank exceeds the corresponding threshold (CORRERRTHRSHLD_y), hardware
will generate a SMI interrupt and log (and preserve) the failing device in the FailDevice
field. SMM software will read the failing device on the particular rank. Software then
sets the EN bit to enable substitution of the failing device/rank with the parity from the
rest of the devices inline.
For independent channel configuration, each rank can tag once. Up to 8 ranks can be
tagged.
There is no hardware logic to report incorrect programming error. Unpredictable error
and/or silent data corruption will be the consequence of such programming error.
If the rank-sparing is enabled, it is recommend to prioritize the rank-sparing before
triggering the device tagging due to the nature of the device tagging would drop the
correction capability and any subsequent ECC error from this rank would cause
uncorrectable error.
Device Tagging Control CSR for Logical Rank 0
Bus: 1 Device: 16 Function: 2 Offset: 140h - 147h
Bus: 1 Device: 16 Function: 3 Offset: 140h - 147h
Bus: 1 Device: 16 Function: 6 Offset: 140h - 147h
Bus: 1 Device: 16 Function: 7 Offset: 140h - 147h
Bit Attr
Reset
Value
Description
7RWS-LB0b
Device Tagging Enable for this rank
Once the bit is set, the parity device of the rank is used for the replacement
device content. After tagging, the rank will no longer have the "correction"
capability. ECC error "detection" capability will not degrade after setting this bit.
Must never be enable prior to using IOSAV.
6:5 RV 0h Reserved
4:0 RWS-V 1Fh
Fail Device ID for this rank
When the corresponding rank’s CORRESRRCNT is greater than its
CORERRTHRESHLD, the hardware will capture the fail device ID of the rank in the
FailDevice field. Subsequent correctable error will not change this field until the
field is cleared. Valid Range is 0–17 to indicate which x4 device (independent
channel) had failed. If the value is equal or greater than 24, the field indicates no
device failure had occurred on this rank.