Specifications

BIOS Error Handling QSSC-S4R Technical Product Specification
238
Sensor Name Sensor
Number
Sensor
Type
E/R T
y
pe Sensor-
specific
Offset
ED1 ED2 ED3
Cor Sensor‘) 13h 71h
3:0=
04h: Replay Timer
timeout
Sensor Name Sensor
Number
Sensor
Type
E/R Type Sensor-
specific
Offset
ED1 ED2 ED3
PCIe Correctable Advisory
non-fatal Error (received
ERR_COR message)
('PCIe Cor Sensor‘)
05h Critical
Interrupt
OEM-
specified
05h 7:6=10b
5:4=10b
3:0=
05h: Advisory Non-
fatal
Bus Device/Function
13h 71h
PCIe Correctable Link
bandwidth changed (ECN)
Error ('PCIe Cor Sensor‘)
05h Critical
Interrupt
OEM-
specified
06h 7:6=10b
5:4=10b
3:0=
06h: Link BW
Changed
Bus Device/Function
13h 71h
21.2.3.2 Legacy PCI Sensors
PCI and PCI-X devices report errors via the legacy PERR# or SERR# signaling mechanism. For these devices, the
BIOS defines two link sensors, one per error signal.
Both sensors are associated with the fatal/uncorrectable error classification. For further description of PCI subsystem ,
see Section 16.3.
Table 161. Legacy PCI Sensors
Sensor Name Sensor
Number
Sensor
Type
E/R T
y
pe Sensor-
specific
Offset
ED1 ED2 ED3
PCI Legacy
SERR# Error ('PCI
Sensor‘)
03h Critical
Interrupt
13h
Sensor-
specific
Discrete
6Fh
05h 7:6=10
b
5:4=10
b
3:0=
0100b
Bus Device/Function
PCI Legacy
PERR# Error ('PCI
Sensor‘)
03h Critical
Interrupt
13h
Sensor-
specific
Discrete
6Fh
04h 7:6=10
b
5:4=10b
3:0=
0101b
Bus Device/Function
21.2.3.3 Memory Sensors
Memory errors detected during system operations are reported by raising an SMI interrupt so they can be handled
immediately before continuing with processing due to the potentially catastrophic nature of these errors. Continuing to
perform the task at hand can cause incorrect execution, data loss, or data corruption, depending on the type of error
detected.
Note that memory errors are also reported by the BIOS during POST memory testing and initialization. These errors
are not
reported and logged by the SMI mechanism, but are typically logged to SEL by the BIOS POST process.
There are three broad categories of errors recognized and reported in the Intel® 7500 Chipset by the BIOS SMI error
handler – ECC-based errors, Address Parity errors, and RAS- based errors.
ECC errors are divided into Uncorrectable ECC Errors
and Correctable ECC Errors. A “Correctable ECC Err
or
” actually
represents a threshold overflow. More Correctable Errors are detected at the memory controller level for a given DIMM
within a given timeframe. In both cases, the error can be narrowed down to particular DIMM(s). The BIOS SMI error
handler uses this information to log the data to the BMC SEL and identify the failing DIMM module.