Datasheet

Reliability, Availability, Serviceability (RAS)
Intel
®
Xeon
®
Processor C5500/C3500 Series
Datasheet, Volume 1 February 2010
384 Order Number: 323103-001
Local clusters, maps detected errors to three error severities and report them to global error
logic. These errors are sorted into Fatal and Non-fatal and reported to respective global error
status register, with severity 2 as fatal, and 0 & 1 as non-fatal. When an error is reported by the
local cluster, the corresponding bit in the global fatal or non-fatal error status register is set.
Software clears the error bit by writing 1 to the bit. Each error is individually masked by global
error control registers. If an error is masked, the corresponding status bit is not set for any
subsequent reported error. The global error control register is non-sticky and cleared by reset.
Global Log Registers
The global error log registers log the errors reported by the IIO clusters. Local clusters map the
detected errors to three error severities and report them to the global error logic. The three error
severities are divided into fatal and non-fatal errors that are logged separately by the FERR and
NERR registers. Each bit in the FERR/NERR register is associated with an specific interface/cluster
(e.g. a PCIe port). Each bit can be individually cleared by writing 1 to the bit. FERR logs the first
report of an error, while NERR logs the subsequent reports of other errors. The time stamp log for
the FERR and ERR provides the time of when the error was logged. Software can read this register
to find out which of the local interfaces have reported the error. FERR log remains valid and
unchanged from the first error detection until the clearing of the corresponding error bit in the
FERR by the software.
Global System Event Register
Errors collected by the global error registers are mapped to system events. The system event
status bit reflects the OR output of all unmasked errors of the associated error severity. Each
system event status bit can be individually masked by the system event control registers.
Masking a system event status bit forces the corresponding bit to 0. When a system event status
bit transitions from 0 to 1, it can trigger one or more system events based on the programming of
the system event map register as shown in Figure 76.
Each severity type can be associated with one of the system events: SMI, CPEI, or NMI. In
addition, the error pin registers allow error pin assertion for an error. When an error is reported to
the IIO, the IIO uses the severity level associated with the error to look up the system event that
should be sent to the system. For example, error severity 2 may be mapped to SMI with error[2]
pin enabled. If an error with severity level 2 is reported and logged by the Global Log Register,
Figure 75. IIO Global Error Control/Status Register
PCI- E 1 Error
PCI- E 2 Error
CSI 1- 1 Error
CSI 1- 2 Error
PCI- E 1 Error
PCI- E 2 Error
IOH Internal Error
CSI 1- 1 Error
CSI 1- 2 Error
Global Error Status for
PCI-E, Intel® QPI and
IIO internal errors
Each Error Status can be
controlled/masked by
the associated error
control bit
Mask 3
Mask 4
Mask N
Mask 1
Mask 2
Global Error
Status Reg
Global Error
Control Reg
Error Severity
from Local Error
Registers
Error Severity
to System Event
Registers
PCI- E 1 Error
PCI- E 2 Error
CSI 1- 1 Error
CSI 1- 2 Error
PCI- E 1 Error
PCI- E 2 Error
IIO Internal Error
Intel® QPI 1
-1 Error
Intel® QPI 1-2 Error
Mask 3
Mask 4
Mask N
Mask 1
Mask 2
Global Error
Control and
Status Registers
are Replicated
(1 per partitoin)