Datasheet

Intel
®
Xeon
®
Processor C5500/C3500 Series
February 2010 Datasheet, Volume 1
Order Number: 323103-001 379
Reliability, Availability, Serviceability (RAS)
11.3.2.1.1 Completion/Response Status
A Non-posted Request requires the return of the completion cycle. This provides an opportunity for
the responder to communicate to the requester the success or failure of the request. A status field
can be attached to the completion cycle and sent back to the requester. A successful status signifies
the request was completed without an error. Conversely, a “failed” status denotes that an error has
occurred as the result of processing the request.
11.3.2.1.2 No Response
For errors that have corrupted the requester’s information (e.g. requester/source ID in the header),
the IIO will not send a response to the requester. This will eventually cause the requester to time-out
and trigger an error at the requester.
11.3.2.1.3 Data Poisoning
A Posted Request that does not require a completion cycle needs another form of synchronous error
reporting. When a receiver detects an uncorrectable data error, it must forward the data to the target
with the “bad data” status indication. This form of error reporting is known as “data poisoning”. The
target that receives poisoned data must ignore the data or store it with a “poisoned” indication. Both
PCIe and Intel
®
QuickPath Interconnect provide a poison bit field in the transaction packet that
indicates the data is poisoned. Data poisoning is not limited to the posted requests. Requests that
require completion with data can also indicate poisoned data.
Since the IIO can be programmed to signal (interrupt or error pin) the detection of the poisoned data,
software should ensure that the report of the poisoned data should come from one agent, preferably
by the original agent that detects the error — the one that poisoned the data.
In general, the IIO forwards the poisoned indication from one interface to another. For example,
Intel
®
QuickPath Interconnect to PCI Express, PCI Express to Intel
®
QuickPath Interconnect, or PCI
Express to PCI Express.
11.3.2.1.4 Time-out
A time-out error indicates that a transaction failed to complete due to expiration of the time-out
counter. This could be a result of corrupted link packets, I/O interface errors, etc. In the IIO, if a
transaction failed to complete within the time-out value, then an error is logged to indicate the failure.
Software has the option to either enable or disable the signaling (via error pin or interrupt) of the
time-out error. On a forwarded transaction for Intel
®
QuickPath Interconnect or PCIe, the transaction
is completed with a completer abort (PCIe) response status. On IIO-initiated transactions (such as
DMA or interrupts), the IIO drops the transaction. Depending on the cause of the error, the fail/time-
out response may be elevated to a fatal error, resulting in system/partition reset.
11.3.2.2 Asynchronous Error Reporting
Asynchronous error reporting is used to signal the system of detected errors. For errors that require
immediate attention, errors not associated with a transaction, or error events requiring system
handling, an asynchronous report is used. Asynchronous error reporting is controlled through the IIO
error registers. These registers enable the IIO to report various errors via system events (e.g., SMI,
CPEI, etc.). In addition, the IIO provides standard sets of error registers as specified in PCIe
specification.
IIO error registers provide software with the flexibility to map an error to one of three error severities.
Software associates each error severity with one of the supported inband messages or be disabled for
inband messaging. The error pin assertion is also enabled/disabled for each error severity. Upon
detection of a given error severity, associated events are triggered, which conveys the error indication
through inband and/or outband signalling. Asynchronous error reporting methods are described as
follows.