Service manual

Troubleshooting 3-87
and PALcode builds a 660 system uncorrectable error frame that is
deposited in the error log. If the CPU detects the error, an error interrupt is
generated for that CPU, the system crashes, and PALcode builds a 670
processor uncorrectable error frame that is deposited in the error log.
Faults are errors that compromise the coherence of the system. When a
fault is detected, a signal is passed to all QBBs that causes the system
including the CPUs to reset and all components (ASICs) in the system to
initialize. Error state is latched and PALcode attempts to build a 660 error
frame that is deposited in the error log.
There are six error classes:
ECC errors Most data paths and large data stores are protected by ECC.
ECC provides single-bit error detection and correction, and double-bit error
detection. For non-coherence-related data stores (memory) single-bit errors are
correctable and multi-bit errors are uncorrectable. For coherence-related data
stores (directory) single-bit errors are correctable and multi-bit errors are faults.
Parity errors Some data paths and data stores are protected by parity.
Parity errors on data paths and in non-coherence-related data stores are
uncorrectable errors. Parity errors on address paths and in coherence-related
data stores are faults.
Forward progress errors If a given transaction in a quad switch is either
not issued or not completing, a forward progress error is detected. Such errors
are faults.
Overflow errors If a system component, an ASIC, receives a new reference
after flow control should have prevented one, an overflow error is detected.
Such errors are faults.
Command inconsistency errors System components, typically ASICs,
check certain internal consistencies and can report errors if consistency rules
are not met. Examples of such errors include memory access violations and
data command packet reception during ineligible cycles. Such errors are faults.
NXM errors Memory or I/O references that are out of range cause NXM (non-
existent memory) errors. NXM errors can be faults, uncorrectable errors or not
an error at all, depending upon the component detecting the error and
configuration register settings and the command executing.
For a full description of errors and their consequences, see the AlphaServer
GS80/160/320 System Programmers Manual.