User manual

32 SPARC Enterprise T1000 Server Administration Guide April 2007
Error Handling Summary
Error handling during the power-on sequence falls into one of the following three
cases:
If no errors are detected by POST or OpenBoot Diagnostics, the system attempts
to boot if auto-boot? is true.
If only nonfatal errors are detected by POST or OpenBoot Diagnostics, the system
attempts to boot if auto-boot? is true and auto-boot-on-error? is true.
Nonfatal errors include the following:
Ethernet interface failure.
Serial interface failure.
PCI-Express card failure.
Memory failure. When a DIMM fails, the firmware unconfigures the entire
logical bank associated with the failed module. Another nonfailing logical
bank must be present in the system for the system to attempt a degraded boot.
Note that certain DIMM failures might not be diagnosable to a single DIMM.
These failures are fatal, and result in both logical banks being unconfigured.
Note If POST or OpenBoot Diagnostics detect a nonfatal error associated with the
normal boot device, the OpenBoot firmware automatically unconfigures the failed
device and tries the next-in-line boot device, as specified by the boot-device
configuration variable.
If a fatal error is detected by POST or OpenBoot Diagnostics, the system does not
boot regardless of the settings of auto-boot? or auto-boot-on-error?. Fatal
nonrecoverable errors include the following:
Any CPU failed
All logical memory banks failed
Flash RAM cyclical redundancy check (CRC) failure
Critical field-replaceable unit (FRU) PROM configuration data failure
Critical system configuration SEEPROM read failure
Critical application-specific integrated circuit (ASIC) failure
For more information about troubleshooting fatal errors, refer to the service manual
for your server.
Reset Scenarios
Three ALOM CMT configuration variables, diag_mode, diag_level, and
diag_trigger, control whether the system runs firmware diagnostics in response
to system reset events.