Specifications
QSSC-S4R Technical Product Specification BIOS Error Handling
233
21. BIOS Error Handling
21.1 Fault Resilient Booting
Fault Resilient Booting (FRB) is an Intel-specific feature that detects and handles errors during the system boot
process. The FRB feature guarantees the system boots without hanging. Failures during the booting process that can
be detected and handled by the BIOS and BMC include:
x BSP POST Failure (FRB-2)
x OS load failures
21.1.1 BSP POST Failure (FRB-2)
FRB-2 is a process that uses a BMC watchdog timer, which can be configured to reset the system if it hangs during
POST. The FRB-2 function can be enabled or disabled in the Setup System Management screen (see Section 17.2.3.5
for details). By default, the FRB-2 Timer is enabled. When activated at the beginning of POST, the BIOS sets the FRB-
2 timer to six minutes.
The BIOS disables the watchdog timer before prompting the user for a password to enter Setup or the Boot Popup
Menu <F6>, while scanning for option ROMs, and when the user enters the BIOS Setup or the Boot Popup Menu
<F6>. Finally, at the end of a successful POST, the FRB- 2 Timer is disabled before initiating OS boot.
If the FRB-2 Timer times out during POST before the BIOS disables the FRB-2 timer, the system is assumed to have
hung during POST, and the BMC generates an asynchronous system reset (ASR).
The BMC retains the status bits that the BIOS can read later during POST to report if there was a FRB timeout on the
previous boot, to log the appropriate event into the system event log, and to display an appropriate error message to
the user. However, when a FRB-2 timeout occurs, the BIOS does not send a “Set Fault Indication” command to the
BMC.
In the case of a FRB-2 failure, two events are logged into SEL:
1. When the BMC services the watchdog timer timeout and initiates a system reset, the BMC logs a “Watchdog Timer
Expiration event to the SEL, specifying the timer purpose as “BIOS FRB2.
2. After the system reset, during the following POST, the BIOS queries the BMC and determines that the system had
experienced an FRB-2 timeout/reset on the previous boot. The BIOS sets a POST Error Code of 0x8190, which will
be displayed in the Error Manager if “POST Error Pause” is enabled. In any case, the Error Code 0x8190 is logged
to the SEL.
For details on the format of the events logged, see Section 21.2.3.6.
21.1.2 Operating System Load Failure (OS Boot Timer)
The BIOS provides an additional watchdog timer to provide fault resilient booting to the OS. This timer option is
disabled by default. The timeout value and the option to enable the timer are configured in the the System
Management screen of the BIOS Setup.
When enabled in the BIOS setup, the BIOS sets the OS Boot Timer in the BMC just at the transition from POST to the
Operating System loader. It is the responsibility of the OS or an application to disable this timer once the OS has
successfully loaded. If the watchdog timer times out before it is stopped, the system is presumed to be hung during OS
boot, and the BMC generates a System Reset to restart POST and tries again.
Note: Enabling this option without an OS or a server management application installed that supports this feature causes the system
to reboot repeatedly when the timer expires without being turned off – the system will not be able to boot successfully.
See the application or OS documentation to make sure that this feature is supported for your OS environment.
In the case of a OS Boot Timer timeout, two events are logged into SEL:
1. When the BMC services the watchdog timer timeout and initiates a system reset, the BMC logs a “Watchdog Timer
Expiration” event to the SEL, specifying the timer purpose as “OS Load”.
2. After the system reset, during the following POST, the BIOS queries the BMC and determines that the system had
experienced an OS Load timeout/reset on the previous boot. The BIOS sets a POST Error Code of 0x8198, which