Technical Product Specification
Platform Management Functional Overview Intel
®
Server Board S2400EP TPS
Intel order number G50763-002 Revision 2.0
60
FW defect triggered by a rare sequence of events or a BMC hang due to some type of HW
glitch (for example, power).
This feature is comprised of a set of capabilities whose purpose is to detect misbehaving
subsections of BMC firmware, the BMC CPU itself, or HW subsystems of the BMC component,
and to take appropriate action to restore proper operation. The action taken is dependent on
the nature of the detected failure and may result in a restart of the BMC CPU, one or more
BMC HW subsystems, or a restart of malfunctioning FW subsystems.
The BMC watchdog feature will only allow up to three resets of the BMC CPU (such as HW
reset) or entire FW stack (such as a SW reset) before giving up and remaining in the uBOOT
code. This count is cleared upon cycling of power to the BMC or upon continuous operation of
the BMC without a watchdog-generated reset occurring for a period of > 30 minutes. The BMC
FW logs a SEL event indicating that a watchdog-generated BMC reset (either soft or hard reset)
has occurred. This event may be logged after the actual reset has occurred. Refer sensor
section for details for the related sensor definition. The BMC will also indicate a degraded
system status on the Front Panel Status LED after an BMC HW reset or FW stack reset. This
state (which follows the state of the associated sensor) will be cleared upon system reset or
(AC or DC) power cycle.
Note: A reset of the BMC may result in the following system degradations that will require a
system reset or power cycle to correct:
1. Timeout value for the rotation period can be set using this parameter. Potentially
incorrect ACPI Power State reported by the BMC.
2. Reversion of temporary test modes for the BMC back to normal operational modes.
3. FP status LED and DIMM fault LEDs may not reflect BIOS detected errors.
6.5 Fault Resilient Booting (FRB)
Fault resilient booting (FRB) is a set of BIOS and BMC algorithms and hardware support that
allow a multiprocessor system to boot even if the bootstrap processor (BSP) fails. Only FRB2 is
supported using watchdog timer commands.
FRB2 refers to the FRB algorithm that detects system failures during POST. The BIOS uses
the BMC watchdog timer to back up its operation during POST. The BIOS configures the
watchdog timer to indicate that the BIOS is using the timer for the FRB2 phase of the
boot operation.
After the BIOS has identified and saved the BSP information, it sets the FRB2 timer use bit and
loads the watchdog timer with the new timeout interval.
If the watchdog timer expires while the watchdog use bit is set to FRB2, the BMC (if so
configured) logs a watchdog expiration event showing the FRB2 timeout in the event data bytes.
The BMC then hard resets the system, assuming the BIOS-selected reset as the watchdog
timeout action.
The BIOS is responsible for disabling the FRB2 timeout before initiating the option ROM scan
and before displaying a request for a boot password. If the processor fails and causes an FRB2
timeout, the BMC resets the system.