Specifications
QSSC-S4R Technical Product Specification Operating System Boot, Sleep, and Wake
217
19.2.3.2 WHEA Software Stack
The Operating System (OS) kernel is responsible for installing the Platform-Specific Hardware Error Drivers (PSHED)
and providing them with the necessary services. While the PSHED is channeled towards the BIOS for hardware error
flow management, the OS Kernel also exposes the interfaces and API (WheaReportHwErr) to user-level applications
called “HW error event consumer”.
The PSHEDs are responsible for the hardware errors and error flow management. The PSHEDs work with the platform
BIOS to achieve this.
The Low-level Hardware Error Handlers (LLHEHs) are error handlers that receive an interrupt when hardware errors
occur on the platform. PCIe errors have a separate LLHEH compared to MCE, which is specific to the processor
architecture used on the platform.
The BIOS publishes WHEA-specific ACPI tables that describe the platform error interfaces for the OS. BIOS also
implements the ASL code to support and enable WHEA capability in the platform. BIOS provides the following ACPI
tables:
x Hardware Error Source Table (HEST) – Extracts error information from platform hardware error registers.
x Error Injection (EINJ) Table – Details the mechanism to inject a simulated HW Error to test WHEA error flow.
x Error Record Serialization Table (ERST) – Persistent store of the WHEA Error Record to describe the serialization
interface of the platform to the OS.
x Boot Error Record table (BERT) - Captures fatal errors from the last boot that the BIOS or OS were unable to
process.
19.2.3.3 Error Handling Models
In order to support WHEA, the Intel® 7500 Chipset BIOS publishes an ACPI table called Hardware Error Source table
(HEST), which lists all platform hardware error sources. There are two types of error handling models that can be
applied for each error source:
x Firmware first error handling – In firmware first error model, the particular error is signaled to the BIOS first via an
SMI, the BIOS processes the error, logs a traditional server management event, clears the error, builds a WHEA
error information record for the OS and then signals the OS via an SCI or MCE.
x Parallel handling – In the parallel model, the particular error is signaled to the OS via interrupts and to the BIOS via
an SMI at the same time along with separate statuses; BIOS and OS handle and process the error independently.
The parallel model allows the OS to handle errors natively using standard IPMI formatted SEL logs.
In Intel® 7500 Chipset, both types of error models are employed.
Note: The OS can overwrite correctable error threshold programmed by BIOS in MCi_MISC2 register. Therefore,
WHEA system event log will appear after reaching threshod value, which is programmed by OS.
19.2.3.4 Persistent Error Record Storage
The BIOS provides Persistent Error Record Storage for the OS, which is required to retain the error records between
system boots. An Error Record Serialization Table (ERST) defining the persistence storage interface mechanism is
published. The OS can communicate error records
to the BIOS for storage and retrieval through the ERST. The BIOS allocates persistent error record storage space in
non-volatile memory. The OS can search, read or clear an existent error record or write a new error record. The error
record format is dependent on the OS. Typically, if an uncorrectable or fatal error occurs, Microsoft Windows* logs the
error to persistent storage before displaying the blue screen.
19.2.4 EFI Optimized Boot
QSSC-S4R 4S platform allows the system to boot to an OS that natively supports UEFI versus the traditional legacy
INT19 booting mechanism. Enabling
the
EfiOptimizedBoot
option in the BIOS Setup reduces the boot time by not
loading any legacy drivers such as the Compatibility Support Module (CSM) that supports the legacy INT19.
Note: Enabling the EfiOptimizedBoot option disables all legacy operating systems like DOS and only allows booting to native EFI
versions of Linux, Windows, etc.