HP-UX HB v13.00 Ch-08 - Crash Dumps

HP-UX Handbook – Rev 13.00 Page 6 (of 38)

Chapter 08 Crash Dumps

October 29, 2013

Getting an HPMC does not always mean that the hardware is at fault. The HPMC tombstone

needs to be analyzed to determine if the hardware was really at fault. Software defects can result

in HPMC crash events, but are typically very rare in production quality software.

NOTE: on Itanium systems the naming is slightly different:

HPMC = MCA (Machine Check Abort)

TOC = INIT

What happens when a system crashes?

Now that you understand the different types of crash events (panic, toc, and hpmc), let’s see

what the system does to process these events. Processing these events usually requires an

interaction between the hardware and operating system software. There are well defined

architected interfaces between hardware and software. For example, PDC entry points (processor

firmware) on the processors and Interruption Vector Table (IVA) in the kernel. These interfaces

allows the hardware to trigger software entry points to initiate logging, analysis and error

recovery to be performed after a hardware fault or vice versa.

Some of the information presented here may be quite indepth on first reading. You may skim

through them initially. It is important to grasp the concept presented here since any investigative

dump analysis work begins with the crash events. It is worthwhile understanding what the

system does in response to crash events and what crucial pieces of information are saved and

where they are stored.

We categorize the crash events into two classes namely hardware crash events and software

crash events. Here is a description of what the system does to process these.

Hardware crash events

A hardware crash event can be High Priority Machine Check (HPMC), Low Priority Machine

Check (LPMC) or Transfer of Control (TOC). The machine checks are typically caused by

hardware malfunctions or certain classes of bus errors. TOC on the other hand is usually initiated

by the operator in response to system software being stuck in an error state.

When a hardware crash event occurs, the processor immediately branches to the PDC entry point

PDCE_CHECK (for HPMC and LPMC faults) or PDCE_TOC (for TOC). The implementation

details of these PDC entry points are processor dependent. Fundamentally they save the

processor’s state (general, control, space and interruption registers) into Processor Internal

Memory (PIM). The processor then vectors back into the operating system entry points;

HPMC_Vector or TOC_Vector. These entry points are defined in the IVA (Interruption Vector

Table) and MEM_TOC in Page Zero respectively.

On entry into the kernel, a crash event entry is created. The operating system makes a pdc call

(PDC_PIM) to read the processor’s state information from PIM into a Restart Parameter Block

(RPB). As such the RPB structure contains information pertinent to the understanding of the

crash. For example, the Program Counter (PC) in the RPB would indicate what routine was