HP-UX 11i v3 Crash Dump Improvements

Page 2
Executive Summary
Crash dump, the ability to write (dump) a copy of system memory onto disk in the event of a
catastrophic system failure, is critical for system problem analysis and resolution. The Crash Dump
utility has been enhanced in HP-UX 11i v3 to significantly increase performance and scalability,
and to improve the availability and manageability of the dump configuration.
Dump performance is important because the amount of time it takes to perform a dump impacts
directly the availability of a system. This is particularly true on large memory systems, because
the more memory you have the longer it may take for the system to complete the dump and come
back up. HP-UX 11i v3 provides new performance capabilities that can significantly reduce the
dump time. HP testing has shown that dump times can be reduced to less than one-quarter of
HP-UX 11i v2 dump times on equivalent configurations.
The dump configuration has been simplified in HP-UX 11i v3, unifying multiple diverse
mechanisms (previously defined for different types of volume managers or raw devices) into a
single mechanism, while maintaining backward compatibility.
New dump availability and manageability algorithms have also been added into HP-UX 11i v3
which provide run-time auto-detection and path failover/reconfiguration of dump device paths
when an existing path fails. The dump configuration automatically adjusts when the size of a
dump device expands or contracts, or when a dump device goes offline.
The dump format is unchanged in HP-UX 11i v3. Hence debuggers and utilities do not require
any associated changes. Backward compatibility is maintained.
1 Overview of HP-UX 11i v3 Dump Improvements
The improvements to the crash dump facility in HP-UX 11i v3 fall into the following categories:
Performance improvements
Configuration improvements
Availability and Manageability improvements
The performance improvements parallelize the dump process, creating multiple threads of
execution to write to multiple devices in parallel and thus significantly reduce the overall dump
time. Each thread of execution (known as a dump unit; see definition below) requires its own set
of CPU
1
s, dump devices, and other resources. As a result, the system configuration and dump
device configuration determines the amount of parallelism that can be achieved and the resulting
speed-up of the dump. See Section 3, “Performance Improvements”, for details.
The configuration improvements centralize and simplify several different mechanisms for marking
dump devices persistently across boot (/stand/system file definitions, and the lvlnboot, and
vxvmboot commands) into a single mechanism. For backwards compatibility, the old mechanisms
can still be used but will be obsoleted in a future release. These improvements are discussed in
Section 4, “Configuration Improvements”.
The availability and manageability improvements provide auto-reconfiguration of the dump
device through a new path when the currently selected path goes offline, as well as other features
such as intelligent path selection to support the performance improvements, avoiding offline
devices in the dump, and online expansion/contraction of dump devices. Section 5, Availability
and Manageability Improvements”, discusses this area of improvements.
1
The term “CPU” in this paper refers to a logical processor available to the HP-UX operating system. This is equivalent to a processor core
if Itanium multi-threading is disabled. Note that online CPU addition or deletion will affect the number of CPUs available at dump.