Specifications

Chapter 4. Continuous availability and manageability 157
The POWER7 family of systems continues to introduce significant enhancements that are
designed to increase system availability and ultimately a high availability objective with
hardware components that are able to perform the following functions:
򐂰 Self-diagnose and self-correct during run time.
򐂰 Automatically reconfigure to mitigate potential problems from suspect hardware.
򐂰 Self-heal or automatically substitute good components for failing components.
Throughout this chapter, we describe IBM POWER technology’s capabilities that are focused
on keeping a system environment up and running. For a specific set of functions that are
focused on detecting errors before they become serious enough to stop computing work, see
4.3.1, “Detecting” on page 169.
4.2.1 Partition availability priority
Also available is the ability to assign availability priorities to partitions. If an alternate
processor recovery event requires spare processor resources and there are no other means
of obtaining the spare resources, the system determines which partition has the lowest
priority and attempts to claim the needed resource. On a properly configured POWER
processor-based server, this approach allows that capacity to first be obtained from a
low-priority partition instead of a high-priority partition.
This capability is relevant to the total system availability because it gives the system an
additional stage before an unplanned outage. In the event that insufficient resources exist to
maintain full system availability, these servers attempt to maintain partition availability by
user-defined priority.
Partition availability priority is assigned to partitions using a
weight value or integer rating, the
lowest priority partition rated at 0 (zero) and the highest priority partition valued at 255. The
default value is set at 127 for standard partitions and 192 for Virtual I/O Server (VIOS)
partitions. You can vary the priority of individual partitions.
Partition availability priorities can be set for both dedicated and shared processor partitions.
The POWER Hypervisor uses the relative partition weight value among active partitions to
favor higher priority partitions for processor sharing, adding and removing processor capacity,
and favoring higher priority partitions for normal operation.
Note that the partition specifications for
minimum, desired, and maximum capacity are also
taken into account for capacity-on-demand options and if total system-wide processor
capacity becomes disabled because of deconfigured failed processor cores. For example, if
total system-wide processor capacity is sufficient to run all partitions, at least with the
minimum capacity, the partitions are allowed to start or continue running. If processor
capacity is insufficient to run a partition at its minimum value, then starting that partition
results in an error condition that must be resolved.
4.2.2 General detection and deallocation of failing components
Runtime correctable or recoverable errors are monitored to determine if there is a pattern of
errors. If these components reach a predefined error limit, the service processor initiates an
action to deconfigure the faulty hardware, helping to avoid a potential system outage and to
enhance system availability.
Note: POWER7 processor-based servers are independent of the operating system for
error detection and fault isolation within the central electronics complex.