Specifications

© Copyright IBM Corp. 2011. All rights reserved. 153
Chapter 4. Continuous availability and
manageability
This chapter provides information about IBM reliability, availability, and serviceability (RAS)
design and features. This set of technologies implemented on IBM Power Systems servers
provides the possibility to improve your architecture’s total cost of ownership (TCO) by
reducing unplanned down time.
RAS can be described as follows:
򐂰 Reliability: Indicates how infrequently a defect or fault in a server manifests itself
򐂰 Availability: Indicates how infrequently the functionality of a system or application is
impacted by a fault or defect
򐂰 Serviceability: Indicates how well faults and their impacts are communicated to users and
services, and how efficiently and nondisruptively the faults are repaired
Each successive generation of IBM servers is designed to be more reliable than the previous
server family. POWER7 processor-based servers have new features to support new levels of
virtualization, help ease administrative burden, and increase system utilization.
Reliability starts with components, devices, and subsystems designed to be fault-tolerant.
POWER7 uses lower voltage technology, improving reliability with stacked latches to reduce
soft error (SER) susceptibility. During the design and development process, subsystems go
through rigorous verification and integration testing processes. During system manufacturing,
systems go through a thorough testing process to help ensure high product quality levels.
The processor and memory subsystem contain a number of features designed to avoid or
correct environmentally induced, single-bit, intermittent failures, as well as handle solid faults
in components, including selective redundancy to tolerate certain faults without requiring an
outage or parts replacement.
4