Specifications

156 IBM Power 770 and 780 Technical Overview and Introduction

4.1.2 Placement of components

Packaging is designed to deliver both high performance and high reliability. For example,

the reliability of electronic components is directly related to their thermal environment, that

is, large decreases in component reliability are directly correlated with relatively small

increases in temperature. POWER processor-based systems are carefully packaged to

ensure adequate cooling. Critical system components such as the POWER7 processor chips

are positioned on printed circuit cards so that they receive fresh air during operation. In

addition, POWER processor-based systems are built with redundant, variable-speed fans that

can automatically increase output to compensate for increased heat in the central

electronic complex.

4.1.3 Redundant components and concurrent repair

High-opportunity components, or those that most affect system availability, are protected with

redundancy and the ability to be repaired concurrently.

The use of redundant parts allows the system to remain operational. Among the parts are:

򐂰 POWER7 cores, which include redundant bits in L1-I, L1-D, and L2 caches, and in L2 and

L3 directories

򐂰 Power 770 and Power 780 main memory DIMMs, which contain an extra DRAM chip for

improved redundancy

򐂰 Power 770 and 780 redundant system clock and service processor for configurations with

two or more central electronics complex (CEC) drawers

򐂰 Redundant and hot-swap cooling

򐂰 Redundant and hot-swap power supplies

򐂰 Redundant 12X loops to I/O subsystem

For maximum availability, be sure to connect power cords from the same system to two

separate Power Distribution Units (PDUs) in the rack and to connect each PDU to

independent power sources. Deskside form factor power cords must be plugged into two

independent power sources to achieve maximum availability.

4.2 Availability

The IBM hardware and microcode capability to continuously monitor execution of hardware

functions is generally described as the process of first-failure data capture (FFDC). This

process includes the strategy of predictive failure analysis, which refers to the ability to track

intermittent correctable errors and to vary components off-line before they reach the point of

hard failure, causing a system outage, and without the need to re-create the problem.

Note: Check your configuration for optional redundant components before ordering

your system.