Specifications
“cleans” memory without operating system or application knowledge, resulting in much better
coverage.
Protection for I/O
I/O errors are another significant cause of hardware errors and downtime because the number of
I/O cards in a typical system is significant and the I/O cards themselves are a part of the system most
exposed to frequent human interaction in the data center.
To prevent downtime resulting from I/O errors, HP has designed the following features into the HP
9000 rp7440 and rp8440 Servers:
• Online replacement of PCI-X cards
• Hardware firewall of I/O errors to cell
• High mean time between failures (MTBF) for I/O cards
• Separate PCI-X buses for each I/O card
Taken together, these features reduce hardware downtime by at least 20% over similar servers.
HP 9000 rp7440 and rp8440 Server crossbar and I/O backplane protection
The backplane of the HP 9000 8440 Server connects everything together. Because all partitions
share the backplane, high reliability and true domain isolation are very important. The specific
features that address these areas are as follows:
• Highly reliable ASICs—The backplane ASIC is manufactured and tested with a process that results
in 10X demonstrated reliability over comparable chips. This reliability results in virtually zero
backplane ASIC failures in the field.
• Redundant DC-DC converters—The DC-DC converters that power the system backplane chips are
fully redundant, reducing downtime associated with power conversion. (Power conversion is
normally a significant contributor to failure rate.) Redundant DC-DC converters have also been
added for the I/O backplane extending the same concept to the I/O subsystem.
• Full end-to-end error correction and independent partition design—The HP 9000 rp8440 Server
backplane is built from two crossbar ASICs with point-to-point connections. Traffic within a partition
is contained in that partition, so there is no sharing of links in a properly configured system. Each
port of the crossbar chip is fully independent, allowing cells of different partitions to coexist without
affecting each other in any way. In other bus-based systems, all domains participate in the
coherency scheme and share address buses. Therefore, in these systems all domains are linked in
some fashion, resulting in shared failure modes that might crash multiple partitions.
Also, unlike other snoopy coherency systems that must accept and respond to all coherency requests
from all domains, HP 9000 rp8440 Server partitions have hardware firewalls dedicated to
guarding partitions from errant transactions generated on failing partitions. A failure in one HP
9000 rp8440 Server partition does not affect any other partitions.
Finally, all data paths in the fabric are resistant to both random single-bit errors and persistent
single-wire “stuck at” faults. Therefore, the fabric is resilient to any single-bit failure, including pin,
connector, or solder problems.
Reliability in the cabinet infrastructure
In keeping with its focus on maintaining high availability, the HP 9000 rp7440 and rp8440 Servers
include protection against failure within the cabinet infrastructure. The HA features in this area include
true dual AC line cord support and complete resilience to service processor failures.
34