Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
75
For fault tolerant systems, communications among fault domains will vary depending on the
specific characteristics of each fault domain. For example, current sharing power supplies
communicate via monitoring the voltages and currents being produced or via a separate current
share signal. Even more extreme, fans may communicate by jointly pressurizing a common air
plenum that is engineered in such a way that airflow continues across all system components even
if one of the fans fails.
Other fault domains, such as processing subsystems and I/O controllers, communicate through
various data paths within the system. These may be I/O busses or point-to-point data paths.
Because they typically are non-redundant, bussed interconnects define fault domains consisting of
everything connected to a single bus, but bus failures themselves are generally very low
probability. As a result, a common compromise in the design of hardware for high availability
systems is to define a hierarchy of fault domains, as shown in Figure14.
Because of the compromises required for this sort of configuration, next-generation system
interconnects such as InfiniBand, RapidIO, or other switched fabric technologies provide higher
levels of availability with the same amount of hardware by eliminating these larger fault domains,
as shown in Figure15.
Finally, communication paths to and from the high availability system itself may need to be
redundant. In a sense, this is simply a redefinition of the system to the next higher level i.e., to
incorporate the devices connected to the system in question into the system itself. Of more
relevance, the termination of a single data path will be either:
to a single point, which will necessarily be contained within a single fault domain
to multiple points within multiple fault domains
Figure 15. Redundant Switched Interconnects
I/O Card
I/O Card
I/O Card
I/O Card
I/O Card
I/O Card I/O Card
I/O Card
I/O Card
I/O Card
I/O Card
I/O Card
Host CPUHost CPU
Each box and each of the
switched networks are fault
domains in this system.
Switch
Switch