Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
74
Mass storage subsystems
Peripheral devices
Power supplies
Cooling modules
8.2 Communication
The fault domains within a high availability system interact with each other to create a complete
system. This interaction occurs through various communication mechanisms. For the purpose of a
fault domain analysis, the communication mechanisms of significance are the ones between fault
domains. Failures of communication paths within fault domains can be considered another fault,
that can lead to the failure of the fault domain.
Note that systems often contain nested fault domains. For example, if there is a non-redundant
communication path between a set of fault domains (and that communication path is critical to
those fault domains providing required services), then that set of fault domains plus the
communication path becomes a larger fault domain. Figure 14 shows an example of this
arrangement. Such a system architecture may make sense if the MTBF of the communication path
(an I/O bus in Figure 14) is much larger than the MTBF of the nested fault domains (the I/O
controllers in Figure 14), or if the MTTR is much smaller.
In contrast, Figure 15 shows a similar system where the bussed interconnects are replaced by a
redundant switched network interconnect. In this system, each of the switched networks is a
separate fault domain, but because each I/O controller and host processor are connected to both
networks, the fault domains remain separate from each other rather than nesting.
For clustered systems, communications between nodes are typically via standard local-area-
network connections, though these may consist of dedicated LANs used just within the cluster for
performance reasons. Redundancy in the network connections between nodes is generally required
so that the cluster can continue to function correctly if any one communication link or path fails.
Figure 14. Nested Fault Domains
I/O Card I/O Card
I/O Card
I/O Card
I/O Card
I/O Card
I/O Card
Host CPU
I/O Card
I/O Card
I/O Card
I/O Card
I/O Card
Host CPU
Bus
Switch
Bus
Switch
Fault Domains with lower MTBF
Fault Domains with
higher MTBF