Providing Open Architecture High Availability Solutions
Providing Open Architecture High Availability Solutions
27
Clusters can be homogeneous, when they are composed of identical nodes, or heterogeneous, when
the nodes can vary widely in make-up or even architecture. Nodes are managed on a black-box
basis – either the node is fully functional, or the entire node is taken out of service with no attempt
to diagnose or rectify failures within the node. Failures therefore remove more of the system (i.e.,
the whole node) instead of the single failed component.
Nodes are linked by a network, which must of course be reliable and is therefore frequently
redundant. Static role assignment can also be performed on the nodes making up the cluster
system; however, more advanced systems can employ smart network switches and end-points to
provide a dynamic reconfiguration capability which can enable a more extensive coverage in the
presence of multiple faults.
Services provided by a node in a cluster can therefore migrate to another node in the event that the
primary node fails. Services therefore need to be location independent, and various software
mechanisms such as CORBA* are available to implement this.
The goal of cluster technology is to improve service availability. However, depending upon
industry and the application, the term cluster may describes different implementations. The
following definitions of clusters are used:
E-Commerce Cluster: Clustering is an commercial term describing the technology that allows
multiple nodes to be deployed as a single network-attached computing resource. In this
implementation, complete systems are redundant, and the redundancy is completely transparent to
the consumer of the service. The HA capabilities of this type of cluster include failover modes,
management APIs, storage integrity, predictive failover, arbitration and recovery. This cluster
technology is commonly deployed in e-commerce and IT applications [DHB3’00].
Network Backplane Cluster: Another approach to high availability and live insertion is to
implement a system that uses a dedicated network fabric that is separate from the backplane. In this
implementation hardware components are intelligent network nodes, not I/O cards. As in a
backplane system, the system may be designed with complete n+m redundancy. In order to
eliminate a single point of failure, two redundant networks connect each node. Because of modern
network design, a node failure will not bring the network down, and the units can be replaced or
reprovisioned without bringing the network down or affecting the service capability of the system.
In this implementation, the failure of a single critical I/O channel on a node will fault only that
node. Network technologies typically used for this type of implementation include 100/1000BASE
Ethernet, FDDI, token ring and ATM. Switched Ethernet has the advantage of point-to-point
bandwidth and the promise of QoS. Token Ring and FDDI provide assurance of worst-case
delivery time. ATM offers high bandwidth and QoS.
Element Cluster: Telecom Equipment Manufacturers typically use cluster as the description of all
of the systems that provide the service function of a network element. The elements of the
functional cluster may be heterogeneous systems, and x-depth duplication of homogenous systems
to provide element scalability. Any of these classes of systems, can in themselves be a network or
e-commerce system. For instance, a MTS element may include high availability switching
elements as well as e-commerce-like redundant clusters to maintain copies of home records and to
ensure local, secure storage of billing information.