Providing Open Architecture High Availability Solutions

Clusters can be homogeneous, when they are composed of identical nodes, or heterogeneous, when

the nodes can vary widely in make-up or even architecture. Nodes are managed on a black-box

basis – either the node is fully functional, or the entire node is taken out of service with no attempt

to diagnose or rectify failures within the node. Failures therefore remove more of the system (i.e.,

the whole node) instead of the single failed component.

Nodes are linked by a network, which must of course be reliable and is therefore frequently

redundant. Static role assignment can also be performed on the nodes making up the cluster

system; however, more advanced systems can employ smart network switches and end-points to

provide a dynamic reconfiguration capability which can enable a more extensive coverage in the

presence of multiple faults.

Services provided by a node in a cluster can therefore migrate to another node in the event that the

primary node fails. Services therefore need to be location independent, and various software

mechanisms such as CORBA* are available to implement this.

The goal of cluster technology is to improve service availability. However, depending upon

industry and the application, the term cluster may describes different implementations. The

following definitions of clusters are used:

E-Commerce Cluster: Clustering is an commercial term describing the technology that allows

multiple nodes to be deployed as a single network-attached computing resource. In this

implementation, complete systems are redundant, and the redundancy is completely transparent to

the consumer of the service. The HA capabilities of this type of cluster include failover modes,

management APIs, storage integrity, predictive failover, arbitration and recovery. This cluster

technology is commonly deployed in e-commerce and IT applications [DHB3’00].

Network Backplane Cluster: Another approach to high availability and live insertion is to

implement a system that uses a dedicated network fabric that is separate from the backplane. In this

implementation hardware components are intelligent network nodes, not I/O cards. As in a

backplane system, the system may be designed with complete n+m redundancy. In order to

eliminate a single point of failure, two redundant networks connect each node. Because of modern

network design, a node failure will not bring the network down, and the units can be replaced or

reprovisioned without bringing the network down or affecting the service capability of the system.

In this implementation, the failure of a single critical I/O channel on a node will fault only that

node. Network technologies typically used for this type of implementation include 100/1000BASE

Ethernet, FDDI, token ring and ATM. Switched Ethernet has the advantage of point-to-point

bandwidth and the promise of QoS. Token Ring and FDDI provide assurance of worst-case

delivery time. ATM offers high bandwidth and QoS.

Element Cluster: Telecom Equipment Manufacturers typically use cluster as the description of all

of the systems that provide the service function of a network element. The elements of the

functional cluster may be heterogeneous systems, and x-depth duplication of homogenous systems

to provide element scalability. Any of these classes of systems, can in themselves be a network or

e-commerce system. For instance, a MTS element may include high availability switching

elements as well as e-commerce-like redundant clusters to maintain copies of home records and to

ensure local, secure storage of billing information.