Providing Open Architecture High Availability Solutions
Providing Open Architecture High Availability Solutions
38
4.5 HA Configuration and Cluster Management
Configuration and cluster management middleware is the key controlling entity in an HA
configuration and is implemented by HA middleware distributed amongst the HA cluster systems.
It maintains a system model of the components that comprise the cluster, defines how faults are
detected in the cluster and what action should be taken as a result. When failures occur it takes the
appropriate actions to reconfigure the cluster and notify and/or restart affected parts of the
application.
The HA cluster configuration and management must be able to operate in a heterogeneous cluster.
The member systems in the cluster may be of different types with different operating
environments, and may have HA management middleware from different suppliers. The system
model, the message protocols between cluster members, and application APIs must allow proper
heterogeneous interoperability.
4.6 Data Replication and Data Integrity
One of the most difficult aspects of HA application design can be the preservation of the dynamic
data and context despite component failures. Not only does it affect the design and testing
complexity, but it may also have significant impacts on the system’s performance, dependability
and accuracy.
An application usually stores long term data and context in a file system or database, but often has
other more transient data that represents the instantaneous context of dialogues and/or transactions.
The ability to efficiently replicate this kind of data can dramatically improve the application
recovery time.
The open HA framework needs to provide for at least some of the following:
• An HA file system native to the operating system environment
• An HA database system
• An HA shared memory subsystem
• An application context replication and recovery subsystem
The HA file system, database and shared memory need to be integrated with the overall HA
configuration and management subsystem so that their operation can be coordinated with the HA
reconfiguration activities (e.g., failure, repair and recovery handling).
Additionally the API to a file system or database may have additional or modified calls and returns
in an HA version when compared to a non-HA version.