Providing Open Architecture High Availability Solutions
3
Providing Open Architecture High Availability Solutions
Contents
1.0 Executive Summary.........................................................................................................9
2.0 Introduction ....................................................................................................................11
2.1 Audience .............................................................................................................11
2.2 HA Forum............................................................................................................11
2.3 Document Organization and Scope ....................................................................11
3.0 High Availability Concepts and Principles ..................................................................13
3.1 High Availability and Service Availability.............................................................13
3.2 Terminology.........................................................................................................14
3.3 System Reliability and Availability ......................................................................15
3.3.1 Reliability vs. Availability ........................................................................15
3.3.2 What Compromises a System’s Reliability.............................................16
3.3.3 Faults — Prevention, Removal, Tolerance, and Forecasting.................17
3.3.4 The Challenges of Making Highly Reliable and
Highly Available Systems.......................................................................19
3.4 Reliability Modeling for Hardware and Software Systems...................................20
3.4.1 System Decomposition...........................................................................20
3.4.2 System Models without Service Restoration..........................................22
3.4.3 System Models with Service Restoration ...............................................24
3.5 Redundancy ........................................................................................................25
3.5.1 Classical Fault Tolerance .......................................................................26
3.5.2 Standby, or Hot Sparing .........................................................................26
3.5.3 Load Sharing ..........................................................................................26
3.5.4 Clustering ...............................................................................................26
3.6 Making it All Work — Open vs. Proprietary.........................................................28
4.0 Customer Requirements for Open HA Systems..........................................................29
4.1 Application Areas ................................................................................................30
4.2 Open HA Framework Outline ..............................................................................30
4.2.1 Scope of an Open HA Framework .........................................................31
4.2.2 Compatibility and Interoperability ...........................................................32
4.2.3 Related Standards Considered ..............................................................32
4.3 System Topologies and Components .................................................................32
4.3.1 System Components..............................................................................33
4.3.2 Application Environment.........................................................................35
4.4 Availability Requirements ....................................................................................36
4.4.1 Recovery Times .....................................................................................36
4.4.2 Repair and Testing.................................................................................37
4.4.3 Upgrades and Changes .........................................................................37
4.5 HA Configuration and Cluster Management........................................................38
4.6 Data Replication and Data Integrity.....................................................................38
5.0 System Capabilities – Configuration Management.....................................................39
5.1 Introduction..........................................................................................................39
5.2 Characteristics of System Components ..............................................................39
5.3 Dynamic System Model.......................................................................................40