Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
9
1.0 Executive Summary
This document describes the best known methods and capabilities needed for systems that are
required to have almost no down-time, frequently referred to as high availability (HA) systems. It
is intended as a guide to a common vocabulary and possible HA functions. The system designer
must select the appropriate functions for each system based on HA requirements, design
complexity and cost.
The document was written by the HA Forum, an industry group with the goal of increasing the
number and capability of open architecture high availability systems by standardizing the
interfaces and capabilities of building blocks for HA systems. Where blocks are missing the HA
Forum will look for solutions to fill the blocks. In addition, the Forum intends to promote solid
development models and best known methods for providing high availability systems.
By providing standards and guidelines for open architecture HA systems, development of these
systems will become easier. Additionally, systems will be better because developers will be able to
focus on their products instead of on low level services required to support high availability.
Three major sections comprise this document: Introductory Material, Functional Capabilities, and
Capabilities of Major Building Blocks. A brief summary of each section is given below:
Introductory Material
The introductory material consists of a document introduction, a review of the concepts and
principals of developing and maintaining an HA system, and an overview of typical customer
requirements for high availability systems.
Concepts and principles of HA are discussed in Section 3.0. In this section, the fundamental
principles of engineering HA systems and the issues involved in the design, development and
deployment of these systems are discussed. Included are some industry best practices to mitigate
the risk and increase the success of deploying HA systems. Some modeling techniques are
reviewed to build a quantitative understanding of HA measurement.
Customer Requirements for HA systems are summarized in Section 4.0. This section identifies the
application areas under consideration by the Forum. It reviews at a high level the requirements,
expectations, and desirable features that typical applications may have of an HA system.
Functional Capabilities
Functional capabilities for HA are separated into those needed to configure and operate the system
during normal operation and those that are used to detect and handle faults.
Managing the system configuration is discussed in Section 5.0. This involves knowing what
hardware, firmware, and software components are intended to be in the system (the system model)
and which of these are actually in a system at any given time. Configuration Management also
allows modification of the configuration of the individual components that comprise a system.
Fault management, as discussed in Section 6.0, is typically a five-stage process:
1. Detection – The fault is found
2. Diagnosis – The cause of the fault is determined
3. Isolation – The rest of the system is protected from the fault