Providing Open Architecture High Availability Solutions
Providing Open Architecture High Availability Solutions
17
This classification is useful in understanding the domain of faults one needs to protect a system
against. It is interesting to point out that there is a wide range of human-made faults. For example,
human-made faults may be of non-malicious or malicious intent, and it is the result of these faults
that are often the chief contributors to a system compromised reliability. An example of non-
malicious intent might quite simply be an error in the design or implementation of a component in
the system. Similarly, systems may be compromised intentionally, but still non-maliciously,
perhaps because of design tradeoffs, where certain limits were intentionally disregarded because of
the infinitesimal probability of their existence. At the other extreme, human-made faults may
intentionally, maliciously, compromise the reliability of a system as it so commonly demonstrated
in this Internet era through the use of viruses or Trojan horses.
A key to increasing a system’s availability is found in seeking defense from these impairments. In
the following section some remedies are explored.
3.3.3 Faults — Prevention, Removal, Tolerance, and Forecasting
When designing and developing a highly available system it is critical that faults be minimized and
procedures be put in place to handle them when they do occur during operation.
The following four primary methods aid in the design and development of reliable systems; fault
prevention, fault removal, fault tolerance, and fault and failure forecasting. These processes are
used in the development phase of the project. In later sections we will discuss how to handle faults
once the system is in operation.
Fault Prevention
Fault prevention, while not considered by [Lapr92] and [Rand95], is considered by [Lyu96]. The
application of best-known methods for design and development, the preference for simplicity over
complexity, the refinement of user requirements, and the discipline for the execution of sound
engineering practices are perhaps the best defense against faults being created in systems. Formal
methods have been researched, but are primarily are still left to academia. Few formal methods are
routinely practiced in the industry.
Figure 2. Fault Classes