Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
104
Corrective maintenance – Maintenance for the purpose of fixing a known or expected to occur
error in the system.
Curative maintenance – Maintenance for the purpose of fixing a known error in the system.
Data Isolation – Using memory management to keep data from one program, task, or thread from
interfering with data from other program areas.
Degraded condition – A condition in which a system is still operating and providing service, but
perhaps not as rapidly or without an available standby unit.
DependabilityThe attribute of a system such that one can rely on its responses. Dependability is
a combination of Availability, Reliability and Integrity.
Detection – Finding a fault in a system.
Detection Frequency – How often a system or signal is sampled to verify that it is not in an error
condition.
Deterministic The attribute of a system such that one can determine when events are going to
occur.
Diagnosis – Determining which component caused a fault in a system so that recovery can begin
on that component.
Digraph – A Directed Graph which shows components and their dependencies.
Direct Detection – Detection of a fault by directly observing it. For example, detecting high CPU
temperature by measuring the CPU temperature instead of the outlet air temperature.
Directed – Directed activities are those that are controlled by a component other than the one
performing that activity.
Discovery – The process of determining what devices are in a system.
DMR – See Dual Modular Redundant.
DMTF– Distributed Management Task Force. A standards body working in the area of system
management. The DMTF is responsible for CIM and WBEM.
Download – To load new software onto a system.
Dual modular redundant – A redundant system in which there are two modules operating in
parallel. If they do not give the same information an error is detected and a recovery needs to be
made.
Dynamic reconfiguration The ability to change a system configuration while the system is in
operation.
ErrorThe occurrence of a component not providing the correct information at the correct time.
Fail-safe – When an error occurs the component will fail in a way that indicates to the rest of the
system that it is no longer reliable. In many cases that means that it will just shut down, rather than
chance giving out incorrect data.
Failure – A system fails to provide service at the level it was designed to provide.