Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
108
Physical IsolationIsolating a component from the rest of the system by electrical disconnection,
either using switches or by removing a board.
Platform managementManaging a hardware platform using control features of that platform.
IPMI is frequently used to provide a platform management function.
Preventative maintenance – Maintenance performed on a system to prevent it from failing while
it is needed in operation. Preventive maintenance can be either scheduled or triggered by fault
prediction.
Process control – The ability for an application or other non-OS software component to be able to
start and stop processes.
Process ID table The list maintained by the OS of which process and threads are running. It
typically includes information on the resources (memory, I/O and CPU time) being used and the
amount of time that the process has been running.
Publish/subscribe – A communications method whereby an software component “publishes” a list
of services it can provide. Other components can then “subscribe” to these services (or “register”
with them) and get notifications.
Rebalance/Re-route – The process of changing which components are receiving and processing
which messages. This allows processing to be moved from one system to another should one
become over- or under utilized or should one system or component fail.
Reboot – To re-start a computer. Cold reboot occurs when the system is powered-off to reboot.
Warm reboot occurs when the hardware is left running, but the OS is re-started.
Recovery The system is adjusted or re-started so it functions properly.
Redundant – Having a copy of a component that can be used if the original component fails.
Register – To sign up with a software component to participate with it. See also Publish/subscribe
Reintegration – To take a component that had been out of service and place it back into a system
in either the unassigned, standby or active modes.
Reliability The attribute associated with systems that do not fail.
Repair - A faulty system component is replaced.
Reporting – The process of passing information on faults and configuration changes to the
management middleware and other components that need to have it.
Residual signature – A software signature which is placed in code to show which version(s) of
patches have been applied to a piece of software. The signature may be removable, but not without
removing the patch it refers to.
Resilience – The property of a component which allows it to function after incurring a fault.
Resilient components provide higher availability than comparable non-resilient components.
Resource allocation – The process of assigning I/O addresses, memory blocks and interrupts to a
software component.
Remote Monitor (RMON) - A standard for remotely monitoring a computing system.