Providing Open Architecture High Availability Solutions
Providing Open Architecture High Availability Solutions
108
Physical Isolation – Isolating a component from the rest of the system by electrical disconnection,
either using switches or by removing a board.
Platform management – Managing a hardware platform using control features of that platform.
IPMI is frequently used to provide a platform management function.
Preventative maintenance – Maintenance performed on a system to prevent it from failing while
it is needed in operation. Preventive maintenance can be either scheduled or triggered by fault
prediction.
Process control – The ability for an application or other non-OS software component to be able to
start and stop processes.
Process ID table – The list maintained by the OS of which process and threads are running. It
typically includes information on the resources (memory, I/O and CPU time) being used and the
amount of time that the process has been running.
Publish/subscribe – A communications method whereby an software component “publishes” a list
of services it can provide. Other components can then “subscribe” to these services (or “register”
with them) and get notifications.
Rebalance/Re-route – The process of changing which components are receiving and processing
which messages. This allows processing to be moved from one system to another should one
become over- or under utilized or should one system or component fail.
Reboot – To re-start a computer. Cold reboot occurs when the system is powered-off to reboot.
Warm reboot occurs when the hardware is left running, but the OS is re-started.
Recovery – The system is adjusted or re-started so it functions properly.
Redundant – Having a copy of a component that can be used if the original component fails.
Register – To sign up with a software component to participate with it. See also Publish/subscribe
Reintegration – To take a component that had been out of service and place it back into a system
in either the unassigned, standby or active modes.
Reliability – The attribute associated with systems that do not fail.
Repair - A faulty system component is replaced.
Reporting – The process of passing information on faults and configuration changes to the
management middleware and other components that need to have it.
Residual signature – A software signature which is placed in code to show which version(s) of
patches have been applied to a piece of software. The signature may be removable, but not without
removing the patch it refers to.
Resilience – The property of a component which allows it to function after incurring a fault.
Resilient components provide higher availability than comparable non-resilient components.
Resource allocation – The process of assigning I/O addresses, memory blocks and interrupts to a
software component.
Remote Monitor (RMON) - A standard for remotely monitoring a computing system.