Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
63
components in the system (dependencies) and the conversions of any data storage that may need to
be updated as well. Also, a software upgrade should include a rollback feature that allows the
system to be returned to the original operation prior to the upgrade.
Diagnostics are tools for verification. The final step in the repair action is to be sure that the new
component is working properly.
6.5.5 Techniques
Component Replacement. This technique includes physical replacement of the failed component.
Debug and Diagnostics. Before a component is removed from a system diagnostic and debug
utilities may be used to find the atomic component that has faulted within the component. These
techniques are also used just after a component is placed back into a system to verify that the
component works correctly.
Off-line diagnostics are typically provided by the manufacturer of the component. A method needs
to be in place to connect an off-line component to an alternate input and output stream in order to
run the diagnostics.
When off-line diagnostics are complete, the information about the diagnostics (including time,
version, results, etc.) should be stored in the system information data block for that component
and/or its subcomponents.
Software Patching. This technique involves replacing pieces of a software component. The desire
would be to do this while the component is being used. A residual signature is needed to show that
a patch has been applied. A mechanism to remove the patch and its associated signature must also
be provided.
Software Upgrade. This technique involves the replacement of the component and dependent
components in the system. The operation should be performed while the system is providing
service but will normally take some level of switchover action and conversion action after the
installation is complete.
6.5.6 Dependencies
The repair action depends on the manual actions of the craftsperson. This is the most error-prone
item and the system must be prepared to expect the unexpected. Other dependencies include
component identification and versioning, and diagnostics.
6.6 Notification
6.6.1 Introduction
Components within a system must interact with each other to enable fault management.
Notification of fault information and the progression of the fault management process may occur
through various communication interfaces. For purposes of fault management, the notification
function focuses only on communication capabilities and interfaces between the fault management
processes and the fault communication between the layers.
This function builds upon the concept of system component interfaces (Section 5.4), and
management interfaces (both internal and external) that are detailed in Section 5.5.