Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
44
Diagnostics. Diagnostics involves testing of the system components. This may be done while the
system component is on-line or off-line. Diagnostic testing may also be done destructively or non-
destructively. Testing that interrupts the normal functioning of the system must be coordinated so it
is either done during off-peak hours or another redundant component can handle the normal system
traffic during the testing period.
Autonomous. Health status and state changes are reported as they occur. Alarm information (see
Section 5.5) is an example of this form of component communication.
Directed, Polled. To get information, the requesting Management Interface function directs
queries for Health Status and State (status) Information to a particular component(s), and the
addressed component responds. This is typically, a two-way communication. This action is
generated on a periodic basis, with a periodicity that ensures that important information does not
overflow the local storage capacities of the components (registers, data structures, etc.)
Directed, Adhoc. Same as the polled action, except generated as needed.
Directed, Control. This communication allows a Management Entity to issue a command to
perform an action. This may be a one or two-way form of communication, and may use a reliable
or unreliable delivery mechanism.
Multiple Managers. Health Status and State information may be important and subject to
management from multiple management functions. For instance, it may be used by the middleware
layer, may be accessed through the local trade interface, and reported (directly or through the
middleware layer) to one or more tiers of a formal network element management schema.
Management Interfaces. These are defined interfaces to communicate component information
(see Section 5.5).
5.4.4 Approach
During the operation, as the state or health status of a system component changes, these changes
must be reflected in the System Model. There are two ways of reporting these changes internally to
the system. The first way is via Asynchronous communications methods. The second way is via a
synchronous, or polled communications.
Health Status and State Information may also need to be reported to a manager external to the
system. Either individual system component information may be conveyed or an overall view of
the systems Health Status and State may be conveyed.
It may also be desirable to check the status of a system while it is not operational. This capability
allows a system to be diagnosed and discovered when it is not functional due to failure or due to it
not yet being started up.
Monitoring Health Status
A system may provide trending information by continuously monitoring the health of each
component in the system. Health Status monitoring looks at the healthiness of a component which
has not (yet) incurred a fault. Once a fault occurs, the component is logged as entering a faulted
condition, and the Fault Management techniques discussed in Section 6.0 are applied.
Even after a component has failed, there is a need to monitor (or view) the health of that
component. The Fault Management (FM) service’s capabilities take care of recovering from the
failed component, but the FM service’s link to the operator is through the Configuration