Providing Open Architecture High Availability Solutions

Providing Open Architecture High Availability Solutions
64
6.6.2 Objective
Notification is a key capability of the fault management process. The objective of notification is to
enable management middleware and other system components to access fault reporting, state
change performance and status information that could proactively predict faults.
6.6.3 Concepts
Notification may include information context and content on:
Autonomous Notification. Notification information is automatically generated and communicated
through the component’s interface.
Directed Notification. Based upon the system design notification information and fault
management control information can be directed to specific interface(s). These interfaces may be
inter-layer; intra-layer, or external.
Indirect Notification. An indirect notification of a fault may be from a higher entity, or from a peer
component using the techniques described in Section 6.1.5.
In-line Notification. Non-faulted components immediately adjacent (up and down) to a detected
fault should typically get an in-line notification of the fault. For instance, a break in network cable
or connector will typically be sensed by the Media Access Controller (MAC) in the hardware layer.
This MAC could immediately communicate this information to the driver via an interrupt, or could
respond with an interface unavailable error return code in response to the next I/O request. Both of
these responses would be in-line notification. Function return codes are another example of ‘in-
line’ notification. Examining the return code provides immediate information to the calling process
on the failure, status, or success of the operation.
In-band Notification. Notification of events, faults, or exceptions may be reported using the same
framework, protocols and hardware as other inter-process messages. These messages are
considered in-band as they use the same “band” as other inter-process messages. Notification
messages may need to be sent several times during a recovery process. For instance, in the
recovery from a fault, redundant components transition from a standby to an active role
assignment, load a copy of a driver or checkpoint data and may encounter exception conditions.
Each of these occurrences may be cause for a notification.
Out-of-band Notification. Notification of events, faults, or exceptions may also be reported out-of-
band as directed communication with one or more of the management interfaces. For instance, in
the recovery from a fault, redundant components may transition from a standby to an active role
assignment, load a copy of a driver or checkpoint data, or encounter exception conditions. While
still reporting state and error conditions in-band, the same information could be reported as out-of-
band to the system log/console, middleware and other management interfaces.
Management Interfaces. As described in Section 5.5, these interfaces can be interprocess, intra-
layer, layer-to-layer, local, or remote.
Administrative Notification. When available, fault management notification should be reported on
the system console. Administrative notification might also consist of activating LED’s, alarms and
relays. The administrative interface can also provide notification information in the fault
management process; acknowledging or silencing an alarm or using the extractor handles to light
the blue light in CompactPCI systems, provides event information that is visible (or audible)
without having to use the standard system management console.