Providing Open Architecture High Availability Solutions
Providing Open Architecture High Availability Solutions
47
5.5.2 Alarms
Alarms are intended to convey critical system exception information in an appropriate, effective
and timely method.
Definition. Alarms are typically autonomously generated messages that are triggered by a specific
causative stimuli. The stimuli might be: a detected fault; a state change (system power-on
generating a trap, or role change generating an alarm); a threshold crossing (a specified parameter,
or rate-of change of that parameter, exceeds or falls below a pre-established value); or when a
particular event type or event severity is recorded (for instance when the storage medium is running
out of available space). A filter process is often used to determine alarm events and to determine
the level.
Concept of Levels. A management scheme may include different escalating levels of alarms.
These may be different LED colors of different types (alarms, alerts, warnings, traps). The levels
may be differentiated by severity or type and may be escalated if not resolved after some defined
interval.
Context/Content. To be useful the alarm should include sufficient information on the context of
the exception, the type and severity (if available) and specific information on the location.
Communication Method. The alarm information can be broadcast to numerous management
interfaces and to any management entity. For instance, a power-on trap message might be
published for everyone to hear, and might be published without regard to whether any entity
received it (unreliable). Alarms can also be directed to particular management entities, and can be
unreliable (as described above), periodic (generated at a interval until resolved), or can be
persistent (continuing to alarm until the alarm is acknowledged, even if the fault has not been
resolved). For instance, the loss of the co-generation facility at a central office would generate an
alarm message that would be both persistent (ensuring that all appropriate management interfaces –
local and remote received the message) and periodic (re-alarming at periodic intervals while
continuing to operate on backup battery power).
5.5.3 Integration with External Network Management Systems
It is critical that a system not only be able to control and monitor itself, but also that it work in a
network of other systems. There are several standard methods that external management systems
use to communicate with the systems they monitor. Most of these methods have defined messages
with room for expansion, and when they are expanded, there is usually an attempt to standardize
messages that compatibility is optimized. The following standards are in use today:
Simple Network Management Protocol (SNMP). SNMP is a set of protocols that pass
information about whether or not a component is operating properly. SNMP uses Management
Information Bases (MIBs) to store data about a system.
Remote Monitoring (RMON). RMON allows network usage information to be gathered by
providing new sets of MIBs. Network devices in a system must be RMON compliant or RMON
will not work.
Common Management Information Protocol (CMIP). CMIP is an OSI standard protocol used
with CMIS (common management information services). CMIS defines a system of network
management information services. CMIP was proposed as a replacement for the less sophisticated
SNMP, but has taken root only in the telecom space. CMIP provides improved security and better
reporting of unusual network conditions.