System information

when the interface is supposed to be up, raise an event. Otherwise store ifOperStatus, ifInOctets and
ifOutOctets in an RRD, adding ifOperStatus to the total interface availability of the device.
After the interfaces are complete, calculate the response time for the device with another ping and store
some health metrics in another RRD, we store the reachability of the device, the interface availability of
the device, the responsetime and create a health metric from a simple algorithm which weights various
collections and makes up a metric to indicate the overall health of that device, more on this in the health
section.
Roles and Groups
The ability exists to put nodes into two types of groups, the first group is a role which is core, distribution
and access, the second group is used to group devices together for reports, and general information. It
is logical hat the second group be something like the building name or city/suburb of the device as this
helps identify problem areas.
Roles play an important part in NMIS, they allow things to be weighted for events and various other
functions. The concept of weighting according to role is simple, if it is a core device then it is important
and should be treated as such, if it an access device then it is less important. The idea is to try and
remove the noise, ie all events coming in at critical and which ones really are.
Health
The following statistics are considered part of the health of the device:
Reachability - is it up or not;
Availability - interface availability of all interface which are supposed to be up;
Response Time;
CPU;
Memory;
All of these metrics are weighted and a health metric is created. This metric when compared over time
should always indicate the relative health of the device. Interfaces which aren't being used should be
shutdown so that the health metric remains realistic. The exact calculations can be seen in the
runReachability subroutine.
Events
Escalation
Events based on device role
Stateful
Thresholds
The thresholds routine runs whenever you like, it process the collected statistics in the RRDs and
compares the numbers to stored thresholds and if exceeded raises an event for that device. The
thresholds use the device role to weight the events.
Updates
Updates ensures that all the cached system and interface information is kept up to date. If the network
is constantly changing then it should be run frequently, otherwise it could be run less frequently.
NMIS - Network Management Information System http://www.sins.com.au/nmis/
22 of 43 15/04/2002 3:59 PM