System information

when the interface is supposed to be up, raise an event. Otherwise store ifOperStatus, ifInOctets and

ifOutOctets in an RRD, adding ifOperStatus to the total interface availability of the device.

After the interfaces are complete, calculate the response time for the device with another ping and store

some health metrics in another RRD, we store the reachability of the device, the interface availability of

the device, the responsetime and create a health metric from a simple algorithm which weights various

collections and makes up a metric to indicate the overall health of that device, more on this in the health

section.

Roles and Groups

The ability exists to put nodes into two types of groups, the first group is a role which is core, distribution

and access, the second group is used to group devices together for reports, and general information. It

is logical hat the second group be something like the building name or city/suburb of the device as this

helps identify problem areas.

Roles play an important part in NMIS, they allow things to be weighted for events and various other

functions. The concept of weighting according to role is simple, if it is a core device then it is important

and should be treated as such, if it an access device then it is less important. The idea is to try and

remove the noise, ie all events coming in at critical and which ones really are.

Health

The following statistics are considered part of the health of the device:

Reachability - is it up or not;

Availability - interface availability of all interface which are supposed to be up;

Response Time;

CPU;

Memory;

All of these metrics are weighted and a health metric is created. This metric when compared over time

should always indicate the relative health of the device. Interfaces which aren't being used should be

shutdown so that the health metric remains realistic. The exact calculations can be seen in the

runReachability subroutine.

Events

Escalation

Events based on device role

Stateful

Thresholds

The thresholds routine runs whenever you like, it process the collected statistics in the RRDs and

compares the numbers to stored thresholds and if exceeded raises an event for that device. The

thresholds use the device role to weight the events.

Updates

Updates ensures that all the cached system and interface information is kept up to date. If the network

is constantly changing then it should be run frequently, otherwise it could be run less frequently.

NMIS - Network Management Information System http://www.sins.com.au/nmis/

22 of 43 15/04/2002 3:59 PM