Using the Event Monitoring Service Manufacturing Part Number: B7612-90015 November 1999 © Copyright 1999 Hewlett-Packard Company
Legal Notices The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.
Contents 1. Understanding the Event Monitoring Service Event Monitoring Service Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 EMS Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 EMS Resource Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 Client and Target Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 EMS with ServiceGuard . . . . . . . . . . . . . . . . . . .
Contents Selecting Protocols for Sending Events. . . . . . . . . . . . . . . . . . . . . . . . . . opcmsg (ITO) Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TCP and UDP Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SNMP Traps Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Email Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Console Option . . .
Contents 8. Monitoring System Resources System Monitor Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76 Number of Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77 Job Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78 Filesystem Available Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79 Creating System Resource Monitoring Requests. . . . . . . . . . .
Contents 6
Printing History Table 1 Printing Date Part Number Edition March 1999 B7612-90009 Edition 1 November 1999 B7612-90015 Edition 2 This edition documents material related to using the Event Monitoring Service to create monitoring requests for system resources. The printing date changes when a new edition is printed. (Minor corrections and updates which are incorporated at reprint do not cause the date to change.) The part number is revised when extensive technical changes are incorporated.
Preface This guide describes how to use the Event Monitoring Service (EMS) and how to configure Management Information Base (MIB) monitors. The MIB monitors check and report status on cluster, network, and system resources.
• Peter Weygant, Clusters for High Availability: A Primer of HP-UX Solutions (ISBN 0-13-494758-4). HP Press: Prentice Hall, Inc., 1996 • Tom Madell, Disk and File Management Tasks on HP-UX (ISBN 0-13-518861-X). HP Press; Prentice Hall, Inc., 1997 • HP OpenView IT/Operations Admnistrator’s Reference (HP Part Number B6941-90001) • Managing Highly Available NFS (HP Part Number B5125-90001) • http://docs.hp.
Understanding the Event Monitoring Service 1 Understanding the Event Monitoring Service The Event Monitoring Service (EMS) is a framework for resource monitoring. Use EMS to monitor system resources including configuring, checking resource status, and sending notification when configured conditions are met.
Understanding the Event Monitoring Service Event Monitoring Service Overview Event Monitoring Service Overview The Event Monitoring Service (EMS) monitors system resources. Use EMS to configure monitoring requests, check resource status, and send notification when configured conditions are met. EMS can work in a high availability environment. It can report a loss of redundant resources.
Understanding the Event Monitoring Service Event Monitoring Service Overview This option does not require any extra handling. Specify the email address when the monitoring request is created. — syslog and textlog This option does not require any extra handling. Specify the log file when the monitoring request is created. Syslog notifications go to the local system. — console This option does not require any extra handling. Specify the console when the monitoring request is created.
Understanding the Event Monitoring Service Event Monitoring Service Overview Figure 1-1 Event Monitoring Service Components The process is as follows: 1. The system administrator enters the client application, for example, the EMS GUI or High Availability Clusters area of SAM, to begin the discovery phase of creating a monitoring request. The discovery phase, includes identifying the resources to be monitored and configuring the request.
Understanding the Event Monitoring Service Event Monitoring Service Overview The resources listed in the dictionary are passed back to the client. 4. When a discovery request is made that exceeds the scope of the information in the dictionary, the registrar launches the appropriate resource monitor application, if it is not already running, and passes the request on to the monitor. Multiple registrars may access the same monitor. 5. The EMS API provides the interface between the registrar and the monitor.
Understanding the Event Monitoring Service Event Monitoring Service Overview request. 12. The EMS API interprets the information received from the monitor, determines if an event occurred, and forwards the notification to the target applications. The method of informing the target application of a critical resource value can vary for different target applications. In the case of ServiceGuard, the client application and the target application are the same and reside on the same system.
Understanding the Event Monitoring Service EMS Requirements EMS Requirements The following are system requirements for the Event Monitoring Service: • All hardware you intend to monitor, such as disks and LAN cards, have been configured and tested prior to configuring EMS. • EMS must be installed on an HP 9000 Series 700 or Series 800 system running HP-UX version 10.20 or later. When installing one or more EMS components, check that the version levels for the other components are compatible.
Understanding the Event Monitoring Service EMS Resource Classes EMS Resource Classes EMS groups resources into classes in a hierarchy similar to that of a filesystem structure. Figure 1-2 is a example of a resource hierarchy.
Understanding the Event Monitoring Service Client and Target Applications Client and Target Applications This section describes some of the client and target application options and processes. Target applications can be written using the EMS API. EMS with ServiceGuard ServiceGuard can be configured with EMS to monitor the health of selected resources, such as disks. Based on the status of the resources, ServiceGuard can decide to fail packages over.
Understanding the Event Monitoring Service Client and Target Applications instance. An example of a full resource path for the physical volume status of the device /dev/dsk/c0t1d2 belonging to volume group vgDataBase, is /vg/vgDataBase/pv_pvlink/status/c0t1d2. 2. Specify when to collect value. Select either and/or all: • When value is ... If you are setting up a request for an asynchronous monitor, this is the only option available.
Understanding the Event Monitoring Service Client and Target Applications • console • syslog • textlog EMS and Target Applications Target applications receive notification messages about the monitored resources. To help configure your Network Node Manager and IT/Operations or other system management software for EMS, refer to the Writing Monitors for the Event Monitoring Service (EMS) (HP Part Number B7611-90016)) developer’s kit web page: 1. Go to the web site: http://software.hp.com. 2.
Understanding the Event Monitoring Service Resource Monitors Resource Monitors Resource monitors are applications written to gather and report information about specific resources on the system. The resource monitor: • Provides a list of resources that can be monitored • Provides information about the resources • Monitors the resources it supports • Provides values to the EMS API notification The EMS framework evaluates the data to determine if an event has occurred.
Understanding the Event Monitoring Service Resource Monitors Monitoring Service (EMS) (HP Part Number B7611-90016) manual and install the developer’s kit. Both are available at the following web site: 1. Go to the web site: http://www.software.hp.com. 2.
Understanding the Event Monitoring Service EMS Framework Components EMS Framework Components This section describes the EMS framework components. The EMS API The EMS API is the interface between the registrar, client applications, target applications, and resource monitors as illustrated in Figure 1-1. The EMS API is provided as part of the EMS product.
Understanding the Event Monitoring Service EMS Framework Components monitors and to provide communication between clients and monitors. One registrar process is started each time a client application calls rm_client_connect(), so a registrar is always connected to one client. Depending on the requests sent by the client, the registrar may be connected to 0, 1, 2, or more resource monitors concurrently.
Understanding the Event Monitoring Service EMS Framework Components The Resource Dictionary The resource dictionary is the mechanism by which the resource monitor identifies itself to EMS. The purpose of the resource dictionary is to give a preliminary picture of the resource structure on a given system. Its main function is to indicate to the registrar which resource monitors should be contacted when information is needed about a certain resource.
Selecting Resources to Monitor 2 Selecting Resources to Monitor This chapter describes the following: • Starting the Event Monitoring Service • Selecting Resources • Viewing Resource Descriptions Chapter 2 27
Selecting Resources to Monitor Starting the Event Monitoring Service Starting the Event Monitoring Service To start EMS: 1. Log on as root to the system with EMS and start the graphical version of SAM. From your command line, type: sam 2. Double-click the Resource Management icon. 3. Double-click on the Event Monitoring Service icon. The main screen with the Actions menu open, shown in Figure 2-1, shows all requests configured on that system.
Selecting Resources to Monitor Selecting Resources Selecting Resources Resources are divided into classes. To select a resource to monitor: 1. From the Event Monitoring Service main screen, click on the Actions menu. Refer to the section, “Starting the Event Monitoring Service” on page 30 for instructions on starting EMS. 2. Select Add Monitoring Request The top-level resource classes for all installed monitors are dynamically discovered and then listed as shown in Figure 2-2.
Selecting Resources to Monitor Selecting Resources The file name corresponds to its monitor. The file extension is .dict. For example, the MIB Monitor dictionary filename is mibmond.dict. • Review the man page. The man page name can be found in the dictionary file with the monitor’s name. If a man page was created it is listed in the MONITOR entry section of the dictionary file. • View the resource class or instance description through EMS.
Selecting Resources to Monitor Selecting Resources Figure 2-2 EMS Monitoring Request Parameters Screen 3. Double-click on a resource class. When you monitor a resource, you actually monitor one or more specific instances of its resource class. View the resource instances associated with the selected resource class in the Resource Instance field. See Figure 2-3. If the resource class has subclasses, those subclasses are listed in the Resource Classes field.
Selecting Resources to Monitor Selecting Resources n/a. Figure 2-3 Add or Copy Monitoring Request Screen 4. Select a specific instance or the wildcard (All Instances). The (*) wildcard is a convenient way to create many requests at once. Most systems have more than one disk or network card, and many have several disks. To avoid having to create a monitor request for each disk, select *(All Instances) in the Resource Instance box. The *(All Instances) listing is always the first item on the list.
Selecting Resources to Monitor Selecting Resources the same resource type and there are multiple instances. Selecting the wildcard applies the monitor to all the instances of that resource type. Wildcards are not available for resource classes. For example, a wildcard is available for the status instances in the subclass, /system/filesystem/availMb. A wildcard is not available for the entire volume group resource class, /vg. 5. Click OK. You see the Monitoring Request Parameters screen.
Selecting Resources to Monitor Viewing Resource Descriptions Viewing Resource Descriptions Resource class and resource instance descriptions are available for each resource. To see a resource class description, click the Show Class Description button from the Add or Copy Monitoring Request screen.
Defining a Monitoring Request 3 Defining a Monitoring Request This chapter describes the following: • Starting a Monitoring Request • Specifying When to Send Event Notifications • Setting the Polling Interval • Setting Event Value Options • Selecting Protocols for Sending Events • Adding a Notification Comment Chapter 3 35
Defining a Monitoring Request Starting a Monitoring Request Starting a Monitoring Request After you have selected a resource to monitor, use the Monitoring Request Parameters screen to specify when and how to send event notification (Figure 3-1). The following sections describe the monitoring parameters and provide examples of common applications.
Defining a Monitoring Request Specifying When to Send Event Notifications Specifying When to Send Event Notifications When you create a request, you specify the conditions under which you want to collect resource status values. While the monitor may be polling disks every five minutes, for example, you may only want to be alerted when something happens that requires your attention. Specify these conditions in the Notify area of the Monitoring Request Parameters screen.
Defining a Monitoring Request Specifying When to Send Event Notifications To set an event trigger: • Select from the listed options in the Notify area (When value is..., When value changes, or At each interval). Asynchronous monitors are event-driven, rather than polled. They generate messages as events occur. Therefore, if the request is for an asynchronous monitor, only the When value is... option is available.
Defining a Monitoring Request Setting the Polling Interval Setting the Polling Interval The polling interval specifies how often the resource monitor checks the resource value. The polling interval is the maximum amount of elapsed time before a monitor knows about a change in status for a particular resource. The shorter the polling interval, the more likely you are to have recent data. However, depending on the monitor, a short polling interval may use too much CPU and system resources.
Defining a Monitoring Request Setting Event Value Options Setting Event Value Options If you select the When value is... from the list in the Notify area, the Options area displays three choices. Select one or more of these three options: Initial Use this option to establish a baseline when monitoring resources such as available filesystem space or system load. It can also be used to test whether newly requested events are being sent. Repeat Use this option for urgent alerts.
Defining a Monitoring Request Selecting Protocols for Sending Events Selecting Protocols for Sending Events Through the Notify via area specify the protocol you want the monitor to use to send events. The options are described in the following sections. opcmsg (ITO) Option This option sends messages to ITO applications via the opcmsg daemon. For this option to display, IT Operation Managed Node Software 3.x or 4.x must be installed on the resource server running HP-UX version 10.20.
Defining a Monitoring Request Selecting Protocols for Sending Events If opcmsg is selected, EMS sets the following fields: • ITO application group: EMS (HP) • message group: HA • object: to the full path of the resource being monitored See HP OpenView IT/Operations Administrators Task Guide (Part Number B4249-90003) for more information. Templates for configuring IT/O and Network Node Manager to display monitored events can be found on the Hewlett-Packard web page at http://www.software.hp.com.
Defining a Monitoring Request Selecting Protocols for Sending Events SNMP Traps Option This sends messages to applications, such as Network Node Manager that use SNMP traps. See HP OpenView Using Network Node Manager (P/N J1169-90002) for more information on configuring SNMP traps. Table 3-1 lists traps used by EMS: Table 3-1 SNMP Traps Trap Name Trap Value EMS_ ENTERPRISE_ OID "1.3.6.1.4.1.11.2.3.1.7" EMS_NORMAL_ OID "1.3.6.1.4.1.11.2.3.1.7.0.1" Normal Event EMS_ABNORMAL_ OID "1.3.6.1.4.1.11.2.3.
Defining a Monitoring Request Selecting Protocols for Sending Events Table 3-1 SNMP Traps Trap Name Trap Value Description EMS_MAJOR_ SEV_OID "1.3.6.1.4.1.11.2.3.1.7.0.8" Problem Event w/Major Severity EMS_CRITICAL_ SEV_OID "1.3.6.1.4.1.11.2.3.1.7.0.
Defining a Monitoring Request Selecting Protocols for Sending Events 1. Specify the notification type from the list in the Notify area. 2. Select the SNMP trap option from the list in the Notify via area. 3. Select the severity from the list in the Severity area: • Map from value • Critical • Major • Minor • Warning • Normal Email Option This sends event notification to the email address indicated for that request. To set for an email notification: 1.
Defining a Monitoring Request Selecting Protocols for Sending Events disk is up and running correctly. The condition becomes TRUE, meaning action needs to be taken, when the disk is down or not operating correctly. • When value changes If this notification option is set, a non-normal severity occurs when the current value does not match the previous value For an abnormal event, a system logging level of error will be associated with the logged message.
Defining a Monitoring Request Adding a Notification Comment Adding a Notification Comment The notification comment is useful for sending task reminders to the recipients of an event. For example, you can configure a disk monitor request that reports an alert when an entire mirror has failed. When that event shows up in IT/Operations, you may want a notification comment to include the name of the person to contact.
Defining a Monitoring Request Adding a Notification Comment 48 Chapter 3
Changing Monitoring Requests 4 Changing Monitoring Requests This chapter describes the following: • Copying Monitoring Requests • Modifying Monitoring Requests • Removing Monitoring Requests • Viewing Monitoring Requests Chapter 4 49
Changing Monitoring Requests Copying Monitoring Requests Copying Monitoring Requests There are two ways to use the copy function: • To create requests for multiple resources using the same monitoring parameters. • To create requests for the same resource using different monitoring parameters. To create requests for multiple resources using the same monitoring parameters: 1. From the Event Monitoring Service main screen, select the monitoring request whose parameters you wish to copy.
Changing Monitoring Requests Copying Monitoring Requests 4. Modify the parameters as desired in the Monitoring Request Parameters screen. 5. Click OK. You see a message that indicates the new request has been added. You see the Event Monitoring Service main screen.
Changing Monitoring Requests Modifying Monitoring Requests Modifying Monitoring Requests To change the monitoring parameters of a request: 1. From the Event Monitoring Service main screen, select the monitoring request you want to modify and either: • Double-Click the request, or • Select Actions menu: Modify You see the Monitoring Request Parameters screen. 2. Modify the parameters as desired, by editing the fields in the Monitoring Request Parameters screen. 3. Click OK.
Changing Monitoring Requests Removing Monitoring Requests Removing Monitoring Requests You can remove one or more requests using the Remove Monitoring Requests option. To remove monitoring requests: 1. From the Event Monitoring Service main screen, select the monitoring request you wish to remove. To select contiguous multiple requests, hold the Shift key and click. To select individual multiple requests, hold the Ctrl key and click. 2. Select Actions menu: Remove option. You see a Confirmation screen. 3.
Changing Monitoring Requests Viewing Monitoring Requests Viewing Monitoring Requests To view the parameters for a monitoring request: 1. From the Event Monitoring Service main screen, select the monitoring request you wish to view. 2. Select Actions menu: View You see the View Monitoring Request Parameters screen with the parameters specified for the monitoring request. 3. To modify the parameters of this request, click the Modify Monitoring Request option. You see the Monitoring Request Parameters screen.
Monitoring ServiceGuard Package Dependencies 5 Monitoring ServiceGuard Package Dependencies This chapter describes how to use SAM to define package dependencies on EMS resources. ServiceGuard by itself automatically monitors specific resources. Using ServiceGuard with EMS adds to the list of resources that can be monitored. These resources need to be configured and identified to ServiceGuard as package resource dependencies.
Monitoring ServiceGuard Package Dependencies A package can depend on any resource whose monitor is registered with EMS. To create package dependencies: 1. Halt the cluster. Include a force option to stop all packages, by typing: cmhaltcl -f To add an EMS resource, the ServiceGuard cluster must be down. You can modify existing EMS resources through ServiceGuard while the cluster is running. 2. From your command line, start SAM, by typing: sam 3. Double-click the Clusters icon. 4.
Monitoring ServiceGuard Package Dependencies Figure 5-1 High Availability Clusters Screen Chapter 5 57
Monitoring ServiceGuard Package Dependencies 6. From the Actions menu, select either the Create/Add a Package or Modify Package Configuration option. If you select Create/Add a Package, a screen similar to Figure 5-2, displays.
Monitoring ServiceGuard Package Dependencies 7. If you have not previously done so, click Specify Package Name and Node and Specify Package SUBNET Address. Then click on Specify Package Resource Dependencies... to add EMS resources as package dependencies. A screen similar to Figure 5-3 displays.
Monitoring ServiceGuard Package Dependencies 8. The Resources: field lists all the installed resources discovered by ServiceGuard. To make a package dependent on an EMS resource, select it from the list, then click Add Resource.... An Add Resources screen, similar to Figure 5-4 displays. Figure 5-4 Add Resources Screen 9. Click through the Resource Classes and Resource Names to select the entity you wish to monitor. Click OK. A Resource Parameters screen, similar to Figure 5-5 displays.
Monitoring ServiceGuard Package Dependencies Figure 5-5 NOTE Resource Parameters Screen Make sure you always select UP in the Resources Up Values field. ServiceGuard creates an EMS request that sends an event if the Resources Up Value field is not equal to the UP value. If you select only UP, the package fails over if the value is anything but UP.
Monitoring ServiceGuard Package Dependencies You can also add resources as package dependencies by modifying the package configuration file. The default filename is /etc/cmcluster/pkg_name.ascii. See Managing MC/ServiceGuard for details on how to modify this file.
Monitoring Cluster Resources 6 Monitoring Cluster Resources The HA Cluster Monitor sends events regarding the status of a cluster. If you have OpenView, we recommend using HP ClusterView to monitor cluster status and receive cluster events. HA Cluster Monitor is primarily for use with non-OpenView systems, for example, CA UniCenter.
Monitoring Cluster Resources Cluster Monitor Reference Cluster Monitor Reference The HA Cluster Monitor is useful in environments not running HP OpenView ClusterView. The HA Cluster Monitor reports information on the status of the cluster to which the local node belongs. The resources monitored are: • /cluster/status/clustername, a summary of the state of all nodes in the cluster clustername • /cluster/localNode/status/clustername, the status of a given node in a cluster.
Monitoring Cluster Resources Cluster Monitor Reference Figure 6-1 Cluster Monitor Resource Class Hierarchy Items in boxes are resource instances that can be monitored. Variables in italics change depending on the names of the clusters and packages on the system. Cluster Status The cluster status is the status of the MC/ServiceGuard cluster to which this node belongs.The status is from the perspective of the node for which the request was created.
Monitoring Cluster Resources Cluster Monitor Reference Table 6-2 Interpreting Cluster Status Resource Name: /cluster/status/clusterName Condition Value Interpretation DOWN 3 The node cannot access the cluster. You might request to be notified when the cluster is not up. You could then verify whether the cluster was shut down intentionally. The minimum polling interval for cluster status is 30 seconds. You may want a longer interval, especially if system performance is affected.
Monitoring Cluster Resources Cluster Monitor Reference Node Status The node status is the current status of a node relative to a particular cluster. The hp-mcCluster MIB variable, hpmcNodeStatus, provides the node status information to the monitor. The cmviewcl -v command displays detailed information about the current status of the cluster and packages on the cluster.
Monitoring Cluster Resources Cluster Monitor Reference Package Status The package status is the status of each package running on this node. The hp-MCCluster MIB variable, hpmcSGPkgStatus, provides the package status information to the monitor. The cmviewcl -v command displays detailed information about the current status of the cluster and packages on the cluster.
Monitoring Cluster Resources Cluster Monitor Reference Service Status A service is part of a package. The service status is the status of each service running on this node. The hp-MCCluster MIB variable, hpmcSGPkgSvcStatus, provides the service status information to the monitor. The cmviewcl -v command displays detailed information about the current status of the cluster and services on the cluster.
Monitoring Cluster Resources Creating Cluster Monitoring Requests Creating Cluster Monitoring Requests For most ServiceGuard or cluster configurations, we suggest creating the following requests on each node in a cluster: Table 6-6 Recommended Cluster Requests Resources to monitor Monitoring Parameters Notify Value Option /cluster/status/ clusterName when value is not equal UP INITIAL /cluster/localNode/ status/clusterName when value is not equal RUNNING INITIAL /cluster/package/ status/ pac
Monitoring Network Interfaces 7 Monitoring Network Interfaces The HA Network Interface Monitor detects whether your LAN interface is up or down. It allows you to send events to a system management interface as an alternative to looking in syslog for LAN status. The HA Network Interface Monitor, lanmond, is part of the MIB Monitors package.
Monitoring Network Interfaces Network Monitor Reference Network Monitor Reference The HA Network Interface Monitor provides status on the LAN interfaces in a given node. It monitors all the interfaces that you see when you run the lanscan command on a system. The HA Network Interface Monitor is part of the MIB Monitors package. Table 7-1 lists the HA Network Interface Monitor.
Monitoring Network Interfaces Network Monitor Reference (1M) or lanadmin(1M) commands. Table 7-2 Interpreting LAN Interface Status Resource Name: /net/interfaces/lan/status/LANname Condition Value Interpretation UP 1 The LAN interface is sending and receiving packets. DOWN 2 The LAN interface is not passing operational packets. The EMS product depends on TCP/IP (or UDP) to send events to targets such as HP OpenView IT/Operations or MC/ServiceGuard.
Monitoring Network Interfaces Configuring Network Monitoring Requests Configuring Network Monitoring Requests Table 7-3 recommends two monitoring requests for each node. With these requests, you would see events when a LAN card failed, and again when it came back up, and you would see an event every hour that the LAN card was down. You may elect to change the polling interval or not to configure a reminder at all.
Monitoring System Resources 8 Monitoring System Resources The HA System Resource Monitor sends events about the number of users, available file system space, and job queues to help you load-balance and tune your system to keep it available. It is an alternative to reading syslog files to get this information. The HA System Resource Monitor, pkgmond, is part of the MIB Monitors package.
Monitoring System Resources System Monitor Reference System Monitor Reference The system monitor reports information on system resources: • /system/numUsers, tells you the number of users on a given node. • /system/jobQueue1Min, /system/jobQueue5Min, and /system/jobQueue15Min, tell you the number of processes waiting for CPU and performing disk I/O as an average over 1, 5, and 15 minutes respectively. This is the same as the load averages reported by uptime (1).
Monitoring System Resources System Monitor Reference Figure 8-1 System Resource Monitor Class Hierarchy Number of Users The number of users tells you how many users are logged in to a given system. The MIB variables computerSystem, fileSystemBavail, and fileSystemBsize from the hp-unix MIB provides the resource value to the monitor. To verify the number of users on the system, use the uptime (1) command.
Monitoring System Resources System Monitor Reference Job Queues The job queue monitor checks the average number of processes that have been waiting for CPU and performing disk I/O over the last 1, 5, or 15 minutes. A value of 4 in /system/jobQueue5Min means that at the time of polling there was an average of 4 jobs in the queue over the last 5 minutes.
Monitoring System Resources System Monitor Reference Filesystem Available Space The filesystem monitor checks the number of megabytes available for use in each file system on the node. File systems must be mounted and active to be monitored. File systems mounted over the network, such as NFS file systems, are not monitored. The MIB variables fileSystemBavail, and fileSystemBsize from the hp-unix MIB are used to calculate the number of available Kb in the file systems.
Monitoring System Resources Creating System Resource Monitoring Requests Creating System Resource Monitoring Requests Table 8-5 shows examples of how you might monitor system resources. Table 8-5 Examples of System Resource Requests To be alerted when...
Monitoring System Resources Creating System Resource Monitoring Requests for the longer job queues because the longer something is in the queue, the more likely it is that a node needs to be load-balanced.
Monitoring System Resources Creating System Resource Monitoring Requests 82 Chapter 8
Dictionary File Command Line Options A Dictionary File Command Line Options This appendix lists the command line options available for the MIB Monitors. Typically, these options may be added to the dictionary file entry that describes how to launch a particular monitor.
Dictionary File Command Line Options MIB Monitor Command-Line Options MIB Monitor Command-Line Options Perform additional MIB monitor configuration by using the following command-line options. This includes the HA Cluster Monitor, HA Network Interface Monitor, and HA System Resource Monitor. Specify one or more of the below listed options in the MONITOR statement of the dictionary file for each monitor. For example, within the file: /etc/opt/resmon/dictionary/mibmond.dict.
Troubleshooting B Troubleshooting This section gives hints on testing your monitoring requests, and gives you some information about log files and monitor behavior that will help you determine the cause of problems. For information on fixing problems detected by monitors, see the list of related publications in the Preface.
Troubleshooting EMS Directories and Files EMS Directories and Files EMS files are located in /etc/opt/resmon and /opt/resmon. Table B-1 lists files and directories that might help you determine the cause of some problems: Table B-1 EMS Directories and Files /etc/opt/resmon/config This file determines how often EMS checks that monitors are running (have not died). /etc/opt/resmon/dictionary This directory contains resource dictionaries for the various monitors.
Troubleshooting EMS Directories and Files Table B-1 EMS Directories and Files /etc/opt/resmon/log This is a directory of log files used by EMS: • client.log stores calls made by clients, such as MC/ServiceGuard or the SAM interface to EMS. • api.log stores api calls made by monitors. • registrar.log contains errors found when reading the resource dictionary. • emsagent.log is the SNMP subagent responsible for sending EMS events through an SNMP trap.
Troubleshooting Logging and Tracing Logging and Tracing Use logging for most troubleshooting activities. By default the monitors log to api.log. Logging to /var/adm/syslog/syslog.log is ON by default for the disk monitor and OFF by default for the remaining monitors. Tracing should only be used when instructed to do so by HP support personnel. This is not available with all monitors. EMS Logging Log files in /etc/opt/resmon/log / contain information logged by the monitors. Look at the client.
Troubleshooting Logging and Tracing mkdir /newpath/resmon mv /etc/opt/resmon/log /newpath/resmon # create /newpath/resmon/log # remove /etc/opt/resmon/log ln -s /newpath/resmon/log /etc/opt/resmon/log NOTE EMS requires that /etc/opt/resmon, the parent directory, reside on the root file system. Do not move all of /etc/opt/resmon to another file system. High Availability Monitors High availability monitors provide additional logging support. NOTE Logging will occur at every polling interval.
Troubleshooting Logging and Tracing Kill the monitor process. The monitor will automatically restart with tracing enabled. To speed up monitor restart, use the resls command with the top level of the resource class as an argument, for example, resls /system. Tracing is customarily logged to /etc/opt/resmon/log/monitor_name.log. The monitor_name usually matches the name used for the monitor in the dictionary file. For example, the MIBmonitor uses mibmond.dict and mibmond.log.
Troubleshooting Performance Considerations Performance Considerations Monitoring your system is important to maintain high availability, but monitoring consumes system resources. You must carefully consider your performance needs against your need to know as soon as possible when a failure threatens availability. System Performance Issues The primary performance impact will be related to the polling interval and the number of resources being monitored.
Troubleshooting Testing Monitor Requests Testing Monitor Requests To test that events are being sent, use the INITIAL option available with conditional notification when creating a monitoring request. This option sends notification on startup. Examine it to make sure your request is properly configured and showing up in the correct system management tool. An alternative is to use the “At each interval” notification to test that events are being sent in the correct system management tool.
Troubleshooting Testing Monitor Requests human. Because the monitors are persistent, monitoring requests are kept when you install a new monitor or update an existing monitor. If a condition, such as “status > 3” is being monitored for a resources that has a range of 1-7, and new version of monitor is installed that supports a new status value, such as “8”, you may start seeing notifications for “status=8”.
Troubleshooting MIB Monitor Troubleshooting MIB Monitor Troubleshooting The MIB monitors that ship with EMS rely on various SNMP MIBs and need to have HP-UX SNMP subagents configured correctly and be running, before they can reliably report on the status of their resources. Other monitors that may be added might also need special SNMP configurations. Review the following troubleshooting hints to help ensure that your environment is set up correctly: • Refer to the standard /var/adm/syslog/syslog.log file.
Troubleshooting MIB Monitor Troubleshooting NOTE On HP-UX version 10.20, if trapdestagt was running, it might need to be restarted manually with the command /usr/sbin/trapdestagt. NNM or OV depends on trapdestagt to set up SNMP trap notification on managed systems. • If changes are needed to dictionary files, stop any MIB monitors that are already running. If changes are needed to snmpd.conf, stop and restart HP SNMP, following the procedure above.
Troubleshooting MIB Monitor Troubleshooting 96 Appendix B
Glossary A-H alert An event. A message sent to tell a user or application when that certain conditions are met, an action or state you want to know about. For example, you may want to be alerted when a disk fails or when available filesystem space falls below a certain level. asynchronous monitor A monitor that monitors resource instances (or resource class) asynchronously. It is event driven and send notifications when events occur.
logical extent The basic allocation unit for a logical volume is called a logical extent. For mirrored logical volumes, either two or three physical extents are mapped for each logical extent, depending on whether you are using 2-way or 3-way mirroring. logical volume The segments of spaces that can be separated physically on a disk or be on serial disks. Each collection appears to the operating system as a single disk.
physical extent LVM divides each physical disk into addressed units called physical extents. physical volume A disk that has been initialized by LVM. polling The process by which a monitor obtains the most recent status of a resource. The method is defined by the monitor when it is created. polling interval Determines the maximum amount of elapsed time before the monitor knows about a change in resource status. protocol The method used to send notification messages.
resource dictionary A set of files that provide to the registrar a hierarchy of resources on the local system and respective resource monitors. resource instance The actual resources that can be monitored. For example, /net/interfaces/lan/status/lan0 may refer to a particular network interface installed on the monitored system. See resource class. resource monitor The process that is used to obtain the status of a resource and send event notifications if appropriate.
Index A api.log file, 88 asynchronous monitors, 39 asynchronous monitors, 32 C classes, 31 cluster resources, 64 system resources, 76 client.
Index syslog, 45 TCP, 42 textlog, 46 UDP, 42 notification options, 40 notification protocol, 42 SNMP, 43 Notify at each interval, 37 Notify when value changes, 37 Notify when value is...