Using EMS HA Monitors

ManualsBrandsHP ManualsSoftwareHP HA Monitors Software

B5735-90001

August 1997

Summary of content (87 pages)

PAGE 2
Legal Notices The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.
PAGE 3
Contents 1. Installing and Using EMS What are EMS HA Monitors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 The Role of EMS HA Monitors in a High Availability Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 Installing and Removing EMS HA Monitors . . . . . . . . . . . . . . . . . . . . . .15 Installing EMS HA Monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 Removing EMS HA Monitors. . . . . . . . . . . . . . . . . . . . . .
PAGE 4
Contents Rules for Using the EMS Disk Monitor with MC/ServiceGuard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rules for RAID Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding PVGs to Existing Volume Groups . . . . . . . . . . . . . . . . . . . . Creating Volume Groups on Disk Arrays Using PV Links . . . . . . . Creating Logical Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 5
Contents 5. Monitoring System Resources System Monitor Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68 Number of Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69 Job Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70 Filesystem Available Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71 Creating System Resource Monitoring Requests. . . . . . . . . . .
PAGE 6
Contents 6
PAGE 7
Printing History Table 1 Printing Date August 1997 Part Number B5735-90001 Edition Edition 1. This edition documents material related to installing and configuring the Event Monitoring Service (EMS). This printing date and part number indicate the current edition. The printing date changes when a new edition is printed. (Minor corrections and updates which are incorporated at reprint do not cause the date to change.) The part number changes when extensive technical changes are incorporated.
PAGE 8
PAGE 9
Preface This guide describes how to install and configure the Event Monitoring Service to monitor system health, and how to use EMS in conjunction with availability software such as MC/ServiceGuard and IT/O: Related Publications • Chapter 1, “Installing and Using EMS” presents the exact steps required to install and use the software on your system or cluster. • Chapter 2, “Monitoring Disk Resources”, gives guidelines on using the disk monitor, including using it with MC/ServiceGuard.
PAGE 10
Problem Reporting If you have any problems with the software or documentation, please contact your local Hewlett-Packard Sales Office or Customer Service Center.
PAGE 11
1 Installing and Using EMS EMS HA Monitors (Event Monitoring Service High Availability Monitors) aids in providing high availability in an HP-UX environment by monitoring particular system resources and then informing target applications (e.g. MC/ServiceGuard) when the resources they monitor are at critical user-defined values.
PAGE 12
Installing and Using EMS What are EMS HA Monitors? What are EMS HA Monitors? EMS HA Monitors (Event Monitoring Service High Availability Monitors) are a set of monitors and a monitoring service that polls a local system or application resource and sends messages when events occur. An event can simply be defined as something you want to know about. For example, you may want to be alerted when a disk fails or when available filesystem space falls below a certain level.
PAGE 13
Installing and Using EMS What are EMS HA Monitors? Monitors are applications written to gather and report information about specific resources on the system. They use system information stored in places like /etc/lvmtab and the MIB database. When you make a request to a monitor, it polls the system information and sends a message to the framework, which then interprets the data to determine if an event has occurred and sends messages in the appropriate format.
PAGE 14
Installing and Using EMS The Role of EMS HA Monitors in a High Availability Environment The Role of EMS HA Monitors in a High Availability Environment The weakest link in a high availability system is the single point of failure. EMS HA Monitors can be used to report information that helps you detect loss of redundant resources, thus exposing single points of failure, a threat to data and application availability.
PAGE 15
Installing and Using EMS Installing and Removing EMS HA Monitors Installing and Removing EMS HA Monitors NOTE To make best use of EMS HA Monitors, install and configure them on all systems in your environment. Because EMS monitors resources for the local system only, you need to install EMS on every system to monitor all systems. EMS HA Monitors run on HP 9000 Series 800 systems running HP-UX version 10.20 or later.
PAGE 16
Installing and Using EMS Installing and Removing EMS HA Monitors Removing EMS HA Monitors Use swremove or the Software Management tools under SAM to remove EMS. Note that because the monitors are persistent, that is, they are always automatically started if they are stopped, it is likely you will have warnings in your removal log file that say, “Could not shut down process” or errors that say “File /etc/opt/resmon/lbin/p_client could not be removed.
PAGE 17
Installing and Using EMS Using EMS HA Monitors Using EMS HA Monitors There are two ways to use EMS HA Monitors: • Configure monitoring requests from the EMS interface in the Resource Management area of SAM. • Configure package dependencies in MC/ServiceGuard by using the Package Configuration interface in the High Availability Clusters subarea of SAM or by editing the package ASCII configuration file.
PAGE 18
Installing and Using EMS Using EMS HA Monitors Configuring EMS Monitoring Requests Outside of MC/ServiceGuard This section describes the steps from the SAM interface to EMS to create monitoring requests that notify non-MC/ServiceGuard management applications such as IT/Operations.
PAGE 19
Installing and Using EMS Using EMS HA Monitors Selecting a Resource to Monitor All resources are divided into classes. When you double-click on Add Monitoring Request in the Actions menu, the top-level classes for all installed monitors are dynamically discovered and then listed.
PAGE 20
Installing and Using EMS Using EMS HA Monitors Some Hewlett-Packard products, such as ATM or HP OTS 9000, provide EMS monitors. If those products are installed on the system, then their top-level classes will also appear here. Similarly, top-level classes belonging to user-written monitors, created using the EMS Developer’s Kit, will be discovered and displayed here.
PAGE 21
Installing and Using EMS Using EMS HA Monitors Wildcards are available only when all instances of a subclass are the same resource type. Wildcards are not available for resource classes. So, for example, a wildcard is available for the status instances in the /vg/vgName/pv_pvlink/status subclass, but no wildcard appears for the volume group subclasses under the /vg resource class. Creating a Monitoring Request The screen in Figure 1-6 shows where you specify when and how to send events.
PAGE 22
Installing and Using EMS Using EMS HA Monitors How Do I Tell EMS When to Send Events? While the monitor may be polling disks every 5 minutes, for example, you may only want to be alerted when something happens that requires your attention. When you create a request, you specify the conditions under which you receive an alert. Here are the terms under which you can be notified: When value is... You define the conditions under which you wish to be notified for a particular resource using an operator (e.g.
PAGE 23
Installing and Using EMS Using EMS HA Monitors NOTE Updated monitors may have new status values that change the meaning of your monitoring requests, or generate new alerts. For example, assume you have a request for notification if status > 3 for a resource with a values range of 1-7. You would get alerts each time the value equaled 4, 5, 6, or 7. If the updated version of the monitor has a new status value of 8, you would see new alerts when the resource equalled 8.
PAGE 24
Installing and Using EMS Using EMS HA Monitors You may specify the ITO message severity for both normal and abnormal events: • Normal • Warning • Critical • Minor • Major The ITO application group is EMS(HP), the message group, HA, and the object is the full path of the resource being monitored. See HP OpenView IT/Operations Administrators Task Guide (P/N B4249-90003) for more information on configuring notification severity.
PAGE 25
Installing and Using EMS Using EMS HA Monitors Copying Monitoring Requests There are two ways to use the copy function: • To create requests for many resources using the same monitoring parameters, select the monitoring request in the main screen and choose Actions: Copy Monitoring Request. You need to have configured at least one similar request for a similar instance. Choose a different resource instance in the Add a Monitoring Request screen, and click in the Monitoring Request Parameters screen.
PAGE 26
Installing and Using EMS Using EMS HA Monitors Configuring MC/ServiceGuard Package Dependencies This section describes how to use SAM to create package dependencies on EMS resources. This creates an EMS request to monitor that resource and to notify MC/ServiceGuard when that resource reaches a critical user-defined level. MC/ServiceGuard will then failover the package.
PAGE 27
Installing and Using EMS Using EMS HA Monitors Figure 1-7 Package Configuration Screen Click on “Specify Package Resource Dependencies...” to add EMS resources as package dependencies; you see a screen similar to Figure 1-8. If you click “Add Resource”, you get a screen similar to Figure 1-7 on page 27.
PAGE 28
Installing and Using EMS Using EMS HA Monitors Figure 1-8 Package Resource Dependencies Screen When you select a resource, either from the “Add a Resource” screen, or from the “Package Resource Dependencies” screen by selecting a resource and clicking “Modify Resource Dependencies...” you get a screen similar to Figure 1-9. To make a package dependent on an EMS resource, select a Resource Up Value from the list of Available Resource Values, then click “Add.
PAGE 29
Installing and Using EMS Using EMS HA Monitors Figure 1-9 Resource Parameters Screen You can also add resources as package dependencies by modifying the package configuration file in /etc/cmcluster/pkg.ascii. See Managing MC/ServiceGuard for details on how to modify this file.
PAGE 30
Installing and Using EMS Using EMS HA Monitors 30 Chapter 1
PAGE 31
2 Monitoring Disk Resources This section recommends ways to configure requests to the disk monitor for most high availability configurations.
PAGE 32
Monitoring Disk Resources You can monitor the following SE (single-ended) or F/W (fast/wide) SCSI disks: • Hewlett-Packard High Availability Disk Array, Models 10, and 20 • Hewlett-Packard Disk Array with AutoRAID, Models 12 and 12H • EMC Symmetrix arrays • High Availability Storage System • Single-spindle SCSI disks HP-IB and HP-FL disks are not supported by the disk monitor. FiberChannel disks are not yet supported.
PAGE 33
Monitoring Disk Resources Disk Monitor Reference Disk Monitor Reference The EMS disk monitor reports information on the physical and logical volumes configured by LVM (Logical Volume Manager). Anything not configured through LVM is not monitored from the disk monitor. Monitored disk resources are: • Physical volume summary (/vg/vgName/pv_summary), a summary status of all physical volumes in a volume group.
PAGE 34
Monitoring Disk Resources Disk Monitor Reference Physical Volume Summary The pv_summary is a summary status of all physical volumes in a volume group. This status is based on the compiled results of SCSI inquiries to all physical volumes in a volume group; see “Physical Volume and Physical Volume Link Status” on page 36. If you have configured package dependencies in MC/ServiceGuard, this resource is used to determine package failover based on access to physical disks.
PAGE 35
Monitoring Disk Resources Disk Monitor Reference The pv_summary resource may not be available for a given volume group in the following cases: • Devices are on an unsupported bus (such as HP-IB or HP-FL) or an unrecognized bus, in the case of a new bus technology. The /etc/syslog entry would say: diskmond[5699]: pv_summary will be unavailable for /dev/vg00 because there are physical volumes in this volume group which are on an unrecognized bus.
PAGE 36
Monitoring Disk Resources Disk Monitor Reference Physical Volume and Physical Volume Link Status Requests to monitor physical volumes and physical volume links give you status on the individual physical volumes and PV links in a volume group. In the case of most RAID arrays, this means the monitor can talk to the physical link to a logical unit number (LUN) in the array. In the case of stand-alone disks, it means the monitor can talk to the disk itself. The pv_pvlink status is used to calculate pv_summary.
PAGE 37
Monitoring Disk Resources Disk Monitor Reference Logical Volume Summary The logical volume summary tells you how accessible the data is in all logical volumes in an active volume group. Sometimes the physical connection may be working, but the application cannot read or write data on the disk. The disk monitor determines I/O activity by querying LVM, and marks a logical volume as DOWN if a portion of its data is unavailable.
PAGE 38
Monitoring Disk Resources Disk Monitor Reference Logical Volume Status Logical volume status gives you status on each logical volume in a volume group. While the lv_summary tells whether data in a volume group is available, the lv/status/lvName will tell you whether specific logical volumes have failed. The value in Table 2-4 is used by the disk monitor to determine how conditions compare in logical operations.
PAGE 39
Monitoring Disk Resources Disk Monitor Reference Logical Volume Number of Copies The logical volume number of copies is most useful to monitor in a mirrored disk configuration. It tells you how many copies of the data are available. MirrorDisk/UX supports up to 3-way mirroring, so there can be from 0 to 3 copies (see Table 2-5.) In a RAID configuration that is not mirrored using LVM, the only possible number is 0 or 1; either the data is accessible or it isn’t.
PAGE 40
Monitoring Disk Resources Rules for Using the EMS Disk Monitor with MC/ServiceGuard Rules for Using the EMS Disk Monitor with MC/ServiceGuard The disk monitor is designed especially for use with MC/ServiceGuard to provide package failover if host adapters, busses, controllers, or disks fail. Here are some examples: • In a cluster where one copy of data is shared between all nodes in a cluster, you may want to fail over a package if the host adapter has failed on the node running the package.
PAGE 41
Monitoring Disk Resources Rules for Using the EMS Disk Monitor with MC/ServiceGuard The pv_summary is calculated based on the compiled results of SCSI inquiries to all physical volumes in a volume group. To help you determine the best way to configure your disks for monitoring, here are the assumptions made when calculating pv_summary: • PVGs (physical volume groups) are set up to be bus-specific sides of a mirror or redundant links and have an equal number of physical volumes.
PAGE 42
Monitoring Disk Resources Rules for Using the EMS Disk Monitor with MC/ServiceGuard Table 2-6 is a summary of how pv_summary is calculated where • n is the number of paths for the volume group in /etc/lvmtab, (physical volumes, paths, or LUNs). • p is the number of PVGs physical volume groups in the volume group. • x is the number of paths currently available from a SCSI inquiry.
PAGE 43
Monitoring Disk Resources Rules for Using the EMS Disk Monitor with MC/ServiceGuard links are not configured in separate PVGs, the disk monitor sees all links to the array as one physical volume, so if one link fails, pv_summary will register DOWN, and your package will fail over, even if the other link is still up and data is available. The following sections describe how to make sure your PV links are in physical volume groups.
PAGE 44
Monitoring Disk Resources Rules for Using the EMS Disk Monitor with MC/ServiceGuard type of disk array you wish to configure, and follow the menus to define alternate links. Be sure to specify a different physical volume group for each link to the same disk. The following example shows how to configure alternate links using LVM commands. In the example, the following disk configuration is assumed: 8/0.15.0 8/0.15.1 8/0.15.2 8/0.15.3 8/0.15.4 8/0.15.
PAGE 45
Monitoring Disk Resources Rules for Using the EMS Disk Monitor with MC/ServiceGuard 4.
PAGE 46
Monitoring Disk Resources Rules for Using the EMS Disk Monitor with MC/ServiceGuard Rules for Mirrored Individual Disks The following rules apply to configuring mirrored disks for use with MC/ServiceGuard and EMS monitoring: • Mirroring must be PVG-strict. Mirrored volumes must reside on a different bus from the original volume to avoid a single point of failure and to obtain the best pv_summary value for that mirror. This is done automatically by LVM if you created the PVGs while setting up mirroring.
PAGE 47
Monitoring Disk Resources Creating Disk Monitoring Requests Creating Disk Monitoring Requests There are two ways to create disk monitor requests from: • the SAM interface to EMS to send alerts to HP OpenView ITO, ClusterView, or Network Node Manager. • MC/ServiceGuard to configure any disk monitor resource as a package dependency. These requests are not exclusive: you can configure the disk monitor from both MC/ServiceGuard and the SAM interface to EMS.
PAGE 48
Monitoring Disk Resources Creating Disk Monitoring Requests Disk Monitoring Request Suggestions The examples listed in Table 2-7 are valid for both RAID and mirrored configurations. For examples on configuring MC/ServiceGuard dependencies, see Chapter 1, “Configuring MC/ServiceGuard Package Dependencies”. Table 2-7 Suggestions for Creating Disk Monitor Requests To be alerted when...
PAGE 49
Monitoring Disk Resources Creating Disk Monitoring Requests The following screens step you through creating a disk monitor request. Assume you want to be alerted when any disks fail and when they are back up. Figure 2-2 shows you can select all instances of pv_pvlink, so you only have to enter the parameters once for each volume group.You still need to create multiple pv_pvlink requests, one for each volume group on your system. Click OK to set monitoring parameters.
PAGE 50
Monitoring Disk Resources Creating Disk Monitoring Requests Assume you have a great need to know the status of your system at all times. You would need a short polling interval, perhaps between 30 and 120 seconds. (If you notice the disk monitor consumes too much CPU, you may want to set a longer polling interval.) Assume also that you want an Initial event sent to make sure the request is configured properly. You would want to set the Return option to send an event when disks come back up.
PAGE 51
Monitoring Disk Resources Creating Disk Monitoring Requests Resources to Monitor for RAID Arrays These considerations are relevant to all supported RAID configurations listed at the beginning of this chapter.
PAGE 52
Monitoring Disk Resources Creating Disk Monitoring Requests Each LUN on the RAID array is in it own volume group: vgdance and vgsing. Assume this is one node in a 2-node cluster and you want to be notified when there is a failover, when any physical device fails, and when any logical volume becomes unavailable. To be notified when a package fails over, you must configure an EMS request that is the same as the package dependency you configured in MC/ServiceGuard.
PAGE 53
Monitoring Disk Resources Creating Disk Monitoring Requests Resources to Monitor for Mirrored Disks This section is valid for mirrored disks created with MirrorDisk/UX. Mirroring is required to be PVG-strict if you are using the disk monitor. Mirrored configurations that are not PVG-strict will not give you a correct pv_summary.
PAGE 54
Monitoring Disk Resources Creating Disk Monitoring Requests To configure the EMS alerts, create the following requests on each node: Monitoring Parameters Resource Notify Condition Option /vg/vg01/pv_summary when value is... >= PVG_UP RETURN /vg/vg01/lv_summary when value is... >= INACTIVE RETURN /vg/vg01/lv/copies/* when value is... <= 1 RETURN Alerts need to be interpreted in relation to each other. In the table above, you would get an alert when PVG_UP is true.
PAGE 55
Monitoring Disk Resources Creating Disk Monitoring Requests Table 2-9 Example for Interpreting the pv_summary for Mirrored Disks number of valid devices pv_summary value meaning 10 all PVs and data accessible UP 9 1 PV down, all data accessible PVG_UP 8-5 if 5 PVs are from the same PVG, then all data is available PVG_UP if 2 or more physical volumes from different PVGs are DOWN, the disk monitor cannot conclude that all data is available SUSPECT 4-1 some data missing 0 no data available
PAGE 56
Monitoring Disk Resources Creating Disk Monitoring Requests Resources to Monitor for Root Volumes In a high availability system, it is recommended that you mirror your root volume, and have them on separate links in separate PVGs. Note that the root volume should always be ACTIVE. Requests to monitor the root volume might look like this: Monitoring Parameters Resource Notify Condition Option /vg/vg00/pv_pvlink/c0t0d0 when value is... >= BUSY REPEAT /vg/vg00/pv_pvlink/c1t0d0 when value is...
PAGE 57
3 Monitoring Cluster Resources The EMS cluster monitor gives you the ability to send events regarding the status of a cluster. If you have OpenView, we recommend using HP ClusterView to monitor cluster status and receive cluster events. The EMS cluster monitor is primarily for use with non-OpenView systems, e.g. CA UniCenter.
PAGE 58
Monitoring Cluster Resources Cluster Monitor Reference Cluster Monitor Reference The cluster monitor is useful in environments not running HP OpenView ClusterView. The cluster monitor reports information on the status of the cluster to which the local node belongs. The resources monitored are: • Cluster status, (/cluster/status/clusterName), a summary of the state of all nodes in the cluster name.
PAGE 59
Monitoring Cluster Resources Cluster Monitor Reference Cluster Status The cluster status is the status of the MC/ServiceGuard cluster to which this node belongs.The status is from the perspective of the node for which the request was created. The MIB variable hpmcClusterState, which is part of the hp-mcCluster MIB, provides the cluster status information to the monitor. The cmviewcl -v command displays detailed information about the current status of the cluster and packages on the cluster.
PAGE 60
Monitoring Cluster Resources Cluster Monitor Reference Node Status The node status is the current status of a node relative to a particular cluster. The MIB variable hpmcClusterState, which is part of the hp-mcCluster MIB, provides the node status information to the monitor. The cmviewcl -v command displays detailed information about the current status of the cluster and packages on the cluster.
PAGE 61
Monitoring Cluster Resources Cluster Monitor Reference Package Status The package status is the status of each package running on this node. The MIB variable hpmcClusterState, which is part of the hp-mcCluster MIB, provides the package status information to the monitor. The cmviewcl -v command displays detailed information about the current status of the cluster and packages on the cluster.
PAGE 62
Monitoring Cluster Resources Creating Cluster Monitoring Requests Creating Cluster Monitoring Requests For most MC/ServiceGuard or cluster configurations, we suggest creating the following requests on each node for each cluster to which the node belongs: Table 3-4 Recommended Cluster Requests Monitoring Parameters Resources to monitor Notify Value Option /cluster/status/clusterName when value is not equal UP INITIAL /cluster/localNode/status/clusterName when value is not equal RUNNING INITIAL
PAGE 63
4 Monitoring Network Interfaces The network interface monitor detects whether your LAN interface is up or down. It allows you to send events to a system management interface as an alternative to looking in syslog for LAN status.
PAGE 64
Monitoring Network Interfaces Network Monitor Reference Network Monitor Reference The network monitor provides status on the LAN interfaces in a given node. It monitors all interfaces visible when you run the lanscan command on a system. Figure 4-1 Network Monitor Resource Class Hierarchy /net /interfaces /lan /status /LANname The MIB variable ifOperStatus, which is part of MIB-2, provides the LAN interface status to the monitor. The MIB value of TESTING is reported by the monitor as DOWN.
PAGE 65
Monitoring Network Interfaces Network Monitor Reference Standby LANs are reported as DOWN unless they have been activated to replace a failed LAN interface. The minimum polling interval is 30 seconds.
PAGE 66
Monitoring Network Interfaces Configuring Network Monitoring Requests Configuring Network Monitoring Requests Table 4-2 recommends monitoring requests for each node. With these requests, you would see events when a LAN card fails, and again when it came back up, and you would see an event each hour as long as the LAN card was down. You may elect to change the polling interval or not to configure a reminder at all.
PAGE 67
5 Monitoring System Resources The system resource monitor gives you the ability to send events about the number of users, available file system space, and job queues to help you load balance and tune your system to keep it available. It is an alternative to reading syslog files to get this information.
PAGE 68
Monitoring System Resources System Monitor Reference System Monitor Reference The system monitor reports information on system resources: Figure 5-1 • number of users, (/system/numUsers) tells you the number of users on a given node. • job queues, (/system/jobQueue1Min, /system/jobQueue5Min, and /system/jobQueue15Min) tells you the number of processes waiting for CPU and performing disk I/O as an average over 1, 5, and 15 minutes respectively.
PAGE 69
Monitoring System Resources System Monitor Reference Number of Users The number of users tells you how many users are logged in to a given system. The MIB variable computerSystemUsers from the hp-unix MIB provides the resource value to the monitor. To verify the number of users on the system, use the uptime (1) command. Table 5-1 Resource Name /system/numUsers Interpreting Number of Users Value Range integer Interpretation Total number of users logged in to the node.
PAGE 70
Monitoring System Resources System Monitor Reference Job Queues The job queue monitor checks the average number of processes that have been waiting for CPU and performing disk I/O over the last 1, 5, or 15 minutes. A value of 4 in /system/jobQueue5Min means that at the time of polling there was an average of 4 jobs in the queue over the last 5 minutes.
PAGE 71
Monitoring System Resources System Monitor Reference Filesystem Available Space The filesystem monitor checks the number of megabytes available for use in each file system on the node. File systems must be mounted and active to be monitored. File systems mounted over the network, such as NFS file systems, are not monitored. The MIB variables fileSystemBavail, and fileSystemBsize from the hp-unix MIB are used to calculate the number of available Kb in the file systems.
PAGE 72
Monitoring System Resources Creating System Resource Monitoring Requests Creating System Resource Monitoring Requests Table 5-4 shows examples of how you might monitor system resources. Table 5-4 Examples of System Resource Requests Monitoring Parameters To be alerted when...
PAGE 73
6 Troubleshooting This section gives hints on testing your monitoring requests, and gives you some information about log files and monitor behavior that will help you determine the cause of problems. For information on fixing problems detected by monitors; see the list of related publications in the Preface.
PAGE 74
Troubleshooting EMS Directories and Files EMS Directories and Files EMS files are located in /etc/opt/resmon and /opt/resmon. The following is a description of files and directories that might help you determine the cause of some problems: /etc/opt/resmon/config A file that sets the restart interval for monitor persistence. /etc/opt/resmon/dictionary A directory that contains resource dictionaries for the various monitors. The disk monitor resources are listed in diskmond.
PAGE 75
Troubleshooting EMS Directories and Files /etc/opt/resmon/log A directory of log files used by EMS: client.log stores calls made by clients, such as MC/ServiceGuard or the SAM interface to EMS. api.log stores api calls made by monitors. registrar.log contains errors found when read the resource dictionary. /opt/resmon/resls A command that lists the latest polled status of the specified resource on a specified system.
PAGE 76
Troubleshooting Logging and tracing Logging and tracing Use logging for most troubleshooting activities. By default the monitors log to api.log and client.log. Logging to /var/adm/syslog/syslog.log is on by default for the disk monitor and off by default for the remaining monitors. Tracing should only be used when instructed to do so by HP support personnel. EMS Logging As mentioned in the previous section, log files in /etc/opt/resmon/log / contain information logged by the monitors. Look at the client.
PAGE 77
Troubleshooting Logging and tracing Entries in /var/adm/syslog/syslog.log are marked with the monitor daemon name, e.g. diskmond or fsmond, followed by the resource name and logging data. Additions, deletions, notifications, and changes in resource states are logged. Errors explaining why a resource is not available for monitoring, or why the monitor cannot access a resource are also logged in /var/adm/syslog/syslog.log. Look at the registrar.
PAGE 78
Troubleshooting Performance Considerations Performance Considerations Monitoring your system, although an important part of high-availability, consumes system resources. You must carefully consider your performance needs against your need to know as soon as possible when a failure threatens availability. System Performance Issues The primary performance impact will be related to the polling interval and the number of resources being monitored.
PAGE 79
Troubleshooting Testing Monitor Requests Testing Monitor Requests To test that events are being sent, use the INITIAL option available with conditional notification when creating a monitoring request. This option sends an initial event that you can examine to make sure your request is properly configured and showing up in the correct system management tool. An alternative is to use the “At each interval” notification to test that events are being sent in the correct system management tool.
PAGE 80
Troubleshooting Testing Monitor Requests Making Sure Monitors are Running Monitor daemons automatically start when you create a request to monitor something. Because monitoring is designed to work in a high availability environment, monitors are written to automatically restart if anything causes them to fail. A daemon called p_client restarts all appropriate monitors using the monitor restart interval defined in /etc/opt/resmon/config.
PAGE 81
Glossary A-H alert An event. A message sent to warn a user or application when certain conditions are met. client The application that creates or cancels requests to monitor particular resources. The consumer of a resource status message. A user of the Resource Monitor framework. This user may browse resources, request status, and make requests to have resources monitored. Examples are MC/ ServiceGuard as it starts a package or the SAM interface to EMS.
PAGE 82
Glossary MIB II (MIB2) A MIB that defines information about the system, the network interface cards it contains, routing information it contains, the TCP and UDP sockets it contains and their states, and various statistics related to error counts. This MIB is widely adopted and is served by most IPaddressed devices. Most system and network resources managed by EMS HA Monitors are taken from this MIB. monitor See resource monitor. N-P notification See alert.
PAGE 83
Glossary resource instance The actual monitorable resource. For example, /net/ interfaces/lan/status/lan0 may refer to a particular network interface installed on the monitored system. resource monitor A framework for selecting resources of interest and monitoring them according to the user's criteria. When the resource value matches the user's criteria, a notification is sent according to the user's instructions. users as logical volumes.
PAGE 84
Glossary 84 Glossary
PAGE 85
Index A alternate links creating volume groups with, 44 API for EMS, 12 api.log file, 76 C calculating pv_summary, 42 classes, 17, 20 cluster resources, 58 system resources, 68 client.
PAGE 86
Index monitor persistence, 15, 80 monitor request example disk monitor request, 50 monitoring disk space, 71 monitoring filesystem space, 71 monitoring request cluster status, 59 copying, 25 creating, 21 creating comments, 24 for clusters, 57 for disk monitor, 47 for lock disks, 55 for mirror disks, 53 for root volmes, 56 lan interfaces, 64 modifying, 25 node status, 60 number of users, 69 package status, 61 polling interval, 23 removing, 25 system resources, 68 testing, 79 monitoring system load, 68, 70 m
PAGE 87
Index testing job queue monitoring requests, 70 testing lan monitoring requests, 64 testing monitoring requests, 79 testing number of users monitoring requests, 69 tracing, 77 U UP value, 28 updating monitors, 23 users, monitoring number on system, 69 V volume group creating, 44 creating for a cluster, 44 volume groups active, 37 W wildcard, 20, 36, 38, 39 Index 87