Using High Availability Monitors (June 2003)

ManualsBrandsHP ManualsSoftwareHP HA Monitors Software

Using High Availability Monitors

Manufacturing Part Number: B5736-90046

E0603

Summary of content (102 pages)

PAGE 2
Legal Notices The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.
PAGE 3
Copyright Notice Copyright  1997-2003 Hewlett-Packard Development Company, L.P. All rights reserved. Reproduction, adaptation, or translation of this document without prior written permission is prohibited, except as allowed under the copyright laws. High Availability Monitors, Event Monitoring Service, HP OpenView, HP OpenView IT/Operations, ServiceGuard Extension for RAC, and MC/ServiceGuard are products of Hewlett-Packard Development Company L.P., and all are protected by copyright.
PAGE 4
PAGE 5
Contents 1. Understanding the Event Monitoring Service Event Monitoring Service Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EMS Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EMS Resource Classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Client and Target Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
Contents Resources to Monitor for Root Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Excluding Volume Groups from being Monitored . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3. Monitoring Database Resources Database Monitor Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Database Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 7
Printing History Table 1 Printing Date Part Number Edition August 1997 B5735-90001 Edition 1 October 1998 B5736-90006 Edition 2 March 1999 B5736-90012 Edition 3 May 1999 B5736-90018 Edition 4 August 1999 B5736-90022 Edition 5 November 1999 B5736-90025 Edition 6 June 2003 B5736-90046 Edition 7 This edition documents configuring High Availability Monitors. The printing date changes when a new edition is printed.
PAGE 8
PAGE 9
Preface This guide describes how to install and configure the High Availability Monitors to monitor system health. The chapters are as follows: • “Understanding the Event Monitoring Service,” which describes the Event Monitoring Service components and operations, including the role of High Availability Monitors. • “Monitoring Disk Resources,” which provides guidelines on using the disk monitor, including using it with MC/ServiceGuard or ServiceGuard OPS Edition.
PAGE 10
• Tom Madell, Disk and File Management Tasks on HP-UX (ISBN 0-13-518861-X). HP Press; Prentice Hall, Inc., 1997. • Managing Systems and Workgroups (HP Part Number B2355-90664) • HP OpenView IT/Operations Admnistrator’s Reference (HP Part Number B6941-90001) • Managing Highly Available NFS (HP Part Number B5125-90001) • http://docs.hp.
PAGE 11
Understanding the Event Monitoring Service 1 Understanding the Event Monitoring Service This document, Using High Availability Monitors, describes how to configure high availability monitors. The chapters in this book are specific to each HA Monitor and describe the options, settings, and provide suggestions for configuring your HA Monitor. HA Monitors is part of a total Event Monitoring Service. This chapter describes the components of the Event Monitoring Service.
PAGE 12
Understanding the Event Monitoring Service Event Monitoring Service Overview Event Monitoring Service Overview The Event Monitoring Service (EMS) monitors system resources. EMS is used by system administrators to configure monitoring requests, check resource status, and send notification when configured conditions are met. EMS can work in a high availability environment. It can report a loss of redundant resources.
PAGE 13
Understanding the Event Monitoring Service Event Monitoring Service Overview — email This option does not require any extra handling. Specify the email address when the monitoring request is created. — syslog and textlog This option does not require any extra handling. Specify the log file when the monitoring request is created. Syslog notifications go to the local system. — console This option does not require any extra handling. Specify the console when the monitoring request is created.
PAGE 14
Understanding the Event Monitoring Service Event Monitoring Service Overview Figure 1-1 shows the relationships between the Event Monitoring Service components. Figure 1-1 Event Monitoring Service Components The process is as follows: 1. The system administrator enters the client application, for example, the EMS GUI, or the EMS CLI, to begin the discovery phase of creating a monitoring request. The discovery phase, includes identifying the resources to be monitored and configuring the request.
PAGE 15
Understanding the Event Monitoring Service Event Monitoring Service Overview 2. The EMS API provides the interface between the client request and the registrar. There is a one to one correspondence between the client and registrar. 3. The registrar refers to the dictionary for a list of available resources and related monitors. The resources listed in the dictionary are passed back to the client. 4.
PAGE 16
Understanding the Event Monitoring Service Event Monitoring Service Overview 9. The monitor checks the resource as specified in the monitor request. It passes back to the EMS API whether the request is accepted or rejected. 10. The EMS API provides the interface between the monitor and the target. 11. The monitor begins collecting data as specified in the monitoring request. 12.
PAGE 17
Understanding the Event Monitoring Service EMS Requirements EMS Requirements The following are system requirements for the Event Monitoring Service: • All hardware you intend to monitor, such as disks and LAN cards, have been configured and tested prior to configuring EMS. • EMS must be installed on an HP 9000 Series 700 or Series 800 system running HP-UX version 10.20 or later. When installing one or more EMS components, check that the version levels for the other components are compatible.
PAGE 18
Understanding the Event Monitoring Service EMS Resource Classes EMS Resource Classes EMS groups resources into classes in a hierarchy similar to that of a file system structure. Figure 1-2 is a example of a resource hierarchy.
PAGE 19
Understanding the Event Monitoring Service Client and Target Applications Client and Target Applications This section describes some of the client and target application options and processes. Target applications can be written using the EMS API. EMS with ServiceGuard ServiceGuard can be configured with EMS to monitor the health of selected resources, such as disks. Based on the status of the resources, ServiceGuard can decide to fail packages over.
PAGE 20
Understanding the Event Monitoring Service Client and Target Applications 2. Specify when to collect the value. Select either and/or all: • When value is ... If you are setting up a request for an asynchronous monitor, this is the only option available. • When value changes • At each interval Select this option to send an event periodically, regardless of the value. Define a polling interval that is appropriate to your system performance and reaction time needs. See Step 3. 3.
PAGE 21
Understanding the Event Monitoring Service Client and Target Applications • textlog EMS CLI Client Application emscli is a command line utility that is used to configure and manage persistent monitoring requests for Event Monitoring Service (EMS) monitors, such as, HA Monitors, Hardware Monitors and Kernel Monitors. The emscli utility can be used to add, modify, delete, list and view monitoring requests and resources. It also allows the user to generate scripts of the configured requests.
PAGE 22
Understanding the Event Monitoring Service Resource Monitors Resource Monitors Resource monitors are applications written to gather and report information about specific resources on the system. The resource monitor: • Provides a list of resources that can be monitored • Provides information about the resources • Monitors the resources it supports • Provides values to the EMS API notification The EMS framework evaluates the data to determine if an event has occurred.
PAGE 23
Understanding the Event Monitoring Service Resource Monitors Writing Resource Monitors The EMS API provides a method for writing new resource monitors. To create your own monitor, read the Writing Monitors for the Event Monitoring Service (EMS) (HP Part Number B7611-90016) manual and install the developer’s kit. Both are available at the following Website: 1. Go to the Website: http://www.software.hp.com 2.
PAGE 24
Understanding the Event Monitoring Service EMS Framework Components EMS Framework Components This section describes the EMS framework components. The EMS API The EMS API is the interface between the registrar, client applications, target applications, and resource monitors as illustrated in Figure 1-1. The EMS API is provided as part of the EMS product.
PAGE 25
Understanding the Event Monitoring Service EMS Framework Components The registrar does not need to keep any state information and does not need to be highly available. It does not need to be running while a resource is being monitored. The registrar is needed only to start the monitors and to provide communication between clients and monitors. One registrar process is started each time a client application calls rm_client_connect(), so a registrar is always connected to one client.
PAGE 26
Understanding the Event Monitoring Service EMS Framework Components When the registrar needs to pass the request to a resource monitor, it needs to determine if the resource monitor is currently running. If the appropriate resource monitor process is not found, the registrar starts the process and waits until the resource monitor can communicate with the registrar. The Resource Dictionary The resource dictionary is the mechanism by which the resource monitor identifies itself to EMS.
PAGE 27
Monitoring Disk Resources 2 Monitoring Disk Resources This chapter recommends ways to configure requests to the HA Disk Monitor for most high-availability configurations.
PAGE 28
Monitoring Disk Resources HA Disk Monitor Reference HA Disk Monitor Reference The HA Disk Monitor reports information about the physical and logical volumes configured by LVM (Logical Volume Manager). Anything not configured through LVM cannot be monitored from the HA Disk Monitor.
PAGE 29
Monitoring Disk Resources HA Disk Monitor Reference Figure 2-1 shows the class hierarchy for the HA Disk Monitor. Figure 2-1 Disk Monitor Resource Class Hierarchy Bold items are resource instances that can be monitored. Bold italic variables represent specific instances of volume groups, devices, and logical volumes on the system. Physical Volume Summary The pv_summary is the summary status of all physical volumes in a volume group.
PAGE 30
Monitoring Disk Resources HA Disk Monitor Reference Table 2-1 lists how conditions compare in logical operations. Specify the logical operation in the monitor request parameters portion of the monitor request. For example, to create a request that alerts you when the condition is SUSPECT or DOWN, specify greater than or equal to 3 (>=3).
PAGE 31
Monitoring Disk Resources HA Disk Monitor Reference • PVGs (physical volume groups) exist in a volume group, but not all physical volumes are assigned to a PVG. The /var/adm/syslog/ syslog.log entry would say: diskmond[18323]: pv_summary will be unavailable for /dev/vgtest because the physical volume groups (PVGs) in this volume group do not have an equal number of PVs or there are PVs not in a PVG. (DRM-503) • Unequal numbers of physical volumes exist in each PVG in the volume group.
PAGE 32
Monitoring Disk Resources HA Disk Monitor Reference pv_pvlinks and pv_summary supplement lv_summary by giving status on the accessibility of volume groups (both active and inactive) and logical volumes. To pinpoint a failure of a particular disk, bus, or I/O card, you need to use the HA Disk Monitor alerts in conjunction with standard troubleshooting methods: reading log files and inspecting the actual devices.
PAGE 33
Monitoring Disk Resources HA Disk Monitor Reference NOTE If the logical volume is in an inactive volume group, the HA Disk Monitor cannot determine if the data can be accessible. Table 2-3 lists how conditions compare in logical operations. You specify the logical operation in the monitor request parameters portion of the monitor request. For example, to create a request that alerts you when the condition is INACTIVE_DOWN, you would specify greater than or equal to 3 (>=3).
PAGE 34
Monitoring Disk Resources HA Disk Monitor Reference Table 2-3 Interpreting Logical Volume Summary (Continued) Resource Name: /vg/vgName/lv_summary Condition Value DOWN 4 Interpretation At least one logical volume in the volume group reports a status of either INACTIVE or DOWN. Note that an inactive logical volume in an active volume group is rare, but possible. See “Logical Volume Status” on page 34. Logical Volume Status Logical volume status gives you status on each logical volume in a volume group.
PAGE 35
Monitoring Disk Resources HA Disk Monitor Reference Logical Volume Number of Copies The logical volume number of copies is most useful to monitor in a mirrored disk configuration. It tells you how many copies of the data are available. The HA Disk Monitor monitors all copies of data, and therefore counts the “original” as part of the total number of copies. MirrorDisk/UX supports up to 3-way mirroring, so the range can be from 0 to 3 copies (see Table 2-5).
PAGE 36
Monitoring Disk Resources Rules for Using the HA Disk Monitor with ServiceGuard Rules for Using the HA Disk Monitor with ServiceGuard The HA Disk Monitor is designed for use with ServiceGuard to trigger package failover if host adapters, buses, controllers, or disks fail. Here are some examples: • In a cluster where one copy of data is shared between all nodes in the cluster, you may want to fail a package if the host adapter has failed on the node running the package.
PAGE 37
Monitoring Disk Resources Rules for Using the HA Disk Monitor with ServiceGuard Setting Failover Parameters When using the HA Disk Monitor with ServiceGuard, the parameters listed in Table 2-6 should be set so that a package failover will occur when access to a disk resource fails. Table 2-6 Setting Failover Parameters Parameter Setting Notes RUN_SCRIPT_ TIMEOUT non-zero timeout value HALT_SCRIPT_ TIMEOUT non-zero timeout value Do not leave these parameters set to the default, NO_TIMEOUT.
PAGE 38
Monitoring Disk Resources Rules for Using the HA Disk Monitor with ServiceGuard Table 2-7 • n is the number of paths for the volume group in /etc/lvmtab (physical volumes, paths, or LUNs). • p is the number of PVGs (physical volume groups) in the volume group • x is the number of paths currently available from a SCSI inquiry pv_summary Calculations Case Conclusion State x=n All physical volumes and all data are available. UP n>x>=n - (p-1) All data is available.
PAGE 39
Monitoring Disk Resources Rules for Using the HA Disk Monitor with ServiceGuard To use the HA Disk Monitor with ServiceGuard, PV links must be configured in separate PVGS (physical volume groups). This new requirement allows pv_summary to accurately calculate data availability based on physical volume availability, thus including both ACTIVE and INACTIVE volume groups. If PV links are not configured in separate PVGs, the HA Disk Monitor sees all links to the array as one physical volume.
PAGE 40
Monitoring Disk Resources Rules for Using the HA Disk Monitor with ServiceGuard NOTE Make sure you edit lvmpvg to contain the correct link names in /dev/dsk/device for that system. Creating Volume Groups on Disk Arrays Using PV Links If you will be monitoring volume groups that use mass storage on disk arrays, you should use redundant I/O channels from each node, and connect them to separate controllers on the array.
PAGE 41
Monitoring Disk Resources Rules for Using the HA Disk Monitor with ServiceGuard # mkdir /dev/vgdatabase 2. Next, create a control file named group in the directory /dev/vgdatabase, as follows: # mknod /dev/vgdatabase/group c 64 0xhh0000 The major number is always 64, and the hexadecimal minor number has the form: 0xhh0000 where hh must be unique to the volume group you are creating. Use an appropriate hexadecimal number that is available on your system, after the volume groups are already configured.
PAGE 42
Monitoring Disk Resources Rules for Using the HA Disk Monitor with ServiceGuard This command creates a 120 MB mirrored volume named lvol1. The name is supplied by default, since no name is specified in the command. The -s g option means that mirroring is PVG-strict, that is, the mirror copies of data will be in different physical volume groups.
PAGE 43
Monitoring Disk Resources Rules for Using the HA Disk Monitor with ServiceGuard • /vg/vgName/lv/status will have a new /lvName resource instance that represents the split-off mirror. • /vg/vgName/lv_summary will change depending on the state of the new logical volume created by the split mirror. If you restore the split mirror normally using supported LVM commands, the HA Disk Monitor will detect the merged mirror and report it.
PAGE 44
Monitoring Disk Resources Creating Disk Monitoring Requests Creating Disk Monitoring Requests There are two ways to create HA Disk Monitor requests: • From EMS GUI, to send alerts to HP OpenView ITO, Network Node Manager, email addresses, the console, a textlog file, or the system log. • From ServiceGuard, to configure any HA Disk Monitor resource as a package dependency. These requests are not exclusive. You can configure the HA Disk Monitor from both ServiceGuard and EMS.
PAGE 45
Monitoring Disk Resources Creating Disk Monitoring Requests Disk Monitoring Request Suggestions The examples listed in Table 2-8 are valid for both RAID and mirrored configurations. Table 2-8 Suggestions for Creating Disk Monitor Requests To be alerted when ...
PAGE 46
Monitoring Disk Resources Creating Disk Monitoring Requests The following series of screens provide a sample process for creating an HA Disk Monitor request. These samples use the EMS GUI, though the Package Dependency screens in ServiceGuard are similar. Refer to the Using the Event Monitoring Service (HP Part Number B7612-90015) for specific instructions. Assume you want to be alerted when any disks fail and when they are back up.
PAGE 47
Monitoring Disk Resources Creating Disk Monitoring Requests The parameters for the monitoring request in Figure 2-3 request an event notification when the resource value is not equal to UP. The polling interval for checking the resources value is 300. The notification method is an SNMP trap with a minor severity level. No initial, repeat or return values are requested. Figure 2-3 Example: Configuring /vg/vg01/pv_pvlink/status Parameters to Notify When Disks Fail All requests are created in a similar way.
PAGE 48
Monitoring Disk Resources Creating Disk Monitoring Requests Resources to Monitor for RAID Arrays These considerations are relevant to all RAID supported configurations listed at the beginning of this chapter.
PAGE 49
Monitoring Disk Resources Creating Disk Monitoring Requests Figure 2-4 represents a node with two RAID arrays and two PV links. Figure 2-4 RAID Array Example Each LUN on the RAID array is in its own volume group: vgdance and vgsing. Assume this is one node in a 2-node cluster and you want to be notified when there is a failover, when any physical device fails, and when any logical volume becomes unavailable.
PAGE 50
Monitoring Disk Resources Creating Disk Monitoring Requests To configure the EMS alerts, create the following requests: Table 2-10 Sample Disk Monitoring Requests Monitoring Parameters Resource Notify Condition Option /vg/vgdance/pv_summary when value is > PVG_UP RETURN /vg/vgsing/pv_summary when value is > PVG_UP RETURN /vg/dance/lv_summary when value is >= INACTIVE RETURN /vg/vgsing/lv_summary when value is >= INACTIVE RETURN If pv_summary is SUSPECT, you know a physical device fail
PAGE 51
Monitoring Disk Resources Creating Disk Monitoring Requests Table 2-11 Resources to Monitor for Mirrored Disks (Continued) vg/vgName/lv/ copies/* This gives you the total number of copies of data currently available. Copies in addition to the original copy. Figure 2-5 represents two nodes with 2-way mirrored configuration with 10 disks on 2 buses. Both copies are in a single volume group.
PAGE 52
Monitoring Disk Resources Creating Disk Monitoring Requests To configure this last request, you must duplicate your ServiceGuard package dependency.
PAGE 53
Monitoring Disk Resources Creating Disk Monitoring Requests Table 2-12 EMS Alert Requests (Continued) Monitoring Parameters Resource Notify /vg/vg01/lv/copies/* when value is Condition <= 1 Option RETURN Alerts need to be interpreted in relation to each other. In the table above, you would get an alert when PVG_UP is true. Although all data is available, the condition PVG_UP implies there are physical volumes that are not functioning and need to be fixed. See Table 2-15.
PAGE 54
Monitoring Disk Resources Creating Disk Monitoring Requests Resources to Monitor for Lock Disks Lock disks are used as a tie-breaker in forming or reforming a cluster. If the lock disk is unavailable during cluster formation, the cluster may fail to reform. If you are using a lock disk with your cluster, you should configure a monitoring request for that disk and send an alert to your system management software if the lock disk is unavailable.
PAGE 55
Monitoring Disk Resources Creating Disk Monitoring Requests Table 2-15 Root Volumes Monitoring Requests (Continued) Monitoring Parameters Resource Notify /vg/vg00/lv/copies/lv01 when value is < Condition Option 1 RETURN If one of the root volumes is unavailable, you are alerted and told which one has failed (pv_pvlink/status). You are alerted if you lose a root disk mirror. With the RETURN option, you are also notified when the mirror is restored.
PAGE 56
Monitoring Disk Resources Creating Disk Monitoring Requests 56 Chapter 2
PAGE 57
Monitoring Database Resources 3 Monitoring Database Resources The HA Database Monitor monitors values and sends events regarding the status of databases and the database servers that support them. These values are defined as part of the rdbms public MIB definition (RFC1697).
PAGE 58
Monitoring Database Resources Database Monitor Reference Database Monitor Reference The HA Database Monitor reports events based on the status of supported databases configured on HP-UX systems. These MIB resources can be monitored: • Database resources: /rdbms/database/resource_class/database_name Information about a database on a given system, such as status and disk usage.
PAGE 59
Monitoring Database Resources Database Monitor Reference Database Resources The database resources available for monitoring are defined under: /rdbms/database The database resource class name is then specified, followed by the database name. The database resource class names are: • status • allocated • usage • used The database name varies depending upon your environment and the number of databases installed. The minimum polling interval for all database resources is 30 seconds.
PAGE 60
Monitoring Database Resources Database Monitor Reference Table 3-1 Interpreting Database Resource Classes Resource Name: /rdbms/database/resource_class/database_name resource_ class status allocated 60 Condition The values are: ACTIVE The database is currently being used by a database server. AVAILABLE The database is accessible, but it is not currently being used by a database server. UNAVAILABLE The database is not accessible to any database server.
PAGE 61
Monitoring Database Resources Database Monitor Reference Table 3-1 Interpreting Database Resource Classes (Continued) Resource Name: /rdbms/database/resource_class/database_name resource_ class usage Condition Description a floating-point value expressed as a percentage This resource class describes the percentage of allocated space currently being used in the database indicating how full the database is and whether it is approaching capacity.
PAGE 62
Monitoring Database Resources Database Monitor Reference Server Resources The database server resources available for monitoring are defined under /rdbms/server/, followed by the server resource class name, and then followed by the server name. The server resource class names are listed in Table 3-2. The server name varies, depending on your environment.
PAGE 63
Monitoring Database Resources Database Monitor Reference Table 3-2 Interpreting Server Resource Classes (Continued) Resource Name: /rdbms/server/resource_class/server_name resource_ class status 1 Condition The values are: (continued) This resource class describes the state of the database CONthe database server supporting the database GESTED server is not instance.
PAGE 64
Monitoring Database Resources Database Monitor Reference Table 3-2 Interpreting Server Resource Classes (Continued) Resource Name: /rdbms/server/resource_class/server_name resource_ class 64 Condition Description disk_ reads an integer This resource class describes the number of disk reads on the database server since it started.
PAGE 65
Monitoring Database Resources Database Monitor Reference Table 3-2 Interpreting Server Resource Classes (Continued) Resource Name: /rdbms/server/resource_class/server_name resource_ class peak_ connects Condition an integer if it keeps increasing over time, this could be an indication that a configuration parameter needs to be increased read_ cache_ hit_rate Description This resource class describes the greatest number of simultaneous connections made to the database server since the database server sta
PAGE 66
Monitoring Database Resources Database Monitor Reference Table 3-2 Interpreting Server Resource Classes (Continued) Resource Name: /rdbms/server/resource_class/server_name resource_ class usage Condition Description a floating-point number expressed as a percentage This resource class describes the percentage of maximum allowed connections to the a value of 100 indicates that database server currently in all available connections are use.
PAGE 67
Monitoring Database Resources Rules for Using the HA Database Monitor with ServiceGuard Rules for Using the HA Database Monitor with ServiceGuard The HA Database Monitor with ServiceGuard provides package failover if database servers fail or if the usage or number of connections exceeds specified levels.
PAGE 68
Monitoring Database Resources Rules for Using the HA Database Monitor with ServiceGuard Setting Failover Parameters When using the HA Database Monitor with ServiceGuard, the ServiceGuard parameters listed in Table 3-3 should be set so that a package failover will occur when access to the database resource fails. Table 3-3 Setting Failover Parameters Parameter Recommen File Location ded Setting ServiceGuard package configuration file Do not leave these parameters set to the default, NO_TIMEOUT.
PAGE 69
Monitoring Database Resources Rules for Using the HA Database Monitor with ServiceGuard Table 3-3 Setting Failover Parameters (Continued) Parameter Chapter 3 Recommen File Location ded Setting Notes RESOURCE_ POLLING_ INTERVAL number of seconds ServiceGuard package configuration file Specify how often ServiceGuard will check the resource, for example once every 30 seconds DEFERRED_ RESOURCE_ NAME database resource ServiceGuard The name of the database package resource that must be started control
PAGE 70
Monitoring Database Resources Rules for Using the HA Database Monitor with ServiceGuard Sample File Settings The following is an example of how you might set up an ServiceGuard cluster, mycluster, with two nodes, nestle and whitman, to monitor the availability of a database, db_1, that is defined on a volume group, VG01, that can be accessed in exclusive mode by either nestle or whitman. Figure 3-2 shows the sample cluster setup.
PAGE 71
Monitoring Database Resources Rules for Using the HA Database Monitor with ServiceGuard parameters needed to set up ServiceGuard. For a complete listing and explanation of all the parameters, refer to your Managing MC/ServiceGuard (B3936-90024) book. Table 3-4 lists the ServiceGuard package configuration file parameters that are used to configure the sample two-node called mycluster that is depicted in Figure 3-2.
PAGE 72
Monitoring Database Resources Rules for Using the HA Database Monitor with ServiceGuard Table 3-4 Some ServiceGuard Package Configuration File Parameters Parameter 72 Sample Data PACKAGE_NAME SG_pkg1 NODE_NAME nestle NODE_NAME whitman RUN_SCRIPT_TIMEOUT 60 HALT_SCRIPT_TIMEOUT 60 RESOURCE_NAME /rdbms/server/status/db_1 RESOURCE_POLLING_INTERVAL 30 RESOURCE_START DEFERRED RESOURCE_UP_VALUE =UP Chapter 3
PAGE 73
Monitoring Database Resources Rules for Using the HA Database Monitor with ServiceGuard Table 3-5 lists the ServiceGuard package control script parameters for the shared volume group, VG01.
PAGE 74
Monitoring Database Resources Rules for Using the HA Database Monitor with ServiceGuard su oracle -c $ORACLE_HOME/bin/svrmgrl < export ORACLE_SID= PFILE=$ORACLE_HOME/dbs/init.
PAGE 75
Monitoring Database Resources Creating Database Monitoring Requests Creating Database Monitoring Requests You can create monitor requests for each database that the database server supports. What specific values you use depends on your system’s configuration and available resources. For example, you may monitor database server status or database status to reflect the health of your system.
PAGE 76
Monitoring Database Resources Creating Database Monitoring Requests Table 3-7 Sample Server Monitor Requests (Continued) Resources to monitor 76 Monitoring Parameters Notify Value Option commits_ per_sec when value is > some number depending on your expectation of database usage INITIAL connects when value is >= some number less than the allowed_max_ connects value INITIAL disk_reads at each interval n/a n/a disk_reads_ per_sec when value is some number depending on your expectation of
PAGE 77
Monitoring Database Resources Creating Database Monitoring Requests Table 3-7 Sample Server Monitor Requests (Continued) Resources to monitor Chapter 3 Monitoring Parameters Notify Value Option commits_ per_sec when value is > some number depending on your expectation of database usage INITIAL connects when value is >= some number less than the allowed_max_ connects value INITIAL disk_reads at each interval n/a n/a disk_reads_ per_sec when value is some number depending on your expecta
PAGE 78
Monitoring Database Resources Creating Database Monitoring Requests Table 3-7 Sample Server Monitor Requests (Continued) Resources to monitor Monitoring Parameters Notify logical_ writes_ per_sec when value is peak_ connects 1 Value Option some number depending on your expectation of database usage INITIAL at each interval n/a n/a read_cache_ hit_rate when value is 80 INITIAL started when value changes n/a n/a uptime at each interval n/a n/a usage 1 when value is > 80 INITIAL w
PAGE 79
Dictionary File Command-Line Options A Dictionary File Command-Line Options This appendix lists the command-line options available for the HA Monitors. Typically, these options may be added to the dictionary file entry that describes how to launch a particular monitor.
PAGE 80
Dictionary File Command-Line Options HA Database Monitor Command-Line Options HA Database Monitor Command-Line Options Perform additional HA Database Monitor configuration by using the following command-line options. Specify one or more of the below listed options in the MONITOR statement of the dictionary file for each monitor. For example, within the file: /etc/opt/resmon/dictionary/mibmond.dict. -c community The community string to use with the SNMP requests. The string public is used by default.
PAGE 81
Dictionary File Command-Line Options HA Disk Monitor Command-Line Options HA Disk Monitor Command-Line Options Use the following command-line options with your HA Disk Monitor. These options may be specified in the MONITOR statement of the dictionary file for the HA Disk Monitor, for example, within the file: /etc/opt/resmon/dictionary/diskmond.dict. -d CAUTION The /var/opt/resmon/log/diskmond.log file grows without bound. Only use this option if you really need a trace.
PAGE 82
Dictionary File Command-Line Options HA Disk Monitor Command-Line Options 82 Appendix A
PAGE 83
Troubleshooting HA Monitors B Troubleshooting HA Monitors This appendix lists some troubleshooting guidelines for working with HA Monitors. The HA Monitors that rely on various SNMP MIBs need to have HP-UX SNMP subagents (and any other related “vendor specific” subagents) configured correctly and be running, before they can reliably report on the status of their resources.
PAGE 84
Troubleshooting HA Monitors Table B-1 Event Monitoring Service Monitors (Continued) Monitor MIB Type? HA Cluster Monitor Associated SNMP Documentation Product Required? Reference Yes EMS Yes Using the Event Monitoring Service (HP Part Number B7612-90015) HA Yes Network Interface Monitor EMS Yes Using the Event Monitoring Service (HP Part Number B7612-90015) HA Yes System Resource Monitor EMS Yes Using the Event Monitoring Service (HP Part Number B7612-90015) The sections in this appendix are
PAGE 85
Troubleshooting HA Monitors General MIB Monitor Troubleshooting General MIB Monitor Troubleshooting Review the following troubleshooting hints to help ensure that your environment is set up correctly: • Refer to the standard /var/adm/syslog/syslog.log file. It is always useful when troubleshooting system and ServiceGuard concerns. • Certain log files may grow without bound. This may fill up file systems and cause unpredictable behavior in SNMP.
PAGE 86
Troubleshooting HA Monitors Database Monitor Troubleshooting for Oracle Database Monitor Troubleshooting for Oracle For Oracle MIB monitors, review the following troubleshooting hints. • If MIB resource classes under rdbms continue to be unavailable, there might be a problem with the Oracle SNMP daemons. For Oracle, these are ora_naaagt, master_peer, dbsnmp, or tnslsnr. Try using the following commands to stop and restart Oracle SNMP: su oracle -c “cd $ORACLE_HOME/bin; .
PAGE 87
Troubleshooting HA Monitors Database Monitor Troubleshooting for Oracle • The HA Database Monitor relies on the proper installation and configuration of the Oracle Net8 product and processes (ora_naaagt, master_peer, dbsnmp, and tnslsnr). If you are able to connect to the database using the Oracle sqlplus utility, then the HA Database Monitor should also work. To verify Oracle Net8 connectivity, run the Oracle sqlplus utility from a client system that uses the same tnsnames.
PAGE 88
Troubleshooting HA Monitors Database Monitor Troubleshooting for Informix Database Monitor Troubleshooting for Informix For Informix, the SNMP environment is established by issuing the following command, either before or after the database instance is started: onsrvapd -rall -k0 onsrvapd is the Informix daemon, which launches the Informix SNMP subagent (onsnmp), if there is an Informix database instance running on the node. If there is no Informix instance running, onsnmp will not be started.
PAGE 89
Troubleshooting HA Monitors Debug Logging of EMS HA Monitors Debug Logging of EMS HA Monitors The debug logging of the EMS HA Monitors (mibmond, lanmond, fsmond, pkgmond, svcmond, clustermond, diskmond, rdbmsmond) can be enabled by modifying the monitor’s dictionary file, by adding the options "-d -l" to the monitor invocation string and restarting it. See also the manual pages for the monitors.
PAGE 90
Troubleshooting HA Monitors Debug Logging of EMS HA Monitors When changing the start-up string of an EMS Monitor in the dictionary file, its persistence file name in /etc/opt/resmon/persistence will change. The file name of the persistence file is obtained by running a hash algorithm on the monitor start-up string stored in the dictionary file.
PAGE 91
Troubleshooting HA Monitors Troubleshooting 0 Byte snmpd.conf Problem Troubleshooting 0 Byte snmpd.conf Problem If the size of /etc/SnmpAgent.d/snmpd.conf is zero, then the snmp daemon will not work and hence our MIB monitors (mibmond, pkgmond, clustermond, lanmond) will not behave as expected. This may result in loss of persistence requests. To troubleshoot this problem, check the size of /etc/SnmpAgent.d/snmpd.conf with the following command: ls -rlt /etc/SnmpAgent.d/snmpd.
PAGE 92
Troubleshooting HA Monitors Steps to Obtain EMS Data to Reproduce an EMS Problem Steps to Obtain EMS Data to Reproduce an EMS Problem If you are about to reproduce an EMS problem, you should obtain a full set of EMS data that allows to get all logfiles and configuration information. Here is an example for a diskmond problem. 1. Enable debugging for EMS by issuing the command # touch /etc/opt/resmon/debug 2. Enable diskmond logging and debugging. # vi /etc/opt/resmon/dictionary/diskmond.
PAGE 93
Troubleshooting HA Monitors Steps to Obtain EMS Data to Reproduce an EMS Problem 4. Set up everything for reproduction. Backup the /etc/opt/resmon/ tree: # mkdir -p /tmp/RESMON/before # cp -Rp /etc/opt/resmon/* /tmp/RESMON/before Save logfiles (if needed): # # # # cd /etc/opt/resmon/log tar cvf /tmp/oldlogs.tar * cd /var/opt/resmon/log tar rvf /tmp/oldlogs.tar diskmond.log Clear logfiles: # for i in /etc/opt/resmon/log/*log* /var/opt/resmon/log/diskmond.log > do > rm $i > done 5.
PAGE 94
Troubleshooting HA Monitors Steps to Obtain EMS Data to Reproduce an EMS Problem 94 Appendix B
PAGE 95
Glossary A-H I-K alert An event. A message sent to warn a user or application when certain conditions are met. ITO HP OpenView IT/Operations, formerly known as Operations Center. It is a software application that provides central operations and problem management for a multi-vendor distributed system. client The application that creates or cancels requests to monitor particular resources. The consumer of a resource status message. A user of the Resource Monitor framework.
PAGE 96
Glossary MIB II (MIB2) Information” (SMI) format. This grammar concisely defines the objects being managed, the data types these objects take, descriptions of how the objects can be used, whether the objects are read-only or read-write, and identifiers for the objects. PV links A method of LVM configuration that allows you to provide redundant SCSI interfaces and buses to disk arrays, thereby protecting against single points of failure in SCSI cards and cables.
PAGE 97
Glossary volume group resource and send event notifications if appropriate. A monitor checks resources on the local system. The resource monitor maps the physical resource into a standard interface understood by EMS. S-T SNMP (Simple Network Management Protocol) Standard protocol for network based retrieval of information about system resources. state The current value of a resource (UP or DOWN).
PAGE 98
Glossary volume group 98 Glossary
PAGE 99
Index A allocated database resource, 59, 60 allocated database resource, 59 allowed_max_connects server resource, 62, 63 alternate links creating volume groups with, 40 C calculating pv_summary, 38 cluster, 36 cluster monitor, 58 example requests, 75 cluster status, 62 ClusterView, 58 commits server resource, 62, 63 commits_per_sec server resource, 62, 63 configuring EMS with MC/ServiceGuard, 36 connects server resource, 62, 63 creating disk monitoring requests, 44 creating logical volumes, 41 creating vol
PAGE 100
Index view information, 22 monitor request, 15, 25 example disk monitor request, 47 monitoring request cluster status, 62 for clusters, 58 for disk monitor, 44 for lock disks, 54 for mirror disks, 50 for root volmes, 54 node status, 62 N node status, 62 notification, 16 when packages fail over, 49 O opcmsg, 12 P package dependencies, 29 package dependies, 36 package failover, 36 peak_connects server resource, 62, 65 physical volume status, 31 physical volume summary, 29 polling interval cluster status, 59,
PAGE 101
Index V volume group creating, 40 creating for a cluster, 40 volume groups active, 33 excluding, 55 W wildcard, 32, 34, 35 write_cache_hit_rate server resource, 66 101
PAGE 102
Index 102