Managing Serviceguard Nineteenth Edition, Reprinted June 2011

since the applications are still running normally. But at this point, there is no redundant path if
another failover occurs, so the mass storage configuration is vulnerable.
Using Event Monitoring Service
Event Monitoring Service (EMS) allows you to configure monitors of specific devices and system
resources. You can direct alerts to an administrative workstation where operators can be notified
of further action in case of a problem. For example, you could configure a disk monitor to report
when a mirror was lost from a mirrored volume group being used in the cluster.
See the manual Using High Availability Monitors at the address given in the preface to this manual.
Using EMS (Event Monitoring Service) Hardware Monitors
A set of hardware monitors is available for monitoring and reporting on memory, CPU, and many
other system values. Some of these monitors are supplied with specific hardware products.
Hardware Monitors and Persistence Requests
When hardware monitors are disabled using the monconfig tool, associated hardware monitor
persistent requests are removed from the persistence files. When hardware monitoring is re-enabled,
the monitor requests that were initialized using the monconfig tool are re-created.
However, hardware monitor requests created using Serviceguard Manager, or established when
Serviceguard is started, are not re-created. These requests are related to thepsmmon hardware
monitor.
To re-create the persistence monitor requests, halt Serviceguard on the node, and then restart it.
This will re-create the persistence monitor requests.
Using HP ISEE (HP Instant Support Enterprise Edition)
In addition to messages reporting actual device failure, the logs may accumulate messages of
lesser severity which, over time, can indicate that a failure may happen soon. One product that
provides a degree of automation in monitoring is called HP ISEE, which gathers information from
the status queues of a monitored system to see what errors are accumulating. This tool will report
failures and will also predict failures based on statistics for devices that are experiencing specific
non-fatal errors over time. In a Serviceguard cluster, HP ISEE should be run on all nodes.
HP ISEE also reports error conditions directly to an HP Response Center, alerting support personnel
to the potential problem. HP ISEE is available through various support contracts. For more
information, contact your HP representative.
Replacing Disks
The procedure for replacing a faulty disk mechanism depends on the type of disk configuration
you are using. Separate descriptions are provided for replacing an array mechanism and a disk
in a high availability enclosure.
For more information, see the section Replacing a Bad Disk in the Logical Volume Management
volume of the HP-UX System Administrator’s Guide, at http://www.hp.com/go/hpux-core-docs.
Replacing a Faulty Array Mechanism
With any HA disk array configured in RAID 1 or RAID 5, refer to the array’s documentation for
instructions on how to replace a faulty mechanism. After the replacement, the device itself
automatically rebuilds the missing data on the new disk. No LVM or VxVM activity is needed. This
process is known as hot swapping the disk.
310 Troubleshooting Your Cluster