Managing Serviceguard 11th Edition, Version A.11.16, Second Printing June 2004

Troubleshooting Your Cluster

Monitoring Hardware

Chapter 8 323

Monitoring Hardware

Good standard practice in handling a high availability system includes

careful fault monitoring so as to prevent failures if possible or at least to

react to them swiftly when they occur. The following should be monitored

for errors or warnings of all kinds:

•Disks

•CPUs

• Memory

• LAN cards

• Power sources

• All cables

• Disk interface cards

Some monitoring can be done through simple physical inspection, but for

the most comprehensive monitoring, you should examine the system log

file (/var/adm/syslog/syslog.log) periodically for reports on all configured

HA devices. The presence of errors relating to a device will show the need

for maintenance.

Using Event Monitoring Service

Event Monitoring Service (EMS) allows you to configure monitors of

specific devices and system resources. You can direct alerts to an

administrative workstation where operators can be notified of further

action in case of a problem. For example, you could configure a disk

monitor to report when a mirror was lost from a mirrored volume group

being used in the cluster

Refer to the manual Using HA Monitors for additional information.

Using EMS Hardware Monitors

A set of hardware monitors is available for monitoring and reporting on

memory, CPU, and many other system values. Some of these monitors

are supplied with specific hardware products.