Managing Serviceguard 11th Edition, Version A.11.16, Second Printing June 2004

Troubleshooting Your Cluster
Monitoring Hardware
Chapter 8 323
Monitoring Hardware
Good standard practice in handling a high availability system includes
careful fault monitoring so as to prevent failures if possible or at least to
react to them swiftly when they occur. The following should be monitored
for errors or warnings of all kinds:
•Disks
•CPUs
• Memory
• LAN cards
• Power sources
• All cables
• Disk interface cards
Some monitoring can be done through simple physical inspection, but for
the most comprehensive monitoring, you should examine the system log
file (/var/adm/syslog/syslog.log) periodically for reports on all configured
HA devices. The presence of errors relating to a device will show the need
for maintenance.
Using Event Monitoring Service
Event Monitoring Service (EMS) allows you to configure monitors of
specific devices and system resources. You can direct alerts to an
administrative workstation where operators can be notified of further
action in case of a problem. For example, you could configure a disk
monitor to report when a mirror was lost from a mirrored volume group
being used in the cluster
Refer to the manual Using HA Monitors for additional information.
Using EMS Hardware Monitors
A set of hardware monitors is available for monitoring and reporting on
memory, CPU, and many other system values. Some of these monitors
are supplied with specific hardware products.