Using Serviceguard Extension for RAC, 4th Edition, February 2007

Maintenance and Troubleshooting

Monitoring Hardware

Chapter 4 201

Monitoring Hardware

Good standard practice in handling a high availability system includes

careful fault monitoring so as to prevent failures if possible or at least to

react to them swiftly when they occur. The following should be monitored

for errors or warnings of all kinds:

•Disks

•CPUs

• Memory

• LAN cards

• Power sources

• All cables

• Disk interface cards

Some monitoring can be done through simple physical inspection, but for

the most comprehensive monitoring, you should examine the system log

file (/var/adm/syslog/syslog.log) periodically for reports on all

configured HA devices. The presence of errors relating to a device will

show the need for maintenance.

Using Event Monitoring Service

Event Monitoring Service (EMS) allows you to configure monitors of

specific devices and system resources. You can direct alerts to an

administrative workstation where operators can be notified of further

action in case of a problem. For example, you could configure a disk

monitor to report when a mirror was lost from a mirrored volume group

being used in a non-RAC package. Refer to the manual Using the Event

Monitoring Service (B7609-90022) for additional information.

Using EMS Hardware Monitors

A set of hardware monitors is available for monitoring and reporting on

memory, CPU, and many other system values. Refer to the EMS

Hardware Monitors User’s Guide (B6191-90015) for additional

information.