Using Serviceguard Extension for RAC, 4th Edition, February 2007
Maintenance and Troubleshooting
Monitoring Hardware
Chapter 4 201
Monitoring Hardware
Good standard practice in handling a high availability system includes
careful fault monitoring so as to prevent failures if possible or at least to
react to them swiftly when they occur. The following should be monitored
for errors or warnings of all kinds:
•Disks
•CPUs
• Memory
• LAN cards
• Power sources
• All cables
• Disk interface cards
Some monitoring can be done through simple physical inspection, but for
the most comprehensive monitoring, you should examine the system log
file (/var/adm/syslog/syslog.log) periodically for reports on all
configured HA devices. The presence of errors relating to a device will
show the need for maintenance.
Using Event Monitoring Service
Event Monitoring Service (EMS) allows you to configure monitors of
specific devices and system resources. You can direct alerts to an
administrative workstation where operators can be notified of further
action in case of a problem. For example, you could configure a disk
monitor to report when a mirror was lost from a mirrored volume group
being used in a non-RAC package. Refer to the manual Using the Event
Monitoring Service (B7609-90022) for additional information.
Using EMS Hardware Monitors
A set of hardware monitors is available for monitoring and reporting on
memory, CPU, and many other system values. Refer to the EMS
Hardware Monitors User’s Guide (B6191-90015) for additional
information.