Managing Serviceguard A.11.20, March 2013

NICs
Power sources
All cables
Disk interface cards
Some monitoring can be done through simple physical inspection, but for the most comprehensive
monitoring, you should examine the system log file (/var/adm/syslog/syslog.log) periodically
for reports on all configured HA devices. The presence of errors relating to a device will show the
need for maintenance.
When the proper redundancy has been configured, failures can occur with no external symptoms.
Proper monitoring is important. For example, if a Fibre Channel switch in a redundant mass storage
configuration fails, LVM will automatically fail over to the alternate path through another Fibre
Channel switch. Without monitoring, however, you may not know that the failure has occurred,
since the applications are still running normally. But at this point, there is no redundant path if
another failover occurs, so the mass storage configuration is vulnerable.
Using System Fault Management Service
The System Fault Management (SFM) is used to monitor the health of HP servers running HP-UX.
SFM retrieves information about a system’s hardware devices such as CPU, memory, power supply,
and cooling devices. SFM operates within the Web-Based Enterprise Management (WBEM)
environment. WBEM is an industry-wide standards-based initiative to aid the management of large
scale systems. SFM provides the same features and benefits as those found in the EMS Hardware
Monitors.
These system devices can be monitored in Serviceguard by configuring generic resources. See
“Using the Generic Resources Monitoring Service” (page 58).
See the System Fault Management Administrator Guide at http://www.hp.com/go/
hpux-diagnostics-docs.
Using Event Monitoring Service
Event Monitoring Service (EMS) allows you to configure monitors of specific devices and system
resources. You can direct alerts to an administrative workstation where operators can be notified
of further action in case of a problem. For example, you could configure a disk monitor to report
when a mirror was lost from a mirrored volume group being used in the cluster.
See the manual Using High Availability Monitors at the address given in the preface to this manual.
Using EMS (Event Monitoring Service) Hardware Monitors
A set of hardware monitors is available for monitoring and reporting on memory, CPU, and many
other system values. Some of these monitors are supplied with specific hardware products.
Hardware Monitors and Persistence Requests
When hardware monitors are disabled using the monconfig tool, associated hardware monitor
persistent requests are removed from the persistence files. When hardware monitoring is re-enabled,
the monitor requests that were initialized using the monconfig tool are re-created.
However, hardware monitor requests created using Serviceguard Manager, or established when
Serviceguard is started, are not re-created. These requests are related to thepsmmon hardware
monitor.
To re-create the persistence monitor requests, halt Serviceguard on the node, and then restart it.
This will re-create the persistence monitor requests.
Monitoring Hardware 329