User`s guide
3-29
Using the Service Processor
Service Processor System Monitoring – Surveillance
Surveillance is a function in which the service processor monitors the system, and the
system monitors the service processor. This monitoring is accomplished by periodic
samplings called heartbeats.
Surveillance is available during two phases:
• System firmware bringup (automatic)
• Operating system runtime (optional)
System Firmware Surveillance
System firmware surveillance is automatically enabled during system power–on. It cannot
be disabled by the user.
If the service processor detects no heartbeats during system IPL (for 7 minutes), it cycles
the system power to attempt a reboot. The maximum number of retries is set from the
service processor menus. If the fail condition persists, the service processor leaves the
machine powered on, logs an error, and displays menus to the user. If Call–out is enabled,
the service processor calls to report the failure and displays the operating–system
surveillance failure code on the operator panel.
Operating System Surveillance
Operating system surveillance provides the service processor with a means to detect hang
conditions, as well as hardware or software failures, while the operating system is running. It
also provides the operating system with a means to detect a service processor failure
caused by the lack of a return heartbeat.
Operating system surveillance is not enabled by default, allowing you to run operating
systems that do not support this service processor option.
You can also use Service Processor Menus and AIX Diagnostic Service Aids to enable or
disable operating system surveillance.
For operating system surveillance to work correctly, you must set three parameters:
• Surveillance enable/disable
• Surveillance interval
The maximum time the service processor should wait for a heartbeat from the operating
system before timeout.
• Surveillance delay
The length of time to wait from the time the operating system is started to when the first
heartbeat is expected.
Surveillance does not take effect until the next time the operating system is started after the
parameters have been set.
You can initiate surveillance mode immediately from Service Aids. In addition to the three
options above, a fourth option allows you to select immediate surveillance, and rebooting of
the system is not necessarily required.
If operating system surveillance is enabled (and system firmware has passed control to the
operating system), and the service processor does not detect any heartbeats from the
operating system, the service processor assumes the system is hung and takes action
according to the reboot/restart policy settings. See ”Service Processor Reboot/Restart
Recovery” on page 3-25.
If surveillance is selected from the service processor menus which are only available at
bootup, then surveillance is enabled by default as soon as the system boots. From Service
Aids, the selection is optional.