Specifications

Service Processor System Monitoring - Surveillance
Surveillance is a function in which the service processor monitors the system, and the
system monitors the service processor. This monitoring is accomplished by periodic
samplings called
heartbeats
.
Surveillance is available during two phases:
v System firmware bring-up (automatic)
v Operating system run time (optional)
System Firmware Surveillance
System firmware surveillance is automatically enabled during system power-on. It
cannot be disabled by the user, and the surveillance interval and surveillance delay
cannot be changed by the user.
If the service processor detects no heartbeats during system IPL (for a set time period),
it cycles the system power to attempt a reboot. The maximum number of retries is set
from the service processor menus. If the fail condition persists, the service processor
leaves the machine powered on, logs an error, and displays menus to the user. If
call-out is enabled, the service processor calls to report the failure and displays the
operating system surveillance failure code on the operator panel.
Operating System Surveillance
Operating system surveillance provides the service processor with a means to detect
hang conditions, as well as hardware or software failures, while the operating system is
running. It also provides the operating system with a means to detect a service
processor failure caused by the lack of a return heartbeat.
Operating system surveillance is not enabled by default, allowing you to run operating
systems that do not support this service processor option.
You can also use the service processor menus and the AIX diagnostic service aids to
enable or disable operating system surveillance.
For operating system surveillance to work correctly, you must set the following
parameters:
v Surveillance enable/disable
v Surveillance interval
The maximum time the service processor waits between heartbeats from the
operating system before reporting a surveillance failure.
v Surveillance delay
The maximum time the service processor waits for the first heartbeat from the
operating system, after the operating system has been started, before reporting a
surveillance failure.
Surveillance does not take effect until the next time the operating system is started after
the parameters have been set.
278 Service Guide