Managing ProLiant servers with Linux HOWTO
6
Events that can contribute to the operating system locking up include:
• A peripheral device, such as a Peripheral Component Interconnect Specification (PCI) adapter,
generates numerous spurious interrupts when it fails.
• A high priority software application consumes all the available central processing unit (CPU) cycles
and does not allow the operating system scheduler to run the ASR timer reset process.
• A software or kernel application consumes all available memory, including the virtual memory
space (for example, swap). This can cause the operating system scheduler to cease functioning.
• A critical operating system component, such as a file system, fails and causes the operating system
scheduler to cease functioning.
• Any event other than an ASR timeout causes a Non-Maskable Interrupt (NMI) to be generated.
The ASR feature is a hardware-based timer. If a true hardware failure occurs, the Health Monitor
might not be called, but the server resets as if the power switch was pressed. The ProLiant ROM code
might log an event to the IML when the server reboots.
The Health Monitor is notified of an ASR timeout through an NMI. If possible, the driver attempts to
perform the following actions:
• Displays a message on the console stating the problem
• Makes an entry in the IML
• Attempts to gracefully shut down the operating system to close the file systems
There is no guarantee that the operating system will gracefully shutdown. This shutdown depends on
the type of error condition (software or hardware) and its severity. The Health Monitor logs a series of
messages when an ASR event occurs. The presence or absence of these messages can provide some
insight into the reason for the ASR event. The order of the messages is important, since the ASR event
is always a symptom of another error condition.
1-1-2 Console messages
When events occur outside of normal operations, the Health Monitor might display a console
message or log a message to the IML. Operational messages, such as fan failures or temperature
violations, are logged to the standard /var/log/messages file. Messages specific to device drivers
(such as NMI type messages) can be viewed using dmesg, if the system is not completely locked up.
The hp-health man page documents how to interpret the messages produced by the Health Monitor.
1-1-3 HP Integrated Management Logging Utility (hplog)
The HP ProLiant Integrated Management Logging utility (hplog) allows system administrators to view
IML pages. Commands are listed in Table 2.
Table 2: hplog options
Command Description
hplog –t Shows the current temperature and the threshold levels of all temperature sensors
hplog –f Shows the status of all fans
hplog –p Shows the status of all power supplies
hplog –t Shows the current temperature and the threshold levels of all temperature sensors