Managing ProLiant servers with Linux HOWTO
4
These man pages include detailed information on error messages and possible action that the
administrator can take.
Additional information about the Insight Management SNMP Agents for HP ProLiant Systems is
available at the following locations:
•
www.hp.com/servers/manage
•
http://h18000.www1.hp.com/products/servers/management/agents.html
1-1-1 Health Monitor
The Health Monitor augments the hardware features built into ProLiant servers. Basic features, such as
temperature, fan, power supply, and memory monitoring are standard on almost all ProLiant servers.
On some ProLiant servers, the Health Monitor supports features such as variable speed fans, server
lights that give a visual indication of a possible error condition, and Advanced Memory Protection
(AMP). The AMP feature allows the capability of reserving memory for fail over if a Single Bit
Correctable Error (SBCE) threshold is exceeded.
Note:
On some ProLiant servers, the entire memory subsystem can be mirrored to
survive an uncorrectable memory error. Without AMP, uncorrectable
memory errors are always fatal and cause a kernel panic. AMP allows a
server to continue execution until the faulty memory can be replaced.
Mirrored AMP solutions usually allow removing the memory board with the
faulty memory dual in-line memory module (DIMM) and replacing the faulty
DIMM while the server continues execution. When the repaired AMP
memory board is inserted back into the server, the AMP mirror
automatically restores. This allows mission critical 7 X 24 applications to
continue execution without interruption or downtime.
The following sections explain the features provided by the Health Monitor for the overall health of the
ProLiant server.
1-1-1-1 System temperature monitoring
A ProLiant server can contain several temperature sensors. On ProLiant servers with intelligent
temperature sensors, check the current and threshold temperatures by running hplog -t.
If the normal operating range is exceeded for any of these sensors, the Health Monitor does the
following:
• Displays a message on the console stating the problem
• Makes an entry in the system health log and the operating system log
Additionally, on some servers, the fans gradually increase to full speed in an attempt to cool the
server as the external environment temperature increases. If the server exceeds the normal operating
range and does not cool down within 60 seconds, the operating system is, in most cases, shut down
to close the file systems.
Tip:
On servers that do not have variable speed fans, the server is shut down
unless the ROM-Based Setup Utility (RBSU) Thermal Shutdown feature is
disabled. This feature is enabled by default. Use RBSU to control the
shutdown option.