HP Insight Management Agents 8.40 Managing ProLiant Servers with Linux HOW TO Whitepaper
Managing ProLiant Servers with Linux – HOWTO v8.40
5
The Health Monitor augments the hardware features built into ProLiant servers. Basic features, such as
temperature, fan, power supply, and memory monitoring are standard on almost all ProLiant servers. On
some ProLiant servers, the Health Monitor supports features such as variable speed fans, server lights that
give a visual indication of a possible error condition, and Advanced Memory Protection (AMP). The AMP
feature allows the capability of reserving memory for failover if a Single Bit Correctable Error (SBCE)
threshold is exceeded.
Note:
On some ProLiant servers, the entire memory subsystem can be mirrored to
survive an uncorrectable memory error. Without AMP, uncorrectable
memory errors are always fatal and cause a kernel panic. AMP allows a
server to continue execution until the faulty memory can be replaced.
Mirrored AMP solutions usually allow removing the memory board with the
faulty memory dual in-line memory module (DIMM) and replacing the faulty
DIMM while the server continues execution. When the repaired AMP
memory board is inserted back into the server, the AMP mirror
automatically restores. This allows mission critical 24 X 7 applications to
continue execution without interruption or downtime.
The following sections explain the features provided by the Health Monitor for the overall health of the ProLiant
server.
1-1-1-1 System temperature monitoring
A ProLiant server can contain several temperature sensors. On ProLiant servers with intelligent temperature
sensors, check the current and threshold temperatures by running hplog -t.
If the normal operating range is exceeded for any of these sensors, the Health Monitor does the following:
• Displays a message on the console stating the problem
• Makes an entry in the system health log and the operating system log
Additionally, on some servers, the fans gradually increase to full speed in an attempt to cool the server as
the external environment temperature increases. If the server exceeds the normal operating range and does
not cool down within 60 seconds, the operating system is, in most cases, shut down to close the file
systems.
Tip:
On servers that do not have variable speed fans, the server is shut down
unless the ROM-Based Setup Utility (RBSU) Thermal Shutdown feature is
disabled. This feature is enabled by default. Use RBSU to control the
shutdown option.
1-1-1-2 System fan monitoring
A ProLiant server can contain fan sensors. On ProLiant servers with intelligent fan sensors, check the status of
the fans by running hplog -f.
If a cooling fan fails and there is no secondary redundant fan, the Health Monitor does the following:
• Displays a message on the console stating the problem