VCEM Profile Failover and Profile Moves White Paper
14
HP Systems Insight Management monitoring
HP Insight Management (IM) agents monitor the health of ProLiant systems. These agents
require a host operating system, such as Windows or Linux. IM agents monitor server
hardware, largely at the component level, and send events
2
In addition to receiving events from servers, HP Systems Insight Manager polls each server
for its health status every few minutes. This server health reflects a combination of all the
subsystems monitored by the IM agent. For each server it monitors, HP Systems Insight
Manager displays an icon indicating its health status, as well as reflecting the category of
the most severe event received.
(SNMP traps or WBEM events)
to HP Systems Insight Manager to report changes in the server’s health status. For this
communication HP Systems Insight Manager requires a working IP network connection to the
server. HP Systems Insight Manager groups events by category, such as server, storage, NIC
and so forth. HP Systems Insight Manager also assigns a severity level to each event.
The component level health information reported does not consider the importance of the
individual components to the overall ability of the system to meet its service level objectives.
System administrators must review the events to determine appropriate actions.
Determining a critical server hardware failure
When a server hardware component fails:
• the server may fail, thereby ceasing to deliver services; or
• the server may continue to operate and meet its service level objectives.
For purposes of failover, when the failure of a hardware component materially impacts a
server’s ability to meet its service level objective, the component is “critical” and its failure
becomes a “critical server hardware failure”. Failover should provide effective remediation.
On the other hand, if the server continues to operate acceptably, then the component is not
critical and immediate failover is most often unnecessary. For example, failure of a
redundant power supply does not indicate a critical failure since the server continues to
operate and with HP BladeSystem, the power supply can be replaced without impacting the
server’s operation.
Since HP Systems Insight Manager does not know which components might be redundant or
unused, it rates all component failures as “critical” events, leaving you to determine the most
effective remediation.
When a component is operating in a degraded state that threatens the server or threatens
the integrity of its retained data, there is cause to failover the server. Examples of this
component degradation state can be certain CPU and memory error conditions. HP Systems
Insight Manager rates conditions that indicate impending failures as “major” events;
however for failover, these events can also be considered critical.
Also, the server configuration and workload can further qualify what a critical component is
for any individual server.
2
See the latest version of the HP Virtual Connect Enterprise Manager User Guide at
http://h20000.www2.hp.com/bizsupport/TechSupport/DocumentIndex.jsp?lang=en&cc=us&taskId=101&prodClassId=10008&contentType=
SupportManual&docIndexId=64255&prodTypeId=18964&prodSeriesId=3601866. Also see the HP Systems Insight Manager document, Part
Number: 347870-003: The Microsoft® Windows Event ID and SNMP Traps Reference Guide,
http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00293064/c00293064.pdf.