VCEM Profile Failover and Profile Moves
•
Using the VCEM CLI the custom tool invokes the VCEM failover command and passes
either the host name or ip address of the system that posted the failover event.
• The host name or ip address is mapped to its enclosure and bay location.
• A failover job is started on that enclosure and bay.
• The failover job runs as described above. Communicating via Ethernet with the VC
interconnects for the enclosures containing the source and spare bays, it moves the server
profile from the source to the spare bay and then applies power to the spare server.
Choosing HP SIM events for Failover
It is necessary to select the HP SIM events you wish to use to initiate failover. HP has
recommended a collection of events for your consideration. These are listed after several
practical topics on using events to automatically initiate Failover.
HP SIM monitoring
HP Insight Management (IM) agents monitor the health of ProLiant systems. These agents
require a host operating system, such as Windows or Linux. IM agents monitor server
hardware, largely at the component level, and send events
2
(SNMP traps or WBEM events)
to HP SIM to report changes in the server’s health status. For this communication HP SIM
requires a working IP network connection to the server. HP SIM groups events by category,
for example: server, storage, NIC, etc. HP SIM also assigns a severity level to each event.
In addition to receiving events from servers, HP SIM polls each server for its health status
every few minutes. This server health reflects a combination of all the subsystems monitored
by the IM agent. For each server it monitors, HP SIM displays an icon indicating its health
status, reflecting the category of the most severe event received.
The component level health information reported does not consider the importance of the
individual components to the overall ability of the system to meet its service level objectives.
Further, HP SIM was originally designed to alert human operators. A human operator is
assumed to have the wherewithal to review the events and determine appropriate actions.
Determining a critical server hardware failure
When a server hardware component fails
• the server may fail, thereby ceasing to deliver services; or
• the server may continue to operate and meet its service level objectives.
For purposes of failover, when the failure of a hardware component materially impacts a
server’s ability to meet its service level objective, the component is “critical” and its failure
becomes a “critical server hardware failure”. Failover should provide effective remediation.
On the other hand, if the server continues to operate acceptably, then the component is not
critical and immediate failover is most often unnecessary. For example, failure of a
redundant power supply does not indicate a critical failure since the server continues to
operate and with HP BladeSystem, it can be replaced without impacting its operation.
Since HP SIM does not know which components might be redundant or unused, it rates all
component failures as “critical” events, leaving it to system administrators to determine the
most effective remediation.
2
Refer to the HP Systems Insight Manager document, Part Number: 347870-003: The Microsoft® Windows Event ID and SNMP Traps Reference
Guide, http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00293064/c00293064.pdf
.