VCEM Profile Failover and Profile Moves

•

Using the VCEM CLI the custom tool invokes the VCEM failover command and passes

either the host name or ip address of the system that posted the failover event.

• The host name or ip address is mapped to its enclosure and bay location.

• A failover job is started on that enclosure and bay.

• The failover job runs as described above. Communicating via Ethernet with the VC

interconnects for the enclosures containing the source and spare bays, it moves the server

profile from the source to the spare bay and then applies power to the spare server.

Choosing HP SIM events for Failover

It is necessary to select the HP SIM events you wish to use to initiate failover. HP has

recommended a collection of events for your consideration. These are listed after several

practical topics on using events to automatically initiate Failover.

HP SIM monitoring

HP Insight Management (IM) agents monitor the health of ProLiant systems. These agents

require a host operating system, such as Windows or Linux. IM agents monitor server

hardware, largely at the component level, and send events

(SNMP traps or WBEM events)

to HP SIM to report changes in the server’s health status. For this communication HP SIM

requires a working IP network connection to the server. HP SIM groups events by category,

for example: server, storage, NIC, etc. HP SIM also assigns a severity level to each event.

In addition to receiving events from servers, HP SIM polls each server for its health status

every few minutes. This server health reflects a combination of all the subsystems monitored

by the IM agent. For each server it monitors, HP SIM displays an icon indicating its health

status, reflecting the category of the most severe event received.

The component level health information reported does not consider the importance of the

individual components to the overall ability of the system to meet its service level objectives.

Further, HP SIM was originally designed to alert human operators. A human operator is

assumed to have the wherewithal to review the events and determine appropriate actions.

Determining a critical server hardware failure

When a server hardware component fails

• the server may fail, thereby ceasing to deliver services; or

• the server may continue to operate and meet its service level objectives.

For purposes of failover, when the failure of a hardware component materially impacts a

server’s ability to meet its service level objective, the component is “critical” and its failure

becomes a “critical server hardware failure”. Failover should provide effective remediation.

On the other hand, if the server continues to operate acceptably, then the component is not

critical and immediate failover is most often unnecessary. For example, failure of a

redundant power supply does not indicate a critical failure since the server continues to

operate and with HP BladeSystem, it can be replaced without impacting its operation.

Since HP SIM does not know which components might be redundant or unused, it rates all

component failures as “critical” events, leaving it to system administrators to determine the

most effective remediation.

Refer to the HP Systems Insight Manager document, Part Number: 347870-003: The Microsoft® Windows Event ID and SNMP Traps Reference

Guide, http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00293064/c00293064.pdf