book.book Page 1 Thursday, December 10, 2009 12:36 PM Dell™ Server PRO Management Pack 2.0 For Microsoft® System Center Virtual Machine Manager User’s Guide w w w. d e l l . c o m | s u p p o r t . d e l l .
book.book Page 2 Thursday, December 10, 2009 12:36 PM Notes and Cautions NOTE: A NOTE indicates important information that helps you make better use of your computer. CAUTION: A CAUTION indicates potential damage to hardware or loss of data if instructions are not followed. ____________________ Information in this document is subject to change without notice. © 2009 Dell Inc. All rights reserved. Reproduction of these materials in any manner whatsoever without the written permission of Dell Inc.
book.book Page 3 Thursday, December 10, 2009 12:36 PM Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . What’s New in this Release? Overview . . . . . . . . . . . . . . . 6 . . . . . . . . . . . . . . . . . . . . . . . . . 6 Related Terms . . . . . . . . . . . . . . . . . . . . . . . 6 What is a PRO Tip? . . . . . . . . . . . . . . . . . . . . 7 Feature Highlights . . . . . . . . . . . . . . . . . . . . .
book.book Page 4 Thursday, December 10, 2009 12:36 PM 3 Uninstalling PRO Pack . . . . . . . . . . . . . . . . . 19 Security Considerations . . . . . . . . . . . . . . . . . 19 Using Dell PRO Pack . . . . . . . . . . . . . . . . 21 . . . . . . . . . . . . . . . 21 Monitoring Using SCVMM . . . . . . . 22 Monitoring Using PRO Specific Alerts on SCOM/SCE . . . . . . . . . . . . . . . . . . . . . . 26 Using Health Explorer to Reset Alerts . . . . . . . . . 27 . . . . . . . . . . . . . . .
book.book Page 5 Thursday, December 10, 2009 12:36 PM 1 Introduction This document is intended for system administrators who use the Dell™ Server PRO Management Pack (Dell PRO Pack) to monitor Dell systems and take remedial action when an inefficient system is identified.
book.book Page 6 Thursday, December 10, 2009 12:36 PM What’s New in this Release? This release of PRO Pack supports the following: • SCOM 2007 R2 • SCVMM 2008 R2 • Virtual machine live migration with no downtime • Feature to Override Dell PRO Pack default recovery actions • Additional Dell OpenManage™ alerts • Change in the names of recovery actions from "Maintenance mode" and "VM Migration" in PRO Pack 1.
book.book Page 7 Thursday, December 10, 2009 12:36 PM What is a PRO Tip? PRO (Performance and Resource Optimization) Tip is a feature that enables monitoring of your virtualized infrastructure and alerting when there is an opportunity to optimize the usage of these resources. A PRO Tip window contains the description of the event that produced the PRO Tip and the suggested remedial action.
book.book Page 8 Thursday, December 10, 2009 12:36 PM • Restrict and migrate: In this mode, in order to prevent loss of service from the virtual workloads, it is recommended that all running virtual machines be migrated from the server to another healthy server immediately. Understanding PRO Tip Management To help you understand how Dell PRO Pack works, this section explains a typical setup and the sequence of events involved. Figure 1-1.
book.book Page 9 Thursday, December 10, 2009 12:36 PM The following table describes the sequence of events that occur in generating and handling of a typical PRO Tip. Table 1-1. Sequence of events with description Sequence Number Event 1 Operations Manager agents on the host enable to detect the warning, error, or failure alerts that are logged by Dell OpenManage Server Administrator. 2 Alert is sent to Operations Manager. 3 Operations Manager console displays active PRO specific alerts.
book.book Page 10 Thursday, December 10, 2009 12:36 PM Supported Operating Systems For the detailed Operating Systems support matrix, see the Dell PRO Pack readme file, DellPROMP2.0_Readme.txt. You can find the readme packaged in the self-extracting executable - Dell_ PROPack_2.0.0_A00.exe. It is also posted on the Systems Management documentation page on the Dell Support website at support.dell.com/manuals.
book.book Page 11 Thursday, December 10, 2009 12:36 PM • The Dell OpenManage Server Administrator Storage Management User's Guide is a comprehensive reference guide for configuring and managing local and remote storage attached to a system. This document is also available in HTML and PDF formats on the Dell Systems Management Tools and Documentation DVD and from the Storage Management console as online help.
book.
book.
book.book Page 14 Thursday, December 10, 2009 12:36 PM Installing SCOM/SCE and SCVMM Agents When you use the setup to monitor your infrastructure, SCOM/SCE (Operations Manager) and SCVMM agents installed on the managed hosts enable data transfer between the managed system and management stations. Agents of both SCVMM and Operations Manager are installed manually or automatically during the discovery process on all Hyper-V hosts.
book.book Page 15 Thursday, December 10, 2009 12:36 PM The Import Management Packs screen is displayed with a warning message in the Management Pack Details section, as shown in Figure 2-1. Operations Manager displays this generic warning as a part of the security process when you manually install a management pack. For more information on how you can change the security settings for installing Management Packs manually, see the Microsoft TechNet Library. Figure 2-1.
book.book Page 16 Thursday, December 10, 2009 12:36 PM Configuring PRO Tips The Dell systems and virtual infrastructure are monitored for either Critical only, or both Critical and Warning alerts. • A Warning alert is generated when a reading for the component is above or below the acceptable level. For example, the component may still be functioning, but it could potentially fail, or the component may be functioning in an impaired state.
book.book Page 17 Thursday, December 10, 2009 12:36 PM 3 Select the PRO tab and select the Enable PRO on this Host Group option. 4 By default, the monitoring level is set to Warning and Critical, which means that the application will display PRO Tips generated for both Warning and Critical alerts. To restrict the PRO Tips to Critical alerts only, select the Critical only option. 5 Select the Automatically implement PRO tips on this Host Group option.
book.book Page 18 Thursday, December 10, 2009 12:36 PM Table 2-1. Checking recovery action for warning alert conditions. (continued) Your Actions Expected System Response Verify that the host is placed in the Restrict mode and the PRO Tip resolved the alert. • After successful implementation of the PRO Tip, the status changes to "Resolved" and the PRO Tip entry is moved out of the PRO Tip window. • Corresponding alert disappears in the Operations Manager Alert View.
book.book Page 19 Thursday, December 10, 2009 12:36 PM Table 2-2. Checking recovery action for failure alert conditions. (continued) Your Actions Expected System Response Verify that the virtual systems are • After successful implementation of the PRO moved to a healthy host and PRO Tip, the status changes to "Resolved" and the Tip resolved the alert. PRO Tip entry is moved out of the PRO Tip window. • Corresponding alert disappears in the Operations Manager Alert View.
book.
book.book Page 21 Thursday, December 10, 2009 12:36 PM 3 Using Dell PRO Pack Monitoring Using SCVMM You can manage the health of your virtualized environment using PRO Tips displayed on the SCVMM console. To see the PRO Tip window, click the PRO Tips menu on the toolbar located below the main menu, as shown in Figure 3-1. The menu also shows the number of active PRO Tips in brackets. Figure 3-1. PRO Tip Button on the SCVMM Console Click the PRO Tips menu.
book.book Page 22 Thursday, December 10, 2009 12:36 PM Figure 3-2. PRO Tip Window Implementation of Recovery Actions The PRO Tip window provides an option to either implement or dismiss the recommended action. If you select the Implement option, SCVMM implements one of the recovery tasks described below, based on the nature of the alert. Placing the host in Restrict mode Placing a host in Restrict mode prevents future assignment of workload to the host until the problem is resolved.
book.book Page 23 Thursday, December 10, 2009 12:36 PM Select the Load Balance algorithm if you want SCVMM to evenly distribute virtual machines (VMs) across a pool of hosts. Select the Resource Maximization algorithm if you prefer to saturate the host completely before moving to a new one.
book.book Page 24 Thursday, December 10, 2009 12:36 PM Figure 3-3. Completed Job PRO Tip implementation of moving VMs can fail if no other healthy hosts are available in the host group or host cluster. In such a case, the PRO Tip window displays the state of the corresponding PRO Tip as Failed, and the reason is elaborated in the Error section. The status of the corresponding entry in the Jobs section on the SCVMM console also displays as Failed.
book.book Page 25 Thursday, December 10, 2009 12:36 PM VM Live Migration With live migration, you can migrate a VM from one node of a Windows Server 2008 R2 failover cluster to another node in the same cluster without any downtime. As a connected user, you will not experience any interruption during live migration. The difference in quick migration and live migration is that there is a downtime in quick migration whereas, there is no downtime in live migration.
book.book Page 26 Thursday, December 10, 2009 12:36 PM Monitoring Using PRO Specific Alerts on SCOM/SCE You can monitor the physical devices in your network using the Operations Manager console. The Operations Manager console provides the following views: • Alert View - Displays Dell PRO specific alerts in a tabular format with information on the severity level, source, name, resolution state, along with the date and time of creation.
book.book Page 27 Thursday, December 10, 2009 12:36 PM • State View - Displays the Dell system objects discovered in a tabular format. The State View displays objects with the name, path, storage health of the Dell system, and so on. You can personalize the State View by defining which objects you want displayed and customizing how the data looks. Figure 3-6. State View For more information on creating a State view see the Microsoft website.
book.book Page 28 Thursday, December 10, 2009 12:36 PM Recovery Action Overrides PRO Pack 2.0 supports two recovery actions. The following flag values trigger the respective recovery action: • 1: For migration recovery action • 2: For placing the server in restricted mode You can override the default recovey actions by changing the default recovery action flag value.
book.book Page 29 Thursday, December 10, 2009 12:36 PM This verifies that the overridden recovery action is successful. Figure 3-7. Override Recovery Action Alert Cause and Recovery Action The following table lists the alerts and the corresponding recommended remedial action: Restrict: It is recommended that the server should be temporarily unavailable for placement of new VMs until the maintenance tasks have been completed.
book.book Page 30 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 1054 Temperature sensor detected a failure value Error A temperature sensor Restrict and Migrate on the backplane board, system board, CPU, or drive carrier in the specified system exceeded its failure threshold value. 1104 Fan sensor detected a failure value.
book.book Page 31 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 1353 Power supply detected a warning. Warning A power supply sensor Restrict reading in the specified system exceeded definable warning threshold. 1354 Power supply detected a failure. Error A power supply has been disconnected or has failed.
book.book Page 32 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2048 Device Failed Error. Critical A storage component Restrict and Migrate such as a physical disk or an enclosure has failed. The failed component may have been identified by the controller while performing a task such as a rescan or a check consistency. 2056 Virtual Disk Failed.
book.book Page 33 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2083 Physical Disk Rebuild Failed A physical disk included in the virtual Restrict disk has failed or is corrupt. 2094 Predictive Warning Failure reported The physical disk is predicted to fail. 2100 Temperature exceeded Maximum Warning Threshold Warning The physical disk enclosure is too hot.
book.book Page 34 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2112 Enclosure shutdown Critical The physical disk enclosure is either hotter or cooler than the maximum or minimum allowable temperature range. 2122 Redundancy degraded Warning One or more of the enclosure components Restrict has failed. For example, a fan or power supply may have failed.
book.book Page 35 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2145 Controller battery low The controller battery charge is low. Warning 2169 The controller Critical battery needs to be replaced 2171 The controller battery temperature is above normal. 2174 Warning The controller battey has been removed.
book.book Page 36 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2201 A global hot spare failed Warning The controller is not able to communicate Restrict with a disk that is assigned as a global hot spare. The disk may have failed or been removed.
book.book Page 37 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2246 The controller battery is degraded Warning The temperature of the the battery is high. Restrict This maybe due to the battery being charged. 2264 A device is missing Warning The controller cannot communicate with a Restrict device. The device may be removed.
book.book Page 38 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2273 A block on the Critical physical disk has been punctured by the controller The controller encountered an Restrict and Migrate unrecoverable medium error when attempting to read a block on the physical disk and marked that block as invalid.
book.book Page 39 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2292 Communication Critical with the enclosure has been lost The controller has lost communication with Restrict and Migrate an enclosure management module (EMM). The cables may be loose or defective.
book.book Page 40 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2302 The enclosure is Critical not responding The enclosure or an enclosure component is Restrict and Migrate in a Failed or Degraded state. 2306 Bad block table is full Warning The bad block table is the table used for remapping bad disk blocks. This table fills as bad disk blocks are remapped.
book.book Page 41 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2314 The Critical initialization sequence of SAS components failed during system startup. SAS management and monitoring is not possible. Storage Management is unable to monitor or Restrict and Migrate manage SAS devices. 2318 Problems with the battery or the battery charger have been detected.
book.book Page 42 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2321 Single-bit ECC Critical error. The controller DIMM is nonfunctional. There will be no further reporting. The dual in-line memory module Restrict and Migrate (DIMM) is malfunctioning. Data loss or data corruption is eminent. 2322 The DC power supply is switched off.
book.book Page 43 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause Dell PRO Tip Recommended Remedial Action 2329 SAS port report Warning The text for this alert is generated by the Restrict and Migrate controller and can vary depending on the situation. 2337 The controller is Critical unable to recover cached data from the battery backup unit (BBU).
book.book Page 44 Thursday, December 10, 2009 12:36 PM Table 3-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Description Severity in SCOM/ SCE & PRO Tip in SCVMM Alert Cause 2350 There was an Critical unrecoverable disk media error during the rebuild or recovery operation The rebuild or recovery operation encountered Restrict an unrecoverable disk media error. 2356 SAS SMP Critical communications error.