book.book Page 1 Tuesday, October 4, 2011 6:58 PM Dell Server PRO Management Pack 2.
book.book Page 2 Tuesday, October 4, 2011 6:58 PM Notes and Cautions NOTE: A NOTE indicates important information that helps you make better use of your computer. CAUTION: A CAUTION indicates potential damage to hardware or loss of data if instructions are not followed. ____________________ Information in this document is subject to change without notice. © 2011 Dell Inc. All rights reserved. Reproduction of these materials in any manner whatsoever without the written permission of Dell Inc.
book.book Page 3 Tuesday, October 4, 2011 6:58 PM Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5 . . . . . . . . . . . . . . . . . . . . . . . . 6 . . . . . . . . . . . . . . . . . . . . . . . . . 6 What’s New Overview Related Terms . . . . . . . . . . . . . . . . . . . . . . . What is a PRO Tip? . . . . . . . . . . . . . . . . . . . . Features and Functionalities . . . . . . . . . . . . . . . Understanding PRO Tip Management Supported Operating Systems . 2 . . .
book.book Page 4 Tuesday, October 4, 2011 6:58 PM 3 Related Documentation and Resources 37 Security Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 . . . . . . . . . . . . . .
book.book Page 5 Tuesday, October 4, 2011 6:58 PM 1 Introduction This document is intended for system administrators who use the Dell Server PRO Management Pack (Dell PRO Pack) to monitor Dell systems and take remedial action when an inefficient system is identified.
book.book Page 6 Tuesday, October 4, 2011 6:58 PM What’s New This release of PRO Pack supports the following: • SCE 2010 • SCVMM 2008 R2 SP1 • New hardware support • Additional Dell OpenManage alerts and Network Interface Card (NIC) alerts • Improvements on the resolutions of some old alerts For more information on the alerts and their resolutions, see Alerts and Recovery Actions.
book.book Page 7 Tuesday, October 4, 2011 6:58 PM Related Terms • A managed system is a Dell system running the Dell OpenManage Server Administrator (OMSA), which is monitored and managed using Operations Manager and SCVMM. It can be managed locally or remotely using supported tools. • A management station or managing station is a Microsoft Windows -based Dell system that has the Operations Manager and SCVMM installed to manage virtual workloads.
book.book Page 8 Tuesday, October 4, 2011 6:58 PM Features and Functionalities Dell PRO Pack: 8 • Performs PRO-management of Dell PowerEdge systems running Microsoft Hyper-V platforms, by continually monitoring the health of your physical and virtual infrastructure. • Works with Operations Manager and SCVMM to detect events such as loss of power supply redundancy, higher temperature than threshold values, system storage battery error, virtual disk failure, and so on.
book.book Page 9 Tuesday, October 4, 2011 6:58 PM Understanding PRO Tip Management This section explains a typical Dell PRO Pack setup and the sequence of events involved in PRO tip management. Figure 1-1.
book.book Page 10 Tuesday, October 4, 2011 6:58 PM Table 1-1describes the sequence of events that occur in PRO Tip management. Table 1-1. Sequence of events with description Sequence Number Event 1 Operations Manager agents on the host are enabled to detect the warning, error, or failure alerts that are generated by OMSA. 2 Alert is sent to Operations Manager. 3 Operations Manager console displays active PRO alerts. 4 Operations Manager notifies the alert and the associated PRO Tip ID to SCVMM.
book.book Page 11 Tuesday, October 4, 2011 6:58 PM Supported Operating Systems The Dell PRO Pack supported operating systems on the managed system and management station are as follows: Managed system: The managed system for PRO Pack is a Virtual Machine Manager Server. For more information, see technet.microsoft.com/en-us/library/cc764213.aspx Management station: For the list of supported configurations of SCOM, SCE, and SCVMM, see the following: • SCOM 2007 R2 - technet.microsoft.
book.
book.book Page 13 Tuesday, October 4, 2011 6:58 PM 2 Using Dell Performance Resource Optimization Pack Monitoring Using SCVMM You can manage the health of your virtualized environment using PRO Tips displayed on the SCVMM console. To see the PRO Tip window, click the PRO Tips menu on the toolbar, as shown in Figure 2-1. The menu also displays the number of active PRO Tips in parentheses. Figure 2-1. PRO Tip Button on the SCVMM Console Click the PRO Tips menu.
book.book Page 14 Tuesday, October 4, 2011 6:58 PM Implementing Recovery Actions The PRO Tip window provides an option to either implement or dismiss the recommended action. If you select Implement, SCVMM implements one of the recovery tasks described below, based on the nature of the alert. Placing the Host in Restrict Mode Placing a host in Restrict mode prevents assignment of workload to the host until the problem is solved.
book.book Page 15 Tuesday, October 4, 2011 6:58 PM After you successfully implement the recovery task, the following changes take place: • The status of PRO Tip changes to Resolved and the PRO Tip entry moves out of the PRO Tip window. • Corresponding alert disappears in the Operations Manager Alert View. • An entry is displayed in the Jobs section on the SCVMM console. This entry displays the status of the job as Completed, as shown in the Figure 2-2. Figure 2-2.
book.book Page 16 Tuesday, October 4, 2011 6:58 PM If you select Dismiss, the PRO Tip is not executed and the following changes take place: • The PRO Tip is removed from the SCVMM PRO Tip console. • The alert in Operations Manager is removed from the Dell Server PRO Alerts. For more informartion, see Using Health Explorer to Reset Alerts.
book.book Page 17 Tuesday, October 4, 2011 6:58 PM The alerts are displayed on the right-side of the screen, as shown in Figure 2-3. Figure 2-3. Alert View • State View — Displays the discovered Dell system objects in a tabular format. The State View displays objects with the name, path, storage health of the Dell system, and so on. You can personalize the State View by defining which objects you want to display and how the data is displayed. Figure 2-4.
book.book Page 18 Tuesday, October 4, 2011 6:58 PM Overriding Recovery Actions PRO Pack 2.1 supports two recovery actions. The following flag values trigger the respective recovery action: • 1: For migration • 2: For placing the server in restricted mode You can override the default recovery action by changing the default recovery action flag value. For example, change the recovery flag value from 2 to 1 using the overrides option provided in SCOM console.
book.book Page 19 Tuesday, October 4, 2011 6:58 PM 10 Generate an alert and PRO Tip. 11 Select Implement PRO Tip. This verifies that the overridden recovery action is successful. Figure 2-5. Overriding Recovery Action Alerts and Recovery Actions Table 2-1 lists the alerts and the corresponding recommended remedial actions: Table 2-1.
book.book Page 20 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause 1054 Temperature sensor detected a failure value Error A temperature sensor Restrict and on the backplane Migrate board, system board, CPU, or drive carrier in the specified system exceeded its failure threshold value.
book.book Page 21 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM 1353 Power supply detected a warning Warning A power supply sensor Restrict reading in the specified system exceeded definable warning threshold. 1354 Power supply detected a failure Error 1403 Memory Warning A memory device Device Status correction rate Warning exceeded an acceptable value.
book.book Page 22 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause 2048 Device Failed Critical Error A storage component Restrict and such as a physical disk Migrate or an enclosure has failed. The failed component may have been identified by the controller while performing a task such as a rescan or a check consistency.
book.book Page 23 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause Dell PRO Tip Recommended Remedial Action 2083 Physical Disk Critical Rebuild Failed A physical disk included in the virtual Restrict disk has failed or is corrupt. 2094 Predictive Failure reported Warning The physical disk is predicted to fail.
book.book Page 24 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause 2112 Enclosure shutdown Critical The physical disk enclosure is either hotter or cooler than the maximum or minimum allowable temperature range. 2122 Redundancy degraded Warning One or more of the enclosure components Restrict has failed. For example, a fan or power supply may have failed.
book.book Page 25 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause Dell PRO Tip Recommended Remedial Action 2145 Controller battery low 2169 The controller Critical battery needs to be replaced 2171 The controller Warning The room temperature Restrict battery may be too hot. The temperature is system fan may also be above normal degraded or failed.
book.book Page 26 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM 2187 Single-bit Warning The controller ECC error memory is limit exceeded malfunctioning. on the controller DIMM 2201 2203 Alert Cause A global hot spare failed Warning The controller is not able to communicate with a disk that is assigned as a global hot spare. The disk may have failed or been removed.
book.book Page 27 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause Dell PRO Tip Recommended Remedial Action 2213 Recharge count maximum exceeded 2246 The controller Warning The temperature of battery is the the battery is high. Restrict degraded This maybe due to the battery being charged.
book.book Page 28 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause 2272 Patrol Read Critical found an uncorrectable media error The Patrol Read task has encounted an error Restrict and that cannot be Migrate corrected. There may be a bad disk block that cannot be remapped.
book.book Page 29 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause 2290 Single-bit Warning An error involving a ECC error on single bit has been controller encountered during a DIMM read or write operation.
book.book Page 30 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause 2300 Unstable Enclosure Failure Critical The controller is not receiving a consistent response from the enclosure. 2301 Enclosure Hardware Error Critical The enclosure or an enclosure component Restrict and is in a Failed or Migrate Degraded state.
book.book Page 31 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause 2310 A virtual disk is permanently degraded A redundant virtual disk has lost Restrict and redundancy. This may Migrate occur when the virtual disk suffers the failure of more than one physical disk.
book.book Page 32 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause 2318 Problems with Warning The battery or the the battery or battery charger is not the battery functioning properly. charger have been detected. The battery health is poor.
book.book Page 33 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause Dell PRO Tip Recommended Remedial Action 2322 The DC Critical power supply is switched off The power supply unit is switched off. Either Restrict and a user switched off the Migrate power supply unit or it is defective.
book.book Page 34 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause 2337 The controller Critical is unable to recover cached data from the battery backup unit (BBU) The controller was unable to recover data Restrict from the cache.
book.book Page 35 Tuesday, October 4, 2011 6:58 PM Table 2-1. Alert Cause and Recovery Action (continued) Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause Dell PRO Tip Recommended Remedial Action 2350 There was an Critical unrecoverable disk media error during the rebuild or recovery operation The rebuild or recovery operation encountered an unrecoverable disk media error.
book.book Page 36 Tuesday, October 4, 2011 6:58 PM Alert Cause and Recovery Action (continued) Table 2-1. Dell Event ID Alert Severity Description on SCOM/ SCE and PRO Tip in SCVMM Alert Cause Dell PRO Tip Recommended Remedial Action 2397 The Check Critical Consistency completed with uncorrectable errors Medium errors in the physical drives.
book.book Page 37 Tuesday, October 4, 2011 6:58 PM Related Documentation and Resources 3 This chapter gives the details of documents and resources to help you work with the Pro Pack 2.1. Security Considerations Operations Console access privileges are handled internally by SCOM/SCE. This can be setup using the User Roles option under Administration Security feature on the SCOM/SCE console.
book.book Page 38 Tuesday, October 4, 2011 6:58 PM • The Dell OpenManage Server Administrator Command Line Interface User's Guide documents the complete command line interface for Server Administrator, including an explanation of the command line interface (CLI) commands to view system status, access logs, create reports, configure various component parameters, and set critical thresholds. For information on terms used in this document, see the Glossary at support.dell.com/manuals.