Dell Server PRO Management Pack 3.0.
Notes, Cautions, and Warnings NOTE: A NOTE indicates important information that helps you make better use of your computer. CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid the problem. WARNING: A WARNING indicates a potential for property damage, personal injury, or death. © 2013 Dell Inc. All Rights Reserved.
Contents 1 Introduction..................................................................................................................................5 What's New in This Release.....................................................................................................................................5 Overview...................................................................................................................................................................5 Related Terms..........
Introduction 1 This document is intended for system administrators who use the Dell Server PRO Management Pack (Dell PRO Pack) to monitor Dell systems and take remedial action when an inefficient system is identified. The Dell PRO Pack 3.0.
recommends remedial actions when monitored objects transition to an unhealthy state (for example, virtual disk failure or predictive drive error), by leveraging the monitoring and alerting capabilities of Operations Manager and remediation capabilities of VMM.
Figure 1. Interaction of Components In the figure, a group of PowerEdge systems act as the managed systems and two PowerEdge systems act as management stations hosting the Operations Manager and VMM. OMSA generates alerts with corresponding severity when there is a transition to an unhealthy state. Dell PRO Pack monitors the same alerts for PRO. Dell PRO Pack maps the OMSA alerts with its remedial action. The following table describes the sequence of events that occur in PRO Tip management.
The managed system for PRO Pack is a Virtual Machine Manager Server. For more information, see technet.microsoft.com/en-us/library/gg610649.aspx. Management station: For the list of supported configurations of Operations Manager and VMM, see the following: • Operations Manager 2012 R2 or Operations Manager 2012 SP1 or Operations Manager 2012 technet.microsoft.com/en-us/library/hh205990.aspx • Operations Manager 2007 R2 - technet.microsoft.com/en-us/library/bb309428.
Using Dell Performance Resource Optimization Pack 2 This chapter suggests steps to use PRO Pack. Planning The Environment For PRO Tips You can plan for enabling the PRO Monitors that are relevant for the environment. By default, all the PRO Monitors are disabled in the Dell PRO Pack. For the list of alerts and the recovery actions, see Alerts and Recovery Actions. Select the alerts that you want to enable.
Alternatively, if you select the Show this window when new PRO Tips are created option in the PRO Tip window, the window automatically opens on the VMM console when a PRO Tip is generated. The PRO Tip window displays information such as source, tip, and state of the PRO Tip in a tabular format. The window also displays description of the problem that triggered the alert, the cause, and the suggested remedial action for recovery.
PRO Tip implementation of moving VMs can fail if no other healthy hosts are available in the host group or host cluster. In such a case, the PRO Tip window displays the state of the corresponding PRO Tip as Failed, and the reason is elaborated in the Error section. The status of the corresponding entry in the Jobs section on the VMM console is also display as Failed. NOTE: In the PRO Tip window the failure message is updated dynamically.
To access the Alert View: 1. Launch the Operations Manager console. 2. Select the Monitoring tab. 3. From Dell Server PRO Pack, select Dell Server PRO Alerts. The alerts are displayed on the right-side of the screen, as shown in the following figure. State View Displays the discovered Dell system objects in a tabular format. The State View displays objects with the name, path, storage health of the Dell system, and so on.
Overriding Recovery Actions PRO Pack 3.0.1 supports two recovery actions. The following flag values trigger the respective recovery action: • 1: For migration • 2: For placing the server in restricted mode You can override the default recovery action by changing the default recovery action flag value. For example, change the recovery flag value from 2 to 1 using the overrides option provided in Operations Manager console.
Alerts and Recovery Actions The following table lists the alerts and the recommended remedial actions: Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity Alert Cause 1053 Temperature sensor detected a warning value Warning A temperature sensor Restrict on the backplane board, system board, CPU, or drive carrier in the specified system exceeded its warning threshold value.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity Alert Cause Dell PRO Tip Recommended Remedial Action 1353 Power supply detected Warning a warning A power supply sensor Restrict reading in the specified system exceeded definable warning threshold. 1354 Power supply detected Error a failure A power supply has been disconnected or has failed. Restrict 1403 Memory Device Status Warning Warning A memory device correction rate exceeded an acceptable value.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity Alert Cause Dell PRO Tip Recommended Remedial Action 2076 Virtual Disk Check Consistency Failed Critical A physical disk included in the virtual disk failed or there is an error in the parity information. Restrict and Migrate 2082 Virtual Disk Rebuild Failure Critical A physical disk included in the virtual disk has failed or is corrupt.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity Alert Cause Dell PRO Tip Recommended Remedial Action power supply may have failed. 2123 Redundancy Lost Warning A virtual disk or an enclosure has lost data redundancy.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity Alert Cause 2187 Single-bit ECC error limit exceeded on the controller DIMM Warning The controller memory Restrict and Migrate is malfunctioning. 2201 A global hot spare failed Warning The controller is not able to communicate with a disk that is assigned as a global hot spare. The disk may have failed or been removed.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity Alert Cause Dell PRO Tip Recommended Remedial Action device. The state of the device cannot be determined. 2268 Storage Management communication Error Critical Storage Management has lost communication with a controller. This may occur if the controller driver or firmware is experiencing a problem.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity Alert Cause Dell PRO Tip Recommended Remedial Action read or write operation. 2292 Communication with the enclosure has been lost Critical The controller has lost communication with an enclosure management module (EMM). The cables may be loose or defective. Restrict and Migrate 2293 EMM (Enclosure Management Module) Failure Error The failure may be caused by a loss of power to the EMM.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity 2310 A virtual disk is Critical permanently degraded A redundant virtual Restrict and Migrate disk has lost redundancy. This may occur when the virtual disk suffers the failure of more than one physical disk. 2312 A power supply in the enclosure has an AC failure Warning The power supply has an AC failure Restrict 2313 A power supply in the enclosure has a DC failure Warning The power supply has a DC failure.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity Alert Cause Dell PRO Tip Recommended Remedial Action power supply unit or it is defective. 2324 The AC power supply cable has been removed Critical The power cable may be pulled out or removed. The power cable may also have overheated and become warped and nonfunctional. 2327 The NVRAM has corrupted data. The controller is reinitializing the NVRAM Warning The NVRAM has Restrict and Migrate corrupted data.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity during a write operation Alert Cause Dell PRO Tip Recommended Remedial Action because the disk contains bad disk blocks that could not be reassigned. Data loss may have occurred. 2350 There was an unrecoverable disk media error during the rebuild or recovery operation Critical The rebuild or recovery operation encountered an unrecoverable disk media error.
Dell Event ID Alert Description on Operations Manager and PRO Tip in VMM Severity 2,4 Driver Name: b06bdrv,ebdrv b57w2k,b57nd60x, b57nd60a,l2nd Dell OMNIC Broadcom Critical Network Interface Link Down 13,27,29,70 Driver Dell OMNIC Intel Critical Name: e1express, Network Interface Link e1qexpress, ixgbe, Down e1000 24 Alert Cause Dell PRO Tip Recommended Remedial Action The network link is down. Restrict Link has been disconnected.
Related Documentation and Resources 3 This chapter gives the details of documents and resources to help you work with the Pro Pack 3.0.1 Security Considerations Operations Console access privileges are handled internally by Operations Manager. You can setup this using the User Roles option under Administration Security feature on the Operations Manager console. The profile of the role assigned to you determines what actions you can perform and which objects you are able to manage.