HP ProLiant Servers Troubleshooting Guide June 2006 (Fifth Edition) Part Number 375445-005
© Copyright 2004-2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Microsoft, Windows, and Windows NT are U.S.
Contents Introduction ................................................................................................................................ 10 What's new............................................................................................................................................ 10 Revision history ....................................................................................................................................... 10 375445-xx4 (May 2006)...........................
Video problems............................................................................................................................. 43 Mouse and keyboard problems ....................................................................................................... 44 Audio problems............................................................................................................................. 45 Printer problems ..........................................................................
Clustering software ........................................................................................................................ 61 Diagnostic tools ...................................................................................................................................... 61 HP Insight Diagnostics .................................................................................................................... 61 Survey Utility........................................................
Teardown procedures, part numbers, specifications ........................................................................... 72 Technical topics............................................................................................................................. 72 Error messages ........................................................................................................................... 73 ADU error messages.............................................................................
Drive Time-Out Occurred on Physical Drive Bay X.............................................................................. 80 Drive X Indicates Position Y............................................................................................................. 80 Duplicate Write Memory Error ........................................................................................................ 80 Error Occurred Reading RIS Copy from SCSI Port X Drive ID ...............................................
Swapped Cables or Configuration Error Detected. A Drive Rearrangement... ........................................ 88 Swapped Cables or Configuration Error Detected. An Unsupported Drive Arrangement Was Attempted... 88 Swapped cables or configuration error detected. The cables appear to be interchanged... ..................... 88 Swapped cables or configuration error detected. The configuration information on the attached drives... . 89 Swapped Cables or Configuration Error Detected.
System Power Supply Failure (Power Supply X)................................................................................ 126 Unrecoverable Host Bus Data Parity Error... .................................................................................... 126 Uncorrectable Memory Error (Slot X, Memory Module Y)... ............................................................... 127 HP BladeSystem infrastructure error codes ....................................................................................
Introduction In this section What's new........................................................................................................................................... 10 Revision history ......................................................................................................................................
• Updated contacting HP: • Contacting HP technical support or an authorized reseller • Server information you need Getting started NOTE: For common troubleshooting procedures, the term "server" is used to mean servers and server blades. This guide provides common procedures and solutions for the many levels of troubleshooting a ProLiant server—from the most basic connector issues to complex software configuration problems.
Pre-diagnostic steps WARNING: To avoid potential problems, ALWAYS read the warnings and cautionary information in the server documentation before removing, replacing, reseating, or modifying system components. IMPORTANT: This guide provides information for multiple servers. Some information may not apply to the server you are troubleshooting. Refer to the server documentation for information on procedures, hardware options, software tools, and operating systems supported by the server. 1.
This symbol indicates that the component exceeds the recommended weight for one individual to handle safely. weight in kg weight in lb WARNING: To reduce the risk of personal injury or damage to the equipment, observe local occupational health and safety requirements and guidelines for manual material handling. These symbols, on power supplies or systems, indicate that the equipment is supplied by multiple sources of power.
CAUTION: The server is designed to be electrically grounded (earthed). To ensure proper operation, plug the AC power cord into a properly grounded AC outlet only.
Common problem resolution In this section Loose connections .................................................................................................................................. 15 Service notifications................................................................................................................................ 15 Updating firmware .................................................................................................................................
Components for option firmware updates are also available from the HP Storage Products Software and Drivers website (http://www.hp.com/support/proliantstorage). 1. Find the most recent version of the component that you require. Components for controller firmware updates are available in offline and online formats. 2. Follow the instructions for installing the component on the server. These instructions are included with the CD and on the component website. 3.
Activity LED (1) Online LED Fault LED (2) (3) Interpretation On Off Do not remove the drive. Off The drive is being accessed, but (1) it is not configured as part of an array; (2) it is a replacement drive and rebuild has not yet started; or (3) it is spinning up during the POST sequence. Flashing Flashing Flashing Do not remove the drive. Removing a drive may cause data loss in non-fault-tolerant configurations.
Online/activity LED Fault/UID LED (green) (amber/blue) Interpretation Flashing irregularly Amber, flashing regularly (1 Hz) The drive is active, but a predictive failure alert has been received for this drive. Replace the drive as soon as possible. Flashing irregularly Off The drive is active, and it is operating normally. Off Steadily amber A critical fault condition has been identified for this drive, and the controller has placed it offline. Replace the drive as soon as possible.
Diagnostic flowcharts In this section Troubleshooting flowcharts ...................................................................................................................... 19 Troubleshooting flowcharts To effectively troubleshoot a problem, HP recommends that you start with the first flowchart in this section, "Start diagnosis flowchart (on page 20)," and follow the appropriate diagnostic path.
Start diagnosis flowchart Use the following flowchart to start the diagnostic process.
The General diagnosis flowchart provides a generic approach to troubleshooting. If you are unsure of the problem, or if the other flowcharts do not fix the problem, use the following flowchart.
Power-on problems flowchart Server power-on problems flowchart Symptoms: • The server does not power on. • The system power LED is off or amber. • The external health LED is red or amber. • The internal health LED is red or amber. NOTE: For the location of server LEDs and information on their statuses, refer to the server documentation.
p-Class server blade power-on problems flowchart Symptoms: • The server does not power on. • The system power LED is off or amber. • The health LED is red or amber. NOTE: For the location of server LEDs and information on their statuses, refer to the server documentation.
• Loose or faulty power cord • Power source problem • Power-on circuit problem • Improperly seated component or interlock problem • Faulty internal component Diagnostic flowcharts 24
c-Class server blade power-on problems flowchart Symptoms: • The server does not power on. • The system power LED is off or amber. • The health LED is red or amber. NOTE: For the location of server LEDs and information on their statuses, refer to the server documentation.
POST problems flowchart Symptoms: • Server does not complete POST NOTE: The server has completed POST when the system attempts to access the boot device.
Server and p-Class server blade POST problems flowchart Diagnostic flowcharts 27
c-Class server blade POST problems flowchart Operating system boot problems flowchart Symptoms: • Server does not boot a previously installed OS • Server does not boot SmartStart Possible causes: • Corrupted OS • Hard drive subsystem problem • Incorrect boot order setting in RBSU There are two ways to use SmartStart when diagnosing OS boot problems on a server blade: Diagnostic flowcharts 28
• Use iLO to remotely attach virtual devices to mount the SmartStart CD onto the server blade. • Use a local I/O cable and drive to connect to the server blade, and then restart the server blade.
NOTE: For the location of server LEDs and information on their statuses, refer to the server documentation.
c-Class server blade fault indications flowchart Diagnostic flowcharts 31
Hardware problems In this section Procedures for all ProLiant servers ............................................................................................................ 32 Power problems ..................................................................................................................................... 32 General hardware problems.................................................................................................................... 33 Internal system problems ........
UPS problems UPS is not working properly Action: 1. Be sure the UPS batteries are charged to the proper level for operation. See the UPS documentation for details. 2. Be sure the UPS power switch is in the On position. See the UPS documentation for the location of the switch. 3. Be sure the UPS software is updated to the latest version. Use the Power Management software located on the Power Management CD. 4.
2. Refer to the release notes included with the hardware to be sure the problem is not caused by a change to the hardware release. If no documentation is available, refer to the HP support website (http://www.hp.com/support). 3. Be sure the new hardware is installed properly. Refer to the device, server, and OS documentation to be sure all requirements are met.
• If the system boots and video is working, add each component back to the server one at a time, restarting the server after each component is added to determine if that component is the cause of the problem. When adding each component back to the server, be sure to disconnect power to the server and follow the guidelines and cautionary information in the server documentation. Third-party device problems Action: 1.
3. Be sure no loose connections (on page 15) exist. 4. Be sure the media from which you are attempting to boot is not damaged and is a bootable CD. 5. If attempting to boot from a USB CD-ROM drive: • Refer to the operating system and server documentation to be sure both support booting from a USB CD-ROM drive. • Be sure legacy support for a USB CD-ROM drive is enabled in RBSU. Data read from the drive is inconsistent, or drive cannot read data Action: 1. Clean the drive and media. 2.
Drive is not found Action: Be sure no loose connections (on page 15) exist with the drive. Non-system disk message is displayed Action: 1. Remove the non-system diskette from the drive. 2. Check for and disconnect any non-bootable USB devices. Diskette drive cannot write to a diskette Action: 1. If the diskette is not formatted, format the diskette. 2. Be sure the diskette is not write protected. If it is, use another diskette or remove the write protection. 3.
Read/write issue Action: 1. Run the Acceptance Test in HP StorageWorks Library and Tape Tools. CAUTION: Running the Acceptance Test overwrites the tape. To avoid overwriting the tape, run the shorter Device Analysis Test instead. 2. Run the Media Validation Test in HP StorageWorks Library and Tape Tools. Backup issue Action: 1. Run the Acceptance Test in HP StorageWorks Library and Tape Tools. CAUTION: Running the Acceptance Test overwrites the tape.
Hard drive problems System completes POST but hard drive fails Action: 1. Be sure no loose connections (on page 15) exist. 2. Be sure no device conflict exists. 3. Be sure the hard drive is properly cabled and terminated if necessary. 4. Be sure the hard drive data cable is working by replacing it with a known functional cable. 5. Run Insight Diagnostics ("HP Insight Diagnostics" on page 61) and replace failed components as indicated. No hard drives are recognized Action: 1.
1. Be sure the files are not corrupt. Run the repair utility for the operating system. 2. Be sure no viruses exist on the server. Run a current version of a virus scan utility. Server response time is slower than usual Action: Be sure the hard drive is not full, and increase the amount of free space on the hard drive, if needed. It is recommended that hard drives should have a minimum of 15 percent free space. Fan problems General fan problems are occurring Action: 1.
Memory problems General memory problems are occurring Action: • • Isolate and minimize the memory configuration. • Be sure the memory meets the server requirements and is installed as required by the server. Some servers may require that memory banks be fully populated or that all memory within a memory bank must be the same size, type, and speed. Refer to the server documentation to determine if the memory is installed properly. • Check any server LEDs that correspond to memory slots.
Server fails to recognize new memory Action: 1. Be sure the memory is the correct type for the server and is installed according to the server requirements. Refer to the server documentation or HP website (http://www.hp.com). 2. Be sure you have not exceeded the memory limits of the server or operating system. Refer to the server documentation. 3. Be sure no Event List error messages (on page 124) are displayed in the IML. 4. Be sure the memory is properly seated. 5.
c. Replace the remaining processor with a known functional processor. If the problem is resolved after you restart the server, a fault exists with one or more of the original processors. Install each processor and its associated PPM (if applicable) one by one, restarting each time, to find the faulty processor or processors. Be sure the processor configurations at each step are compatible with the server requirements.
7. Be sure a video expansion board, such as a RILOE board, has not been added to replace onboard video, making it seem like the video is not working. Disconnect the video cable from the onboard video, and then reconnect it to the video jack on the expansion board. NOTE: All servers automatically bypass onboard video when a video expansion board is present. 8. Press any key, or type the password, and wait a few moments for the screen to activate to be sure the power-on password feature is not in effect.
Audio problems Action: Be sure the server speaker is connected. Refer to the server documentation. Printer problems Printer does not print Action: 1. Be sure the printer is powered up and online. 2. Be sure no loose connections (on page 15) exist. 3. Be sure the correct printer drivers are installed. Printer output is garbled Action: Be sure the correct printer drivers are installed. Local I/O cable problems NOTE: The local I/O cable is used only with HP ProLiant p-Class server blades.
Data is displayed as garbled characters after the connection is established Action: 1. Be sure both modems have the same settings, including speed, data, parity, and stop bits. 2. Be sure the software is set for the correct terminal emulation. a. Reconfigure the software correctly. b. Restart the server. c. Run the communications software, checking settings and making corrections where needed. d. Restart the server, and then reestablish the modem connection.
3. Be sure no line interference exists. Retry the connection by dialing the number several times. If conditions remain poor, contact the telephone company to have the line tested. 4. Be sure the modem is current and compliant with CCITT and Bell standards. Replace with a supported modem if needed. You are unable to connect to an online subscription service Action: 1. If the line you are accessing requires error control to be turned off, do so using the AT command AT&Q6%C0. 2.
2. Be sure the correct network driver is installed for the controller and that the driver file is not corrupted. Reinstall the driver. 3. Be sure no loose connections (on page 15) exist. 4. Be sure the network cable is working by replacing it with a known functional cable. 5. Check the PCI Hot Plug power LED to be sure the PCI slot is receiving power, if applicable. 6. Be sure the network controller is not damaged. 7.
Software problems In this section Operating system problems and resolutions............................................................................................... 49 Application software problems................................................................................................................. 52 Remote ROM flash problems....................................................................................................................
If neither of these actions resolve the problem, contact an authorized service provider. For more information about debugging tools or specific GPF messages, refer to the Microsoft website (http://www.microsoft.com/whdc/devtools/debugging/default.mspx). Errors are displayed in the error log Action: Follow the information provided in the error log, and then refer to the operating system documentation.
3. Install the current drivers. If you apply the update and have problems, refer to the Software and Drivers Download website (http://h18007.www1.hp.com/support/files/server) to find files to correct the problems. Restoring to a backed-up version If you recently upgraded the operating system or software and cannot resolve the problem, you can try restoring a previously saved version of the system. Before restoring the backup, make a backup of the current system.
• Linux—Refer to the operating system documentation for information. Linux operating systems For troubleshooting information specific to Linux operating systems, refer to the Linux for ProLiant website (http://h18000.www1.hp.com/products/servers/linux). Application software problems Software locks up Action: 1. Check the application log and operating system log for entries indicating why the software failed. 2. Check for incompatibility with other software on the server. 3.
• One or more remote servers with system ROMs requiring upgrade • An administrative user account on each target system. The administrative account must have the same username and password as the local administrative client system. • All target systems are connected to the same network and use protocols that enable them to be seen from the administrative client. • Each target system has a system partition that is at least 32 MB in size.
Software tools and solutions In this section Configuration tools ................................................................................................................................. 54 Management tools.................................................................................................................................. 58 Diagnostic tools .....................................................................................................................................
• Installing software drivers directly from the CD. With systems that have internet connection, the SmartStart Autorun Menu provides access to a complete list of ProLiant system software. • Enabling access to the Array Configuration Utility (on page 54), Array Diagnostic Utility (on page 63), and Erase Utility (on page 59) SmartStart is included in the HP ProLiant Essentials Foundation Pack.
Auto-configuration process The auto-configuration process automatically runs when you boot the server for the first time. During the power-up sequence, the system ROM automatically configures the entire system without needing any intervention. During this process, the ORCA utility, in most cases, automatically configures the array to a default setting based on the number of drives connected to the server. NOTE: The server may not support all the following examples.
7. Press the Esc key to exit the current menu, or press the F10 key to exit RBSU. For more information on online spare memory, refer to the white paper on the HP website (http://h18000.www1.hp.com/products/servers/technology/memoryprotection.html). Option ROM Configuration for Arrays Before installing an operating system, you can use the ORCA utility to create the first logical drive, assign RAID levels, and establish online spare configurations.
Management CD The Management CD contains the latest tools available for easily managing the server, such as HP SIM ("HP Systems Insight Manager" on page 59) and Management Agents (on page 59). Run the Management CD shipped with the server. Refer to the Management CD user documentation as well as the ProLiant Server Management website (http://www.hp.com/servers/manage).
• Access advanced troubleshooting features through the iLO and iLO 2 interface. • Diagnose iLO and iLO 2 using HP SIM through a web browser and SNMP alerting. For more information about iLO and iLO 2 features, refer to the iLO and iLO 2 documentation on the Documentation CD or on the HP website (http://www.hp.com/servers/lights-out). Erase Utility CAUTION: Perform a backup before running the System Erase Utility.
The Virtual Machine Management Pack provides the following functionality: • Central management and control of VMware® and Microsoft® virtual machines with physical host to virtual machine association • Easy identification of VMs or host servers reaching high CPU, memory, or disk utilization levels • Highly flexible move capabilities that enable live moves and moves to dissimilar host resources • Back up, template, and alternate host capabilities that enable restoration of VMs on any available host T
System Management homepage To access the System Management homepage of a server, go to https://localhost:2381 (https://localhost:2381). USB support HP provides both standard USB support and legacy USB support. Standard support is provided by the operating system through the appropriate USB device drivers. HP provides support for USB devices before the operating system loads through legacy USB support, which is enabled by default in the system ROM. HP hardware supports USB version 1.1.
Smart Array SCSI Diagnosis feature NOTE: This feature is only available in HP Insight Diagnostics Online Edition. The HP Insight Diagnostics Online Edition ("HP Insight Diagnostics" on page 61) provides the capability to use non-intrusive system level checks to diagnose Smart Array SCSI hard drives. Diagnosis supports SCSI, SATA, and SAS hard drives that are attached to a Smart Array controller and configured as part of a logical volume.
Array Diagnostic Utility ADU is a tool that collects information about array controllers and generates a list of detected problems. ADU can be accessed from the SmartStart CD ("SmartStart software" on page 54) or downloaded from the HP website (http://www.hp.com). Remote support and analysis tools HP Instant Support Enterprise Edition ISEE is a proactive remote monitoring and diagnostic tool to help manage your systems and devices, a feature of HP support.
If you do not use the SmartStart CD to install an operating system, drivers for some of the new hardware are required. These drivers, as well as other option drivers, ROM images, and value-add software can be downloaded from the HP website (http://www.hp.com/support). IMPORTANT: Always perform a backup before installing or updating device drivers. Version control The VCRM and VCA are Web-enabled Insight Management Agents tools that HP SIM uses to facilitate software update tasks.
Care Pack HP Care Pack Services offer upgraded service levels to extend and expand standard product warranty with easy-to-buy, easy-to-use support packages that help you make the most of your server investments. Refer to the Care Pack website (http://www.hp.com/hps/carepack/servers/cp_proliant.html). Firmware maintenance HP has developed technologies that ensure that HP servers provide maximum uptime with minimal maintenance.
For additional information, refer to the HP Online ROM Flash User Guide on the HP website (http://h18023.www1.hp.com/support/files/server/us/romflash.html). Option ROMs Smart Components for option ROMs provide for efficient administration of option ROM upgrades. Types of option ROMs include: • Array controller ROMs • iLO ROMs • RILOE II ROMs • Hard drive ROMs NOTE: Online ROM Flash components are not available for hard drive ROMs.
2. Shut down each server where the system or option ROM images are to be upgraded and reboot using the correct ROMPaq diskette for that server. 3. Follow the interactive session in the ROMPaq utility, which enables you to select the devices to be flashed. 4. After the ROMPaq utility flashes the ROM for the selected devices, cycle power manually to reboot the system back into the operating system.
4. Verify the firmware update by checking the version of the current firmware.
HP resources for troubleshooting In this section Online resources .................................................................................................................................... 69 General server resources.........................................................................................................................
White papers White papers are electronic documentation on complex technical topics. Some white papers contain indepth details and procedures. Topics include HP products, HP technology, OS, networking products, and performance. Refer to one of the following websites: • HP Business Support Center (http://www.hp.com/go/bizsupport) • HP Industry Standard Server Technology Papers (http://h18004.www1.hp.com/products/servers/technology/whitepapers/index.
Management of the server Refer to the HP Systems Insight Manager Help Guide on the Management CD or the HP website (http://www.hp.com/go/hpsim). Operating system installation and configuration information (for factory-installed operating systems) Refer to the factory-installed operating system installation documentation that ships with the server. Operating system version support Refer to the operating system support matrix (http://www.hp.com/go/supportos).
Server and option specifications, symbols, installation warnings, and notices Refer to the server documentation and printed notices. Printed notices are available in the Reference Information pack. Server documentation is available in the following locations: • Documentation CD that ships with the server • HP Business Support Center website (http://www.hp.com/go/bizsupport) • HP Technical Documentation website (http://www.docs.hp.
Error messages In this section ADU error messages............................................................................................................................... 73 POST error messages and beep codes...................................................................................................... 92 Event list error messages ....................................................................................................................... 124 HP BladeSystem infrastructure error codes.
Accelerator Parity Write Errors: X Description: Number of times that write memory parity errors were detected during transfers to memory on the array accelerator board. Action: If many parity errors occurred, you may need to replace the array accelerator board. Accelerator Status: Cache was Automatically Configured During Last Controller Reset Description: Cache board was replaced with one of a different size. Action: No action is required. Accelerator Status: Data in the Cache was Lost... ...
Description: The number of cache lines experiencing excessive ECC errors has reached a preset limit. Therefore, the cache has been shut down. Action: 1. Reseat the cache to the controller. 2. If the problem persists, replace the cache. Accelerator Status: Obsolete Data Detected Description: During reset initialization, obsolete data was found in the cache due to the drives being moved and written to by another controller. Action: No action is required.
Accelerator Status: Valid Data Found at Reset Description: Valid data was found in posted-write memory at reinitialization. Data will be flushed to disk. Action: No error or data loss condition exists. No action is required. Accelerator Status: Warranty Alert Description: Catastrophic problem exists with array accelerator board. Refer to other messages on Diagnostics screen for exact meaning of this message. Action: Replace the array accelerator board.
Cache Has Been Disabled; Likely Caused By a Loose Pin on One of the RAM Chips Description: Cache has been disabled due to a large number of ECC errors detected while testing the cache during POST. This is probably caused by a loose pin on one of the RAM chips. Action: Try reseating the cache to the controller. If that does not work, replace the cache. Configuration Signature is Zero Description: ADU ("Array Diagnostic Utility" on page 63) detected that NVRAM contains a configuration signature of zero.
page 63) examines each physical drive and looks for drives that have been moved to a different drive bay. Action: Look for messages indicating which drives have been moved. If no messages are displayed and drive swapping did not occur, run ACU ("Array Configuration Utility" on page 54) to configure the controller and run the server setup utility to configure NVRAM. Do not run either utility if you believe drive swapping has occurred. Controller Reported POST Error.
4. If the problem persists, power down the system and replace the cable. 5. If the problem persists, power down the system and replace the drive. Drive (Bay) X is a Replacement Drive Description: This drive has been replaced. This message is displayed if a drive is replaced in a faulttolerant logical volume. Action: If the replacement was intentional, allow the drive to rebuild.
Drive Monitoring Features Are Unobtainable Description: ADU ("Array Diagnostic Utility" on page 63) is unable to get monitor and performance data due to a fatal command problem (such as drive time-out), or is unable to get data due to these features not being supported on the controller. Action: Check for other errors such as time-outs. If no other errors occur, upgrade the firmware to a version that supports monitor and performance, if desired.
Identify Logical Drive Data did not Match with NVRAM Description: The identify unit data from the array controller does not match with the information stored in NVRAM. This can occur if new, previously configured drives have been placed in a system that has also been previously configured. Action: Run the server setup utility to configure the controller and NVRAM.
Action: Check for drive failures, wrong drive replaced, or loose cable messages. If a drive failure occurred, replace the failed drive or drives, and then restore the data for this logical drive from the tape backup. Otherwise, follow the procedures for correcting problems when an incorrect drive is replaced or a loose cable is detected. Logical Drive X Status = Interim Recovery (Volume Functional, but not Fault Tolerant) Description: A physical drive in this logical drive has failed.
Logical Drive X Status = Wrong Drive Replaced Description: A physical drive in this logical drive has failed. The incorrect drive was replaced. Action: 1. Power down the server. 2. Replace the drive that was incorrectly replaced. 3. Replace the original drive that failed with a new drive. CAUTION: Do not run the server setup utility and try to reconfigure, or data will be lost.
Other Controller Indicates Different Firmware Version Description: The other controller in the redundant controller configuration is using a different firmware version. Action: Be sure both controllers are using the same firmware revision. Other Controller Indicates Different Cache Size Description: The other controller in the redundant controller configuration has a different size array accelerator. Action: Be sure both controllers are using the same capacity array accelerator.
5. If the error persists after completing steps 1 through 4, contact an HP authorized service provider. SCSI Port X Drive ID Y Failed - REPLACE (failure message) Description: ADU ("Array Diagnostic Utility" on page 63) detected a drive failure. Action: Correct the condition that caused the error, if possible, or replace the drive. SCSI Port X, Drive ID Y Firmware Needs Upgrading Description: Drive firmware may cause problems and should be upgraded.
Description: A predictive failure warning for this hard drive has been generated, indicating that a drive failure is imminent. Action: Replace this drive at the earliest opportunity. Refer to the server documentation for drive replacement information before performing this operation. SCSI Port X, Drive ID Y...S.M.A.R.T. Predictive Failure Errors Have Been Detected in the Power Monitor and Performance Data... ...SOLUTION: Please replace this drive when conditions permit.
Description: A power supply in the external storage unit has failed. Action: Replace the power supply. Storage Enclosure on SCSI Bus X Indicated an Overheated Condition... ...SOLUTION: Make sure all cooling fans are operating properly. Also be sure the operating environment of storage enclosure is within temperature specifications. Description: The external storage unit is generating a temperature alert. Action: 1. Be sure all fans are connected and operating properly. 2.
Swapped cables or configuration error detected. A configured array of drives... ...was moved from another controller that supported more drives than this controller supports. SOLUTION: Upgrade the firmware on this controller. If this doesn’t solve the problem, then power down system and move the drives back to the original controller. Description: You have exceeded the maximum number of drives supported for this controller, and the connected controller was not part of the original array configuration.
Swapped cables or configuration error detected. The configuration information on the attached drives... ...is not backward compatible with this controller’s firmware. SOLUTION: Upgrade the firmware on this controller. If this doesn’t solve the problem then power down system then move drives back to the original controller. Description: The current firmware version on the controller cannot interpret the configuration information on the connected drives.
Description: ADU detected two different controller models installed in a redundant controller configuration. This is not supported and one or both controllers may not be operating properly. Action: Use the same controller models for redundant controller configurations. This Controller Can See the Drives but the Other Controller Can't Description: The other controller in the redundant controller configuration cannot recognize the drives, but this controller can.
Unsupported Processor Configuration (Processor Required in Slot #1) Description: Processor required in slot 1. Action: If you do not install a supported processor in slot 1, this message is displayed, and the system halts. Warning Bit Detected Description: A monitor and performance threshold violation may have occurred. The status of a logical drive may not be OK. Action: Check the other error messages for an indication of the problem.
WARNING: Storage Enclosure on SCSI Bus X Indicated it is Operating in Single Ended Mode... ...SOLUTION: This usually occurs when a single-ended drive type is inserted into an enclosure with other drive types; and that makes the entire enclosure operate in single ended mode. To maximize performance replace the single-ended drive with a type that matches the other drives. Description: One or more single-ended mode SCSI drives are installed in an external storage unit that operates in LVD mode.
Non-numeric messages or beeps only Advanced Memory Protection mode: Advanced ECC Audible Beeps: None Possible Cause: Advanced ECC support is enabled. Action: None. Advanced Memory Protection mode: Advanced ECC with hot-add support Audible Beeps: None Possible Cause: Advanced ECC with Hot-Add support is enabled. Action: None. Advanced Memory Protection mode: Online spare with Advanced ECC ...Xxxx MB System memory and xxxx MB memory reserved for Online Spare.
Critical Error Occurred Prior to this Power-Up Audible Beeps: None Possible Cause: A catastrophic system error, which caused the server to crash, has been logged. Action: Run Insight Diagnostics ("HP Insight Diagnostics" on page 61) and replace failed components as indicated. Fan Solution Not Fully Redundant Audible Beeps: Possible Cause: The minimum number of required fans is installed, but some redundant fans are missing or failed. Action: Install fans or replace failed fans to complete redundancy.
Fatal Hub Link Error Audible Beeps: None Possible Cause: The hub link interface has experienced a critical failure that caused an NMI. Action: Run Insight Diagnostics ("HP Insight Diagnostics" on page 61) and replace failed components as indicated. FATAL ROM ERROR: The System ROM is not Properly Programmed. Audible Beeps: 1 long, 1 short Possible Cause: The System ROM is not properly programmed. Action: Replace the physical ROM part. Fibre Channel Mezzanine/Balcony Not Supported.
Invalid memory types were found on the same node. Please check DIMM compatibility. - Some DIMMs may not be used Description: Invalid or mixed memory types were detected during POST. Action: Use only supported DIMM pairs when populating memory sockets. Refer to the applicable server user guide memory requirements. Invalid Password - System Halted! Audible Beeps: None Possible Cause: An invalid password was entered. Action: Enter a valid password to access the system.
NMI - Undetermined Source Audible Beeps: None Possible Cause: An NMI event has occurred. Action: Reboot the server. Node Interleaving disabled - Invalid memory configuration Description: Each node must have the same memory configuration to enable interleaving. Action: Populate each node with the same memory configuration and enable interleaving in RBSU. No Floppy Drive Present Audible Beeps: None Possible Cause: No diskette drive is installed or a diskette drive failure has occurred. Action: 1.
Power Fault Detected in Hot-Plug PCI Slot x Audible Beeps: 2 short Possible Cause: PCI-X Hot Plug expansion slot was not powered up properly. Action: Reboot the server. Power Supply Solution Not Fully Redundant Audible beeps: None Possible cause: The minimum power supply requirement is installed, but a redundant power supply is missing or failed. Action: Do one of the following: • Install a power supply. • Replace failed power supplies to complete redundancy.
Temperature violation detected - system Shutting Down in x seconds Audible Beeps: 1 long, 1 short Possible Cause: The system has reached a cautionary temperature level and is shutting down in X seconds. Action: Adjust the ambient temperature, install fans, or replace any failed fans. There must be a first DIMM in pair if second DIMM in pair is populated. Second DIMM in pair ignored. Description: The first DIMM socket in the pair is not populated. The second DIMM in the pair is not recognized or used.
1. Enable OBDR 2. Exit Audible Beeps: None Possible Cause: A USB tape device that supports One Button Disaster Recovery (OBDR) is installed in the system. Action: 1. Press 1 or 2. • Pressing 2 exits the configuration. • Pressing 1 starts the configuration. The following message appears Attempting to enable OBDR for the attached USB tape drive... 2. Observe the configuration progress. The following error may appear: Error - USB tape drive not in Disaster Recovery mode. 3.
100 Series 101-I/O ROM Error Audible Beeps: None Possible Cause: Options ROM on a PCI, PCI-X, or PCI Express device is corrupt. Action: If the device is removable, remove the device and verify that the message disappears. Update Option ROM for a failed device. 101-Option ROM Checksum Error... ...An add-in card in your system is not working correctly. If you have recently added new hardware, remove it and see if the problem remains.
-System Board Failure, DMA Test Failed Audible Beeps: None Possible Cause: 8237 DMA controllers, 8254 timers, and similar devices. CAUTION: Only authorized technicians trained by HP should attempt to remove the system board. If you believe the system board requires replacement, contact HP Technical Support before proceeding. Action: Contact an authorized service provider for system board replacement.
180-Log Reinitialized Audible Beeps: None Possible Cause: The IML has been reinitialized due to corruption of the log. Action: Event message, no action is required. 200 Series 201-Memory Error Audible Beeps: None Possible Cause: Memory failure detected. Action: Run Insight Diagnostics ("HP Insight Diagnostics" on page 61) and replace failed components as indicated. 203-Memory Address Error Audible Beeps: None Possible Cause: Memory failure detected.
207-Invalid Memory Configuration - Incomplete Bank Detected in Bank X Audible Beeps: 1 long, 1 short Possible Cause: Bank is missing one or more DIMMs. Action: Fully populate the memory bank. 207-Invalid Memory Configuration - Insufficient Timings on DIMM Audible Beeps: 1 long, 1 short Possible Cause: The installed memory module is not supported. Action: Install a memory module of a supported type.
207-Invalid Memory Configuration - Unsupported DIMM in Socket X Audible Beeps: 1 long, 1 short Possible Cause: Unregistered DIMMs or insufficient DIMM timings. Action: Install registered ECC DIMMs. 207-Memory Configuration Warning - DIMM In Socket x does not have Primary Width of 4 and only supports standard ECC Advanced ECC does not function when mixing DIMMs with Primary Widths of x4 and x8. Audible Beeps: 1 long, 1 short, or none Possible Cause: Installed DIMMs have a primary width of x8.
210-Memory Board Power Fault on board X Audible Beeps: 1 long, 1 short Possible Cause: A problem exists with a memory board powering up properly. Action: Exchange DIMMs and retest. Replace the memory board if problem persists. 210-Memory Board Failure on board X Audible Beeps: 1 long, 1 short Possible Cause: A problem exists with a memory board powering up properly. Action: Exchange DIMMs and retest. Replace the memory board if problem persists.
303-Keyboard Controller Error Audible Beeps: None Possible Cause: System board, keyboard, or mouse controller failure occurred. Action: 1. Be sure the keyboard and mouse are connected. CAUTION: Only authorized technicians trained by HP should attempt to remove the system board. If you believe the system board requires replacement, contact HP Technical Support before proceeding. 2. Run Insight Diagnostics ("HP Insight Diagnostics" on page 61) and replace failed components as indicated.
600 Series 601-Diskette Controller Error Audible Beeps: None Possible Cause: Diskette controller circuitry failure occurred. Action: 1. Be sure the diskette drive cables are connected. 2. Replace the diskette drive, the cable, or both. 3. Run Insight Diagnostics ("HP Insight Diagnostics" on page 61) and replace failed components as indicated. 602-Diskette Boot Record Error Audible Beeps: None Possible Cause: The boot sector on the boot disk is corrupt. Action: 1.
2. Run Insight Diagnostics ("HP Insight Diagnostics" on page 61) and replace failed components as indicated. 1100 Series 1151-Com Port 1 Address Assignment Conflict Audible Beeps: 2 short Possible Cause: Both external and internal serial ports are assigned to COM X. Action: Run the server setup utility and correct the configuration. 1600 Series 1609 - The server may have a failed system battery. Some... ...configuration settings may have been lost and restored to defaults.
1611-CPU Zone Fan Assembly Failure Detected. Single fan... ...failure. Assembly will provide adequate cooling. Audible Beeps: None Possible Cause: Required fan is not spinning. Action: Replace the failed fan to provide redundancy, if applicable. 1611-Fan Failure Detected Audible Beeps: 2 short Possible Cause: Required fan is not installed or spinning. Action: 1. Check the fans to be sure they are working. 2. Be sure each fan cable is properly connected and each fan is properly seated. 3.
1611-Fan x Not Present (Fan Zone I/O) Audible Beeps: 2 short Possible Cause: Required fan is not installed or spinning. Action: 1. Check the fans to be sure they are working. 2. Be sure each fan cable is properly connected, if applicable, and each fan is properly seated. 3. If the problem persists, replace the failed fans. 1611- Power Supply Zone Fan Assembly Failure Detected. Either... ...the Assembly is not installed or multiple fans have failed.
1615-Power Supply Configuration Error Audible Beeps: None Possible Cause: The server configuration requires an additional power supply. A moving bar is displayed, indicating that the system is waiting for another power supply to be installed. Action: Install the additional power supply. 1615-Power Supply Configuration Error - A working power supply must be installed in Bay 1 for proper cooling.
Audible Beeps: None Possible Cause: Upgrade the Array Accelerator module to a larger size. Action: Migrate logical drives to RAID 0 or 1, reduce the number of drives in the array, or upgrade to a larger-size array accelerator module. 1713-Slot Z Drive Array Controller - Redundant ROM Reprogramming Failure... ...Replace the controller if this error persists after restarting system. Audible Beeps: None Possible Cause: Flash ROM is failing.
1720-S.M.A.R.T. Hard Drive Detects Imminent Failure Audible Beeps: None Possible Cause: A hard drive SMART predictive failure condition is detected. It may fail at some time in the future. Action: • If configured as a non-RAID 0 array, replace the failing or failed drive. Refer to the server documentation. • If configured as a RAID 0 array or non-RAID setup, back up the drive or drives, replace the drive, and restore the system.
1727-Slot X Drive Array - New Logical Drive(s) Attachment Detected... ...If more than 32 logical drives, this message will be followed by: “Auto-configuration failed: Too many logical drives.” Audible Beeps: None Possible Cause: The controller has detected an additional array of drives that was connected when the power was off. The logical drive configuration information has been updated to add the new logical drives. The maximum number of logical drives supported is 32.
Expansion will resume when automatic data recovery has been completed. Audible Beeps: None Possible Cause: The capacity expansion process has been temporarily disabled. Action: Follow the action that is displayed onscreen to resume the capacity expansion process. 1753-Slot Z Drive Array - Array Controller Maximum Operating Temperature Exceeded During Previous Power Up Audible Beeps: None Possible Cause: Controller is overheating.
1774-Slot X Drive Array - Obsolete Data Found in Array Accelerator Audible Beeps: None Possible Cause: Drives were used on another controller and reconnected to the original controller while data was in the original controller cache. Data found in the array accelerator is older than data found on the drives and has been automatically discarded. Action: Check the file system to determine whether any data has been lost. 1775-Slot X Drive Array - ProLiant Storage System Not Responding SCSI Port Y: ...
1776-Drive Array Reports Improper SCSI Port 1 Cabling Audible Beeps: None Possible Cause: • The integrated array enabler board failed. • The I/O board, drive backplane fan board, or drive backplane failed. Action: 1. Replace the integrated array enabler board. 2. Update the integrated Smart Array option to the latest firmware version ("Firmware maintenance" on page 65). CAUTION: Only authorized technicians trained by HP should attempt to remove the I/O board.
1779-Slot X Drive Array - Replacement drive(s) detected OR previously failed drive(s) now operational:... ...Port Y: SCSI ID Z: Restore data from backup if replacement drive X has been installed. Audible Beeps: None Possible Cause: More drives failed (or were replaced) than the fault-tolerance level allows. Unable to rebuild array. If drives have not been replaced, this message indicates an intermittent drive failure.
2. Be sure all drives are fully seated. 3. Replace defective cables, drive X, or both. 1785-Slot X Drive Array Not Configured... (followed by one of the following): ...(1) Run Array Configuration Utility (2) No drives detected (3) Drive positions appear to have changed – Run Drive Array Advanced Diagnostics if previous positions are unknown. Then turn system power OFF and move drives to their original positions.
1786-Slot 1 Drive Array Recovery Needed. Automatic Data Recovery Previously Aborted!... ...The following SCSI drive(s) need Automatic Data Recovery: SCSI Port Y: SCSI ID Z Select F1 to retry Automatic Data Recovery to drive. Select F2 to continue without starting Automatic Data Recovery. Audible Beeps: None Possible Cause: System is in Interim Data Recovery Mode and a failed or replacement drive has not yet been rebuilt.
a. Repair the connection and press the F2 key. b. If the problem persists, run ADU ("Array Diagnostic Utility" on page 63) to resolve. • Be sure the cable is routed properly. 1789-Slot X Drive Array SCSI Drive(s) Not Responding... ...Check cables or replace the following SCSI drives: SCSI Port Y: SCSI ID Z Select F1 to continue – drive array will remain disabled. Select F2 to failed drives that are not responding – Interim Recovery Mode will be enabled if configured for fault tolerance.
1794-Drive Array - Array Accelerator Battery Charge Low... ...Array Accelerator is temporarily disabled. Array Accelerator will be re-enabled when battery reaches full charge. Audible Beeps: None Possible Cause: The battery charge is below 75 percent. Posted writes are disabled. Action: Replace the array accelerator board if the batteries do not recharge within 36 powered-on hours. 1795-Drive Array - Array Accelerator Configuration Error... ...Data does not correspond to this drive array.
Audible Beeps: None Possible Cause: One or more logical drives failed due to loss of data in posted-writes memory. Action: • Press the F1 key to continue with the logical drives disabled. • Press the F2 key to accept data loss and re-enable logical drives. After pressing the F2 key, check integrity of the file system and restore lost data from backup.
Automatic operating system shutdown initiated due to fan failure Event Type: Fan failure Action: Replace the fan. Automatic Operating System Shutdown Initiated Due to Overheat Condition... ...Fatal Exception (Number X, Cause) Event Type: Overheating condition Action: Check fans. Also, be sure the server is properly ventilated and the room temperature is set within the required range. Blue Screen Trap: Cause [NT]... ...
Real-Time Clock Battery Failing Event Type: System configuration battery low Action: Replace the system configuration battery. System AC Power Overload (Power Supply X) Event Type: Power supply overload Action: 1. Switch the voltage from 110 V to 220 V or add an additional power supply (if applicable to the system). 2. If the problem persists, remove some of the installed options. System AC Power Problem (Power Supply X) Event Type: AC voltage problem Action: Check for any power source problems.
Event Type: Host bus error CAUTION: Only authorized technicians trained by HP should attempt to remove the system board. If you believe the system board requires replacement, contact HP Technical Support before proceeding. Action: Replace the board on which the processor is installed. Uncorrectable Memory Error (Slot X, Memory Module Y)... ...Uncorrectable Memory Error (System Memory) Uncorrectable Memory Error (Memory Module Unknown) Event Type: Uncorrectable error Action: Replace the memory module.
1. Press the server blade management module reset button. 2. Replace the server blade management module. Server blade management module signal backplane error codes LED code: 10-1, 10-2, or 10-3 Location: Server blade management backplane Action: Perform the following steps to resolve the problem. Stop when the problem is resolved. 1. Press the server blade management module reset button. 2. Replace the signal backplane.
Interconnect B Error Code LED code: 14-1, 14-2, 14-3, or 14-4 Location: Interconnect device - side B Action: Perform the following steps to resolve the problem. Stop when the problem is resolved. 1. Press the server blade management module reset button. 2. Reseat the interconnect device. For more information, refer to the HP BladeSystem Maintenance and Service Guide on the HP website (http://www.hp.com/products/servers/proliant-bl/p-class/info). 3. Replace the interconnect device.
For more information, refer to the HP BladeSystem Maintenance and Service Guide on the HP website (http://www.hp.com/products/servers/proliant-bl/p-class/info). 3. Replace the interconnect module. For more information, refer to the HP BladeSystem Maintenance and Service Guide on the HP website (http://www.hp.com/products/servers/proliant-bl/p-class/info).
Power management module board error codes LED code: 7-1, 7-2, 7-3, 7-4, 7-5, 7-6, 7-7, 7-8, 7-9, 7-10, 7-11, 7-12, or 7-13 Location: Power management board Action: Perform the following steps to resolve the problem. Stop when the problem is resolved. 1. Reseat the power management module. 2. Replace the power management module.
IMPORTANT: Reboot the server after completing each numbered step. If the error condition continues, proceed with the next step. To troubleshoot processor-related error codes: 1. Bring the server to base configuration by removing all components that are not required by the server to complete POST.
3. Reseat the remaining memory boards, rebooting after each installation to isolate any failed memory boards, if applicable. 4. Replace the DIMMs with a remaining bank of memory. 5. Replace the memory board, if applicable. 6. Replace the system board. IMPORTANT: If replacing the system board or clearing NVRAM, you must re-enter the server serial number through RBSU ("Re-entering the server serial number and product ID" on page 57).
IMPORTANT: Processor socket 1 and PPM slot 1 must be populated at all times or the server does not function properly. • PPMs, except the PPM installed in slot 1 • DIMMs, except the first bank • Hard drives • Peripheral devices 2. Install each remaining system component, rebooting between each installation to isolate any failed components. 3. Clear the system NVRAM. 4. Replace the system board.
Description: The system encountered an NMI prior to this boot. The NMI source was: Uncorrectable cache memory error. Action: Replace the processor. Insight Diagnostics processor error codes MSG_CPU_RR_1 Event type: Unable to divide and multiply with zero and infinity. Action: • Ensure proper ventilation and cooling for the server. • Ensure the processor heatsinks are attached correctly (do not remove them). • Check diagnostics and the Integrated Management Log for heat-related events.
MSG_CPU_RR_7 Event type: CPU speed is out of range. Action: Replace the processor. MSG_CPU_RR_8 Event type: Unable to update the CMOS time. Action: Replace the board that CMOS is on. MSG_CPU_RR_9 Event type: MMX hardware is not present. Action: Replace the processor. MSG_CPU_RR_10 Event type: MMX add instruction has failed. Action: Replace the processor. MSG_CPU_RR_11 Event type: MMX subtraction instruction has failed. Action: Replace the processor.
Action: Replace the processor. MSG_CPU_RR_17 Event type: Stress integer math test has failed. Action: • Ensure proper ventilation and cooling for the server. • Ensure the processor heatsinks are attached correctly (do not remove them). • Check diagnostics and the Integrated Management Log for heat-related events. • Upgrade to the latest versions of system BIOS and Insight Diagnostics. • Replace the processor.
Contacting HP In this section Contacting HP technical support or an authorized reseller......................................................................... 138 Customer self repair ............................................................................................................................. 138 Server information you need.................................................................................................................. 139 Operating system information you need.............
Server information you need Before contacting HP technical support, collect the following information: • Explanation of the issue, the first occurrence, and frequency • Any changes in hardware or software configuration before the issue surfaced • Third-party hardware information: • • • • Product name, model, and version • Company name Specific hardware configuration: • Product name, model, and serial number • Number of processors and speed • Number of DIMMs and their size and speed • Lis
• An updated Emergency Repair Diskette • If HP drivers are installed: • • Version of the PSP used • List of drivers from the PSP The drive subsystem and file system information: • Number and size of partitions and logical drives • File system on each logical drive • Current level of Microsoft® Windows® Service Packs and Hotfixes installed • A list of each third-party hardware component installed, with the firmware revision • A list of each third-party software component installed, with the
Novell NetWare operating systems Collect the following information: • Whether the operating system was factory installed • Operating system version number • Printouts or electronic copies (to e-mail to a support technician) of AUTOEXEC.NCF, STARTUP.NCF, and the system directory • A list of the modules. Use CONLOG.NLM to identify the modules and to check whether errors occur when the modules attempt to load.
• If management agents are installed, version number of the agents • System dumps, if they can be obtained (in case of panics) • A list of each third-party hardware component installed, with the firmware revisions • A list of each third-party software component installed, with the versions • A detailed description of the problem and any associated error messages IBM OS/2 operating systems Collect the following information: • • • • • Operating system version number and printouts or electronic
• • DU number • List of drivers in the DU diskette The drive subsystem and file system information: • Number and size of partitions and logical drives • File system on each logical drive • A list of all third-party hardware and software installed, with versions • A detailed description of the problem and any associated error messages • Printouts or electronic copies (to e-mail to a support technician) of: • /usr/sbin/crash (accesses the crash dump image at /var/crash/$hostname) • /var/adm/m
Acronyms and abbreviations ACPI Advanced Configuration and Power Interface ACU Array Configuration Utility ADG Advanced Data Guarding (also known as RAID 6) ADU Array Diagnostics Utility CCITT International Telegraph and Telephone Consultative Committee CS cable select DMA direct memory access DU driver update EFS Extended Feature Supplement EULA end user license agreement FC Fibre Channel HTTP hypertext transfer protocol Acronyms and abbreviations 144
IDE integrated device electronics iLO Integrated Lights-Out IMD Integrated Management Display IML Integrated Management Log IP Internet Protocol ISEE Instant Support Enterprise Edition ISP Internet service provider KVM keyboard, video, and mouse LED light-emitting diode LVD low-voltage differential MMX multimedia extensions NMI non-maskable interrupt NVRAM non-volatile memory OBDR One Button Disaster Recovery Acronyms and abbreviations 145
ORCA Option ROM Configuration for Arrays OS operating system POST Power-On Self Test PPM processor power module PSP ProLiant Support Pack RBSU ROM-Based Setup Utility RILOE Remote Insight Lights-Out Edition RILOE II Remote Insight Lights-Out Edition II RIS reserve information sector ROM read-only memory SAS serial attached SCSI SATA serial ATA SIM Systems Insight Manager SIMM single inline memory module Acronyms and abbreviations 146
SMART self-monitoring analysis and reporting technology SNMP Simple Network Management Protocol SSD support software diskette UPS uninterruptible power system USB universal serial bus VCA Version Control Agent VCRM Version Control Repository Manager Acronyms and abbreviations 147
Index 1 C 120PCI.
diskette image creation 57 DMA error 94 documentation 69, 70 drive errors 36, 37, 78, 79, 80, 90, 97, 108 drive failure, detecting 36 drive LEDs 16, 17, 36 drive not found 37, 39 drive problems 35, 36, 37 drivers 63, 70 DVD-ROM drive 35 E ECC errors 74 energy saver features 44 Erase Utility 59 error log 50 error messages 50, 53, 73, 92, 124, 127 event list error messages 124 expansion board 48, 71 expansion board-related port 85 codes 133 express port error 94 external device problems 43 F fan assembly 10
Management Agents 59 Management CD 58, 70, 71 management tools 58 media issue, tape drive 38 MEGA4 XX.
read/write errors 36, 37 read/write issue, tape drive 38 redundant ROM 65, 98, 113 registering the server 71 Remote Insight Lights-Out Edition II (RILOE II) 50, 58 remote ROM flash 52, 53 remote ROM flash problems 52 remote support and analysis tools 63 required information 139 Resource Paqs 64 resources 69 resources, troubleshooting 69 restoring 51 RILOE II (Remote Insight Lights-Out Edition II) 50, 58 ROM error 95, 98 ROM legacy USB support 61 ROM update utility 67 ROM, types 65 ROM, updating 53 ROM-Based
W warnings 13, 72 Web-Based Enterprise Service 63 website, HP 69, 70 white papers 70, 72 Windows Event Log processor error codes 134 Index 152