AlphaServer 800 Service Guide Order Number: EK–ASV80–SG. A01 This guide describes diagnostics used in troubleshooting system failures, as well as the procedures for replacing field-replaceable units (FRUs).
First Printing, April 1997 Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description.
Contents Preface .................................................................................................ix Chapter 1 Troubleshooting Strategy 1.1 1.2 1.3 1.4 Questions to Consider.............................................................................. 1-1 Problem Categories.................................................................................. 1-2 Service Tools and Utilities..................................................................... 1-10 Information Services.......
Chapter 3 3.1 3.2 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 3.2.7 3.2.8 3.2.9 Command Summary ................................................................................ 3-2 Command Reference ............................................................................... 3-3 test.................................................................................................... 3-3 sys_exer............................................................................................ 3-6 cat el and more el....
6.2.2 6.2.3 6.3 6.3.1 6.4 6.4.1 6.4.2 6.4.3 6.4.4 6.5 6.6 6.6.1 6.6.2 6.7 6.7.1 6.7.2 6.7.3 6.7.4 6.7.5 Memory Modules............................................................................ 6-17 Motherboard ................................................................................... 6-18 EISA Bus Options.................................................................................. 6-20 Identifying ISA and EISA Options..................................................
Figures 2-1 2-2 2-3 2-4 2-5 3-1 4-1 6-1 6-2 6-3 6-4 6-5 6-6 6-7 6-8 6-9 6-10 7-1 7-2 7-3 7-4 7-5 7-6 7-7 7-8 7-9 7-10 7-11 7-12 7-13 7-14 7-15 7-16 7-17 7-18 vi AlphaBIOS Boot Menu..................................................................... 2-8 Hard Disk Drive LEDs.................................................................... 2-15 Floppy Drive Activity LED............................................................. 2-16 CD-ROM Drive Activity LED ............................................
-19 7-20 7-21 7-22 7-23 7-24 7-25 7-26 7-27 7-28 7-29 7-30 7-31 7-32 A-1 A-2 A-3 A-4 B-1 B-2 Removing DIMMs from Motherboard............................................. 7-25 Installing DIMMs on Motherboard.................................................. 7-25 Removing Disk Status Module........................................................ 7-26 Removing EISA and PCI Options ................................................... 7-27 Removing CPU Daughter Board ............................................
6-2 6-3 6-4 7-1 7-2 7-3 Summary of Procedure for Configuring EISA Bus (EISA Options Only) ...................................................................... 6-24 Summary of Procedure for for Configuring ISA Options................. 6-25 Serial Line Keyboard Commands.................................................... 6-38 AlphaServer 800 FRUs ..................................................................... 7-2 Power Cord Order Numbers (Pedestal Systems)..............................
Preface Intended Audience This guide describes the procedures and tests used to service AlphaServer 800 systems and is intended for use by Digital Equipment Corporation service personnel and qualified self-maintenance customers. The material is presented as follows: • Chapter 1, Troubleshooting Strategy, describes the troubleshooting strategy for AlphaServer 800 systems. • Chapter 2, Power-Up Diagnostics and Display, provides information on how to interpret error beep codes and the power-up display.
Conventions The following conventions are used in this guide: Convention Meaning WARNING: A warning contains information to prevent injury. CAUTION: A caution contains information essential to avoid damage to equipment or software. NOTE: A note calls the reader’s attention to important information. [] In command format descriptions, brackets indicate optional elements. italic type In console command sections, italic type indicates a variable.
Related Documentation Table 1 lists the documentation kits and related documentation for AlphaServer 800 systems.
Chapter 1 Troubleshooting Strategy This chapter describes the troubleshooting strategy for AlphaServer 800 systems. • Questions to consider before you begin troubleshooting • Diagnostics flows for each problem category • List of service tools and utilities • List of information services 1.1 Questions to Consider Before troubleshooting any system problem, first check the site maintenance log for the system's service history.
1.2 Problem Categories System problems can be classified into the following five categories. Using these categories, you can quickly determine a starting point for diagnosis and eliminate the unlikely sources of the problem. 1. Power problems (Table 1-1) 2. No access to console mode (Table 1-2) 3. Console-reported failures (Table 1-3) 4. Boot failures (Table 1-4) 5.
Table 1-1 Power Problems Symptom Action System does not power on. • Check the power source and power cord. • Check the On/Off setting on the operator control panel. Toggle the On/Off button to off, then back to the On position to clear a remote power disable. • Check the indicator lights on the operator control panel.
Table 1-2 Problems Getting to Console Mode Symptom Action Power-up screen is not displayed. Interpret the error beep codes at power-up (Section 2.1) for a failure detected during self-tests. Check that the keyboard and monitor are properly connected and turned on. If the power-up screen is not displayed, yet the system enters console mode when you press the Return key, check that the console environment variable is set correctly.
Table 1-2 Problems Getting to Console Mode (continued) Symptom Action Try connecting a console terminal to the COM1 serial communication port (Section 6.7). Check the baud rate setting for the console terminal and the system. The system baud rate setting is 9600. When using the COM1 port, you must set the console environment variable to serial. If none of the above considerations solve the problem, check that the J1 jumper on the CPU daughter board is not missing.
Table 1-3 Problems Reported by the Console Symptom Action Power-up tests do not complete. Interpret the error beep codes at power-up (Section 2.1) and check the power-up screen (Section 2.2) for a failure detected during selftests. The system attempts to boot from the floppy drive after a checksum error is reported (error beep code 1-1-2 or 1-1-4). Reinstall firmware by inserting a fail-safe loader diskette. Refer to the procedure provided with the firmware update documentation.
Table 1-3 Problems Reported by the Console (continued) Symptom Action • Power-up screen or console event log indicates problems with EISA devices. Use the troubleshooting table in Section 2.7 to determine the problem. • EISA devices are missing from the show config display. Use the troubleshooting table in Section 2.7 to determine the problem. Run the ROM-based diagnostic (RBD) tests (Chapter 3) to verify the problem.
Table 1-4 Boot Problems Symptom Action System cannot find boot device. Check the system configuration for the correct device parameters (node ID, device name, and so on). • For DIGITAL UNIX and OpenVMS, use the show config and show device commands (Section 6.1.4). • For Windows NT, use the AlphaBIOS menus to examine and set the system configuration (Section 6.1.3). Check the system configuration for the correct environment variable settings.
Table 1-5 Errors Reported by the Operating System Symptom Action System is hung or has crashed. Press the Halt button and enter the crash command to provide a crash dump file for analysis. Refer to OpenVMS Alpha System Dump Analyzer Utility Manual for information on how to interpret OpenVMS crash dump files. Refer to the Guide to Kernel Debugging for information on using the DIGITAL UNIX Krash Utility. Errors have been logged and the operating system is up.
1.3 Service Tools and Utilities This section lists the tools and utilities available for acceptance testing, diagnosis, and serviceability and provides recommendations for their use. Error Handling/Logging Tools (DECevent) DIGITAL UNIX, OpenVMS, and Microsoft Windows NT operating systems provide recovery from errors, fault handling, and event logging. The DECevent Translation and Reporting Utility provides bit-to-text translation of event logs for interpretation for DIGITAL UNIX and OpenVMS error logs.
Firmware Console Commands Console commands are used to set and examine environment variables and device parameters, as well as to invoke ROM-based diagnostics and exercisers. For example, the show memory, show configuration, and show device commands are used to examine the configuration; the set bootdef_dev, set auto_action, and set boot_osflags commands are used to set environment variables; and the cdp command is used to configure DSSI parameters.
1.4 Information Services Several information resources are available, including online information for service providers and customers, computer-based training, and maintenance documentation database services. A brief description of some of these resources follows. Service Help File The information contained in this guide, including the field-replaceable unit (FRU) procedures and illustrations, is available in online format. You can download the hypertext file (AS800.HLP) or order a self-extracting .
ECU Revisions The EISA Configuration Utility (ECU) is used for configuring EISA options on AlphaServer systems. Systems are shipped with an ECU kit, which includes the ECU license. Customers who already have the ECU and license, but need the latest ECU revision (a minimum revision of 1.10 for AlphaServer 800 systems), can order a separate kit. Call 1-800-DIGITAL to order. If the customer plans to migrate from DIGITAL UNIX or OpenVMS to Windows NT, you must re-run the appropriate ECU.
Supported Options A list of options supported on AlphaServer 800 systems is available on the Internet: FTP address: ftp://ftp.digital.com/pub/Digital/Alpha/systems/as800/ World Wide Web address: http://www.digital.com/info/alphaserver/tech_docs/alphasrv800/ You can obtain information about hardware configurations for the AlphaServer 800 from the DIGITAL Systems and Options Catalog. The catalog can be used to order and configure systems and hardware options.
Chapter 2 Power-Up Diagnostics and Display This chapter provides information on how to interpret error beep codes and the power-up display on the console screen. In addition, a description of the power-up and firmware power-up diagnostics is provided as a resource to aid in troubleshooting. • Section 2.1 describes how to interpret error beep codes at power-up. • Section 2.2 describes how to interpret the power-up screen display. • Section 2.
2.1 Interpreting Error Beep Codes If errors are detected at power-up, audible beep codes are emitted from the system. For example, if the SROM code could not find any good memory, you would hear a 1-3-3 beep code (one beep, a pause, a burst of three beeps, a pause, and another burst of three beeps). Be sure to check that the CPU daughter board is properly seated in its connector if errors are reported. NOTE: A single beep is emitted when the SROM code completes successfully.
Table 2-1 Interpreting Error Beep Codes Beep Code Problem Corrective Action 1 A single beep is emitted when the SROM code has successfully completed. Not applicable. No error. 1-3 VGA monitor is not plugged in. Plug in the graphics monitor. If you do not want the graphics monitor, disable the VGA jumper (J27) on the motherboard. Refer to Appendix A. 1-1-2 ROM data path error detected while loading AlphaBIOS/SRM console code.
Table 2-1 Interpreting Error Beep Codes (continued) Beep Code Problem Corrective Action 1-2-4 Backup cache error. Replace the CPU daughter board (Chapter 7). The system can be operated with the B-cache disabled until a replacement CPU daughter board is available. Bank 1 of the J1 jumper on the CPU daughter board is used to disable the B-cache. Refer to Appendix A. 1-3-3 No usable memory detected. Verify that the memory modules are properly seated and try powering up again.
2.2 Power-Up Display During power-up self-tests, test status and results are displayed on the console terminal. Information similar to that in Example 2-1 is displayed on the screen. Example 2-1 Sample Power-Up Display ff.fe.fd.fc.fb.fa.f9.f8.f7.f6.f5.ef.df.ee.f4.
Table 2-2 provides a description of the power-up countdown for output to the serial console port. If the power-up display stops, use the beep codes (Table 2-1 and Table 2-2) to isolate the likely field-replaceable unit (FRU).
Table 2-2 Console Power-Up Countdown Description and FRUs (continued) Countdown Number Description Likely FRU ea Start phase 4 drivers: console support drivers.
2.2.1.1 DIGITAL UNIX or OpenVMS Systems The DIGITAL UNIX and OpenVMS operating systems are supported by the SRM firmware. The SRM console prompt follows: >>> 2.2.1.2 Windows NT Systems The Windows NT operating system is supported by the AlphaBIOS firmware. Systems using Windows NT power up to the AlphaBIOS boot menu as shown in Figure 2-1. Figure 2-1 AlphaBIOS Boot Menu AlphaBIOS Version 5.26 Please select the operating system to start: Windows NT Server 4.00 Use and to move the highlight to your choice.
2.2.2 Console Event Log A console event log consists of status messages received during power-up self-tests. If problems occur during power-up, standard error messages indicated by asterisks (***) may be embedded in the console event log. To display a console event log, use the more el or cat el command. NOTE: To stop the screen display from scrolling, press Ctrl/S. To resume scrolling, press Ctrl/Q. You can also use the more el command to display the console event log one screen at a time.
2.3 Mass Storage Problems Mass storage failures at power-up are usually indicated by read fail messages. Other problems are indicated by storage devices missing from the show config display. • Table 2-3 provides information for troubleshooting mass storage problems indicated at power-up or storage devices missing from the show config display. • Table 2-4 provides troubleshooting tips for AlphaServer systems that use a RAID array subsystem. • Section 2.4 provides information on storage device LEDs.
Table 2-3 Mass Storage Problems Symptom Problem Corrective Action Drives are missing from the show config display. Drives have duplicate SCSI IDs. Correct SCSI IDs. SCSI bus not properly terminated. Check the following jumpers and terminator to ensure that proper termination is provided for all internal SCSI devices. Note: Internal hard disk drives are automatically assigned SCSI IDs 0, 1, 2, and 3 (from left to right for pedestal systems; and bottom to top for rackmount systems).
Table 2-3 Mass Storage Problems (continued) Symptom Problem Corrective Action Drives are missing from the show config display/One drive appears seven times on the show config display. Drive SCSI ID set to 7 (reserved for host ID) Correct SCSI IDs. Duplicate host IDs on a shared bus. Change host ID by setting the pk*0_host_id environment variable (set pk*0_host_id) through the SRM console. LEDs do not come on. Drive missing from the show config display. Missing or loose cables.
Table 2-3 Mass Storage Problems (continued) Symptom Problem Corrective Action Read/write errors in the console event log; storage adapter port fails. Terminator missing or wrong terminator used. Check the following jumpers and terminator to ensure that proper termination is provided for all internal SCSI devices. • The SCSI terminator jumper (J51) on the system motherboard should be set to “on.” Refer to Appendix A.
Table 2-4 provides troubleshooting hints for systems with a StorageWorks RAID array subsystem. Table 2-4 Troubleshooting RAID Problems Symptom Action Some RAID drives do not appear on the show device d display. Valid configured RAID logical drives will appear as DRA0--DRAn, not as DKn. Configure the drives by running the RAID Configuration Utility (RCU), following the instructions provided with the StorageWorks RAID array subsystem.
2.4 Storage Device LEDs Storage device LEDs indicate the status of the device. • Figure 2-2 shows the hard disk drive LEDs for disk drives in the system enclosure. • Figure 2-3 shows the Activity LED for the floppy drive. This LED is on when the drive is in use. • Figure 2-4 shows the Activity LED for the CD-ROM drive. This LED is on when the drive is in use. For information on other storage devices, refer to the documentation provided by the manufacturer or vendor.
Table 2-5 Hard Disk Drive LEDs LED Meaning Activity (green) Fault (amber) When lit, indicates disk activity. Disk Present (green) When lit indicates that a disk drive is installed for that position in the hard disk drive backplane. At product introduction, this LED has no function, it may be used with future enhancements.
Figure 2-4 CD-ROM Drive Activity LED Activity LED IP00082 2.5 Control Panel LEDs Control panel LEDs (Figure 2-5) indicate the status of the system. Table 2-6 describes the LEDs.
Table 2-6 Control Panel LEDs Power OK (green) Halt (amber) Off Off System powered-off using control panel Power button or no AC power is present. Off On System power is enabled using the control panel Power button, but the system has been powered off by one of the following: • Remote management console command Status • System software • Fan failure • Overtemperature condition • Power supply failure On Off System is powered-on and is not in a halt state.
2.6 PCI Bus Problems PCI bus failures at power-up are usually indicated by the inability of the system to see the device. Table 2-7 provides steps for troubleshooting PCI bus problems. Use the table to diagnose the likely cause of the problem. NOTE: Some PCI devices do not implement PCI parity, and some have a paritygenerating scheme in which parity is sometimes incorrect or is not compliant with the PCI Specification. In such cases, the device functions properly as long as parity is not checked.
2.7 EISA Bus Problems EISA bus failures at power-up may be indicated by the following messages: EISA Configuration Error. Run the EISA Configuration Utility. Run the EISA Configuration Utility (ECU) when this message is displayed. Other EISA bus problems are indicated by the absence of EISA devices from the show config display. Table 2-8 provides steps for troubleshooting EISA bus problems that persist after you run the ECU.
Table 2-8 EISA Troubleshooting Step Action 1 Confirm that the EISA module and any cabling are properly seated. 2 Run the ECU to: • Confirm that the system has been configured with the most recently installed controller. • See what the hardware jumper and switch setting should be for each ISA controller. • See what the software setting should be for each ISA and EISA controller. • See if the ECU deactivated (<>) any controllers to prevent conflict.
Additional EISA Troubleshooting Tips The following tips can aid in isolating EISA bus problems: • Peripheral device controllers need to be seated firmly in their slots to make all necessary contacts. Improper seating is a common source of problems. • Be sure you run the correct version of the ECU for the operating system. For Windows NT, use ECU diskette DECpc AXP (AK-PYCJ*-CA); for DIGITAL UNIX and OpenVMS, use ECU diskette DECpc AXP (AK-Q2CR*-CA).
2.8 Fail-Safe Loader The fail-safe loader (FSL) allows you to boot an SRM console from a diskette at power-up. This allows you to power up without running power-up diagnostics and load new SRM and FSL console firmware from the firmware diskette. NOTE: The fail-safe loader should be used only when a failure at power-up prohibits you from getting to the console program. You cannot boot an operating system from the fail-safe loader.
To activate the FSL: 1. Move the jumper at bank 7 of the J1 jumper on the CPU daughter board. The jumper is normally installed in the standard boot setting (position 0). Refer to Figure A-1 in Appendix A. 2. Insert the console firmware diskette and turn on the system. 3. Reinstall the console firmware from diskette. 4. Power down and return the J1 jumper to the standard boot setting (position 0). 2.
2. 12V, 5V, 3.3V, and –12V outputs are energized and stabilized. If the outputs do not come into regulation, the power-up is aborted and the power supply enters the latching-shutdown mode. 2.10 Firmware Power-Up Diagnostics After successful completion of AC and DC power-up sequences, the processor performs diagnostics to verify system operation, loads the system console, and tests the core system (CPU, memory, and system board), including all boot path devices.
7. The console program is loaded into memory from the FEPROM on the system board. A checksum test is executed for the console image. If the checksum test fails, an error beep code (1-1-4) is generated, the power-up tests are terminated, and the fail-safe loader is activated. If the checksum test passes, a single audible beep is issued, control is passed to the console code, and the console firmware diagnostics are run. 2.10.
Chapter 3 Running System Diagnostics This chapter tells how to run ROM-based diagnostics. ROM-based diagnostics (RBDs), which are part of the console firmware, offer many powerful diagnostic utilities, including the ability to examine error logs from the console environment and run system- or device-specific exercisers. AlphaServer 800 system RBDs rely on exerciser modules to isolate errors.
3.1 Command Summary Table 3-1 provides a summary of the diagnostic and related commands. Table 3-1 Summary of Diagnostic and Related Commands Command Function Section Acceptance Testing test Quickly tests the core system. The test command is the primary diagnostic for acceptance testing and console environment diagnosis. 3.2.1 The test command runs one pass of the tests. To run tests concurrently and indefinitely, use the sys_exer command. Error Reporting cat el Displays the console event log. 3.
Table 3-1 Summary of Diagnostic and Related Commands (continued) Command Function Section Loopback Testing sys_exer -lb Conducts loopback tests for COM2 and the parallel port in addition to core system tests. 3.2.2 test -lb Conducts loopback tests for COM2 and the parallel port in addition to quick core system tests. 3.2.1 Diagnostic-Related Commands kill Terminates a specified process. 3.2.8 kill_diags Terminates all executing diagnostics. 3.2.
When using the test command after shutting down an operating system, you must initialize the system to a quiescent state. Enter the following command at the SRM console: >>> init . . . >>> test The tests are run in the following order: 1. Memory tests (one pass). 2. Read-only tests: DK* disks, DR* disks, DU* disks, MK* tapes, DV* floppy. 3. Console loopback tests if -lb argument is specified: COM2 serial port and parallel port. 4. VGA/TGA console tests.
Examples In the following example, the tests complete successfully. NOTE: Examine the console event log after running tests. >>> test Testing the Memory Testing the DK* Disks(read only) No DU* Disks available for testing No DR* Disks available for testing No MK* Tapes available for testing No MU* Tapes available for testing Testing the DV* Floppy Disks(read only) file open failed for dva0.0.0.1000.
3.2.2 sys_exer The sys_exer command runs diagnostics for the system. The same tests that are run using the test command are run with sys_exer, only these tests are run concurrently and in the background. Nothing is displayed, after the initial test startup messages, unless an error occurs. The diagnostics started by the sys_exer command automatically reallocate memory resources, as these tests require additional resources.
Example >>> sys_exer Default zone extended at the expense of memzone.
3.2.3 cat el and more el The cat el and more el commands display the contents of the console event log. Status and error messages are logged to the console event log at power-up, during normal system operation, and while running system tests. Standard error messages are indicated by asterisks (***). When cat el is used, the contents of the console event log scroll by. You can use the Ctrl/S key combination to stop the screen from scrolling, Ctrl/Q to resume scrolling.
3.2.4 crash The crash command forces a crash dump to the selected device for DIGITAL UNIX and OpenVMS systems. Use this command when an error has caused the system to hang and can be halted by the Halt button or the RMC halt command. The crash command restarts the operating system and forces a crash dump to the selected device. Refer to OpenVMS Alpha System Dump Analyzer Utility Manual for information on how to interpret OpenVMS crash dump files.
3.2.5 memexer The memexer command tests memory by running a specified number of memory exercisers. The exercisers are run in the background and nothing is displayed unless an error occurs. Each exerciser tests all available memory in twice the backup cache size blocks for each pass. To terminate the memory tests, use the kill command to terminate an individual diagnostic or the kill_diags command to terminate all diagnostics.
The following is an example with a memory compare error indicating bad DIMMs. In most cases, the failing bank and DIMM position (Figure 3-1) are specified in the error message. If the failing DIMM information is not provided, use the procedure that follows to isolate a failing DIMM.
To determine the failing DIMM, match the lowest five bits of the failing address in which the bad data is received to the failing DIMM using the table below. Failing Address Lowest Five Bits 0 8 10 18 Failing DIMM 0 1 2 3 In the example, the lowest five bits (represented by the last or rightmost character in the address) in the failing address is 8 (a11848). Therefore, the failing DIMM is DIMM 1.
3.2.6 net -s The net -s command displays the MOP counters for the specified Ethernet port.
3.2.7 net -ic The net -ic command initializes the MOP counters for the specified Ethernet port.
3.2.8 kill and kill_diags The kill and kill_diags commands terminate diagnostics that are currently executing. NOTE: A serial loopback connector (12-27351-01) must be installed on the COM2 serial port for the kill_diags command to successfully terminate system tests. • The kill command terminates a specified process. • The kill_diags command terminates all diagnostics. Syntax kill_diags kill [PID. . . ] Argument: [PID. . . ] The process ID of the diagnostic to terminate.
3.2.9 show_status Use the show_status command to display the progress of diagnostics. The show_status command reports one line of information per executing diagnostic. The information includes ID, diagnostic program, device under test, error counts, passes completed, bytes written, and bytes read. Many of the diagnostics run in the background and provide information only if an error occurs.
Chapter 4 Server Management Console This chapter describes the function and operation of the integrated server management console. • Section 4.1 describes how the remote management console (RMC) allows you to remotely monitor and control the system. • Section 4.2 describes the first-time setup procedures for using the RMC modem port and enabling the system to call out to a remote operator. • Section 4.3 describes the procedure to reset the RMC to its factory settings. • Section 4.
4.1 Operating the System Remotely The remote management console (RMC) enables the user to monitor and control the system remotely. The RMC resides on the system backplane and allows a remote operator to connect to the system through a modem, using a serial terminal or terminal emulator.
You can access the RMC through either of two serial lines: the standard console terminal COM1 (MMJ) port or the RMC modem port (9-pin DIN). • To enter the RMC console remotely, dial in through a modem, enter a password, and then type a special escape sequence that invokes the RMC command mode. The default escape sequence is ^[^[rcm. This is equivalent to rcm, where is the escape key on a PC keyboard. The default string can be changed using the set escape command.
The remote operator can disconnect (using the quit command) from the RMC and connect to the system’s COM1 port. Through the remote terminal, the operator can then communicate with the software and firmware that normally use the local serial terminal: • SRM and AlphaBIOS firmware consoles • ECU and RCU configuration utilities • Operating systems The RMC also provides a watchdog timer, whose interval is set using the RMC set wdt command.
4.2 First-Time Setup Before you can dial in remotely through the RMC modem port or enable the system to call out to a remote operator in response to system alerts, several RMC strings and parameters must be set. Use the following procedure to set up RMC strings, password and parameters; and to send out a test alert to verify the modem strings are set correctly. 1. From the local console terminal, enter the RMC escape sequence at the SRM prompt. The default escape sequence is ^[^[rcm.
Table 4-1 Dial and Alert String Elements String Elements Description Dial String ATXDT (Enter characters either in all uppercase or all lowercase). AT = Attention X = Forces the modem to dial “blindly” (not look for the dial tone). Enter this character if the dial-out line modifies its dial tone when used for services such as voice mail. D = Dial T = Tone (for touch-tone) , = Pause for 2 seconds. 9, In the example, “9” gets an outside line.
7. Using the RMC command send alert, force an alert condition in order to test the dial out function and verify proper setup of the modem initialization, dial, and alert strings. 8. Once the alert is received successfully, use the RMC command clear alert, to clear the current alert condition and cause the RMC to stop paging the remote operator. If the alert is not cleared, the RMC continues to page the remote operator approximately every 30 minutes.
RCM> status PLATFORM STATUS: Firmware Revision: V1.0 Server Power: OFF Fanstate: System Halt: Temperature: 29.0¡C (warnings at 46¡C, power-off at 52¡C) RCM Power Control: ON Escape sequence: ^[^[RCM Remote Access: Enabled Alert Enable: Enabled Alert Pending: NO Init String: at&f0e0v0x0s0=2 Dial String: atxdt9,15085553333 Alert String: ,,,,,,5085553332#; Modem and COM1 baud: 9600 Last Alert: RCM User Requested Watchdog Timer: 60 seconds Autoreboot : ON 4.
4.4 Remote Management Console Commands The remote management console supports the following commands: clear {alert, port} disable {alert, reboot, remote} enable {alert, reboot, remote} halt {in, out} hangup help or ? power {off, on} quit reset send alert set {alert, baud, dial, escape, init, password, wdt} status Explanations and examples of the RMC command set follow. clear alert The clear alert command clears the current alert condition and causes the RMC to stop paging the remote operator.
disable alert The disable alert command disables alert conditions from paging an external operator. Monitoring continues and alerts are still logged in the “last alert” field; however, alerts are not sent to the remote user. Example: RCM> disable alert RCM> disable reboot The disable reboot command disables automatic reboot of the system when the watchdog timer expires.
enable reboot The enable reboot command enables automatic reboot of the system when the watchdog timer expires. The watchdog timer is enabled and operated by the operating system. It periodically interrupts the server management microcontroller and assists in clearing a hung state in the operating system. If the microcontroller does not receive a watchdog timer interrupt for a specified period of time, it will reset the system.
halt out The halt out command is the equivalent of setting the Halt button on the server front panel to the “out” position. After executing the halt out command, the user is switched from the RMC monitor to the server’s COM1 port. Note that a local operator physically placing the front panel Halt button to the “In” position takes precedence over the setting of this command. Example: RCM>halt out Returning to COM port. hangup The hangup command terminates the modem session.
power off The power off command is the equivalent of turning off the system power from the operator control panel. If the system is already powered off this command will have no effect. The system can be powered back on by either issuing a power on command or by toggling the power button on the system front panel. Example: RCM>power off RCM> power on The power on command is the equivalent of turning on the system power from the operator control panel.
reset The reset command is the equivalent of pushing the Reset button from the operator control panel. It causes a full re-initialization of the system firmware. When the reset command is executed, the user’s terminal exits console monitor mode and reconnects to the server’s COM1 port. Example: RCM>reset Returning to COM port. send alert The send alert command forces an alert condition.
set baud The set baud command sets the baud rate on the RMC modem port and on the COM1 to microcontroller port. Allowed values are 1, 2, and 3. Note that the microcontroller port that is connected to the 6-pin MMJ connector for the local console terminal is not affected. This port is fixed at 9600 baud.
set dial The set dial command sets the dial string to be used when the RMC detects an alert condition. Note that this string must be in the correct dial string format for the attached modem. If a paging service is to be contacted, then the dial string must include the appropriate modem commands to dial the number, wait for the line to connect, and send the appropriate touch tones to leave a pager message. The dial string is limited to 31 characters.
Example: RCM> set init init> at&f0e0v0x0s0=2 RCM> set password The set password command allows the user to change the password that is prompted at the beginning of a modem session. The password is stored in nonvolatile memory. The maximum password length is 14 characters. The password is not echoed on the user’s terminal. The password must be set before access through the modem can be enabled.
status The status command displays the current state of the server’s sensors, as well as the current escape sequence and alarm information. Example: RCM> status PLATFORM STATUS: Firmware Revision: V1.0 Server Power: ON Fanstate: OK System Halt: Deasserted Temperature: 29.
4.5 RMC Troubleshooting Tips Table 4-2 lists a number of possible causes and suggested solutions for symptoms you might see. Table 4-2 RMC Troubleshooting Symptom Possible Cause Suggested Solution The local terminal will not communicate with the system or RMC console. System, terminal, or RMC baud rate set incorrectly. Set the baud rates for the system, RMC, and terminal to 9600 baud. For first-time setup, suspect the console terminal, since the RMC and system default is 9600.
Table 4-2 RMC Troubleshooting (continued) Symptom Possible Cause Suggested Solution After the system is powered up, the COM1 port seems to hang and then starts working after a few seconds. This delay is normal behavior due to initialization. Wait a few seconds for the COM1 port to start working. New password and escape sequence are forgotten. Reset the RMC to its factory default settings. Refer to Section 4.3. The remote user sees a “+++” string on the screen.
Chapter 5 Error Log Analysis This chapter tells how to interpret error logs reported by the operating system. • Section 5.1 provides the fault detection and correcton compontents of AlphaServer 800 systems. • Section 5.2 describes machine checks/interrupts and how these errors are detected and reported. • Section 5.3 describes how to generate a formatted error log using the DECevent Translation and Reporting Utility available with OpenVMS and DIGITAL UNIX.
5.1 Fault Detection and Reporting Table 5-1 provides a summary of the fault detection and correction components of AlphaServer 800 systems. Generally, PALcode handles exceptions as follows: • The PALcode determines the cause of the exception. • If possible, it corrects the problem and passes control to the operating system for reporting before returning the system to normal operation.
5.2 Machine Checks/Interrupts The exceptions that result from hardware system errors are called machine checks/interrupts. They occur when a system error is detected during the processing of a data request.
System Machine Check (SCB: 660) A system machine check is a system- or processor-detected error that occurred as a result of an “off-chip” request to the system. The following conditions cause PALcode to build the 660/670 machine check logout frame and invoke the 660 error handler.
5.2.1 Error Logging and Event Log Entry Format The DIGITAL UNIX and OpenVMS error handlers generate several entry types. Error entries, except for correctable memory errors, are logged immediately. Entries can be of variable length based on the number of registers within the entry. Each entry consists of an operating system header, several device frames, and an end frame. Most entries have a PAL-generated logout frame, and may contain frames for CPU, memory, and I/O. 5.
5.3.1 OpenVMS Alpha Translation Using DECevent The kernel error log entries are translated from binary to ASCII using the DIAGNOSE command. To invoke the DECevent utility, enter the DCL command DIAGNOSE. Format: DIAGNOSE/TRANSLATE [qualifier] [,. . .] [infile[,. . .]] Example: $ DIAGNOSE/TRANSLATE/SINCE=14-JUN-1997 For more information on generating error log reports using DECevent, refer to DECevent Translation and Reporting Utility for OpenVMS Alpha, User and Reference Guide.
Chapter 6 System Configuration and Setup This chapter provides configuration and setup information for AlphaServer 800 systems and system options. • Section 6.1 describes how to examine the system configuration using the console firmware. —Section 6.1.1 describes the function of the two firmware interfaces used with AlphaServer systems. —Section 6.1.2 describes how to switch between firmware interfaces. —Sections 6.1.3 and 6.1.
6.1 Verifying System Configuration Figure 6-1 illustrates the system architecture for AlphaServer 800 systems.
SRM Interface Systems running DIGITAL UNIX or OpenVMS access the SRM firmware through a command-line interface, a UNIX style shell that provides a set of commands and operators, as well as a scripting facility. The SRM console allows you to configure and test the system, examine and alter system state, and boot the operating system. The SRM console prompt is >>>.
6.1.2 Switching Between Interfaces For a few procedures it is necessary to switch from one console interface to the other. • The test command and other diagnostic commands are run from the SRM interface. • The EISA Configuration Utility (ECU) and the RAID Configuration Utility (RCU) are run from the AlphaBIOS interface, as are some option-specific configuration utilities.
6.1.3 Verifying Configuration: AlphaBIOS Menu Options for Windows NT The following AlphaBIOS menu options are used for verifying system configuration on Windows NT systems: • Display System Configuration menu—Provides information about the system’s installed processor, memory, attached devices, and option boards. From the AlphaBIOS Setup screen, select Display System Configuration..., then the category for the requisite information.
The configuration display includes the following: Firmware: The version numbers for the firmware code, PALcode, SROM chip, and CPU are displayed, along with the CPU clock speed. System motherboard revision: The hardware revision number of the system motherboard. Memory: Hose The memory size and configuration for each bank of memory. 0, Bus 0, PCI: All controllers on Hose 0, Bus 0 of the primary PCI bus. The logical slot numbers are listed in the left column of the display.
Syntax show config Example >>> show config Digital Equipment Corporation AlphaServer 800 5/400 Firmware SRM Console: V4.8-29 ARC Console: v5.8 PALcode: VMS PALcode V1.19-3, OSF PALcode V1.21-5 Serial Rom: X0.4 Processor DECchip (tm) 21164A-1 400MHz System Motherboard Revision: 0 Memory 64 Meg of System Memory Bank 0 = 64 Mbytes(16 MB Per DIMM) Starting at 0x00000000 Bank 1 = No Memory Detected Slot Option 5 QLogic ISP1020 Hose 0, Bus 0, PCI pka0.7.0.5.0 SCSI Bus ID 7 dka100.1.0.5.0 dka200.2.0.
6.1.4.2 show device The show device command displays the console bootable devices and controllers in the system. The device name convention is shown in Figure 6-2. Figure 6-2 Device Name Convention dka0.0.0.0.
Example >>> show device dka100.1.0.5.0 dka200.2.0.5.0 dka400.4.0.5.0 dkc0.0.0.2003 dva0.0.0.1000.0 ewa0.0.0.1001.0 ewb0.0.0.12.0 ewc0.0.0.13.0 pka0.7.0.5.0 pka0.7.0.2002.0 pka0.7.0.2003.0 DKA100 DKA200 DKA400 DKC9 DVA0 EWA0 EWB0 EWC0 PKA0 PKB0 PKC0 RZ28M-S RZ28M-S RRD45 RZ25 08-00-2B-3E-BC-B5 00-00-C0-33-E0-0D 08-00-2B-E6-4B-F3 SCSI Bus ID 7 SCSI Bus ID 7 SCSI Bus ID 7 0021 0526 1645 0900 2.10 2.10 2.
6.1.4.4 set and show (Environment Variables) The environment variables described in Table 6-1 are typically set when you are configuring a system. Syntax: set [-default] [-integer] -[string] envar value NOTE: Whenever you use the set command to reset an environment variable, you must initialize the system to put the new setting into effect. You initialize the system by entering the init command or pressing the Reset button.
Table 6-1 Environment Variables Set During System Configuration Variable auto_action Attributes NV,W 1 Description The action the console should take following an error halt or power failure. Defined values are: BOOT — Attempt bootstrap. HALT — Halt, enter console I/O mode. RESTART — Attempt restart. If restart fails, try boot. No other values are accepted. bootdef_dev NV,W The device or device list from which booting is to be attempted when no path is specified.
Variable Attributes Description boot_flags: The hexadecimal value of the bit number or numbers to set. To specify multiple boot flags, add the flag values (logical OR). 1—Bootstrap conversationally (enables you to modify SYSGEN parameters in SYSBOOT). 2—Map XDELTA to running system. 4—Stop at initial system breakpoint. 8—Perform a diagnostic bootstrap. 10—Stop at the bootstrap breakpoints. 20—Omit header from secondary bootstrap file. 80—Prompt for the name of the secondary bootstrap file.
Variable Attributes Description com1_baud NV,W Sets the baud rate of the COM1 (MMJ) port. The default baud rate is 9600. Baud rate values are 9600, 19200, 38400. If you change com1_baud to a setting other than 9600, you need to change the RMC baud rate to match. com2_baud NV,W Sets the baud rate of the COM2 port. The default baud rate is 9600. Baud rate values are 300, 600, 1200, 2400, 4800, 9600, and 19200.
Variable Attributes fastfd—Sets the default device to fast full duplex 100BaseT. full—Set the default device to full duplex twisted pair. twisted-pair— Sets the default device to 10BaseT (twisted-pair). ew*0_mode (continued) ew*0_protocols Description NV Determines which network protocols are enabled for booting and other functions. mop—Sets the network protocol to MOP: the setting typically used for systems using the OpenVMS operating system.
Variable Attributes pk*0_fast (continued) Description If a controller is set to standard SCSI mode, both standard and fast SCSI devices will perform in standard mode. 1—Sets the default speed for devices on the controller to fast SCSI mode. Devices on a controller that connect to both standard and Fast SCSI devices will automatically perform at the appropriate rate for the device, either fast or standard mode. pk*0_host_id NV Sets the controller host bus node ID to a value between 0 and 7.
Variable Attributes tga_sync_green (continued) Description This environment variable must be set correctly so that the graphics monitor will synchronize. The parameter is a bit mask, where the least significant bit (LSB) sets the vertical SYNC for the first graphics card found, the second for the second found, and so on. The command set tga_sync_green 00 sets all graphics cards to synchronize on a separate vertical SYNC line, as required by some monitors.
6.2 CPU, Memory, and Motherboard Brief descriptions of the CPU daughter board, memory cards, and motherboard and its connectors are provided in this section. 6.2.1 CPU Daughter Board The CPU daughter board provides: • The Alpha 21164 microprocessor • Backup cache • ALCOR-2 chipset, which provides logic for external access to the cache for main memory control, and the PCI bus interface • SROM code 6.2.
6.2.3 Motherboard The motherboard provides a standard set of I/O functions: • A fast, wide SCSI controller chip (Qlogic) that supports up to seven fast wide SCSI drives: Up to three narrow SCSI removable media devices, and up to four wide SCSI hard disk drives.
Figure 6-3 Motherboard Connectors RMC PIC Processor Power Supply Connectors E26 Bank 1 Memory Module Connectors Bank 0 CPU Daughter Board E44 BIOS Chip Removable Media Narrow SCSI Connector PCI 11 PCI 12 PCI 13 PCI 14 (64-bit) EISA 1 EISA 2 Hard Disk Wide SCSI Connector PCI Option Slots Shared PCI or EISA EISA Option Slots E14 E78 NVRAM TOY Clock Chip EISA 3 NVRAM Chip IP00071C System Configuration and Setup 6-19
6.3 EISA Bus Options The EISA (Extended Industry Standard Architecture) bus is a 32-bit industry standard I/O bus. EISA is a superset of the well-established ISA bus. EISA was designed to accept newer 32-bit components while remaining compatible with older 8-bit and 16-bit cards. The EISA bus is a superset of the well-established ISA bus and has been designed to be backward compatible with 16-bit and 8-bit architecture. EISA offers performance of up to 33 Mbytes/sec for bus masters and DMA devices.
6.4 EISA Configuration Utility Whenever you add or move EISA options or some ISA options in the system, you need to run the EISA Configuration Utility (ECU). Each EISA or ISA board has a corresponding configuration (CFG) file that describes the characteristics and the system resources required for that option. The ECU uses the CFG file to create a conflict-free configuration. The ECU is a menu-based utility that provides online help to guide you through the configuration process.
6.4.1 Before You Run the ECU Before running the ECU: 1. Install EISA option(s). (Install ISA boards after you run the ECU). For information about installing a specific option, refer to the documentation for that option. 2. Familiarize yourself with the utility. You can find more information about the ECU by reading the ECU online help. Start the ECU (Refer to Section 6.4.2). Online help for the ECU is located under Step 1, “Important EISA Configuration Information.” 3. 4.
• • For systems running Windows NT—Select the following menus: a. From the AlphaBIOS Setup menu, select Utilities. b. From submenu, select Run Maintenance Program. Insert the ECU diskette for Windows NT (AK-PYCJ*-CA) into the diskette drive and select Run ECU from floppy. For systems running OpenVMS or DIGITAL UNIX—Start the ECU as follows: a. Insert the ECU diskette for OpenVMS or DIGITAL UNIX (AKQ2CR*-CA) into the diskette drive. b.
6.4.3 Configuring EISA Options EISA boards are recognized and configured automatically. See Table 6-2 for a summary of steps to configure an EISA bus that contains no ISA options. Review Section 6.6.1. Then run the ECU as described in Section 6.6.2. NOTE: It is not necessary to run Step 2 of the ECU, “Add or remove boards.” (EISA boards are recognized and configured automatically.) Table 6-2 Summary of Procedure for Configuring EISA Bus (EISA Options Only) Step Explanation Install EISA option.
6.4.4 Configuring ISA Options ISA boards are configured manually, whereas EISA boards are configured through the ECU software. See Table 6-3 for a summary of steps to configure an EISA bus that contains both EISA and ISA options. Review Section 6.6.1. Then run the ECU as described in Section 6.6.2. Table 6-3 Summary of Procedure for Configuring ISA Options Step Explanation Install or move EISA option. Do not install ISA boards. Use the instructions provided with the EISA option.
Table 6-3 Summary of Procedure for Configuring ISA Options (continued) Step Explanation Examine and set required switches to match the displayed settings. The "Examine Required Switches" ECU option displays the correct switch and jumper settings that you must physically set for each ISA option. Although the ECU cannot detect or change the settings of ISA boards, it uses the information from the previous step to determine the correct settings for these options.
6.5 PCI Bus Options PCI (Peripheral Component Interconnect) is an industry-standard expansion I/O bus that is the preferred bus for high-performance I/O options. The AlphaServer 800 provides three slots for 32-bit PCI options and one slot for 64-bit PCI options. A PCI board is shown in Figure 6-5. Figure 6-5 PCI Board PCI IP00075A Install PCI boards according to the instructions supplied with the option.
6.6.1 Configuring Internal Storage Devices The AlphaServer 800 system supports up to seven internal SCSI storage devices. The hard disk drive backplane automatically supplies the SCSI IDs for the hard disk drives as shown in Figure 6-6. The CD-ROM drive is assigned SCSI ID 4 at the factory.
When configuring the SCSI bus, note the following: • If you plan to connect the internal hard disk drives to a RAID controller option or a SCSI controller other than the onboard controller, you need to use cable PB8HA-DA. This cable provides additional length needed to reach the connector on the controller option. Figure 6-7 shows the cable routing from the hard disk backplane to the storage controller option.
Figure 6-7 RAID/SCSI Cable for Internal Disk Drive Backplane IP00015A 6-30 AlphaServer 800 Service Guide
Figure 6-8 Wide SCSI Cable for Breakouts at Rear of Enclosure IP00015B System Configuration and Setup 6-31
Figure 6-9 Wide SCSI Dual Connector Cable for Standard Bulkhead Connector IP00049A 6-32 AlphaServer 800 Service Guide
Figure 6-10 Removing Divider to Allow for Full-Height Device IP00037 6.6.2 External SCSI Expansion External SCSI devices, such as tabletop or rackmounted storage devices, can be connected to the system using EISA- or PCI-based SCSI adapters. Use the following rules to determine if a particular device can be used: • The device must be supported by the operating system. Consult the software product description for the device or contact the hardware vendor.
6.7 Console Port Configurations Power-up information is typically displayed on the system's console terminal. The console terminal may be either a graphics monitor or a serial terminal. If you use a serial terminal, it is connected through the COM1 (MMJ) serial port. Several SRM console environment variables are used to configure the console ports: Environment Variable Description console Determines where the system will display power-up output.
6.7.1 set console The setting of the console environment variable determines where the system will display power-up output. Power-up information is typically displayed on the console terminal. The console terminal can be either a graphics monitor or a serial terminal. Set this environment variable according to the console terminal that you are using. Whenever you change the value of this environment variable, you must initialize the firmware with the init command or press the Reset button.
6.7.2 set tt_allow_login The setting of the tt_allow_login environment variable enables or disables login to the SRM console firmware on alternative console ports. Syntax set tt_allow_login [0,1] Arguments: 1 Enables login on alternative console ports (default setting). If the console output device is set to serial, you can log in on the COM1(MMJ) port, COM2 port, or the graphics monitor.
6.7.3 set tga_sync_green The tga_sync_green environment variable sets the location of the SYNC signal generated by the ZLXp-E PCI graphics accelerator card. The correct setting, displayed with the show command, is: >>> show tga_sync_green tga_sync_green If the monitor does not synchronize, set the parameter as follows: >>> set tga_sync_green 00 This command sets all graphics cards to synchronize on a separate vertical SYNC line, as required by some monitors.
Table 6-4 Serial Line Keyboard Commands Graphics Line Commands Serial Line Commands F1 CTRL +A F2 CTRL +B F3 CTRL +C F4 CTRL +D F5 CTRL +E F6 CTRL +F F7 CTRL +P F8 CTRL +R F9 CTRL +T F10 CTRL +U Insert CTRL +V Delete CTRL +W Backspace CTRL +H ESC CTRL +[ 6.7.
Chapter 7 FRU Removal and Replacement This chapter describes the field-replaceable unit (FRU) removal and replacement procedures for AlphaServer 800 systems, pedestal and rackmount. • Section 7.1 lists the FRUs. • Section 7.2 provides the removal and replacement procedures for the FRUs. 7.1 AlphaServer 800 FRUs Table 7-1 lists the FRUs by part number and description and provides the reference to the figure or section that shows the removal/replacement procedure.
Table 7-1 AlphaServer 800 FRUs Part # Description Reference 17-03970-03 Floppy drive cable Figure 7-5 17-03971-04 Control panel module cable Figure 7-6 Power cord (pedestal systems) Table 7-2 Power cord (rackmount systems) Table 7-3 17-01476-02 Hard disk drive status cable, 20-pin Figure 7-8 17-04400-01 SCSI (embedded 16-bit) disk drive cable, 68-pin Figure 7-9 17-04399-01 SCSI (embedded 8-bit) removable-media cable, 50-pin Figure 7-10 PB8HA-DA SCSI (16-bit)/RAID option to hard disk b
Table 7-1 AlphaServer 800 FRUs (continued) Part # Description Reference 54-24801-01 333 MHz CPU daughter board (EV5.6) Figure 7-14 54-24801-02 400 MHz CPU daughter board (EV5.6) Figure 7-14 Fan, 4.75-inch with 3-pin cable Figure 7-16 2.1 GB SCA2 disk drive Section 7.2.7 CPU Modules Fan 12-23609-24 Fixed-Disks RZ28M-S For a complete listing of supported disk options, refer to the DIGITAL Systems and Options Catalog (ftp://ftp.digital.com/pub/Digital/info/SOC/).
Table 7-1 AlphaServer 800 FRUs (continued) Part # Description Reference Memory Modules (continued) 20-47170-D7 Alternate for 54-24352-DA Section 7.2.8 20-47083-D7 Alternate for 54-24329-DA Section 7.2.8 20-47167-D7 Alternate for 54-24344-DA Section 7.2.8 20-47137-D7 Alternate for 54-24823-DA Section 7.2.8 NOTE: Alternate and standard DIMM options cannot be mixed. Determine DIMM type before ordering. Other Modules and Components 54-24978-01 Control panel module Section 7.2.
7.2 Removal and Replacement This section describes the procedures for removing and replacing FRUs. CAUTION: Static electricity can damage integrated circuits. Always use a grounded wrist strap (29-26246) and grounded work surface when working with internal parts of a computer system. Unless otherwise specified, you can install a FRU by reversing the steps shown in the removal procedure. 7.2.
Figure 7-1 Opening Front Door, Pedestal Systems IP00046A 7-6 AlphaServer 800 Service Guide
Figure 7-2 Removing Top Cover and Side Panels (Pedestal Systems) IP00006F FRU Removal and Replacement 7-7
7.2.2 Accessing FRUs, Rackmount Systems Access rackmount FRUs as follows (refer to Figure 7-3): WARNING: The system can weigh 27.45 kg (61 lb). To prevent injury and equipment damage, ensure that only one system is extended out of the cabinet at any one time and that the cabinet is stabilized before pulling the system out on its slides. The adjustable leveling feet should be down and the cabinet’s stabilizing bar fully extended before any component is extended out of the cabinet on slides.
Figure 7-3 Accessing FRUs, Rackmount Systems IP00065D FRU Removal and Replacement 7-9
Figure 7-4 FRUs, Pedestal and Rackmount Enclosure Power Supply Removable Media Drives Power Cord Control Panel Cable Control Panel DIMM Memory Hard Disk Drive Disk Status Module CPU Daughter Board Disk Status Cable Speaker SCSI Disk Cable Motherboard NVRAM Chip (E14) NVRAM Toy Clock Chip (E78) 7-10 SCSI Removable Media Cable AlphaServer 800 Service Guide Fan IP00010F
7.2.3 Cables This section shows the routing for each cable in the system.
230V Figure 7-7 Power Cords 230V 100-120 100-120 220-240 220-240 115V 115V 100-120VAC 7.0A 50/60 Hz 220-240VAC 3.
Table 7-2 lists the country-specific power cords for pedestal systems. Table 7-3 lists the country-specific power cords for rackmount systems. Table 7-2 Power Cord Order Numbers (Pedestal Systems) Country Power Cord BN Number DIGITAL Number U.S., Japan, Canada BN09-1K 17-00083-09 Australia, New Zealand BN019H-2E 17-00198-14 Central Europe (Aus, Bel, Fra, Ger, Fin, Hol, Nor, Swe, Por, Spa) BN19C-2E 17-00199-21 U.K.
Figure 7-8 Hard Disk Drive Status Cable (20-Pin) IP00019 7-14 AlphaServer 800 Service Guide
Figure 7-9 SCSI (Embedded 16-Bit) Disk Drive Cable (68-Pin) IP00015 FRU Removal and Replacement 7-15
Figure 7-10 SCSI (Embedded 8-Bit) Removable-Media Cable (50-Pin) IP00016 7-16 AlphaServer 800 Service Guide
Figure 7-11 SCSI (16-Bit)/RAID Option to Disk Drive Backplane (68-Pin) IP00015A FRU Removal and Replacement 7-17
Figure 7-12 Wide SCSI Cable for Breakouts at Rear of Enclosure IP00015B 7-18 AlphaServer 800 Service Guide
Figure 7-13 Wide SCSI Dual Connector Cable for Standard PCI/EISA Bulkhead Connector IP00049A FRU Removal and Replacement 7-19
7.2.4 CPU Daughter Board Figure 7-14 Removing CPU Daughter Board IP00044A WARNING: CPU and memory modules have parts that operate at high temperatures. Wait 2 minutes after power is removed before handling these modules. When installing the CPU daughter board, be sure to insert it straight and square, so as not to damage the connector pins. Once the levers are in place and screwed closed, press in on the front of the module to ensure that it is properly seated.
7.2.5 Control Panel Module Disconnect the control panel cable and remove the control panel module.
7.2.
7.2.7 Hard Disk Drives NOTE: If the drives are plugged into a RAID controller, you can “hot swap” drives; that is, you can add or replace drives without first shutting down the operating system or powering down the server hardware. For more information, see the StorageWorks RAID Array Subsystem Family Installation and Configuration Guide. If the drives are not plugged into a RAID controller, you will need to shut down the operating system before swapping a drive.
7.2.8 Memory Modules The position of the failing dual-inline memory modules (DIMMs) are reported by the SROM power-up scripts (Section 2.2.2) or can be determined using the procedures described with the memexer command (Section 3.2.5). Note the following memory configuration rules when replacing memory: • At least one memory bank must contain a memory option. • A memory option consists of four DIMMs (0, 1, 2, and 3). • All DIMMs in a bank must be of the same capacity and part number.
Figure 7-19 Removing DIMMs from Motherboard IP00100 Figure 7-10 Installing DIMMs on Motherboard IP00100A NOTE: When installing DIMMs, make sure that the DIMMs are fully seated. The two latches on each DIMM connector should lock around the edges of the DIMMs.
7.2.
7.2.10 System Motherboard STEP 1: RECORD THE POSITION OF EISA AND PCI OPTIONS. STEP 2: REMOVE EISA AND PCI OPTIONS. STEP 3: REMOVE THE CPU DAUGHTER BOARD.
Figure 7-23 Removing CPU Daughter Board IP00044A WARNING: CPU and memory modules have parts that operate at high temperatures. Wait 2 minutes after power is removed before handling these modules. When installing the CPU daughter board, be sure to insert it straight and square, so as not to damage the connector pins. Once the levers are in place and screwed closed, press in on the front of the module to ensure that it is properly seated. STEP 4: REMOVE AIRFLOW BAFFLE FROM THE MOTHERBOARD.
Figure 7-24 Removing Airflow Baffle and Motherboard (12X) IP00034A STEP 6: MOVE THE NVRAM CHIP (E14) AND NVRAM TOY CHIP (E78) TO THE NEW MOTHERBOARD.
Move the socketed NVRAM chip (position E14) and NVRAM TOY chip (E78) to the replacement motherboard and set the jumpers to match previous settings.
7.2.12 PCI/EISA Options STEP 1: RECORD THE POSITION OF FAILING EISA OR PCI OPTION. STEP 2: REMOVE FAILING OPTION.
7.2.13 SCSI Disk Drive Backplane STEP 1: REMOVE HARD DISK DRIVES. Figure 7-27 Removing Hard Disk Drives IP00040A STEP 2: DISCONNECT DISK POWER, DISK STATUS, AND SCSI DATA CABLES FROM THE DISK DRIVE BACKPLANE AND REMOVE BACKPLANE.
Figure 7-28 Removing Disk Drive Backplane Disk Power SCSI Data Disk Status (6X) IP00033A FRU Removal and Replacement 7-33
7.2.14 Power Supply STEP 1: DISCONNECT POWER SUPPLY CABLES AND REMOVE POWER SUPPLY. Figure 7-29 Removing Power Supply 115V 230V IP00012A WARNING: Hazardous voltages are contained within the power supply. Do not attempt to service. Return to factory for service.
STEP 2: SET VOLTAGE SELECT SWITCH ON REPLACEMENT POWER SUPPLY AND INSTALL POWER SUPPLY. CAUTION: Incorrectly setting the voltage select switch can destroy the power supply.
7.2.
7.2.
Figure 7-32 Removing the CD-ROM Drive IP00041 NOTE: When removing a 5.25-inch device from the upper two 5.25-inch storage slots, you must first remove the diskette drive in order to access the screws that retain the 5.25-inch device.
Appendix A Default Jumper Settings This appendix provides the location and default setting for all jumpers in AlphaServer 800 systems. • Section A-1 provides location and default settings for jumpers on the motherboard. • Section A-2 provides the location and supported settings for the J3 jumper on the CPU daughter board. • Section A-3 provides the location and default setting for the J1 jumper on the CPU daughter board.
A.1 Motherboard Jumpers Figure A-1 shows the location and default settings for jumpers on the motherboard.
Jumper Name Description Default Setting J16 Fan fail override Allows the fan failure detection logic to be disabled to accommodate alternative enclosures. This jumper is not installed on AlphaServer 800 systems. J22 Remote management console (RMC) Sets default values to the RMC NVRAM. Disabled (as shown in Figure A-1). J27 VGA Enable When enabled (as shown in Figure A-1), the onboard VGA logic is activated. Enabled for onboard VGA; Disabled if an EISA- or PCI-based VGA option is installed.
Figure A-2 AlphaServer 800 5/400 and 5/333 CPU Daughter Board (Jumper J3) 400 MHz J3 0 1 2 3 4 333 MHz J1 J3 0 1 2 3 4 0 1 2 3 4 5 6 7 J1 J3 IP00070D A.3 CPU Daughter Board (J1 Jumper) Figure A-3 shows the system default setting for the J1 jumper on the CPU daughter board. For information on the fail-safe loader, which is activated through the J1 jumper, refer to Chapter 2.
Figure A-3 Jumper J1 on the CPU Daughter Board J1 0 1 2 3 4 5 6 7 J1 IP00070C Bank Jumper Setting Function 0 Standard boot setting 1 Power up with backup cache disabled: Allows the system to run despite bad B-cache until a replacement CPU board is available. 2 Power up to the fail-safe loader with backup cache disabled.
A.4 Hard Disk Drive Backplane (J5) Supported Settings Figure A-4 shows the supported setting for the J5 jumper on the SCSI hard disk backplane.
Appendix B Connector Pin Layout This appendix provides the pin layout for AlphaServer 800 internal and external connectors. • Section B-1 provides the layout for internal connectors. • Section B-2 provides the layout for external connectors.
B.
B.
Parallel Port Connector 13 12 11 10 9 8 7 6 5 4 3 2 1 25 24 23 22 21 20 19 17 18 16 15 14 1 2 3 4 5 6 7 8 9 10 11 12 13 = = = = = = = = = = = = = ~STRB DAT0 DAT1 DAT2 DAT3 DAT4 DAT5 DAT6 DAT7 ~ACK BUSY EN SLCT 14 15 16 17 18 19 20 21 22 23 24 25 = = = = = = = = = = = = ~AUTOFD ~ERROR ~INIT ~SLCTIN CHAS CHAS CHAS CHAS CHAS CHAS CHAS CHAS IP00117 VGA Connector 10 15 14 13 12 11 5 4 3 2 1 6 B-4 1 2 3 4 5 6 7 8 = = = = = = = = RED GREEN BLUE NC CHAS GND CHAS GND CHAS GND CHAS GND 9 10 11 12 13 14
Index A AC power-up sequence, 2-24 AlphaBIOS interface, 6-3 switching to SRM from, 6-4 alphabios command, 6-4 B Beep codes, 2-2 Boot diagnostic flow, 1-8 boot problems, 1-8 Boot menu (AlphaBIOS), 2-8 C cat el command, 2-9, 3-8 CD-ROM LEDs, 2-17 CFG files, 2-22 COM2 and parallel port loopback tests, 3-3, 3-6 Commands diagnostic, summarized, 3-2 diagnostic-related, 3-3 to perform extended testing and exercising, 3-3 Configuration console port, 6-34 environment variables, 6-10 ISA boards, 6-25 verifying, Op
D DC power-up sequence, 2-24 DEC VET, 1-11 DECevent, 1-10 Device naming convention SRM, 6-8 dia command, 5-6 DIAGNOSE command, 5-6 Diagnostic flows boot problems, 1-8 console, 1-4 errors reported by operating system, 1-8 problems reported by console, 17 RAID, 2-14 Diagnostics command summary, 3-2 command to terminate, 3-3, 3-15 console firmware-based, 2-26 firmware power-up, 2-25 power-up display, 2-1 related commands, summarized, 3-2 related-commands, 3-3 ROM-based, 1-10, 3-1 serial ROM, 2-25 showing statu
H Hard disk drives, 2-15 internal, 6-28 I I/O bus, EISA features, 6-20 Information resources, 1-12 Interfaces switching between, 6-4 Internet files Firmware updates, 1-12 OpenVMS patches, 1-13 supported options list, 1-14 technical information, 1-13 ISA boards configuring, 6-25 K kill command, 3-15 kill_diags command, 3-15 L LEDs CD-ROM drive, 2-17 control panel, 2-17 floppy drive, 2-16 hard disk drives, 2-15 storage device, 2-15 Logs, event, 1-10 Loopback tests, 1-10 COM2 and parallel ports, 3-3, 36 com
R RAID diagnostic flow, 2-14 problems, 2-14 Remote console monitor.
with ROM-based diagnostics, 110 Troubleshooting control panel LEDs, 2-17 EISA problems, 2-20 mass storage, 2-10 PCI problems, 2-19 storage LEDs, 2-15 Troubleshooting strategy categories of system problems, 11 questions before you begin, 1-1 Index-5