AlphaServer 1000A Service Guide Order Number: EK–ALPSV–SV.
First Printing, March 1996 Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description.
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Troubleshooting Strategy 1.1 1.1.1 1.2 1.3 Troubleshooting the System Problem Categories . . . Service Tools and Utilities . Information Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Running System Diagnostics 3.1 3.2 3.3 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.3.6 3.3.7 3.3.8 3.3.9 3.4 3.5 Running ROM-Based Diagnostics . . . Command Summary . . . . . . . . . . . . . Command Reference . . . . . . . . . . . . . test . . . . . . . . . . . . . . . . . . . . . . . . cat el and more el . . . . . . . . . . . . memory . . . . . . . . . . . . . . . . . . . . netew . . . . . . . . . . . . . . . . . . . . . . network . . . . . . . . . . . . . . . . . . . . net -s . . . . . . . . . . . . . . . . . . .
.2.2 5.3 5.4 5.5 5.5.1 5.6 5.6.1 5.6.2 5.6.3 5.6.4 5.7 5.7.1 5.8 5.8.1 5.8.2 5.8.3 5.9 5.10 5.10.1 5.10.2 5.10.3 5.10.4 5.10.5 Memory Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motherboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EISA Bus Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ISA Bus Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Identifying ISA and EISA options . . . . . . . . . . . . . . . .
.2.14 Removable Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–41 A Default Jumper Settings A.1 A.2 A.3 Motherboard Jumpers . . . . . . . . . . . . . . . . . . . . . . . . . . . . CPU Daughter Board (J3 and J4) Supported Settings . . . . CPU Daughter Board (J1 Jumper) . . . . . . . . . . . . . . . . . . . A–2 A–4 A–6 Sample Hardware Configuration Display . . . . . . . . . . . 5–6 Jumper J1 on the CPU Daughter Board . . . . . . . . . . . AlphaServer 1000A Memory Layout . . . . . .
6–2 6–3 6–4 6–5 6–6 6–7 6–8 6–9 6–10 6–11 6–12 6–13 6–14 6–15 6–16 6–17 6–18 6–19 6–20 6–21 6–22 6–23 6–24 6–25 6–26 6–27 6–28 6–29 6–30 6–31 6–32 6–33 6–34 FRUs, Rear Left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Opening Front Door . . . . . . . . . . . . . . . . . . . . . . . . . . Removing Top Cover and Side Panels . . . . . . . . . . . . Floppy Drive Cable (34-Pin) . . . . . . . . . . . . . . . . . . . . OCP Module Cable (10-Pin) . . . . . . . . . . . . . . . . . . . . Power Cord . . .
6–35 6–36 6–37 6–38 A–1 A–2 A–3 A–4 Removing Speaker . . . . . . . . . . . . . . . . . . . . . . . Removing a CD–ROM Drive . . . . . . . . . . . . . . . Removing a Tape Drive . . . . . . . . . . . . . . . . . . . Removing a Floppy Drive . . . . . . . . . . . . . . . . . . Motherboard Jumpers (Default Settings) . . . . . . AlphaServer 1000A 4/266 CPU Daughter Board (Jumpers J3 and J4) . . . . . . . . . . . . . . . . . . . . . AlphaServer 1000A 4/233 CPU Daughter Board (Jumpers J3 and J4) . . . . . . . . . .
5–6 5–7 5–8 6–1 6–2 Summary of Procedure for Configuring EISA Bus (EISA Options Only) . . . . . . . . . . . . . . . . . . . . . . . . . . Summary of Procedure for Configuring EISA Bus with ISA Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SCSI Storage Configurations . . . . . . . . . . . . . . . . . . . . AlphaServer 1000A FRUs . . . . . . . . . . . . . . . . . . . . . . Power Cord Order Numbers . . . . . . . . . . . . . . . . . . . . .
Preface This guide describes the procedures and tests used to service AlphaServer 1000A systems. AlphaServer 1000A systems use a deskside ‘‘wide-tower’’ enclosure. Intended Audience This guide is intended for use by Digital Equipment Corporation service personnel and qualified self-maintenance customers.
Conventions The following conventions are used in this guide: Convention Meaning Return A key name enclosed in a box indicates that you press that key. Ctrl/x Ctrl/x indicates that you hold down the Ctrl key while you press another key, indicated here by x. In examples, this key combination is enclosed in a box, for example, Ctrl/C . Warning Warnings contain information to prevent personal injury. Caution Cautions provide information to prevent damage to equipment or software.
• DECevent Analysis and Notification Utility for OpenVMS Alpha, User and Reference Guide, AA-Q73LC-TE • DECevent Analysis and Notification Utility for Digital UNIX, User and Reference Guide AA-QAA4A-TE • StorageWorks RAID Array 200 Subsystems Controller Installation and Standalone Configuration Utility User’s Guide, EK-SWRA2-IG xiii
1 Troubleshooting Strategy This chapter describes the troubleshooting strategy for AlphaServer 1000A systems. • Section 1.1 provides questions to consider before you begin troubleshooting an AlphaServer 1000A system. • Tables 1–1 through 1–5 provide a diagnostic flow for each category of system problem. • Section 1.2 lists the product tools and utilities. • Section 1.3 lists available information services. 1.
1.1.1 Problem Categories System problems can be classified into the following five categories. Using these categories, you can quickly determine a starting point for diagnosis and eliminate the unlikely sources of the problem. 1. Power problems (Table 1–1) 2. No access to console mode (Table 1–2) 3. Console-reported failures (Table 1–3) 4. Boot failures (Table 1–4) 5.
Table 1–1 Diagnostic Flow for Power Problems Symptom Action System does not power on. Power supply shuts down after a few seconds (fan failure). • Check the power source and power cord. • Check that the system’s top cover is properly secured. A safety interlock switch shuts off power to the system if the top cover is removed. • If there are two power supplies, make sure both power supplies are plugged in. • Check the On/Off switch setting on the operator control panel.
Table 1–2 Diagnostic Flow for Problems Getting to Console Mode Symptom Action Power-up screen is not displayed. Interpret the error beep codes at power-up (Section 2.1) for a failure detected during self-tests. Check that the keyboard and monitor are properly connected and turned on. If the power-up screen is not displayed, yet the system enters console mode when you press Return , check that the console environment variable is set correctly.
Table 1–3 Diagnostic Flow for Problems Reported by the Console Program Symptom Action Power-up tests do not complete. Interpret the error beep codes at power-up (Section 2.1) and check the power-up screen (Section 2.3) for a failure detected during self-tests. Console program reports error: Use the error beep codes (Section 2.1) and/or console terminal (Section 2.3) to determine the error. • Error beep codes report an error at power-up. • Power-up screen includes error messages.
Table 1–4 Diagnostic Flow for Boot Problems Symptom Action System cannot find boot device. Check the system configuration for the correct device parameters (node ID, device name, and so on). • For Digital UNIX and OpenVMS, use the show config and show device commands (Section 5.1). • For Windows NT, use the Display Hardware Configuration display and the Set Default Environment Variables display (Section 5.1). Check the system configuration for the correct environment variable settings.
Table 1–5 Diagnostic Flow for Errors Reported by the Operating System Symptom System is hung or has crashed. Action Examine the crash dump file. Refer to OpenVMS Alpha System Dump Analyzer Utility Manual (AA-PV6UB-TE) for information on how to interpret OpenVMS crash dump files. Refer to the Guide to Kernel Debugging (AA–PS2TD– TE) for information on using the Digital UNIX Krash Utility. Errors have been logged and the operating system is up.
RECOMMENDED USE: ROM-based diagnostics are the primary means of testing the console environment and diagnosing the CPU, memory, Ethernet, I/O buses, and SCSI and DSSI subsystems. Use ROM-based diagnostics in the acceptance test procedures when you install a system, add a memory module, or replace the following components: CPU module, memory module, motherboard, I/O bus device, or storage device. Refer to Chapter 3 for information on running ROM-based diagnostics.
Crash Dumps For fatal errors, such as fatal bugchecks, Digital UNIX and OpenVMS operating systems will save the contents of memory to a crash dump file. RECOMMENDED USE: Crash dump files can be used to determine why the system crashed. To save a crash dump file for analysis, you need to know the proper system settings. Refer to the OpenVMS Alpha System Dump Analyzer Utility Manual (AA-PV6UB-TE) or the Guide to Kernel Debugging (AA–PS2TD–TE) for Digital UNIX. 1.
ECU Revisions The EISA Configuration Utility (ECU) is used for configuring EISA options on AlphaServer systems. Systems are shipped with an ECU kit, which includes the ECU license. Customers who already have the ECU and license, but need the latest revision of the ECU, can order a separate kit. Call 1-800-DIGITAL to order. If the customer plans to migrate from Digital UNIX or OpenVMS to Windows NT, you must re-run the appropriate ECU. Failure to run the operatingspecific ECU will result in system failure.
You can obtain information about hardware configurations for the AlphaServer 1000A from the Digital Systems and Options Catalog. The catalog is regularly published to assist in ordering and configuring systems and hardware options. Each printing of the catalog presents all of the products that are announced, actively marketed, and available for ordering. Access printable postscript files of any section of the catalog from the Internet as follows (Be sure to check the Readme file): • ftp://ftp.digital.
2 Power-Up Diagnostics and Display This chapter provides information on how to interpret error beep codes and the power-up display on the console screen. In addition, a description of the power-up and firmware power-up diagnostics is provided as a resource to aid in troubleshooting. • Section 2.1 describes how to interpret error beep codes at power-up. • Section 2.2 describes SROM memory tests that can be run at power-up to isolate failing SIMM memory. • Section 2.
2.1 Interpreting Error Beep Codes If errors are detected at power-up, audible beep codes are emitted from the system. For example, if the SROM code could not find any good memory, you would hear a 1-3-3 beep code (one beep, a pause, a burst of three beeps, a pause, and another burst of three beeps). The beep codes are the primary diagnostic tool for troubleshooting problems when console mode cannot be accessed. Refer to Table 2–1 for information on interpreting error beep codes.
Table 2–1 (Cont.) Interpreting Error Beep Codes Beep Code Problem 1-3-3 No usable memory detected. Corrective Action 1. Verify that the memory modules are properly seated and try powering up again. 2. Swap bank 0 memory with known good memory and run SROM memory tests at powerup (Section 2.2). 3. If populating bank 0 with known good memory does not solve the problem, replace the CPU daughter board (Chapter 6). 4.
Table 2–1 (Cont.) Interpreting Error Beep Codes Beep Code 3-3-1 Problem Corrective Action Generic system failure. Possible problem sources include the TOY NVRAM chip (Dallas DS1287A) or PCI-to-EISA bridge chipset (Intel 82375EB). 1. Replace the TOY NVRAM chip (E78) on system motherboard (Chapter 6.) 2. If replacing the TOY NVRAM chip did not solve the problem, replace the motherboard (Chapter 6).
Table 2–2 SROM Memory Tests, CPU Jumper J1 Bank # 3 Test Description Test Results Cache Test: Tests backup cache. Test status displays on OCP: ....done. If the test takes longer than a few seconds to complete, there is a problem with the backup cache—replace the CPU daughter board (Chapter 6). 5 Memory Test: Tests memory with backup and data cache disabled. Test status displays on OCP: 12345.done. If an error is detected, the bank number and failing SIMM position are displayed.
Table 2–2 (Cont.) SROM Memory Tests, CPU Jumper J1 Bank # 6 Test Description Test Results Memory Test, Cache Enabled: Tests memory with backup and data cache enabled. Test status displays on OCP: 12345.done. If an error is detected, the bank number and failing SIMM position are displayed. The following OCP message indicates a failing SIMM at bank 0, SIMM position 2. FAIL B:0 S:2 Test duration: Approximately 2 seconds per 8 megabytes of memory.
Table 2–2 (Cont.) SROM Memory Tests, CPU Jumper J1 Bank # 4 Test Description Test Results Backup Cache Test: Tests backup cache alternatively with data cache enabled then disabled. Test status displays on OCP: d D D d 12345.done. 12345.done. 12345.done. 12345.done. If an error is detected, the bank number and failing SIMM position are displayed. The following OCP message indicates a failing SIMM at bank 0, SIMM position 2.
Figure 2–1 Jumper J1 on the CPU Daughter Board J1 0 1 2 3 4 5 6 7 MA00926 Bank Jumper Setting 0 Standard boot setting (AlphaServer 1000 systems) 1 Standard boot setting (AlphaServer 1000A systems) 2 Mini-console setting: Internal use only 3 SROM CacheTest: backup cache test 4 SROM BCacheTest: backup cache and memory test 5 SROM memTest: memory test with backup and data cache disabled 6 SROM memTestCacheOn: memory test with backup and data cache enabled 7 Fail-Safe Loader setting: selects
Figure 2–2 AlphaServer 1000A Memory Layout Bank 3 Bank 2 Bank 1 Bank 0 ECC Banks SIMM 1 SIMM 3 SIMM 0 SIMM 2 SIMM 1 SIMM 3 SIMM 0 SIMM 2 SIMM 1 SIMM 3 SIMM 0 SIMM 2 SIMM 1 SIMM 3 SIMM 0 SIMM 2 ECC SIMM for Bank 2 ECC SIMM for Bank 3 ECC SIMM for Bank 0 ECC SIMM for Bank 1 MA00327 2.3 Power-Up Screen During power-up self-tests, the test status and result are displayed on the console terminal. Information similar to the following example should be displayed on the screen. ff.fe.fd.fc.
Table 2–3 Console Power-Up Countdown Description and Field Replaceable Units (FRUs) Countdown Number Description Likely FRU ff Console initialization started Non-specific/Status message fe Initialized idle PCB Non-specific/Status message fd Initializing semaphores Non-specific/Status message fc,fb,fa Initializing heap Non-specific/Status message f9 Initializing driver structures Non-specific/Status message f8 Initializing idle process PID Non-specific/Status message f7 Initializing fil
Windows NT Systems The Windows NT operating system is supported by the ARC firmware (see Section 5.1.1). Systems using Windows NT power up to the ARC boot menu as follows: Alpha Firmware Version n.nn Copyright (c) 1993-1995 Microsoft Corporation Copyright (c) 1993-1995 Digital Equipment Corporation Boot menu: Boot Windows NT Boot an alternate operating system... Run a program... Supplementary menu... Use the arrow keys to select, then press Enter. 2.3.
2.4 Mass Storage Problems Indicated at Power-Up Mass storage failures at power-up are usually indicated by read fail messages. Other problems are indicated by storage devices missing from the show config display. • Table 2–4 provides information for troubleshooting mass storage problems indicated at power-up or storage devices missing from the show config display. • Table 2–5 provides troubleshooting tips for AlphaServer systems that use the RAID Array 200 Subsystem. • Section 2.
Table 2–4 (Cont.) Mass Storage Problems Problem Symptom Corrective Action Missing or loose cables. Drives not properly seated on StorageWorks shelf Activity LEDs do not come on. Drive missing from the show config display. Remove device and inspect cable connections. Reseat drive on StorageWorks shelf. SCSI bus length exceeded Drives may disappear intermittently from the show config and show device displays.
Table 2–4 (Cont.) Mass Storage Problems Problem Symptom Corrective Action SCSI storage controller failure Problems persist after eliminating the problem sources. Replace failing EISA or PCI storage adapter module (or motherboard for the native SCSI controller). Table 2–5 provides troubleshooting hints for AlphaServer 1000A systems that have the StorageWorks RAID Array 200 Subsystem. The RAID subsystem includes either the KZESC-xx (SWXCR-Ex) or the KZPSC-xx (SWXCR-Px) PCI backplane RAID controller.
Table 2–5 (Cont.) Troubleshooting RAID Problems Symptom Action Cannot access disks connected to the RAID subsystem on Windows NT systems. On Windows NT systems, disks connected to the controller must be spun up before they can be accessed. While running the ECU, verify that the controller is set to spin up two disks every six seconds. This is the default setting if you are using the default configuration files for the controller. If the settings are different, adjust them as needed. 2.
Figure 2–3 StorageWorks Disk Drive LEDs (SCSI) Activity Fault MA00927 Figure 2–4 Floppy Drive Activity LED Activity LED 2–16 Power-Up Diagnostics and Display MA00330
Figure 2–5 CD–ROM Drive Activity LED Activity LED MA00333 Power-Up Diagnostics and Display 2–17
2.6 EISA Bus Problems Indicated at Power-Up EISA bus failures at power-up are usually indicated by the following messages displayed during power-up: EISA Configuration Error. Run the EISA Configuration Utility. Run the EISA Configuration Utility (ECU) (Section 5.4) when this message is displayed. Other EISA bus problems are indicated by the absence of EISA devices from the show config display. Table 2–6 provides steps for troubleshooting EISA bus problems that persist after you run the ECU.
2.6.1 Additional EISA Troubleshooting Tips The following tips can aid in isolating EISA bus problems. • Peripheral device controllers need to be seated (inserted) carefully, but firmly, into their slots to make all necessary contacts. Improper seating is a common source of problems for EISA modules. • Be sure you run the correct version of the ECU for the operating system. For windows NT, use ECU diskette DECpc AXP (AK-PYCJ*-CA); for Digital UNIX and OpenVMS, use ECU diskette DECpc AXP (AK-Q2CR*-CA).
2.7 PCI Bus Problems Indicated at Power-Up PCI bus failures at power-up are usually indicated by the inability of the system to see the device. Table 2–7 provides steps for troubleshooting PCI bus problems. Use the table to diagnose the likely cause of the problem. Note Some PCI devices do not implement PCI parity, and some have a paritygenerating scheme in which parity is sometimes incorrect or is not compliant with the PCI Specification.
ftp://ftp.digital.com/pub/DEC/Alpha/systems/ http://www.service.digital.com/alpha/server/ 2.8 Fail-Safe Loader The fail-safe loader (FSL) is a redundant or backup ROM that allows you to power up without running power-up diagnostics and load new SRM/ARC and FSL console firmware from the firmware diskette. Note The fail-safe loader should be used only when a failure at power-up prohibits you from getting to the console program. You cannot boot an operating system from the fail-safe loader.
2.8.2 Activating the Fail-Safe Loader To activate the FSL: 1. Install the jumper at bank 7 of the J1 jumper on the CPU daughter board (Figure 2–6). The jumper is normally installed in the standard boot setting (bank 1 for AlphaServer 1000A systems). 2. Install the console firmware diskette and turn on the system. Two messages are displayed on the operator control panel (OCP) when the FSL program loads the diskette: OCP Message Meaning Floppy Loader FSL firmware is executing.
Figure 2–6 Jumper J1 on the CPU Daughter Board J1 0 1 2 3 4 5 6 7 MA00926 Bank Jumper Setting 0 Standard boot setting (AlphaServer 1000 systems) 1 Standard boot setting (AlphaServer 1000A systems) 2 Mini-console setting: Internal use only 3 SROM CacheTest: backup cache test 4 SROM BCacheTest: backup cache and memory test 5 SROM memTest: memory test with backup and data cache disabled 6 SROM memTestCacheOn: memory test with backup and data cache enabled 7 Fail-Safe Loader setting: selects
2.9 Power-Up Sequence During the AlphaServer 1000A power-up sequence, the power supplies are stabilized and the system is initialized and tested through the firmware power-on self-tests. The power-up sequence includes the following: • • Power supply power-up: – AC power-up – DC power-up Two sets of power-on diagnostics: – Serial ROM diagnostics – Console firmware-based diagnostics Caution The AlphaServer 1000A enclosure will not power up if the top cover is not securely attached.
2.9.2 DC Power-Up Sequence DC power is applied to the system with the DC On/Off button on the operator control panel. A summary of the DC power-up sequence follows: 1. When the DC On/Off button is pressed, the power supply checks for a POK_H condition. 2. 12V, 5V, 3.3V, and -12V outputs are energized and stabilized. If the outputs do not come into regulation, the power-up is aborted and the power supply enters the latching-shutdown mode. 2.
3. Test the system bus to PCI bus bridge and system bus to EISA bus bridge. If the PCI bridge fails or EISA bridge fails, an audible error beep code (3-3-1) sounds (Table 2–1). The power-up tests continue despite these errors. 4. Test the PCI-to-PCI bus bridge. If the bridge fails, an error beep code (3-3-2) sounds. 5. Test the native SCSI controller. If the controller fails, an error beep code (3-1-2) sounds. 6. Configure the memory in the system and test only the first 4 MB of memory.
4. Run exercisers on the drives currently seen by the system. Note This step does not ensure that all disks in the system will be tested or that any device drivers will be completely tested. Spin-up time varies for different drives, so not all disks may be on line at this point in the power-up sequence. To ensure complete testing of disk devices, use the test command (Section 3.3.1). 5. Enter console mode or boot the operating system. This action is determined by the auto_action environment variable.
3 Running System Diagnostics This chapter provides information on how to run system diagnostics. • Section 3.1 describes how to run ROM-based diagnostics, including error reporting utilities and loopback tests. • Section 3.4 describes acceptance testing and initialization procedures. • Section 3.5 describes the DEC VET operating system exerciser. 3.
3.2 Command Summary Table 3–1 provides a summary of the diagnostic and related commands. Table 3–1 Summary of Diagnostic and Related Commands Command Function Reference Acceptance Testing test Quickly tests the core system. The test command is the primary diagnostic for acceptance testing and console environment diagnosis. Section 3.3.1 cat el Displays the console event log. Section 3.3.2 more el Displays the console event log one screen at a time. Section 3.3.
Table 3–1 (Cont.) Summary of Diagnostic and Related Commands Command Function Reference test lb Conducts loopback tests for COM2 and the parallel port in addition to quick core system tests. Section 3.3.1 netew Runs external MOP loopback tests for specified EISA- or PCI-based ew* (DECchip 21040, TULIP) Ethernet ports. Section 3.3.4 network Runs external MOP loopback tests for specified EISA- or PCI-based er* (DEC 4220, LANCE) Ethernet ports. Section 3.3.
3.3.1 test The test command runs firmware diagnostics for the entire core system. The tests are run concurrently in the background. Fatal errors are reported to the console terminal. The cat el command should be used in conjunction with the test command to examine test/error information reported to the console event log. Because the tests are run concurrently and indefinitely (until you stop them with the kill_diags command), they are useful in flushing out intermittent hardware problems.
The test script tests devices in the following order: 1. Console loopback tests if lb argument is specified: COM2 serial port and parallel port. 2. Network external loopback tests for E*A0. This test requires that the Ethernet port be terminated or connected to a live network; otherwise, the test will fail. 3. Memory tests (one pass). 4. Read-only tests: DK* disks, DR* disks, DU* disks, MK* tapes, DV* floppy. 5. VGA console tests.
Testing the memory Testing parallel port Testing the SCSI Disks Non-destructive Test of the Floppy started dka400.4.0.6.0 has no media present or is disabled via the RUN/STOP switch file open failed for dka400.4.0.6.
3.3.2 cat el and more el The cat el and more el commands display the current contents of the console event log. Status and error messages (if problems occur) are logged to the console event log at power-up, during normal system operation, and while running system tests. Standard error messages are indicated by asterisks (***). When cat el is used, the contents of the console event log scroll by. You can use the Ctrl/S combination to stop the screen from scrolling, Ctrl/Q to resume scrolling.
3.3.3 memory The memory command tests memory by running a memory exerciser each time the command is entered. The exercisers are run in the background and nothing is displayed unless an error occurs. The number of exercisers, as well as the length of time for testing, depends on the context of the testing. Generally, running three to five exercisers for 15 minutes to 1 hour is sufficient for troubleshooting most memory problems.
The following is an example with a memory compare error indicating bad SIMMs.
3.3.4 netew The netew command is used to run MOP loopback tests for any EISA- or PCIbased ew* (DECchip 21040, TULIP) Ethernet ports. The command can also be used to test a port on a ‘‘live’’ network. The loopback tests are set to run continuously (-p pass_count set to 0). Use the kill command (or Ctrl/C ) to terminate an individual diagnostic or the kill_diags command to terminate all diagnostics. Use the show_status display to determine the process ID when terminating an individual diagnostic test.
Testing an Ethernet Port: >>> netew >>> show_status ID Program -------- -----------00000001 idle 000000d5 nettest >>> kill_diags >>> Device Pass Hard/Soft Bytes Written Bytes Read ------------ ------ --------- ------------- ------------system 0 0 0 0 0 ewa0.0.0.0.
3.3.5 network The network command is used to run MOP loopback tests for any EISA- or PCIbased er* (DEC 4220, LANCE) Ethernet ports. The command can also be used to test a port on a ‘‘live’’ network. The loopback tests are set to run continuously (-p pass_count set to 0). Use the Ctrl/C ) to terminate an individual diagnostic or the kill_diags command to terminate all diagnostics. Use the show_status display to determine the process ID when terminating an individual diagnostic test.
Testing an Ethernet Port: >>> network >>> show_status ID Program -------- -----------00000001 idle 000000d5 nettest >>> kill_diags >>> Device Pass Hard/Soft Bytes Written Bytes Read ------------ ------ --------- ------------- ------------system 0 0 0 0 0 era0.0.0.0.
3.3.6 net -s The net -s command displays the MOP counters for the specified Ethernet port.
3.3.7 net -ic The net -ic command initializes the MOP counters for the specified Ethernet port.
3.3.8 kill and kill_diags The kill and kill_diags commands terminate diagnostics that are currently executing . Note A serial loopback connector (12-27351-01) must be installed on the COM2 serial port for the kill_diags command to successfully terminate system tests. • The kill command terminates a specified process. • The kill_diags command terminates all diagnostics. Synopsis: kill_diags kill [PID . . . ] Argument: [PID . . . ] The process ID of the diagnostic to terminate.
3.3.9 show_status The show_status command reports one line of information per executing diagnostic. The information includes ID, diagnostic program, device under test, error counts, passes completed, bytes written, and bytes read. Many of the diagnostics run in the background and provide information only if an error occurs. Use the show_status command to display the progress of diagnostics.
3.4 Acceptance Testing and Initialization Perform the acceptance testing procedure listed below after installing a system or whenever adding or replacing the following: Memory modules Motherboard CPU daughter board Storage devices EISA or PCI options 1. Run the RBD acceptance tests using the test command. 2. If you have added or moved, an EISA option or some ISA options, run the EISA Configuration Utility (ECU). 3. Bring up the operating system. 4.
4 Error Log Analysis This chapter provides information on how to interpret error logs reported by the operating system. • Section 4.1 describes machine check/interrupts and how these errors are detected and reported. • Section 4.2 describes the entry format used by the error formatters. • Section 4.3 describes how to generate a formatted error log using the DECevent Translation and Reporting Utility available with OpenVMS and Digital UNIX. 4.
Table 4–1 AlphaServer 1000 Fault Detection and Correction Component Fault Detection/Correction Capability KN22A Processor Module DECchip 21064 and 21064A microprocessors Contains error detection and correction (EDC) logic for data cycles. There are check bits associated with all data entering and exiting the 21064(A) microprocessor. A singlebit error on any of the four longwords being read can be corrected (per cycle).
Processor Machine Check (SCB: 670) Processor machine check errors are fatal system errors that result in a system crash. The error handling code for these errors is common across all platforms using the DECchip 21064 and 21064A microprocessors.
• Invalid page table lookup (scatter gather) • Memory cycle error • B-cache tag address parity error • B-cache tag control parity error • Non-existent memory error • ESC NMI: IOCHK Processor-Corrected Machine Check (SCB: 630) Processor-corrected machine checks are caused by B-cache errors that are detected and corrected by the DECchip 21064 or 21064A microprocessor. These are nonfatal errors that result in an error log entry.
4.3 Event Record Translation Systems running Digital UNIX and OpenVMS operating systems use the DECevent management utility to translate events into ASCII reports derived from system event entries (bit-to-text translations).
System faults can be isolated by examining translated system error logs or using the DECevent Analysis and Notification Utility. Refer to the DECevent Analysis and Notification Utility for OpenVMS Alpha, User and Reference Guide, AA-Q73LC-TE, for more information. 4.3.2 Digital UNIX Translation Using DECevent The kernel error log entries are translated from binary to ASCII using the dia command. To invoke the DECevent utility, enter dia command. Format: dia [-a -f infile[ . . .
5 System Configuration and Setup This chapter provides configuration and setup information for AlphaServer 1000A systems and system options. • Section 5.1 describes how to examine the system configuration using the console firmware. – Section 5.1.1 describes the function of the two firmware interfaces used with AlphaServer 1000A systems. – Section 5.1.2 describes how to switch between firmware interfaces. – Sections 5.1.3 and 5.1.
5.1 Verifying System Configuration Figure 5–1 illustrates the system architecture for AlphaServer 1000A systems.
SRM Command Line Interface Systems running Digital UNIX or OpenVMS access the SRM firmware through a command line interface (CLI). The CLI is a UNIX style shell that provides a set of commands and operators, as well as a scripting facility. The CLI allows you to configure and test the system, examine and alter system state, and boot the operating system. The SRM console prompt is >>>.
5.1.2 Switching Between Interfaces For a few procedures it is necessary to switch from one console interface to the other. • The test command is run from the SRM interface. • The EISA Configuration Utility (ECU) and the RAID Configuration Utility (RCU) are run from the ARC interface. Switching from SRM to ARC Two SRM console commands are used to temporarily switch to the ARC console: • The arc command loads the ARC firmware and switches to the ARC menu interface.
5.1.3.1 Display Hardware Configuration The hardware configuration display provides the following information: • The first screen displays system information, such as the memory, CPU type, speed, NVRAM usage, the ARC version time stamp, and the type of video option detected. • The second screen displays devices detected by the firmware, including the monitor, keyboard, serial ports and devices on the SCSI bus. Tape devices are displayed, but cannot be accessed from the firmware.
Table 5–2 ARC Firmware Device Names Name Description multi(0)key(0)keyboard(0) multi(0)serial(0) multi(0)serial(1) The multi( ) devices are located on the system module. These devices include the keyboard port and the serial line ports. eisa(0)video(0)monitor(0) eisa(0)disk(0)fdisk(0) The eisa( ) devices are provided by devices on the EISA bus. These devices include the monitor and the diskette drive. scsi(0)disk(0)rdisk(0) scsi(0)cdrom(5)fdisk(0) The scsi( ) devices are SCSI disk or CD–ROM devices.
Example 5–1 (Cont.) Sample Hardware Configuration Display eisa(0)video(0)monitor(0) multi(0)key(0)keyboard(0) eisa(0)disk(0)fdisk(0) (Removable) multi(0)serial(0) multi(0)serial(1) scsi(0)disk(0)rdisk(0) (4 Partitions) DEC scsi(0)cdrom(0)fdisk(0) (Removable) DEC RZ29B RRD43 (C)DEC007 (C) DEC 1084 Press any key to continue...
Table 5–3 lists and explains the default ARC firmware environment variables. Table 5–3 ARC Firmware Environment Variables Variable Description A: The default floppy drive. The default value is eisa( )disk( )fdisk( ). AUTOLOAD The default startup action, either YES (boot) or NO or undefined (remain in Windows NT firmware). CONSOLEIN The console input device. The default value is multi( )key( )keyboard( )console( ). CONSOLEOUT The console output device.
5.1.4 Verifying Configuration: SRM Console Commands for Digital UNIX and OpenVMS The following SRM console commands are used to verify system configuration on Digital UNIX and OpenVMS systems: • • show config (Section 5.1.4.1)—Displays the buses on the system and the devices found on those buses. show device (Section 5.1.4.2)—Displays the devices and controllers in the system. • show memory (Section 5.1.4.3)—Displays main memory configuration. • set and show (Section 5.1.4.
Bus 0, Slots 11–13 correspond to physical PCI card cage slots on the primary PCI bus: Slot 11 = PCI11 Slot 12 = PCI12 Slot 13 = PCI13 In the case of storage controllers, the devices off the controller are also displayed. • EISA Bus: Slot numbers correspond to EISA card cage slots (1 and 2). For storage controllers, the devices off the controller are also displayed. For more information on device names, refer to Figure 5–2. Refer to Figure 5–3 for the location of physical slots.
Synopsis: show config Example: >>> show config Firmware SRM Console: ARC Console: PALcode: Serial Rom: X4.4-5365 4.43p VMS PALcode X5.48-115, OSF PALcode X1.35-84 X2.1 Processor DECchip (tm) 21064A-6 MEMORY 32 Meg of System Memory Bank 0 = 32 Mbytes() Starting at 0x00000000 PCI Bus Bus 00 Slot 07: Intel 8275EB PCI to Eisa Bridge Bus 00 Slot 08: Digital PCI to PCI Bridge Chip Bus 02 Slot 00: ISP1020 Scsi Controller pka0.7.0.2000.0 dka0.0.0.2000.0 dka500.5.0.2000.
The following show config example illustrates how PCI options that contain a PCI-to-PCI bridge are represented in the display. For each option that contains a PCI-to-PCI bridge, the bus number increments by 1, and the logical slot numbers start anew at 0.
Example: >>> show config Firmware SRM Console: ARC Console: PALcode: Serial Rom: X4.4-5365 4.43p VMS PALcode X5.48-115, OSF PALcode X1.35-84 X2.1 Processor DECchip (tm) 21064A-6 MEMORY 32 Meg of System Memory Bank 0 = 32 Mbytes() Starting at 0x00000000 PCI Bus Bus 00 Slot 07: Intel 8275EB PCI to Eisa Bridge Bus 00 Slot 08: Digital PCI to PCI Bridge Chip Bus 02 Slot 00: ISP1020 SCSI Controller pka0.7.0.2000.0 dka0.0.0.2000.0 dka500.5.0.2000.
5.1.4.2 show device The show device command displays the devices and controllers in the system. The device name convention is shown in Figure 5–2. Figure 5–2 Device Name Convention dka0.0.0.0.
Example: >>> show device dka400.4.0.6.0 dva0.0.0.0.1 era0.0.0.2.1 pka0.7.0.6.0 >>> DKA400 DVA0 ERA0 PKA0 RRD43 2893 08-00-2B-BC-93-7A SCSI Bus ID 7 Console device name Node name (alphanumeric, up to 6 characters) Device type Firmware version (if known) 5.1.4.3 show memory The show memory command displays information for each bank of memory in the system.
show envar Arguments: envar The name of the environment variable to be modified. value The value that is assigned to the environment variable. This may be an ASCII string. Options: -default Restores variable to its default value. -integer Creates variable as an integer. -string Creates variable as a string (default).
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function bootdef_dev NV The device or device list from which booting is to be attempted, when no path is specified on the command line. Set at factory to disk with Factory Installed Software; otherwise NULL. boot_file NV,W The default file name used for the primary bootstrap when no file name is specified by the boot command. The default value when the system is shipped is NULL.
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function bus_probe_ algorithm NV Specifies a bus probe algorithm for the system. OLD—Systems running OpenVMS V6.1 or earlier must set the bus probe algorithm to old—Failure to do so could result in bugcheck errors when booting from an EISA device. NEW—Systems running Digital UNIX V3.0B or later or OpenVMS V6.2 or later should be set to new.
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function er*0_protocols, ew*0_protocols NV Determines which network protocols are enabled for booting and other functions. ‘‘mop’’—Sets the network protocol to MOP: the setting typically used for systems using the OpenVMS operating system. ‘‘bootp’’—Sets the network protocol to bootp: the setting typically used for systems using the Digital UNIX operating system.
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function pci_parity NV Disable or enable parity checking on the PCI bus. ON—PCI parity enabled. OFF—PCI parity disabled. Some PCI devices do not implement PCI parity checking, and some have a parity-generating scheme in which the parity is sometimes incorrect or is not fully compliant with the PCI specification. In such cases, the device functions properly as long as parity is is not checked.
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function pk*0_host_id NV Sets the controller host bus node ID to a value between 0 and 7. 0 to 7—Assigns bus node ID for specified host adapter. pk*0_soft_term NV Enables or disables SCSI terminators. This environment variable applies to systems using the QLogic ISP1020 SCSI controller. The QLogic ISP1020 SCSI controller implements the 16-bit wide SCSI bus.
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function tga_sync_green NV Sets the location of the SYNC signal generated by the ZLXp-E PCI graphics accelerator (PBXGA). This environment variable must be set correctly so that the graphics monitor will synchronize. The parameter is a bit mask, where the least significant bit (LSB) sets the vertical SYNC for the first graphics card found, the second for the second found, and so on.
the system by entering the init command or pressing the Reset button. 5.2 System Bus Options The system bus interconnects the CPU and memory modules. Figure 5–3 shows the card cage and bus locations.
5.2.1 CPU Daughter Board AlphaServer 1000A systems use a CPU daughter board. The daughter board provides: • The DECchip 21064 or 21064A processor • 2 megabytes of backup cache • APECS chipset, which provides logic for external access to the cache for main memory control, and the PCI bus interface • SROM code (SROM tests are controlled by jumper J6 on the CPU daughter board) 5.2.2 Memory Modules AlphaServer 1000A systems can support from 16 megabytes to 1024 megabytes of memory.
Table 5–5 provides the memory requirements and recommendations for each operating system.
• Two serial ports with full modem control and the parallel port • The keyboard and mouse interface • CIRRUS VGA controller • The speaker interface • PCI-to-PCI bridge chip set (PPB) • PCI-to-EISA bridge chip set • EISA system component chip • Time-of-year (TOY) clock • Connectors: – EISA bus connectors (Slots 1 and 2) – PCI bus connectors (Slots 11, 12, and 13–before the bridge) – PCI bus connects (Slots 1, 2, 3, and 4–behind the bridge) – Memory module connectors (20 SIMM connector
5.5 ISA Bus Options The ISA (Industry Standard Architecture) bus is an industry-standard, 16-bit I/O bus. The EISA bus is a superset of the well-established ISA bus and has been designed to be backward compatible with 16-bit and 8-bit architecture. Therefore, ISA modules can be used in AlphaServer 1000A systems, provided the operating system supports the device. Up to two ISA (or EISA) modules can reside in the EISA bus portion of the card cage. Refer to Section 5.
5.6 EISA Configuration Utility Whenever you add or move EISA options or some ISA options in the system, you will need to run a utility called the EISA Configuration Utility (ECU). Each EISA or ISA board has a corresponding configuration (CFG) file, which describes the characteristics and the system resources required for that option. The ECU uses the CFG file to create a conflict-free configuration. The ECU is a menu-based utility that provides online help to guide you through the configuration process.
• If you are configuring an EISA bus that contains both ISA and EISA options, refer to Table 5–7. 4. Locate the correct ECU diskette for your operating system. The ECU diskette is shipped in the accessories box with the system. Make a copy of the appropriate diskette, and keep the original in a safe place. Use the backup copy for configuring options.
The system displays ‘‘loading ARC firmware.’’ When the firmware has finished loading, the ECU program is booted. 3. Complete the ECU procedure according to the guidelines provided in the following sections. • If you are configuring an EISA bus that contains only EISA options, refer to Table 5–6. Note If you are configuring only EISA options, do not perform Step 2 of the ECU, ‘‘Add or remove boards.’’ (EISA boards are recognized and configured automatically.
5.6.3 Configuring EISA Options EISA boards are recognized and configured automatically. Study Table 5–6 for a summary of steps to configure an EISA bus that contains no ISA options. Review Section 5.6.1. Then run the ECU as described in Section 5.6.2. Note It is not necessary to run Step 2 of the ECU, ‘‘Add or remove boards.’’ (EISA boards are recognized and configured automatically.) Table 5–6 Summary of Procedure for Configuring EISA Bus (EISA Options Only) Step Explanation Install EISA option.
5.6.4 Configuring ISA Options ISA boards are configured manually, whereas EISA boards are configured through the ECU software. Study Table 5–7 for a summary of steps to configure an EISA bus that contains both EISA and ISA options. Review Section 5.6.1. Then run the ECU as described in Section 5.6.2. Table 5–7 Summary of Procedure for Configuring EISA Bus with ISA Options Step Explanation Install or move EISA option. Do not install ISA boards. Use the instructions provided with the EISA option.
Table 5–7 (Cont.) Summary of Procedure for Configuring EISA Bus with ISA Options Step Explanation Return to the SRM console (Digital UNIX and OpenVMS systems only) and turn off the system. Refer to step 4 of Section 5.6.2 for information about returning to the console. Install ISA board and turn on the system. Use the instructions provided with the ISA option. 5.
5.7.1 PCI-to-PCI Bridge AlphaServer 1000A systems have a PCI-to-PCI bridge (DECchip 21050) on the motherboard. • Physical PCI slots 11, 12, and 13 (primary PCI) are located before the bridge. • Physical PCI slots 1, 2, 3 and 4 (secondary PCI) are located behind the bridge. Some PCI options are restricted to the primary PCI bus, slots 11, 12, and 13. Refer to the following documents for restrictions on specific PCI options: • AlphaServer 1000A READ THIS FIRST—shipped with the system.
When configuring the StorageWorks shelf, note the following: • Narrow SCSI (8-bit) devices can be used in the wide StorageWorks shelf, as long as the devices are at a supported revision level. The narrow devices will run in narrow mode. • Narrow and wide devices can be mixed in the wide StorageWorks shelf. In a mixed configuration, wide devices run in wide mode and narrow devices run in narrow mode. • For best performance, wide devices should be operated in wide SCSI-2 mode.
5.8.3 SCSI Bus Configurations Table 5–8 provides descriptions of the SCSI configurations available using single, dual, and triple controllers, as well as single and split StorageWorks backplanes. Table 5–8 SCSI Storage Configurations SCSI Buses Configuration Single The native Fast-SCSI-2 controller on the backplane provides 8-bit SCSI support for the removable-media bus; and 16-bit SCSI support for up to four StorageWorks drives in the internal StorageWorks shelf (Figure 5–7).
Figure 5–7 Single Controller Configuration Bus ID 4 Bus ID 5 Bus A 12-45490-01 J10 0 J1 1 J12 2 J2 3 J11 J16 J14 17-04233-01 4 J13 5 12-41667-05 17-04021-01 J15 6 External Terminator W3 W2 W1 17-04022-01 J3 J17 StorageWorks Backplane (Rear) StorageWorks Shelf (Front) MA00900 System Configuration and Setup 5–37
Figure 5–8 Dual Controller Configuration with Split StorageWorks Backplane Bus ID 4 Bus ID 5 Bus A Bus B J10 0 J1 1 Controller Option Card J12 2 J2 3 J11 J16 J14 17-04233-01 12-41667-05 0 J13 1 12-45490-01 17-04022-01 J15 17-04019-01 12-41667-04 External Terminators 2 W3 W2 W1 17-04022-02 J17 J3 StorageWorks Backplane (Rear) StorageWorks Shelf (Front) MA00950 5–38 System Configuration and Setup
Figure 5–9 Triple Controller Configuration with Split StorageWorks Backplane Bus ID 4 Bus ID 5 Bus A Bus B Bus C J10 0 J1 Controller Option Cards 1 J12 2 J2 3 J11 J16 J14 17-04233-01 12-41667-05 0 J13 1 17-04022-01 J15 17-04019-01 2 W3 W2 W1 12-41667-04 17-04022-01 J17 J3 StorageWorks Backplane (Rear) StorageWorks Shelf (Front) 17-04019-01 12-41667-04 External Terminators MA00902 System Configuration and Setup 5–39
5.9 Power Supply Configurations AlphaServer 1000A systems offer added reliability with redundant power options, as well as UPS options. The power supplies for AlphaServer 1000A systems support two different modes of operation. In addition, UPS options are available. Refer to Figure 5–10. Power supply modes of operation: 1. Single power supply 2. Dual power supply (redundant mode)—Provides redundant power (n + 1). In redundant mode, the failure of one power supply does not cause the system to shut down.
Figure 5–10 Power Supply Configurations Redundant Single 400 Watts DC or Less 400 Watts DC or Less UPS UPS MA00335 The H7290-AA power supply kit is used to order a second power supply and current sharing cable.
Figure 5–11 Power Supply Cable Connections Signal/Misc. Harness (22-Pin/15-Pin) + 3.3V Harness (20-Pin) + 5V Harness (24-Pin) 17-03969-01 Current Sharing Harness (3-Pin) J12 Storage Harness (12-Pin) + 5V Harness (24-Pin) J13 + 3.3V Harness (20-Pin) Signal/Misc.
5.10 Console Port Configurations Power-up information is typically displayed on the system’s console terminal. The console terminal may be either a graphics monitor or a serial terminal (connected through the COM1 serial port). There are several SRM console environment variables related with configuring the console ports: Environment Variable Description console Determines where the system will display power-up output.
serial Sets the power-up output to be displayed on the device that is connected to the COM1 port at the rear of the system. Example: P00>>> set console serial P00>>> init . . . !Now switch to the serial terminal. P00>>> show console console serial 5.10.2 set tt_allow_login The setting of the tt_allow_login environment variable enables or disables login to the SRM console firmware on alternative console ports.
5.10.3 set tga_sync_green The tga_sync_green environment variable sets the location of the SYNC signal generated by the ZLXp-E PCI graphics accelerator card. The correct setting, displayed with the show command, is: >>> show tga_sync_green tga_sync_green If the monitor does not synchronize, set the parameter as follows: >>> set tga_sync_green 00 Description: This commands sets all graphics cards to synchronize on a separate vertical SYNC line, as required by some monitors.
5.10.5 Using a VGA Controller Other than the Standard On-Board VGA When the system is configured to use a PCI- or EISA-based VGA controller instead of the standard on-board VGA (CIRRUS), consider the following: • The on-board CIRRUS VGA options must be set to disabled through the ECU. • The VGA jumper (J27) on the upper-left corner of the motherboard must then be set to disable (off). • The console environment variable should be set to graphics.
6 AlphaServer 1000A FRU Removal and Replacement This chapter describes the field-replaceable unit (FRU) removal and replacement procedures for AlphaServer 1000A systems, which use a deskside ‘‘wide-tower’’ enclosure. • Section 6.1 lists the FRUs for AlphaServer 1000A-series systems. • Section 6.2 provides the removal and replacement procedures for the FRUs. 6.
Table 6–1 AlphaServer 1000A FRUs Part # Description Section 17-03970-02 Floppy drive cable (34-pin) Figure 6–5 17-03971-01 OCP module cable (10-pin) Figure 6–6 17-00083-09 Power cord Figure 6–7 17-04195-01 Power supply current sharing cable (3-pin) Figure 6–8 70-31346-01 Power supply DC cable assembly 17-03965-01, Power supply signal/misc harness (15-pin) 17-03966-01, Power supply +5V harness (24-pin) 17-03968-01, Power supply +3.3V harness (20-pin) Section 6.2.
Table 6–1 (Cont.) AlphaServer 1000A FRUs Part # Description Section 70-31350-01 92 mm fan Section 6.2.4 70-31351-01 120 mm fan Section 6.2.4 Fans Internal StorageWorks RZnn -VW StorageWorks disk drive (16-bit SCSI) Section 6.2.5 54-23365-01 Internal StorageWorks backplane Section 6.2.6 12-45490-01 Internal SCSI terminator Figure 6–13 17-04021-01 Internal StorageWorks jumper cable (68-pin) Figure 6–13 Memory Modules 54-21225-BA 1 x 4MB SIMM Section 6.2.
Table 6–1 (Cont.) AlphaServer 1000A FRUs Part # Description Section Other Modules and Components 70-31348-01 Interlock switch Section 6.2.8 54-23499-01 System motherboard Section 6.2.9 21-29631-02 NVRAM chip (E14) Section 6.2.10 21-32423-01 NVRAM TOY clock chip (E78) Section 6.2.10 54-23302-02 OCP module Section 6.2.11 30-43120-02 Power supply (H7290-AA) Section 6.2.12 70-31349-01 Speaker Section 6.2.
Figure 6–1 FRUs, Front Right Tape Drive Interlock Switch Interlock/Server Management Cable DC Cable Assembly CDROM Drive Floppy Drive Floppy Drive Cable OCP Module OCP Cable Hard Disk Drives Current Sharing Cable Power Supply Power Supply Storage Harness StorageWorks Backplane StorageWorks Jumper Cable MA00929 AlphaServer 1000A FRU Removal and Replacement 6–5
Figure 6–2 FRUs, Rear Left Memory Upper Fan SCSI Cables Speaker Lower Fan Power Cord SCSI Removable Media Cable Motherboard CPU Daughter Board NVRAM Chip (E14) NVRAM Toy Clock Chip (E78) 6–6 AlphaServer 1000A FRU Removal and Replacement MA00930
6.2 Removal and Replacement This section describes the procedures for removing and replacing FRUs for AlphaServer 1000 systems, which use the deskside ‘‘wide-tower’’ enclosure. Caution: Before removing the top cover and side panels: 1. Perform an orderly shutdown of the operating system. 2. Set the On/Off button on the operator control panel to off. 3. Unplug the AC power cords. Caution Static electricity can damage integrated circuits.
Figure 6–4 Removing Top Cover and Side Panels Top Cover Release Latch MA00914 6–8 AlphaServer 1000A FRU Removal and Replacement
6.2.1 Cables This section shows the routing for each cable in the system.
Figure 6–6 OCP Module Cable (10-Pin) 17-03971-01 MA01421 Figure 6–7 Power Cord MA00338 6–10 AlphaServer 1000A FRU Removal and Replacement
Table 6–2 lists the country-specific power cables. Table 6–2 Power Cord Order Numbers Country Power Cord BN Number Digital Number U.S., Japan, Canada BN09A-1K 17-00083-09 Australia, New Zealand BN019H-2E 17-00198-14 Central Europe (Aus, Bel, Fra, Ger, Fin, Hol, Nor, Swe, Por, Spa) BN19C-2E 17-00199-21 U.K.
6.2.2 Power Supply DC Cable Assembly STEP 1: REMOVE THE CABLE CHANNEL GUIDE. STEP 2: REMOVE THE POWER SUPPLY DC CABLE ASSEMBLY. The power supply DC cable assembly contains the following cables: • Power supply signal/misc cable (15-pin) • Power supply +5V cable (24-pin) • Power supply +3.
Figure 6–10 Power Supply DC Cable Assembly DC Cable Assembly Signal/Misc. Harness (22-Pin/15-Pin) + 3.3V Harness (20-Pin) + 5V Harness (24-Pin) + 5V Harness (24-Pin) + 3.3V Harness (20-Pin) Signal/Misc.
Figure 6–11 Power Supply Storage Harness (12-Pin) 70-31346-01 J12 Storage Harness (12-Pin) J13 MA01422 6–14 AlphaServer 1000A FRU Removal and Replacement
Figure 6–12 Interlock/Server Management Cable (2-pin) J254 MA00932 AlphaServer 1000A FRU Removal and Replacement 6–15
Figure 6–13 Internal StorageWorks Jumper Cable (68-Pin) 12-45490-01 J10 J1 J12 J2 J11 J16 J14 17-04233-01 12-41667-05 J13 17-04021-01 External Terminator W3 W2 W1 17-04022-02 J15 J3 J17 StorageWorks Backplane (Rear) MA01427 6–16 AlphaServer 1000A FRU Removal and Replacement
Figure 6–14 Wide-SCSI (Controller to StorageWorks Shelf) Cable (68-Pin) J10 J1 Controller Option Card J12 J2 J11 J16 J14 17-04233-01 12-41667-05 J13 12-45490-01 17-04022-01 J15 17-04019-01 W3 W2 W1 12-41667-04 External Terminators 17-04022-02 J17 J3 StorageWorks Backplane (Rear) MA01428 Note Figure 6–14 shows the 17-04022-01 SCSI cable used from the native wide SCSI controller to the J17 connector of the StorageWorks backplane, and the 17-04022-02 SCSI cable used from the option controller to th
J11 connector of the StorageWorks backplane. In Figure 6–15, just the 17-04022-02 variant is used in a single bus configuration.
Figure 6–16 Wide-SCSI (J10 to Bulkhead Connector) Cable (68-Pin) J10 J1 Controller Option Card J12 J2 J11 J16 J14 17-04233-01 12-41667-05 J13 12-45490-01 17-04022-02 J15 17-04019-01 W3 W2 W1 12-41667-04 External Terminators 17-04022-02 J17 J3 StorageWorks Backplane (Rear) MA01430 AlphaServer 1000A FRU Removal and Replacement 6–19
Figure 6–17 SCSI (Embedded 8-bit) Removable-Media Cable (50-Pin) J10 J1 Controller Option Card J12 J2 J11 J16 J14 17-04233-01 12-41667-05 J13 12-45490-01 17-04022-02 J15 17-04019-01 W3 W2 W1 12-41667-04 External Terminators 17-04022-02 6–20 AlphaServer 1000A FRU Removal and Replacement J17 J3 StorageWorks Backplane (Rear) MA01431
6.2.3 CPU Daughter Board Figure 6–18 Removing CPU Daughter Board Crossbar Retaining Screw CPU Card Handle Clips MA00312 Warning: CPU and memory modules have parts that operate at high temperatures. Wait 2 minutes after power is removed before handling these modules.
6.2.4 Fans STEP 1: REMOVE THE CPU DAUGHTER BOARD AND ANY OTHER OPTIONS BLOCKING ACCESS TO THE FAN SCREWS. See Figure 6–18 for removing the CPU daughter board. STEP 2: DISCONNECT THE FAN CABLE FROM THE MOTHERBOARD AND REMOVE FAN.
6.2.5 StorageWorks Drive Note If the StorageWorks drives are plugged into an SWXCR-xx controller, you can ‘‘hot swap’’ drives; that is, you can add or replace drives without first shutting down the operating system or powering down the server hardware. For more information, see StorageWorks RAID Array 200 Subsystem Family Installation and Configuration Guide, EK-SWRA2-IG.
6.2.6 Internal StorageWorks Backplane STEP 1: REMOVE POWER SUPPLIES. Figure 6–21 Removing Power Supply Rear Screws 6/32 Inch (4) Internal Screws 3.5 mm (2) Current Sharing Harness (3-Pin) Storage Harness (12-Pin) + 5V Harness (24-Pin) + 3.3V Harness (20-Pin) Signal/Misc.
STEP 2: REMOVE INTERNAL STORAGEWORKS BACKPLANE.
6.2.7 Memory Modules The positions of the failing single-inline memory modules (SIMMs) are reported by SROM power-up scripts (Section 2.2). Note • Bank 0 must contain a memory option (5 SIMMs–0, 1, 2, 3, and 1 ECC SIMM). • A memory option consists of five SIMMs (0, 1, 2, 3 and 1 ECC SIMM for the bank). • All SIMMs within a bank must be of the same capacity. STEP 1: RECORD THE POSITION OF THE FAILING SIMMS. STEP 2: LOCATE THE FAILING SIMM ON THE MOTHERBOARD.
Warning: Memory and CPU modules have parts that operate at high temperatures. Wait 2 minutes after power is removed before handling these modules. Caution Do not use any metallic tools or implements including pencils to release SIMM latches. Static discharge can damage the SIMMs.
Note SIMMs can only be removed and installed in successive order. For example; to remove a SIMM at bank 0, SIMM 1, SIMMs 0 and 1 for banks 3, 2, and 1 must first be removed.
Note When installing SIMMs, make sure that the SIMMs are fully seated. The two latches on each SIMM connector should lock around the edges of the SIMMs.
6.2.
6.2.9 Motherboard STEP 1: RECORD THE POSITION OF EISA AND PCI OPTIONS. STEP 2: REMOVE EISA AND PCI OPTIONS. STEP 3: REMOVE CPU DAUGHTER BOARD.
Figure 6–28 Removing CPU Daughter Board Crossbar Retaining Screw CPU Card Handle Clips MA00312 Warning: CPU and memory modules have parts that operate at high temperatures. Wait 2 minutes after power is removed before handling these modules. STEP 4: DETACH MOTHERBOARD CABLES, REMOVE SCREWS AND MOTHERBOARD. Caution When replacing the system bus motherboard install the screws in the order indicated.
Figure 6–29 Removing Motherboard 2 1 MA01432 AlphaServer 1000A FRU Removal and Replacement 6–33
STEP 5: MOVE THE NVRAM CHIP (E14) AND NVRAM TOY CHIP (E78) TO THE NEW MOTHERBOARD. Move the socketed NVRAM chip (position E14) and NVRAM TOY chip (E78) to the replacement motherboard and set the jumpers to match previous settings. Note The NVRAM TOY chip contains the os_type environment variable. This environment variable may need to be reset (Section 5.1.4.4).
Figure 6–30 Motherboard Layout Power Connectors Diskette Drive Connector Upper Fan Connector Lower Fan Connector Bank 3 Bank 2 Memory Module Connectors (20) Bank 1 Bank 0 ECC Banks TOY/NVRAM Chip (E78 On Board) CPU Module Connector NVRAM Chip (E14 On Board) EISA 1 EISA 2 PCI Primary Slots PCI Secondary Slots RCM DC Enable Connector 11 12 13 1 2 3 4 Removable Media Internal SCSI Connector (50 Pin Narrow) Speaker Connector RCM Interconnect Connector StorageWorks Internal SCSI Connector (68 pin Wi
6.2.10 NVRAM Chip (E14) and NVRAM TOY Clock Chip (E78) See Figure 6–30 for the motherboard layout. Note The NVRAM TOY clock chip contains the os_type environment variable. The default setting is for the SRM console for OpenVMS or Digital UNIX operating systems. 6.2.11 OCP Module STEP 1: REMOVE FRONT DOOR. STEP 2: REMOVE FRONT PANEL. STEP 3: REMOVE OCP MODULE.
Figure 6–32 Removing Front Panel Remove Hidden Screws Remove Screws MA00307 AlphaServer 1000A FRU Removal and Replacement 6–37
Figure 6–33 Removing the OCP Module J254 Black/Red (To Interlock Switch) Green/Yellow (To Motherboard) MA01423 6–38 AlphaServer 1000A FRU Removal and Replacement
6.2.12 Power Supply STEP 1: DISCONNECT POWER SUPPLY CABLES. STEP 2: REMOVE POWER SUPPLY. Figure 6–34 Removing Power Supply Rear Screws 6/32 Inch (4) Internal Screws 3.5 mm (2) Current Sharing Harness (3-Pin) Storage Harness (12-Pin) + 5V Harness (24-Pin) + 3.3V Harness (20-Pin) Signal/Misc. Harness (15-Pin) MA00933 Warning: Hazardous voltages are contained within. Do not attempt to service. Return to factory for service.
6.2.
6.2.
Figure 6–37 Removing a Tape Drive MA00325 6–42 AlphaServer 1000A FRU Removal and Replacement
Figure 6–38 Removing a Floppy Drive MA00326 AlphaServer 1000A FRU Removal and Replacement 6–43
A Default Jumper Settings This appendix provides the location and default setting for all jumpers in AlphaServer 1000A systems: • Section A.1 provides location and default settings for jumpers located on the motherboard. • Section A.2 provides the location and supported settings for jumpers J3 and J4 on the CPU daughter board. • Section A.3 provides the location and default setting for the J1 jumper on the CPU daughter board.
A.1 Motherboard Jumpers Figure A–1 shows the location and default settings for jumpers located on the motherboard.
Jumper Name Description Default Setting J16 Large Fan Allows the large fan to be disabled to accommodate the alternative enclosures. This jumper is not installed on AlphaServer 1000A systems. J25 Remote Console Module (RCM) DC Enable When enabled, activates the RCM DC enable connector (J17) for use with the RCM. Disabled (as shown in Figure A–1). J27 VGA Enable When enabled (as shown in Figure A–1), the on-board VGA logic is activated.
A.2 CPU Daughter Board (J3 and J4) Supported Settings Figure A–2 shows the supported AlphaServer 1000A 4/266 settings for the J3 and J4 jumpers on the CPU daughter board. These jumpers affect clock speed and other critical system settings. Figure A–3 shows the supported AlphaServer 1000A 4/233 settings for the J3 and J4 jumpers on the CPU daughter board. These jumpers affect clock speed and other critical system settings.
Figure A–3 AlphaServer 1000A 4/233 CPU Daughter Board (Jumpers J3 and J4) J4 J3 MA00791 Supported settings: • J4 Jumper: Off On Off Off On • J3 Jumper: Off Default Jumper Settings A–5
A.3 CPU Daughter Board (J1 Jumper) Figure A–4 shows the default setting for the J1 jumper on the CPU daughter board. For information on SROM tests and the fail-safe loader, which are activated through the J1 jumper, refer to Chapter 2.
Glossary 10Base-T Ethernet network IEEE standard 802.3-compliant Ethernet products used for local distribution of data. These networking products characteristically use twisted-pair cable. ARC User interface to the console firmware for operating systems that require firmware compliance with the Windows NT Portable Boot Loader Specification. ARC stands for Advanced RISC Computing. AUI Ethernet network Attachment unit interface. An IEEE standard 802.
backup cache A second, very fast cache memory that is closely coupled with the processor. bandwidth The rate of data transfer in a bus or I/O channel. The rate is expressed as the amount of data that can be transferred in a given time, for example megabytes per second. battery backup unit A battery unit that provides power to the entire system enclosure (or to an expander enclosure) in the event of a power failure. Another term for uninterruptible power supply (UPS). boot Short for bootstrap.
bystander A system bus node (CPU or memory) that is not addressed by a current system bus commander. byte A group of eight contiguous bits starting on an addressable byte boundary. The bits are numbered right to left, 0 through 7. cache memory A small, high-speed memory placed between slower main memory and the processor. A cache increases effective memory transfer rates and processor speed.
cluster A group of networked computers that communicate over a common interface. The systems in the cluster share resources, and software programs work in close cooperation. cold bootstrap A bootstrap operation following a power-up or system initialization (restart). On Alpha based systems, the console loads PALcode, sizes memory, and initializes environment variables. commander In a particular bus transaction, a CPU or standard I/O that initiates the transaction.
data cache A high-speed cache memory reserved for the storage of data. Abbreviated as D-cache. DEC VET Digital DEC Verifier and Exerciser Tool. A multipurpose system diagnostic tool that performs exerciser-oriented maintenance testing. diagnostic program A program that is used to find and correct problems with a computer system. Digital UNIX A general-purpose operating system based on the Open Software Foundation technology.
ECC Error correction code. Code and algorithms used by logic to facilitate error detection and correction. EEPROM Electrically erasable programmable read-only memory. A memory device that can be byte-erased, written to, and read from. EISA bus Extended Industry Standard Architecture bus. A 32-bit industry-standard I/O bus used primarily in high-end PCs and servers.
FIB Flexible interconnect bridge. A converter that allows the expansion of the system enclosure to other DSSI devices and systems. field-replaceable unit Any system component that a qualified service person is able to replace on site. firmware Software code stored in hardware. fixed-media compartments Compartments that house nonremovable storage media. Flash ROM Flash-erasable programmable read-only memory. Flash ROMs can be bank- or bulk-erased. FRU Field-replaceable unit.
instruction cache A high-speed cache memory reserved for the storage of instructions. Abbreviated as I-cache. interrupt request lines (IRQs) Bus signals that connect an EISA or ISA module (for example, a disk controller) to the system so that the module can get the system’s attention through an interrupt. ISA Industry Standard Architecture. An 8-bit or 16-bit industry-standard I/O bus, widely used in personal computer products. The EISA bus is a superset of the ISA bus. LAN Local area network.
MAU Medium attachment unit. On an Ethernet LAN, a device that converts the encoded data signals from various cabling media (for example, fiber optic, coaxial, or ThinWire) to permit connection to a networking station. memory interleaving The process of assigning consecutive physical memory addresses across multiple memory controllers. Improves total memory bandwidth by overlapping system bus command execution across multiple memory modules.
NVRAM Nonvolatile random-access memory. Memory that retains its information in the absence of power. OCP Operator control panel.
portability The degree to which a software application can be easily moved from one computing environment to another. porting Adapting a given body of code so that it will provide equivalent functions in a computing environment that differs from the original implementation environment. power-down The sequence of steps that stops the flow of electricity to a system or its components. power-up The sequence of events that starts the flow of electrical current to a system or its components.
reliability The probability a device or system will not fail to perform its intended functions during a specified time. responder In any particular bus transaction, memory, CPU, or I/O that accepts or supplies data in response to a command/address from the system bus commander. RISC Reduced instruction set computer. A processor with an instruction set that is reduced in complexity. ROM-based diagnostics Diagnostic programs resident in read-only memory.
SRM User interface to console firmware for operating systems that expect firmware compliance with the Alpha System Reference Manual (SRM). storage array A group of mass storage devices, frequently configured as one logical disk. StorageWorks Digital’s modular storage subsystem (MSS), which is the core technology of the Alpha SCSI-2 mass storage solution. Consists of a family of low-cost mass storage products that can be configured to meet current and future storage needs.
test-directed diagnostics (TDDs) An approach to diagnosing computer system problems whereby error data logged by diagnostic programs resident in read-only memory (RBDs) is analyzed to capture information about the problem. thickwire One-half inch, 50-Ohm coaxial cable that interconnects the components in many IEEE standard 802.3-compliant Ethernet networks. ThinWire Ethernet cabling and technology used for local distribution of data communications. ThinWire cabling uses BNC connectors.
write back A cache management technique in which data from a write operation to cache is written into main memory only when the data in cache must be overwritten. write-enabled Indicates a device onto which data can be written. write-protected Indicates a device onto which data cannot be written. write through A cache management technique in which data from a write operation is copied to both cache and main memory.
Index A A: environment variable, 5–7 AC power-up sequence, 2–24 Acceptance testing, 3–18 arc command, 5–4 ARC interface, 5–3 switching to SRM from, 5–4 AUTOLOAD environment variable, 5–8 B Beep codes, 2–2, 2–21, 2–25, 2–26 Boot diagnostic flow, 1–6 Boot menu (ARC), 2–11 C Card cage location, 5–23 cat el command, 2–11, 3–7 CD–ROM LEDs, 2–17 CFG files, 2–19 COM2 and parallel port loopback tests, 3–4 Commands diagnostic, summarized, 3–2 diagnostic-related, 3–3 firmware console, functions of, 1–8 to examine s
Console event log, 2–11 Console firmware diagnostics, 2–26 Digital UNIX, 5–3 OpenVMS, 5–3 Windows NT, 5–3 Console interfaces switching between, 5–4 Console output, 5–44 Console port configurations, 5–43 CONSOLEIN environment variable, 5–7 CONSOLEOUT environment variable, 5–7 COUNTDOWN environment variable, 5–8 CPU daughter board, 5–24 Crash dumps, 1–9 D DC power-up sequence, 2–25 DEC VET, 1–8, 3–18 DECevent, 1–7 Device naming convention SRM, 5–14 Devices Windows NT firmware device display, 5–6 Windows NT f
Environment variables set during system configuration, 5–16 Error handling, 1–7 logging, 1–7 Error formatters DECevent, 4–5 Error log translation Digital UNIX, 4–6 OpenVMS, 4–5 Error logging, 4–4 event log entry format, 4–4 Ethernet external loopback, 3–4 Event logs, 1–7 Event record translation Digital UNIX, 4–5 OpenVMS, 4–5 Exceptions how PALcode handles, 4–1 F Fail-safe loader, 2–21 activating, 2–21 power-up using, 2–21 Fan failure, 1–3 Fast Track Service Help File, 1–9 Fault detection/correction, 4–1 K
Logs event, 1–7 Loopback tests, 1–8 COM2 and parallel ports, 3–4 command summary, 3–3 M Machine check/interrupts, 4–2 processor, 4–2 processor corrected, 4–2 system, 4–2 Maintenance strategy, 1–1 service tools and utilities, 1–7 Mass storage described, 5–34 Mass storage problems at power-up, 2–12 fixed media, 2–12 removable media, 2–12 memory command, 3–8 Memory module configuration, 5–24 displaying information for, 5–15 minimum and maximum, 5–24 Memory tests, 2–5 Memory, main exercising, 3–8 Modules CPU,
R RAID diagnostic flow, 2–14 RAID problems, 2–14 Removable media storage problems, 2–12 ROM-based diagnostics (RBDs), 1–7 diagnostic-related commands, 3–3 performing extended testing and exercising, 3–3 running, 3–1 utilities, 3–2 S SCSI bus on-board, 5–34 SCSI devices Windows NT firmware device names, 5–5, 5–6 Serial ports, 5–44 Serial ROM diagnostics, 2–25 Service tools and utilities, 1–7 set command (SRM), 5–15 show command (SRM), 5–15 show configuration command (SRM), 5–9 show device command (SRM), 5–1
Troubleshooting (cont’d) EISA problems, 2–18 error report formatter, 1–7 errors reported by operating system, 1–7 interpreting error beep codes, 2–2 mass storage problems, 2–12 PCI problems, 2–20 power problems, 1–3 problem categories, 1–2 problems getting to console, 1–4 problems reported by the console, 1–5 RAID, 2–14 RAID problems, 2–14 Index–6 SIMMs, 2–5 with DEC VET, 1–8 with loopback tests, 1–8 with operating system exercisers, 1–8 with ROM-based diagnostics, 1–7 W Windows NT firmware Available har
How to Order Additional Documentation Technical Support If you need help deciding which documentation best meets your needs, call 800-DIGITAL (800-344-4825) and press 2 for technical assistance. Electronic Orders If you wish to place an order through your account at the Electronic Store, dial 800-234-1998, using a modem set to 2400- or 9600-baud. You must be using a VT terminal or terminal emulator set at 8 bits, no parity.
Reader’s Comments AlphaServer 1000A Service Guide EK–ALPSV–SV. A01 Your comments and suggestions help us improve the quality of our publications. Thank you for your assistance.
Do Not Tear – Fold Here and Tape TM BUSINESS REPLY MAIL FIRST CLASS PERMIT NO. 33 MAYNARD MASS.