DEC 7000 AXP System VAX 7000 System Service Manual Order Number EK–7002B–SV.002 This manual tells how to add or replace CPU and memory modules in a DEC 7000 AXP or VAX 7000 system.
First Printing, November 1992 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. The software, if any, described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license.
Contents Preface ..................................................................................................... vii Chapter 1 Adding or Replacing CPUs and Memories 1.1 1.2 1.3 1.4 1.5 1.6 What Is Required ................................................................... 1-2 LSB Configuration Rules ....................................................... 1-4 Identifying the Kernel FRUs ................................................ 1-6 Removing a Module from the LSB Card Cage ....................
3.6 3.7 3.8 3.9 Exit ....................................................................................... 3-12 Display and Verify Commands ............................................ 3-14 How to Update Corrupted Firmware .................................. 3-16 How to Modify Device Attributes ........................................
Table 2 Table 1-1 Table 2-1 Related Documents .................................................................. xi Field-Replaceable Units ......................................................... 1-6 EEPROM Environment Variables ........................................
Preface Intended Audience This manual is written for Digital customer service engineers and selfmaintenance customers servicing DEC 7000 AXP or VAX 7000 systems. This manual is a follow-on to Basic Troubleshooting and Advanced Troubleshooting. Document Structure This manual uses a structured documentation design. Topics are organized into small sections for efficient on-line and printed reference. Each topic begins with an abstract.
Conventions Used in This Document Terminology. Unless specified otherwise, the use of "system" refers to either a DEC 7000 AXP or VAX 7000 system. The DEC 7000 AXP systems use the Alpha AXP architecture. References in text use DEC 7000 to refer to DEC 7000 AXP systems. When a discussion applies to only one system, an icon is used to highlight that system. Otherwise, the discussion applies to both systems.
Table 1 DEC 7000/VAX 7000 Documentation Title Order Number Installation Kit EK–7000B–DK Site Preparation Guide EK–7000B–SP Installation Guide EK–700EB–IN Hardware User Information Kit EK–7001B–DK Operations Manual EK–7000B–OP Basic Troubleshooting EK–7000B–TS Service Information Kit—VAX 7000 EK–7002A–DK Platform Service Manual EK–7000A–SV System Service Manual EK–7002A–SV Pocket Service Guide EK–7000A–PG Advanced Troubleshooting EK–7001A–TS Service Information Kit—DEC 7000 EK–7002
Table 1 DEC 7000/VAX 7000 Documentation (Continued) Title Order Number Reference Manuals Console Reference Manual EK–70C0B–TM KA7AA CPU Technical Manual EK–KA7AA–TM KN7AA CPU Technical Manual EK–KN7AA–TM MS7AA Memory Technical Manual EK–MS7AA–TM I/O System Technical Manual EK–70I0A–TM Platform Technical Manual EK–7000A–TM Upgrade Manuals x KA7AA CPU Installation Guide EK–KA7AA–IN KN7AA CPU Installation Guide EK–KN7AA–IN MS7AA Memory Installation Guide EK–MS7AA–IN KZMSA Adapter Installa
Table 2 Related Documents Title Order Number General Site Preparation Site Environmental Preparation Guide EK–CSEPG–MA System I/O Options BA350 DECstor/me Modular Storage Shelf Subsystem Configuration Guide EK–BA350–CG BA350 DECstor/me Modular Storage Shelf Subsystem User’s Guide EK–BA350–UG BA350-LA DECstor/me Modular Storage Shelf User’s Guide EK–350LA–UG CIXCD Interface User Guide EK–CIXCD–UG DEC FDDIcontroller 400 Installation/Problem Solving EK–DEMFA–IP DEC LANcontroller 400 Installation
Table 2 Related Documents (Continued) Title Order Number Operating System Manuals Alpha Architecture Reference Manual EY–L520E–DP DEC OSF/1 Guide to System Administration AA–PJU7A–TE DECnet for OpenVMS Network Management Utilities AA–PQYAA–TK Guide to Installing DEC OSF/1 AA–PS2DA–TE OpenVMS Alpha Version 1.
Chapter 1 Adding or Replacing CPUs and Memories This chapter provides information on how to remove and install processor and memory modules in DEC 7000 and VAX 7000 systems.
1.1 What Is Required Adding or replacing processor or memory modules is a simple operation. Afterward you must verify that the new modules are recognized in the system. You may need to set system parameters.
Processor and memory modules reside in the LSB card cage, a centerplane card cage with nine slots for modules. The LSB card cage always contains an IOP module, a clock module, and at least one processor and one memory module (see Figure 1-1). To add or replace modules, you will follow the steps in Sections 1.2 through 1.6. Then you will: • Set system parameters to the original operating environment (Chapter 2). • Upgrade firmware if required (Chapter 3).
1.2 LSB Configuration Rules The first CPU module is node 0, and the first memory module is at node 7. The LSB bus requires that an IOP module be at node 8. See Figure 1-2.
The LSB card cage (see Figure1-2) has nine slots. Slot numbers are equivalent to node numbers. Four slots are at the front of the cabinet (nodes 0 through 3, right to left), and five slots are at the rear (nodes 4 through 8, right to left). A system can have up to six processors and up to seven memory modules, as space allows. The maximum memory configuration is bounded by the operating system support and the physical slots. • The first CPU module is installed in node 0 (in the front at the far right).
1.3 Identifying the Kernel FRUs Table 1-1 lists the field-replaceable units (FRUs) for DEC 7000 and VAX 7000 systems that are discussed in this book. Table 1-1 Field-Replaceable Units Option No. Part No.
Each memory or processor board is enclosed in the module case, protecting the module electronics from static discharge. A barcode label gives information about the module, including the module part number, revision level, and the module serial number (see Figure 1-4).
1.4 Removing a Module from the LSB Card Cage Use the following procedure to remove a module from the LSB card cage for replacement or reconfiguration. Removing a Module from the LSB Card Cage SGO1234567 Figure 1-5 E 04 2. E2043-AA 1.
1. Perform an orderly shutdown of the system. 2. Turn the keyswitch on the front control panel to the Disable position and wait for the control panel yellow Fault LED to stop flashing. When the Fault LED stops flashing, power has been removed from the LSB backplane and you may safely proceed. 3. Open the cabinet door by holding the recessed handhold and pulling it out toward you. 4. Put on the antistatic wrist strap. CAUTION: You must wear a wrist strap when you handle any modules. 5.
1.5 Inserting a Module in the LSB Card Cage Use the following procedure when replacing or adding a module in the system card cage during maintenance or upgrade.
Follow Steps 1 through 6 in Section 1.4 and then: 1. If you are adding a module, remove the filler module from the slot where you will install the new module. Hold the filler module firmly on the vertical piece closest to you and gently pull it out toward you. Place it aside for return. 2. On the module to be inserted, pull out the two black restraining clips to the right and pull the two levers out until they are perpendicular to the front edge of the module. The clips snap open. 3.
1.6 Verifying the System Power up the system and check that all processor and memory modules appear in the self-test display. Example 1-1 F E . + . . D . . . . C + . . . Self-Test Display B . + . . Firmware Rev = P00>>> A . . . . 9 . . . . 8 A o . o . + . 7 M + . + . + . 6 M + . + . + . 5 . . . . . . . 4 . . . . . . . 3 . . . . . . . 2 . . . . . . . 1 P + E + E + E + + . . . . . . + . . . . + . . + . . . . . . . . . . + + . . . A1 A0 . .128 128 . . . . . . . . . V1.
Power up the system by turning the keyswitch from Disable to either the Enable or Restart position. Power sequencing begins and the system runs self-test. Check the self-test display to make sure that the system recognizes the newly installed modules. Example 1-1 shows the self-test display of a system in which one processor and one memory module were added. The newly installed modules are at nodes 1 and 6, respectively. 1 On the TYP line the P indicates that processors are at nodes 0 and 1.
Chapter 2 Servicing the CPU This chapter describes how to service a CPU in a DEC 7000 or a VAX 7000 system should it break or should new CPUs be added to a system. Some CPU firmware problems are covered in this chapter; others are covered in Chapter 3.
2.1 System Parameters Several system parameters must be set during repair or when adding CPUs. Other system parameters may require setting depending upon how the customer wants the system configured. Table 2-1 EEPROM Environment Variables Environment Variable Default Value auto_action Halt Specifies the action the system will take following an error halt. Values are: restart - Automatically restart. If restart fails, boot the operating system. boot - Automatically boot the operating system.
Table 2-1 shows the permanent environment variables stored in EEPROM. Some of these variables must be set when either adding a CPU or replacing a broken one. You may view these variables by typing show * at the console prompt. Volatile environment variables are initialized by a system reset; others are nonvolatile across system failures. Environment variables can be created and modified using the create and set commands, respectively.
Table 2-1 EEPROM Environment Variables (Continued) Environment Variable Default Value cpu_enabled 0xff A bitmask determining which CPUs are enabled to run (leave console mode). If not defined, all available processors are considered enabled. cpu_primary 0xff A bitmask indicating which CPUs are eligible to become the primary processor, following the next system reset. If not defined, all available processors are considered enabled. d_harderr Halt Determines action taken following a hard error.
Another important environment variable, not shown in Table 2-1, because it is not a default variable but may be defined by the customer, is a nickname. Should the customer want to have a default boot path for a cluster and a different local one, a nickname variable may be used for that purpose. Nicknames are set by a console command of the form create -nv old_disk dua0.0.0.4.0. The -nv option indicates this nonvolatile environment variable will be stored in EEPROM.
2.2 How to Replace the Only Processor When replacing the only processor in a system, you must store the system ID and customized boot paths. If the customer changed console environment variables from the default values, you will want to set them as the customer wishes. Example 2-1 Replacing a Single Processor >>> show device # Shows device sizes in the # system and the path to the # devices. polling for units on kzmsa0, slot 3, xmi0... dka300.3.0.3.0 DKA300 RZ73 dka400.4.0.3.0 DKA400 RZ73 dkb400.4.1.3.
After you have removed and replaced the defective module, following the instructions in Chapter 1, take the following steps: 1. Power up the system. Self-test is run and you need to decide whether the new CPU module is functioning properly. If it is not, try reseating the new CPU and/or refer to the Advanced Troubleshooting manual; otherwise continue. 2.
2.3 How to Replace the Boot Processor In cases where the boot processor in a multiprocessing system is the CPU that is in need of repair, you need to manipulate which CPU receives data from the console terminal. Example 2-2 Replacing the Boot Processor F E D C B A 9 8 A o . o . + . 7 M + . + . + . 6 . . . . . . . 5 . . . . . . . 4 . . . . . . . 3 . . . . . . . 2 . . . . . . . 1 P + E + E + E . . . . . . . . . . . . . . . . + . . . . . . . + . . . . . . . + . . . . . . . . . . .
There are at least two factors to consider when replacing a primary CPU: 1. The desire to retain the system environment. 2. The possibility that the new CPU is at a higher or lower firmware revision than other CPUs in the system. 2 shows the mismatch message should the firmware differ between CPUs. Note that in this case the newer primary has a higher firmware revision than the secondary. The procedure described here takes these into consideration.
2.4 How to Add a New Processor or Replace a Secondary Processor Add a new secondary in the slot to the left of the boot processor or other secondary processors. Example 2-3 Adding or Replacing a Secondary Processor F E D C B A 9 8 A o . o . + . 7 M + . + . + . 6 . . . . . . . 5 . . . . . . . 4 . . . . . . . 3 . . . . . . . 2 . . . . . . . 1 P + E + E + E . . . . . . . . . . . . . . . . + . . . . . . . + . . . . . . . + . . . . . . . . . . . . . . . + . . . . . . . . A0 .
There are at least two factors to consider when adding or replacing a CPU: 1. The desire to retain the system environment. 2. The possibility that the new CPU is at a higher firmware revision than other CPUs in the system. 2 shows the mismatch message should the firmware differ between CPUs. In this case the newer secondary has a higher firmware revision than the older primary. The procedure described here takes these into consideration.
2.5 Build EEPROM Command Should the EEPROM become corrupted, you can use the build eeprom command to recover. The build eeprom command is the proper response to the console error messages shown in Example 2-4. If the build eeprom command fails, return the module for repair. Example 2-4 Build EEPROM Command EEPROM image failed to verify #Checksum bad. EEPROM environment parameters not set up #Ev area corrupt. Fail to update EEPROM envar on CPU x #Cannot write #EEPROM.
Should an EEPROM become corrupted, the error message, EEPROM image failed to verify, is printed. Should this occur, use the build eeprom command to rebuild the EEPROM. When the EEPROM is rebuilt, all settings will revert to default settings. Follow Section 2.2 to customize environment variables. The build eeprom command prompts you for several pieces of information.
2.6 FEPROM Recovery—Hardware Requirements When FEPROMs are corrupt and you do not have a CPU to use the update -f command, you may be able to recover the console and diagnostic code through the console terminal line. A serial line receive program in the serial ROM forces a prompt, AXP- or VAX7000/10000-FRRC>, on the console terminal.
There are three methods for recovering console and diagnostic code. The first is to use the update -f command; the second is to use the Loadable Firmware Update (LFU) Utility; and the third is to downline load the console/diagnostic firmware into the damaged system and copy it into the FEPROMs. The use of the update -f command can only be done in a multiprocessing system and is documented in Sections 2.3 and 2.4. LFU can be used when the console is completely functioning and is described in Chapter 3.
2.7 FEPROM Recovery—Software Requirements and Setup On the source system you need to "bind" the RRD42 or InfoServer to a virtual disk container, mount it, and set the terminal speed to that of the target console. Example 2-5 Setting Up the Source System $ set term/speed=9600/perm txa3: # # # # # $ mcr ess$ladcp LADCP> BIND VAX7000_V01 VAX7000_V01 is bound to DAD104 LADCP> exit $ mount/ov=id dad104 # $ dir dad104:[sys0.sysexe] # . # . # VAX7000_10000_CONSOLE_IMAGE.GROM .
Example 2-5 illustrates the steps needed to prepare Kermit with OpenVMS VAX. What you do is: 1. First make sure that you have the hardware necessary to perform the task. 2. Make sure you have the correct CD-ROM for the damaged system. 3. Set the terminal speed on the source system to 9600. FRRC only works at 9600 baud. 4. Run LADCP at the source system to "bind" the CD-ROM volume name to a virtual disk container pointed to by a logical name created by LADCP. 5.
2.8 FEPROM Recovery—Procedure After Kermit has been set up and you are ready to downline load the file, AXP or VAX7000_10000_CONSOLE_IMAGE.GROM, connect to the target system, prepare it to receive the file and then load it. The final steps are to copy the file into the FEPROMs and boot the system. Note that all commands are entered on the source system. Example 2-6 Using Kermit to Downline Load FEPROM Code Kermit-32> connect txa5: # Line = com path e.g.
Assuming you have the correct CD-ROM in an InfoServer, you are now ready to connect to the damaged target system and downline load the code. Example 2-6 illustrates a VAX 7000 recovery. Follow the same steps for the DEC 7000, using the AXP7000_10000_console_image.grom file. 1. At the Kermit prompt, connect to the target system. Here are two examples of connections: connect txa5: Logically connect to the target console line connect lta1004: Logically connect to the target console line 2.
Chapter 3 Updating Firmware Use the Loadable Firmware Update (LFU) Utility to update system firmware. LFU runs without any operating system and can update the firmware on any system module. LFU handles modules on the LSB bus (for example, the CPU) as well as modules on the I/O buses (for example, a CI controller on the XMI bus). You are not required to specify any hardware path information, and the update process is highly automated.
3.1 Booting LFU on a DEC 7000 System LFU is supplied on the DEC 7000/10000 AXP Console CDDEC ROM (Part Number AG-PQW3*-RE, where * is the letter 7000 that denotes the disk revision). Make sure this CD-ROM is mounted in the RRD42 in-cabinet CD drive. Boot LFU from the CD-ROM. Example 3-1 RRD42 LFU Booting >>> show device 1 polling for units dka100.1.0.1.0 polling for units dub1.1.0.6.0 dub2.2.0.6.0 >>> boot dka100 Booting... on kzmsa, slot 1, xmi0... dka100 RRD42 on kfmsb0, slot 6, xmi0...
1 Use the show device command to find the name of the RRD42 CD drive. 2 Enter the boot command to boot from the RRD42. The RRD42 has a device name of dka100. 3 LFU starts, displays a summary of its commands, and issues its prompt (Function?).
3.2 Booting LFU on a VAX 7000 LFU is supplied on the VAX 7000/10000 Console CD-ROM VAX (Part Number AG-PQW1*-RE, where * is the letter that de7000 notes the disk revision). Make sure this CD-ROM is mounted in one of the system’s InfoServers. Boot the Initial System Load (ISL) program, and select the service name corresponding to the console CD-ROM. Example 3-2 Booting LFU >>> boot exa0 -file ISL_LVAX_V01 Resulting file is mopdl:ISL_LVAX_V01/exa0 ......
Copyright Digital Equipment Corporation 1992 All Rights Reserved. Loadable Environment Rev: V1.0-1625 Jul 12 1992 10:50:56 ***** Loadable Firmware Update Utility ***** Version 2.01 16-jun-1992 ------------------------------------------------------------------Function Description ------------------------------------------------------------------Display Exit List Displays the system’s configuration table. Return to loadable offline operating environment.
3.3 Show The show command shows the current revision of firmware and hardware for every module in the system that contains microcode. In the display, each module that needs to be updated is indicated by a plus sign (+) following the device mnemonic.
1 If you type just the command show without a device mnemonic, LFU prompts for the device mnemonic. All the commands that require device mnemonics will prompt. 2 If you enter ? (or help) for the device, a table displays the syntax for specifying devices. All the commands that require device specifications use this syntax. Note the use of wildcards. For example, show kdm70* would display all KDM70 controller modules.
3.4 List The list command displays the inventory of update firmware on the CD-ROM. Only the devices listed at your terminal are supported for firmware updates. Example 3-4 List Command Function? l 1 Loadable Firmware Update Utility Version 2.01 2 Name Mnemonic Update Firmware Revision CIXCD cixcd* 70.00 KDM70 kdm70* 3.00 All Revisions KN7AA kn7aa* 1.01 All Revisions KZMSA kzmsa* 2.
1 The list command shows the revisions of firmware corresponding to the revisions of hardware for each device. (There may be several hardware revisions for a particular device, but only one firmware revision corresponds to any hardware revision.) Comparing the output of the list and show commands helps you understand which devices should receive firmware updates. 2 VAX 7000 systems do not support kn7aa and kzmsa. The following devices show up in the display instead: KA7AA KFMSA ka7aa* kfmsa* 1.10 5.
3.5 Update The update command writes new firmware from the CD-ROM to the module. Then LFU automatically verifies the update by reading the new firmware image from the module back into memory and comparing it with the CD-ROM image. Example 3-5 Function? Update Command update kn7aa0 cixcd0 1 2 Update kn7aa0? [Y/(N)] y WARNING: updates may take several minutes to complete for each device. DO NOT ABORT! Updating to 1.10... Reading Device... Verifying 1.10...PASSED.
7 Continue? [Y/(N)] y WARNING: updates may take several minutes to complete for each device. DO NOT ABORT! demna0 Updating to 6.06... Reading Device... Verifying 6.06... PASSED. Function? update demna* 8 Update all demna? [Y/(N)] n Function? 9 1 This command specifically requests firmware updates for the CPU and CIXCD modules. Note the syntax of a device list, separated by spaces. 2 LFU requires you to confirm each update, if you named the modules specifically.
3.6 Exit The exit command terminates the LFU program, causes system initialization and self-test, and returns to the system console prompt. Example 3-6 Function? Exit Command show Device Mnemonic(s)? 1 exit 2 Function? exit Initializing... F E D C B A 9 8 A o . o . + . 7 M + . + . + . 6 . . . . . . . 5 . . . . . . . 4 . . . . . . . 3 . . . . . . . 2 . . . . . . . 1 P + E + E + E . . . . . . . . . . . . . . . . + . . . . . . . + . . . . . . . + . . . . . . . . . . . . . . .
1 From within the "Device Mnemonic(s)?" prompt, exit returns to the Function prompt. 2 At the Function prompt, exit causes the system to be initialized. 3 The console prompt appears.
3.7 Display and Verify Commands Display and verify commands are used in special situations. Display shows the physical configuration. Verify repeats the verification process performed by the update command. Example 3-7 Function? disp Name LSB 0+ KN7AA 5+ MS7AA 7+ MS7AA 8+ IOP Display and Verify Commands 1 Type Rev Mnemonic (8002) (4000) (4000) (2000) 0000 0000 0000 0001 kn7aa0 ms7aa0 ms7aa1 iop0 1.
1 Display shows the system physical configuration. Display is equivalent to issuing the console command show configuration. Because it shows the LSB slot for each module, display can help you identify unknown devices. 2 Verify reads the firmware from the module into memory and compares it with the update firmware on the CD-ROM. If a module already verified successfully when you updated it, but later failed selftest, you can use verify to tell whether the firmware has become corrupted.
3.8 How to Update Corrupted Firmware If LFU identifies a device as unknown, the firmware on the module is corrupted. The update command allows you to specify the correct device type so that new firmware can be written to the module. Example 3-8 Updating an "Unknown" Device Function? sho * 1 Firmware Revision kn7aa0 + ms7aa0 iop0 xmi0 kdm700 demna0 unknown0 cixcd0 + 2 1.00 ---3.00 --69.00 Hardware Revision E04 ---Cannot be read --A01 not supported. not supported. not supported. not supported.
unknown0 8 Updating to 70.00... Reading Device... Verifying 70.00... PASSED. Function? exit 9 Initializing... F E D C B A 9 8 7 6 [self-test map appears] . A0 . .128 . Firmware Rev = V1.0-1625 SROM >>> sho config Name Type LSB 0+ KN7AA (8002) 7+ MS7AA (4000) 8+ IOP (2000) 5 4 3 2 1 0 NODE # . . . . . . ILV . . . . . . 128Mb Rev = V1.
3.9 How to Modify Device Attributes The modify command can change parameters stored in EEPROM on the following devices: KZMSA (DEC 7000 system), KFMSA (VAX 7000), DEC LANcontroller 400 (DEMNA), and CIXCD. The attributes are specific to each device.
Enter new value (HEX) or to keep present value: Secondary Lock Retries? [(0020)] Enter new value (HEX) or to keep present value: Modify DSSI Timeouts? [Y/(N)]n 4 Modify DSSI Retries? [Y/(N)]n 4 Modify XMI Timeouts? [Y/(N)]n 4 Finished display/modify parameters? [(Y)/N]y Function? m demna0 5 demna0 Remote Boot: Remote Console: Local Console: Monitor Facility: Promiscuous Mode: Log Selftest Errors: Log NI RBD Errors: Log XMI RBD Errors: Log XNA RBD Errors: Diagnostic Error Logging: Error Frame Ove
1 The CIXCD has only one parameter: the hardware revision. You would need to modify the value only if the EEPROM had become corrupted. 2 When you modify the KFMSA, LFU first displays all the parameters for both ports. 3 You select which port to modify. 4 LFU prompts for parameters by category. 5 LFU displays the DEMNA parameters. 6 This example modifies one parameter, disabling remote booting.
Appendix A Kermit Parameters To transmit a file using Kermit, the following parameters must be set: Kermit-32> show all VMS Kermit-32 version 3.3.
End of line character Quoting character 015 (octal) 043 (octal) 8-bit quoting character Start of packet 046 (octal) 001 (octal) Transmit parameters Delay Echo Repeat quoting character A-2 Kermit Parameters 0.
Index A Attributes, setting device, 3-18, 3-19 B Booting DSSI VAXcluster, 3-2 LFU, 3-2 Build eeprom command, 2-12 software requirements, 2-16 FEPROM recovery code, 2-15 Field-replaceable units, 1-6 Firmware corrupted, 3-16 Firmware revision of CPU, 2-9 Firmware updating, 3-1 FRU part numbers, 1-6 I C Configuration rules, LSB, 1-4 Console CD-ROM part number AG-PQW1*-RE, 3-4 part number AG-PQW3*-RE, 3-2 Console commands build eeprom, 2-12 create, 2-6 set eeprom serial, 2-6 update -e, 2-8, 2-10 update -f, 2
R Replace boot processor, 2-8 only processor, 2-6 secondary processor, 2-10 RRD42, 2-15, 2-16, 3-1, 3-3 S Self-test display, 1-12 Set CPU command, 2-9 Set eeprom serial command, 2-6 Show command, LFU, 3-6 Show device command, 2-7 System parameters, how to set, 2-2 U Update command, 2-9 Update command, LFU, 3-10 V Verification, 1-2, 2-2 Verify command, LFU, 3-14 Index-2