AlphaServer GS60E Service Manual Order Number: EK-GS60E-SV. A01 This manual is intended for Compaq service engineers. It includes troubleshooting information, configuration rules, and instructions for removal and replacement of field-replaceable units (FRUs) for the Compaq AlphaServer GS60E system.
First Printing, February 2000 The information in this publication is subject to change without notice. COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. This publication contains information protected by copyright.
Contents Preface ........................................................................................................................xi Chapter 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 System Overview................................................................................... 1-2 TLSB System Bus ................................................................................. 1-4 Processor Module ..................................................................................
Chapter 4 4.1 4.1.1 4.1.2 4.1.3 4.2 4.3 4.4 4.5 4.5.1 4.5.2 4.5.3 4.6 4.6.1 4.6.2 4.6.3 Brief Description of the TLSB Bus........................................................ 4-2 Command/Address Bus................................................................... 4-2 Data Bus ......................................................................................... 4-3 Error Checking ...............................................................................
Appendix A A.1 A.2 A.3 A.4 A.5 A.6 Booting LFU..........................................................................................A-2 List ........................................................................................................A-4 Update...................................................................................................A-6 Exit......................................................................................................A-10 Display and Verify Commands ..........
A–5 A–6 Display and Verify Commands ...........................................................A-12 Create Command ................................................................................A-14 Figures 1–1 1–2 1–3 1–4 1–5 1–6 1–7 1–8 1–9 1–10 2–1 2–2 2–3 2–4 2–5 2–6 2–7 2–8 2–9 3–1 4–1 5–1 5–2 5–3 5–4 5–5 5–6 5–7 5–8 5–9 5–10 5–11 5–12 5–13 viii AlphaServer GS60E System ................................................................. 1-2 TLSB Card Cage ............................................
5–14 5–15 5–16 Plenum Assembly................................................................................ 5-38 Cabinet Panels .................................................................................... 5-40 Cables.................................................................................................. 5-42 Tables 1 1–1 2–1 2–2 2–3 4–1 4–2 4–3 4–4 4–5 5–1 B–1 B–2 B–3 Compaq AlphaServer GS60E Documentation ....................................... xii Memory Modules and Related SIMMs..
Preface Intended Audience This manual is written for the customer service engineer. Document Structure This manual uses a structured documentation design. Topics are organized into small sections, usually consisting of two facing pages. Most topics begin with an abstract that provides an overview of the section, followed by an illustration or example. The facing page contains descriptions, procedures, and syntax definitions. This manual has five chapters and two appendixes.
Documentation Titles Table 1 Compaq AlphaServer GS60E Documentation Title Order Number Hardware User Information and Installation AlphaServer GS60E Installation Guide EK–GS60E–IN AlphaServer GS60E Operations Manual EK–GS60E–OP KFTHA System I/O Module Installation Card EK–KFTHA–IN KFE72 Installation Guide EK–KFE72–IN Service Information AlphaServer GS60E Service Manual EK–GS60E–SV Reference Manual AlphaServer GS60E and GS140 Getting Started with Logical Partitions EK–TUNLP–SF Upgrade Manuals GS
Chapter 1 Introduction The AlphaServer GS60E system is a high-performance, symmetric multi– processing system. It offers access to multiple high-bandwidth I/O buses, very large memory capacities, up to eight high-performance CPUs, and many other features normally associated with mainframe systems. This chapter introduces the AlphaServer GS60E system.
1.1 System Overview The Compaq AlphaServer GS60E system is the latest offering in the GS60/GS140 family. It uses the same system bus, the TLSB, with seven slots. It provides the reliability and availability features normally associated with mainframe systems. The GS60E has redundant, hotswappable N+1 power supplies.
AlphaServer GS60E System The AlphaServer GS60E system main cabinet contains the seven-slot TLSB card cage, power supplies, and space for PCI I/O shelves and StorageWorks shelves. The GS60E system can have up to two expander cabinets (see Figure 1-1), containing additional PCI I/O shelves and StorageWorks shelves. Chapter 2 describes how to use LEDs and other indicators to troubleshoot the system. Chapter 3 describes the console display and diagnostics.
1.2 TLSB System Bus The TLSB card cage is a 7-slot card cage that contains slots for up to four CPU modules, up to five memory array modules, and up to three I/O modules. The TLSB bus interconnects the CPU, memory, and I/O modules.
The TLSB card cage is located in the upper part of the system cabinet. The TLSB card cage contains seven module slots (slots 3 and 4 are not used). The slots are numbered 0 through 2 from right to left in the front of the cabinet and slots 5 through 8 right to left in the rear of the cabinet (see Figure 1-2). The minimum configuration is a processor module in slot 0, an I/O module in slot 8, a memory module in slot 7, and terminator modules in all other slots.
1.3 Processor Module Up to four processor modules can be used in an AlphaServer GS60E system. Each processor module contains two CPU chips.
The KN7CG processor module has two Alpha 21264 chips, with a clock speed of 525 MHz. The KN7CH processor module has two 21264A chips, with a clock speed of 700 MHz. If one of the CPUs on the processor module is malfunctioning, you replace the entire module. The chip is not a fieldreplaceable unit (FRU). The console display (see Section 3.1) shows each processor on a module. Figure 1-3 shows the processor module. The raised blocks in the figure represent heatsinks that cover the chips. ➊ CPU chips.
1.4 MS7CC Memory Module The GS60E uses three variants of the MS7CC memory module, 1 Gbyte, 2 Gbytes, and 4 Gbytes. Up to 20 Gbytes of memory can be configured using combinations of the three module variants.
All memory modules for the AlphaServer GS60E have SIMMs (single inline memory modules). DRAMs are mounted on small cards that are fixed to the larger memory module by spring-held mounting clips that grip both sides of the SIMM. Figure 1-4 shows: ➊ ➋ The array of SIMMS in an MS7CC–EA (1-Gbyte) memory module. ➌ The control address interface (CTL) gate array that provides the interface to the TLSB, controls DRAM timing and refresh, runs memory self-test, and contains TLSB and memory-specific registers.
1.5 KFTHA Module The KFTHA module offers four “hose” connections that interface between the TLSB and the I/O subsystem.
The KFTHA module is designed for high-speed, high-volume data transfers. Direct memory access (DMA) transfers are pipelined to allow for up to 500 Mbytes/second throughput. The major elements of the KFTHA module are: ➊ RAM to buffer data for the DMA transfers. ➋ Four hose-to-data (HDP) chips, each handling 32 bits from two “hoses” (I/O cables connecting to an adapter in an associated I/O bus). Data on the HDPs flow in one direction; either “up” (to the KFTHA) or “down” (to the I/O adapter).
1.6 Power Subsystem Overview The power subsystem consists of an AC input box, a DC distribution module, redundant hot swap power supplies, a cabinet control logic (CCL) panel, and cables.
Three-phase AC power enters the system by cable through the AC input box (see Figure 1-7). The H7506 power supplies convert three-phase AC power to 48 VDC. Three hot-swappable power supplies offer n+1 redundancy; that is, if any one power supply fails, the remaining two supply the needed power.
1.7 I/O Bus and In-Cab Storage Devices Both the AlphaServer GS60E main cabinet and expander cabinets are designed to hold PCI shelves and StorageWorks I/O shelves.
Figure 1-8 shows an AlphaServer GS60E system cabinet. As shown, PCI shelves and StorageWorks shelves are mounted horizontally. Each StorageWorks shelf has room for up to seven devices, including a signal converter and 3.25-inch disks or tapes. A power unit (DC-to-DC converter) is in the leftmost slot of shelf. The system cabinet has space for up to two PCI shelves (DWLPB-DA) and three StorageWorks shelves (BA36R-RC/RD UltraSCSI).
1.8 Troubleshooting Overview Follow steps to isolate system problems. A possible routine is shown below. Figure 1–9 Troubleshooting Steps You cannot find cause of user problem by phone. Go to site and follow these steps. Control panel LEDs lit No Check power subsystem (see Section 2.5) Yes Yes Operating system running Customer experiences intermittent error: Check error log (see Chapter 4) No Console software running Yes Type "init" command. Check system self-test display (see Section 3.
The system hardware, console software, and operating system software provide three types of troubleshooting tools, as shown in Figure 1-10. Chapters 2, 3, and 4 tell how to use these tools to isolate faulty components or report software problems for AlphaServer GS60E systems.
Chapter 2 Troubleshooting with LEDs This chapter tells how to use the LED displays and other indicators to track down faulty components that you can replace in the AlphaServer GS60E system. LEDs give status on the power subsystem, system bus (TLSB) modules (processor, memory, and I/O) the I/O bus, and devices in shelves. The cooling subsystem consists of two blowers located in the center of the system cabinet. They can be checked by looking and listening for the fans.
2.1 Operator Control Panel Start with the operator control panel (OCP). Check the OCP lights. The OCP has six status LEDs, three pushbuttons, and a keyswitch. Figure 2–1 Operator Control Panel 1 2 3 4 5 6 OM29-99 Table 2–1 Operator Control Panel LEDs Light Color State Meaning ➊ – Run Green On Power is supplied to entire system; the blowers are running. System has exited console. ➋ – Power Green On System is powered on. ➌ – Fault Yellow On Fault on system bus.
Six status indicator LEDs (see Figure 2-1) show the state of the system. Table 2-1 describes the conditions indicated by the lights. NOTE: With the keyswitch in the On position, if all six LEDs are blinking, one or more of the power supplies has failed or there is a missing power supply. With the keyswitch in the Off position, the LEDs will also blink but do not provide power supply status.
Figure 2-2 Troubleshooting: Start with the Operator Control Panel On/Off button/ keyswitch is Off Yes 1 No 2 Fault LED is lit 3 Yes No Fix problem identified. If a faulty component or firmware update was identified as the problem, replace the component or update the firmware. If the problem has not yet been identified, go to 2 Turn power on and watch power-up. As 48-VDC power is passed to the system, initial tests are run on the CPU, memory, and I/O adapters on the system.
Figure 2-2 Troubleshooting: Start with the Operator Control Panel (Continued) A Any LEDs lit on control panel No 4 Yes Green LED(s) lit Yes 5 Status LEDs are not receiving power/signals. Check the power supplies to see if DC power is leaving the supply. If so, check the power and signal lines to the CCL panel. Check the cabling between the CCL and the operator control panel. If connections seem OK, replace CCL. If still no lights on control panel, replace control panel.
2.2 Troubleshooting TLSB Modules You can check individual module self-test results by looking at the status LEDs on the module.
In general, if a module on the TLSB does not pass self-test (green light is not lit) it should be replaced. There is a case where some removal and replacement action may be needed even though the module passes self-test. Failure of the built-in self-test for the MS7CC modules indicates that testing has shown that there is no single 64-Kbyte segment of memory that is usable. Each 64-Kbyte segment must show at least 256 bad pages before it is noted as unusable.
2.3 Troubleshooting a PCI Shelf LEDs show the status of the power supplies, as well as the adapter selftest results in the PCI shelf.
Figure 2-5 Troubleshooting Steps for PCI Shelf LED 3 lit No Yes No LED 1 lit 11 Check Cabling to PCI shelf. Check to make sure the clip connectors are engaged properly. If so, proceed to 2 2 Check 48V Power Supply. 13 Internal Power System Error. Check fans in blower; check for jumper cable (a small plug) replacing fan connection. Yes 3 LED 2 lit No Power Board. Yes 15 Replace Motherboard. Yes LED 4 lit 4 Replace 16 Hose Error.
2.4 Troubleshooting StorageWorks Shelves StorageWorks devices are mounted in horizontal shelves in the GS60E system or expander cabinet. LEDs are located on each disk drive.
Table 2-3 SCSI Disk Drive LEDs Indicator LED LED State Meaning Green Off Flashing On No activity Activity Activity Yellow Off Flashing On Normal Spin up/spin down Not used Troubleshooting with LEDs 2-11
2.5 Troubleshooting the Power Subsystem The GS60E power supplies accept three-phase AC and produce 48 VDC power. Each power supply has two LEDs that indicate normal conditions and faults.
The system must be provided with a suitable source of 3-phase AC power. Three H7506 power supplies (see Figure 2-7) provide the necessary power and power redundancy required for all internal system components. The AC input box is located at the bottom of the system cabinet (when viewing the system cabinet from the rear). The 48 VDC power supplies are located above the AC input box and are visible when viewing the system cabinet from the front.
2.6 Troubleshooting the Cooling Subsystem The cooling system cools the power subsystem, the TLSB card cage, and shelves.
The cooling system is designed to keep the system components at an optimal operating temperature. It is important to keep the front and rear doors free of obstructions, leaving a minimum clearance space of 1.5 meters (59 inches) in the front and 1 meter in the rear to maximize airflow. Two blowers, located in the center of the cabinet (see Figure 2-8) draw air downward through the TLSB card cage. Air is exhausted at the middle of the cabinet, to the rear (see Figure 2-9).
Chapter 3 Console Display and Diagnostics This chapter describes how hardware diagnostic programs are executed when the system is initialized.
3.1 Checking Self-Test Results: Console Display The self-test console display gives information for the TLSB modules and the PCIs in the system. Example 3–1 System Self-Test Console Display F E D C B A + . . . . . . . . . . . . . . . . . . 9 + . . . 8 7 6 5 4 3 2 1 0 NODE # A M M M . . P P P TYP o + + + . . ++ ++ ++ ST1 . . . . . . EE EE EB BPD o + + + . . ++ ++ ++ ST2 . . . . . . EE EE EB BPD o + + + . . ++ ++ ++ ST3 .
➊ ➋ The NODE # line lists the node numbers on the TLSB and I/O buses. The TYP line in the printout indicates the type of module at each TLSB node. Processors are type P, memories are type M, and the KFTHA port module is type A. A period (.) indicates that the slot is not populated or that the module is not reporting. ➌ This line shows the results of individual processor and memory module tests. Possible values are pass (+) or (–).
3.2 Show Configuration Display The show configuration console command is useful to obtain more information about the system configuration, in case you need to replace a module.
➌ ➍ ➎ ➏ Node 0 is the KFE72 standard I/O PCI/EISA adapter module. Nodes 7 and 8 are the KZPSA adapters. This line shows the DA960 controller. These lines show the controllers on the SIO module. Figure 3-1 shows the connector numbering scheme for the KFTHA module. Each slot has four connector numbers associated with it, numbered in increasing order from top to bottom, as shown.
3.3 Running Diagnostics: the Test Command The test command allows you to run diagnostics on the entire system, an I/O subsystem, a single module, a group of devices, or a single device. Example 3–3 Sample Test Commands P00>>> test # Tests the entire system. # Default run time is 10 minutes. P00>>> t pci0 –t 60 # Tests all devices associated # with the PCI0 subsystem. Test # run time is 60 seconds. P00>>> test ms* # Tests all ms7cc memory modules.
You enter the command test to test the entire system using exercisers resident in ROM on the boot processor module. No module self-tests are executed when the test command is issued without a mnemonic. When you specify a subsystem mnemonic or a device mnemonic with test, such as test pci0 or test ms7cc0, self-tests are executed on the associated modules first and then the appropriate exercisers are run.
3.4 Testing the Entire System The test command with no modifiers runs all exercisers for subsystems and devices on the system. Example 3–4 Sample Test Command for the Entire System P00>>>test ➊ Console is in diagnostic mode Complete Test Suite for runtime of 1200 seconds Type ^C to stop testing ➋ Configuring system... : : Memory Tests not run.
Example 3–4 Sample Test Command, System Test (Continued) Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting : : P00>>> down down down down down down down down down down down down down down down down drivers... units on tulip2, slot 12, bus 0, hose 4... units on floppy1, slot 0, bus 1, hose 4... units on isp4, slot 6, bus 0, hose 4... units on isp5, slot 7, bus 0, hose 4... units on isp6, slot 8, bus 0, hose 4...
3.5 Sample Test Command for a Memory Module To test a processor, memory module, or an I/O adapter and its associated devices, enter the test command and the correct mnemonic. Mnemonics are displayed when you enter a show configuration or a show device command. Example 3–5 Sample Test Command, Memory Test P00>>> set d_report full ➊ P00>>> test ms* Console is in diagnostic mode Memory subsystem test selected for runtime of 1200 seconds Type Ctrl/C to abort...
Example 3–5 Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting Shutting : : P00>>> down down down down down down down down down down down down Sample Test Command, Memory Test (Continued) drivers... units on tulip2, slot 12, bus 0, hose 4... units on floppy1, slot 0, bus 1, hose 4... units on isp4, slot 6, bus 0, hose 4... units on isp5, slot 7, bus 0, hose 4... units on isp6, slot 8, bus 0, hose 4... units on isp7, slot 9, bus 0, hose 4...
3.6 Identifying a Failing SIMM From the console, you can check for flawed or poorly seated SIMMs in memory boards. This information is useful as a simple on-site check as part of a service call, as a validation procedure after upgrading a memory, or adding or changing SIMMs for any reason. Failing SIMMs are also reported in the error log (see Chapter 4). Example 3–6 Console Mode: No Failing SIMMS ➊ P00>>> set simm_callout on P00>>> init ➋ Initializing…. . . WARNING: F E D C B A + . . . . . . .
➊ The set simm_callout on command sets an internal environment variable that enables code that isolates failing SIMMs during memory testing. With this variable enabled, system self-test can take up to 40 seconds longer if a faulty SIMM is present. ➋ The init command initializes the system and prints the console map. ➌ This line in the console display notes that the SIMM callout environment variable is on. ➍ The show simm command requests a display of faulty SIMMS.
3.7 Info Command The info command provides information useful in debugging the system. Some of the information it provides can be useful for isolating FRUs in the field. Example 3–8 Examples of the Info Command P00>>> info ➊ 0. About the console ➋ 1. Bitmap 2. PAL symbols 3. IMPURE area (abbreviated) 4. IMPURE area (full) 5. TLSB Registers 6. GBUS 7. LOGOUT area 8. Per Cpu HWRPB areas ➋ 9. LAMB registers 10. TLSB register addresses 11. Page Tables 12. FRU table ➋ 13. Console internals 14.
TLFADR0 TLFADR1 TLESR0 TLESR1 TLESR2 TLESR3 TLILID0 0011ab00 07050000 00000303 00000c0c 00006060 00009090 00400303 00400c0c 00406060 00409090 Node0 KN7CG-AB Node1 MS7CC 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Node 7 Node8 MS7CC KFTHA 00000000 00000000 00000000 00000000 00000000 00000000 00000010 00000010 TLILID1 TLILID2 TLILID3 TLCPUMASK . . .
Chapter 4 DECevent Error Log This chapter discusses error logs produced by the DECevent bit-to-text translator.
4.1 Brief Description of the TLSB Bus The error log entries discussed here are specific to the AlphaServer GS60E system. Most of the errors occur during the transmission of commands or data along the TLSB system bus or in buses or storage internal to a particular module. To understand some of the terms used in the error log, you should understand how data is transferred on the TLSB system bus. The TLSB has two separate buses: a command/address bus and a data bus.
4.1.2 Data Bus The TSLB transfers data in the sequence order that valid address bus commands are issued. In addition to 256 bits of data, the data bus contains associated ECC bits and some control signals. Three signals are of particular significance in read and write operations. TLSB_SHARED – When a request is made to access memory, each CPU notes whether the block of memory is currently resident in cache, and, if so, asserts a signal that the data is shared.
4.2 Producing an Error Log with DECevent The DECevent utility is available for both Tru64 UNIX and OpenVMS operating systems to help diagnose what are called “intermittent errors.” These errors may or may not cause the operating system to crash. Example 4–1 Producing an Error Log with DECevent $ diagnose/output=errlog.dat DECevent Version V3.0 In this example, the error log information is directed to a file called errlog.dat.
4.3 Getting a Summary Error Log Running DECevent with the /summary qualifier is a good way to start analyzing the error log. It gives you a “table of contents” for the error log. Example 4–2 Summary Error Log $ diagnose/summary SUMMARY OF ALL ENTRIES LOGGED ON NODE CLYP01 Unknown major class New errorlog created Timestamp Machine check (670 entry) Crash Re-start System startup Volume mount Adapter Error Soft ECC error 1. 3. 7. 2. 3. 3. 4. 1.
4.4 Supported Event Types The events that DECevent logs can be logged by the CPU modules or one of the TLSB or I/O adapters. (Memory errors are logged by the CPU.) Table 4–2 Supported Event Types Event Types Description Machine check 670 670 processor checks Machine check 660 660 system machine checks 630 error interrupts 630 correctable processors checks 620 errors 620 correctable system errors Extended CRD Memory single-bit error footprints Adapter Adapter is logging entity.
Example 4-3 OSF Event Type Identification *********************** ENTRY 1 ************************** Logging OS 2. DIGITAL UNIX System Architecture 2. ALPHA Event sequence number 1. Timestamp of occurrence 21-OCT-1999 16:57:19 Host name clyp01 AXP HW model AlphaServer GS60E Number of CPUs (mpnum) x0000002 CPU logging event (mperr) x0000006 Event validity Entry type CPU Minor class Event severity 1. Valid 100. CPU Machine Check Errors 1. Machine check (670 entry) 1.
4.5 Sample Error Log Entries 4.5.1 Machine Check 660 Error You can identify problem FRUs in an error log entry by checking the contents of the registers against the parse trees. The following steps (relating to the callouts in Example 4-5) isolate the error and the FRU most likely responsible. Table 4–3 Parsing a Sample 660 Error (Example 4-5) ➊ ➋ This line identifies the error log entry as a machine check 660 error. ➌ The TLBER register is next in the parse tree. UNCORRECTABLE DATA ERROR is set.
-- TLaser MCHK 660 -Software Flags Packet Present Active CPUs Hardware Rev System Serial Number Module Serial Number System Revision MCHK Reason Mask MCHK Frame Rev ➊ x00000001 TLSB Error Log Snapshot x00000003 x00000000 12345678 NI81000080 x00000000 x0000FFF0 x00000001 MCHK Frame Rev: 1.
Performance Cnt Interrupt x0000000000000000 Corr Read Error Intr Dis Serial Line Intr Dis EIEN Interrupts: x0000000000000000 PAL_Base x0000000000020000 Base address of PAL Code: x0000000000000004 I_CTL xFFFFFFFC03300396 System Performance Counter Dsb Icache Set enabled x0000000000000003 Super page Mode Bits x0000000000000002 I-Stream Buffer Enable 3.
TLESR0 TLESR1 TLESR2 TLESR3 TLMODCONFIG0 TLMODCONFIG1 TCCERR TDIERR INTR MASK 0 INTR MASK 1 INTR SUM 0 INTR SUM 1 x0008D4D4 SYND0 x000000D4 SYND1 x000000D4 UNCORRECTABLE ECC ERROR x00000300 SYND0 x00000000 SYND1 x00000003 x00000300 SYND0 x00000000 SYND1 x00000003 x00000300 SYND0 x00000000 SYND1 x00000003 x00700B80 DPQ MAX Entries x00000007 enable fast fills BQ_MAX_ENTRIES 7 Bcache size = 4MB x08B00111 Overtake Enabled P0 Reqest ID line 0 P1 Reqest ID line 1 TLMBPR_RETRY_Count 2**10 retries - 6.
TLEP VMG TLEPWERR0 TLEPWERR1 TLEPWERR2 TLEPWERR3 x00000000 x00000380 x00047804 x0006E680 x00047810 CPU0 Last Win Sp Access x000000C780400380 Pending Bit=0, Address NOT LATCHED/NOT VALID CPU1 Last Win Sp Access x000000C78106E680 Pending Bit=0, Address NOT LATCHED/NOT VALID Palcode Revision x0000000400000402 Palcode Rev: 4.2-4 TLSB Base Adr x0000000000000000 *TLaser CPU Registers* TLSB Node Number 0.
tbc fast path disabled dm_dslb_prio - fills, probes, victims or wrio en_fst_vq en_fst_prq en_fts_writes TCCERR x00011800 TCC Chip Revision x00000001 TDIERR x00000000 INTRMASK0 x000000FE ipl 14 interrupt enable ipl 15 interrupt enable ipl 16 interrupt enable ipl 17 interrupt enable ip enable intim enable CPU halt enable INTRMASK1 x00000000 TLEP Interrupt Sum 0 x00000000 TLEP Interrupt Sum 1 x00000000 TLEP VMG x00000000 TLEPWERR0 x00000000 TLEPWERR1 x00000000 TLEPWERR2 x00000000 TLEPWERR3 x00047810 * TLaser M
Refresh Cnt = 1360 x00000000 Failing String = x00000000 x00000000 Refresh Rate 1X x00000000 x00000000 x00000000 x00000000 TMER TMDRA TDDR0 TDDR1 TDDR2 TDDR3 * TLaser Memory Regs * TLSB Node Number 5.
TLDEV x02045000 -- Device Type: Memory -- Module Revision: x00000204 TLBER TLCNR TLVID FADR 0 FADR 1 TLESR0 TLESR1 TLESR2 TLESR3 TMIR TMCR x00800000 x000FC260 x000000B3 x0032000000300010 x00320000 x00000300 x00000300 x00000300 x00000300 x80000002 Interleave x00000002 x00000208 256MB Module (E2035-CA) 4 MB DRAM 60ns DRAM Strings Installed = 4 DRAM timing: Bus Spd = 10.0-11.
TMER TMDRA TDDR0 TDDR1 TDDR2 TDDR3 Strings Installed = 2 DRAM timing: Bus Spd = 10.0-11.2 Refresh Cnt = 1360 x00000000 Failing String = x00000000 x00000000 Refresh Rate 1X x00000000 x00000000 x00000000 x00000000 * TLaser I/O Registers * TLSB Node Number TLDEV x00002000 8.
4.5.2 Machine Check 620 Error Machine check 620 errors are nearly always soft errors; that is, they do not cause the system to crash. Correctable write data errors (CWDE) on CSR writes are the exception. Example 4-6 shows a sample machine check 620 error. In this case, all nodes on the TLSB are presented in the error log entry. The steps in Table 4-4 isolate the error and the FRU most likely responsible.
Event validity Event severity Entry type 1. O/S claims event is valid 5. Low Priority 100. Machine Check Error - (major class) 3. - (minor class) ➊ -- TLaser 620 Corr Error Software Flags x00000001 TLSB Error Log Snapshot Packet Present Active CPUs x0000000F Hardware Rev x00000000 System Serial Number Module Serial Number SSS System Revision x00000000 MCHK Reason Mask x00000086 MCHK Frame Rev x00000001 MCHK Frame Rev: 1.
DOF_CNT TLDEV 700Mhz, TLBER TLSB RUN Signal CPU0 Running console CPU1 Running console x00000000 xB0008027 -- Device Type: Dual EV67 Proc, x00140000 TLESR0 TLESR1 TLESR2 TLESR3 Palcode Revision TLSB Base Adr 4meg Bcache CORRECTABLE READ DATA ERROR ➌ Data Syndrome 0 x0020D5D5 SYND0 x000000D5 SYND1 x000000D5 CORRECTABLE ECC ERROR DURING READ x00000300 SYND0 x00000000 SYND1 x00000003 x00000300 SYND0 x00000000 SYND1 x00000003 x00000300 SYND0 x00000000 SYND1 x00000003 x0000001300000504 Palcode Rev: 5.
MODCONFIG1 x08B00141 Overtake Enabled P0 Reqest ID line 0 P1 Reqest ID line 4 MBPR_RETRY_Count 2**10 retries - 6.
SYND1 x00000003 SYND0 x00000000 SYND1 x00000003 TLESR3 x00000300 SYND0 x00000000 SYND1 x00000003 MODCONFIG0 x00700B80 DPQ MAX Entries x00000007 enable fast fills BQ_MAX_ENTRIES 7 Bcache size = 4MB MODCONFIG1 x08B00153 Overtake Enabled P0 Reqest ID line 1 P1 Reqest ID line 5 TLMBPR_RETRY_Count 2**10 retries - 6.
TLBER x01140000 ERROR TLCNR TLVID FADR FADR 1 TLESR0 x000FC240 x00000080 x0702000000874000 x07020000 Failing Command: Read Failing Bank = Bank 0 x0021D5D5 ECC Syndrome 0 x000000D5 CC Syndrome 1 x000000D5 TRANSMITTER DURING ERROR CORRECTABLE READ ECC ERROR ECC Code Second ECC Code TLESR1 TLESR2 TLESR3 TMIR TMCR TMER TMDRA TDDR0 TDDR1 TDDR2 TDDR3 CORRECTABLE READ DATA ERROR ➍ DATA SYNDROME 0 DATA TRANSMITTER DURING xD5 xD5 Failing SIMM Number = J22 Failing SIMM Number = J22 x00000300 x00000300 x00000
TLESR2 x00000000 TLESR3 x00000000 CPU Interrupt Mask x00000001 Cpu Interrupt Mask = x00000001 ICCMSR x00000000 Arbitration Control Minimum Latency Mode Suppress Control Suppress after 16 Translations ICCNSE x80000000 Interrupt Enable on NSES Set ICCMTR x00000002 Mbox Trans in Prog, Hose 1 IDPNSE-0 x00000006 Hose Power OK Hose Cable OK IDPNSE-1 x00000006 Hose Power OK Hose Cable OK IDPNSE-2 x00000000 IDPNSE-3 x00000000 IDPVR x00000800 ICCWTR x00000000 TLMBPR x0000000000000000 IDPDR0 x20000000 IDPDR1 x0000000
4.5.3 DWLPB Motherboard (PCIA) Adapter Error Log Registers on the DWLPB motherboard are printed in the error log when one of these errors occur. You use the parse tree for the DWLPB motherboard to determine the most likely FRU. Example 4-7 shows a sample DWLPB motherboard (PCIA) adapter error. The following steps isolate the error and the FRU most likely responsible. Table 4–5 Parsing a DWLPB Motherboard Error (Example 4-7) ➊ This line identifies the error as a PCIA (DWLPB motherboard) adapter error.
SWI Minor sub class Software Flags 5. PCIA ➊ x0028000 PCIA Subpacket Present PCI Bus Snapshot Present x000000FF89800000 Base Phys Addr of TIOP -Tlaser PCIA RegistersChannel No.
Window Base Address=x00004000 Translation Base Reg B0 x00000000 Trans Base Address=x00000000 Window Mask Reg C0 x0FFF0000 Window Size = 256 MB Window Base Reg C0 xF0000003 Scatter/Gather Enable Window Enable Window Base Address=x0000F000 Translation Base Reg C0 x00000000 Trans Base Address=x00000000 Error Vector 0 x00000945 Interrupt Vector x00000945 Dev Vec 0 Slot 0, IntA x00000B70 Interrupt Vector x00000B70 Dev Vec 0 Slot 0, IntB x00000B80 Interrupt Vector x00000B80 Dev Vec 0 Slot 0, IntC x00000B90 Interr
Translation Base Reg A1 x00000000 Trans Base Address=x00000000 Window Mask Reg B1 x3FFF0000 Window Size = 1 GB Window Base Reg B1 x40000002 Window Enable Window Base Address=x00004000 Translation Base Reg B1 x00000000 Trans Base Address=x00000000 Window Mask Reg C1 x0FFF0000 Window Size = 256 MB Window Base Reg C1 xF0000003 Scatter/Gather Enable Window Enable Window Base Address=x0000F000 Translation Base Reg C1 x00000000 Trans Base Address=x00000000 Error Vector 1 x00000956 Interrupt Vector x00000956 Dev V
Window Mask Reg A2 Window Base Reg A2 x007F0000 Window Size = 8 MB x00800003 Scatter/Gather Enable Window Enable Window Base Address=x00000080 Translation Base Reg A2 x00000000 Trans Base Address=x00000000 Window Mask Reg B2 x3FFF0000 Window Size = 1 GB Window Base Reg B2 x40000002 Window Enable Window Base Address=x00004000 Translation Base Reg B2 x00000000 Trans Base Address=x00000000 Window Mask Reg C2 x0FFF0000 Window Size = 256 MB Window Base Reg C2 xF0000003 Scatter/Gather Enable Window Enable Window
Base Address Register 3 Base Address Register 4 Base Address Register 5 Base Address Register 6 Expansion Rom Base Address Interrupt P1 Interrupt P2 Min Gnt Max Lat x00000000 x00000000 x00000000 x00000000 x00000000 xE5 x01 x00 x00 DECevent Error Log 4-29
4.6 Console Halt Conditions Double error halts are conditions in which the processing of a fatal error triggers a second error. The TL6 Machine Check 670/660 logout frame provides error information to the operating system error handler. 4.6.1 CPU Double Error Halt The CPU double error halt is caused by two conditions: 1. The machine is processing a Machine Check and trapping back into the Machine Check prior to exiting the first machine check.
Figure 4-1 illustrates the format of the Entry type 71 Errorlog utilizing the Header structures. If the console has two halt frames to log, it will put a header on each as shown. Normally there will only be one Halt Frame in this event. In any case, there will be an End of Event Frame at the bottom on the entry. The packets for memory, TIOP and PCI use the same forms specified in the TurboLaser 5 Product Fault Management Specification.
CPU Double Error Halt content TL6 CPU DBL ERR HLT Frame Content HEADER 2 LW HALT CODE 1 LW RSVD 1 LW WATCH 2 LW 670/660 Logout 72 LW Node 0 TLEP SUB-Packet(mini) 14 LW/Node Node …8 126 LW 9Nodes PCI 0 3 LW/Node PCI …19 60 LW 20PCI Total Byte Count for two events 2112 byte count TLEP Sub-Packet (minimized) TLBER TLESR1 TLESR3 TDIERR TLEPWERR1 TLEPWERR3 RESERVED TLDEV TLESRO TLESR2 TCCERR TLEPWERR0 TLEPWERR2 RESERVED PCI Sub-Packet PCIA ERR1 4-32 Service Manual PCIA ERR0 PCIA ERR2
Memory Sub-Packet TLBER TLESR1 TLESR3 TLFADR1 TLMIR MER RESERVED TLDEV TLESR0 TLESR2 TLFADR0 TLVID MCR RESERVED TIOP SUB-Packet TLBER TLESR1 TLESR3 ICCWTR IDPNSE1 IDPNSE3 RESERVED TLDEV TLESR0 TLESR2 ICCNSE IDPNSEO IDPNSE2 RESERVED Example 4-8 CPU Double Error Halt ***************** ENTRY 1 ******************************** Logging OS System Architecture OS version Event sequence number Timestamp of occurrence Time since reboot Host name System Model 1. OpenVMS 2. Alpha V6.2 11.
Watch $ MCHK Reason Mask MCHK Frame Rev - CPU Registers I_STAT DC_STAT C_ADDR DC1_SYNDROME DC0_SYNDROME C_STAT C_STS MM_STAT EXC_ADDR IER_CM x0000620306101227 Halt On 6-Mar-1998 at 16:18:39 x0000FFFA x00000001 MCHK Frame Rev: 0.
Performance Cnt Interrupt x0000000000000000 PAL_Base I_CTL Corr Read Error Intr Dis Serial Line Intr Dis EIEN Interrupts: x0000000000000000 x0000000000000000 Base address of PAL Code: x0000000000000000 x0000000000000000 System Performance Counter Dsb Icache Set enabled x0000000000000000 Super page Mode Bits x0000000000000000 I-Stream Buffer Enable Only Demand Requests Launched I-Stream Buffer Enable DBP based on state of chooser Branches chosen PALRES Inst NOT executed in Kernel Mode VA_48, 43 Bit Virtual
TLBER TLCNR TLVID TLESR0 TLESR1 TLESR2 TLESR3 TLMODCONFIG0 TLMODCONFIG1 TCCERR TDIERR INTR MASK 0 INTR MASK 1 4-36 Service Manual x00000000 x00000000 x00000000 x00400303 SYND0 x00000003 SYND1 x00000003 CPU0 Sourced Data x00400C0C SYND0 x0000000C SYND1 x0000000C CPU0 Sourced Data x00406060 SYND0 x00000060 SYND1 x00000060 CPU0 Sourced Data x00409090 SYND0 x00000090 SYND1 x00000090 CPU0 Sourced Data x00040000 DPQ MAX Entries x00000000 dtag1 disable BQ_MAX_ENTRIES NO Limit Bcache size = 4MB x00098AD4
INTR SUM 0 INTR SUM 1 TLEP VMG TLEPWERR0 TLEPWERR1 TLEPWERR2 TLEPWERR3 ipl 17 interrupt enable ip enable intim enable CPU halt enable x00000000 x00000000 x00000000 x00000000 x00000000 x00000000 x00000000 CPU0 Last Win Sp Access x000000DBEEFDBEE8 Pending Bit=1, Address Valid CPU1 Last Win Sp Access x000000DBEEFDBEE8 Pending Bit=1, Address Valid TLSB Node: 5.
TLESR2 TLESR3 ICCNSE ICCWTR IDPNSE-0 x00000000 x00000000 x80000000 Interrupt Enable on NSES Set x00000000 x00000006 Hose Power OK Hose Cable OK x00000006 Hose Power OK Hose Cable OK x00000000 x00000000 IDPNSE-1 IDPNSE-2 IDPNSE-3 TLSB Node: TLDEV TLBER TLESR0 TLESR1 TLESR2 TLESR3 ICCNSE ICCWTR IDPNSE-0 IDPNSE-1 IDPNSE-2 IDPNSE-3 8.
4.6.2 Machine Check Logout Frames Machine Check Logout Frame - 670/660 The TL6 Machine Check 670/660 logout frame provides error information to the operating system error handler. When a fault is detected, PALcode enters a error handler, captures the state of the processor and system, and builds a logout frame. One frame is built for both processor and system detected errors.
63 … 48 System Area: 4-40 47 … 32 RSVD TLBER TLVID TLESR1 TLESR3 TLMODCONFIG1 TDIERR TLINTRMASK1 TLINTRSUM1 TLEPWERR0 TLEPWERR2 RESERVED RESERVED RESERVED RESERVED RESERVED Service Manual 31 … 16 15 … 00 MISCR | WHAMI TLDEV TLCNR TLESR0 TLESR2 TLMODCONFIG0 TCCERR TLINTRMASK0 TLINTRSUM0 TLEP_VMG TLEPWERR1 TLEPWERR3 RESERVED RESERVED RESERVED RESERVED A0 A8 B0 B8 C0 C8 D0 D8 E0 E8 F0 F8 100 108 110 118
Machine Check Logout Frame - 630/620 The TL6 Machine Check 630/620 logout frame provides error information to the operating system error handler. When a fault is detected, PALcode enters a error handler, captures the state of the processor and system, and builds a logout frame. One frame is built for both processor and system detected errors that are correctable. Machine check logout 630 contains EV6 CPU specific errors registers while machine check logout 620 contains system specific error registers.
4.6.3 Machine Check Error Log The Error Log contains relevant system register information used to diagnosis hardware system faults. Because a majority of the Error Log has been specified in Chapter 5 of the TL5 Product Fault Management Specification, this section only deals with only changes between TL5 and TL6. Error Log Size The Operating System Header for OpenVMS and Compaq Tru64 UNIX remains the size as the TL5.
TLSB Bus Snapshot Error Types Requiring TLSB SNAPSHOT The following is a list of registers and errors that require the operating system to append a SNAPSHOT to the error log file.
TLEP Subpacket The TLEP sub-packet contains TurboLaser CPU module registers. It can be part of the TLSB sub-packet of a machine check entry packet or part of a LASTFAIL packet. The TL6 TLEP has been extended to include additional system registers.
TLDEV Format Name CHIP TYPE Bit(s) 31:28 Type M Init 0 CHIP SPEED EV5 & EV56 27:24 M 0 CHIP SPEED EV6 DTYPE 27:24 M 0 15:0 M 0 Description EV5 = 5 EV5/6 = 7 EV6 = 8 EV67=11 350MHZ = 0 300MHZ = 1 525MHZ = 2 437MHZ = 3 625MHZ with 8M BCACHE = 5 625MHZ with 4M BCACHE = 6 525MHZ = 0 700MHZ = 1 I/O MODULE = 2000 INTERGRATED I/O MODULE = 2020 MEMORY MODULE = 5000 SINGLE PROCESSOR, 4M BCACHE = 8011 DUAL PROCESSOR, 4M BCACHE = 8014 DUAL EV6, 4M BCACHE = 8025 DECevent Error Log 4-45
Chapter 5 Removal and Replacement Procedures This chapter contains removal and replacement procedures for the components of the AlphaServer GS60E system.
5.1 TLSB Modules This section covers replacing processor, memory, terminator, or I/O modules, as well as SIMM removal and replacement. 5.1.1 How to Replace the Only Processor Before replacing processor modules, update console firmware and any customized environment variables and boot paths. Example 5–1 Replacing the Only Processor Module P00>>> sho * ➊ [list of environment variables appears] P00>>> boot dkd400 ➌ Building FRU table............ (boot dkd400.4.0.5.
1. List the system’s environment variables to determine if any have been customized (see ➊ in Example 5-1). You will set these in step 7. 2. Power down the system and remove and replace the module. See Section 5.1.4. 3. Power up the system. Boot LFU and issue the update command to ensure that the module has the latest version of console firmware (see ➌). 4. Exit LFU (see ➍). 5. Build the EEPROM (see ➎). The format of data often changes between versions of console firmware. This command reformats the data.
5.1.2 How to Replace the Boot Processor Check the console firmware version in the existing and replacement modules and, if they differ, use the LFU update command to bring the replacement module to the current version. Build the replacement EEPROM on the replacement module. Example 5–2 Replacing the Boot Processor F E D C B A + . . . . . . . . . . . . . . . 9 + . . . 8 A o . o . o . + . . . 7 M + . + . + . + . . . . 6 M + . + . + . + . . . . 5 M + . + . + . + . . . . 4 . . . . . . .
1. Remove the failing module (see Section 5.1.4). In this example, the primary processor is the failing module and it is in slot 0. 2. Power up the system and make note of the version of console firmware in the remaining modules. See ➋ in Example 5-2. 3. Power down the system and remove all processor modules. See Section 5.1.4. 4. Insert the replacement modules. See Section 5.1.4. 5. Power up the system and determine the version of console firmware in the replacement module.
Example 5–2 Replacing the Boot Processor (Continued) kn7cg-ab0 Updating to V4.9-20... Verifying V4.9-20... Passed. UPD> exit Initializing... [self-test display appears] P00>>> build -e kn7cg-ab0 ➏ Build EEPROM on kn7cg-ab0 ? [Y/N]y EEPROM built on kn7cg-ab0 F E D C B A + . . . . . . . . . . . . . . . 9 + . . . 8 A o . o . o . + . . . 7 M + . + . + . + . . . . 6 M + . + . + . + . . . . 5 M + . + . + . + . . . . 4 . . . . . . . + . . . . 3 . . . . . . . . . . . .
6. Build the EEPROM. See ➏. 7. Power down the system, replace the other processor modules (see Section 5.1.4), and power up the system. 8. Copy the EEPROM environment variables from a secondary processor to the new primary processor. To do this, set a different module as primary and copy the environment variables using the build –c command. See ➑. 9. Set processor 0 as the primary processor.
5.1.3 How to Add a New Processor or Replace a Secondary Processor Check the console firmware version in the existing modules and the new or replacement module and, if they differ, use the LFU update command to bring the new module to the current version. Build the EEPROM on the new module. Example 5–3 Adding or Replacing a Secondary Processor F E D C B A + . . . . . . . . . . . . . . . 9 + . . . 8 A o . o . o . + . . . 7 M + . + . + . + . . . . 6 M + . + . + . + . . . . 5 M + . + . + .
In this example, the primary processor is in slot 0 and a secondary processor is being replaced in slot 1. 1. If you are replacing a secondary processor, remove the module from the system. See Section 5.1.4. 2. Power up the system and make note of the version of console firmware in the processor modules. See ➋ in Example 5-3. 3. Power down the system and remove all processor modules. See Section 5.1.4. 4. Insert the new processor module. See Section 5.1.4. 5.
Example 5–3 Adding or Replacing a Secondary Processor (Continued) kn7cg-ab0 Updating to V4.9-20... Verifying V4.9-20... Passed. UPD> exit Initializing... [self-test display appears] P00>>> build -e kn7cg-ab0 ➏ Build EEPROM on kn7cg-ab0 ? [Y/N]y EEPROM built on kn7cg-ab0 F E D C B A + . . . . . . . . . . . . . . . 9 + . . . 8 A o . o . o . + . . . 7 M + . + . + . + . . . . 6 M + . + . + . + . . . . 5 M + . + . + . + . . . . 4 . . . . . . . + . . . . 3 . . . . . . . . . . . .
6. Build the EEPROM. See ➏. 7. Power down the system, replace the other processor modules. See Section 5.1.4. 8. Power up the system. Copy the EEPROM environment variables to the new processor using the build –c command. See ➑. 9. Enter into the EEPROM the 8-digit LARS number and a short message (68 characters maximum) stating the date and reason for service. See ➒. 10. Boot the operating system.
5.1.4 Processor, Memory, or Terminator Module Removal and Replacement Wear an antistatic wrist strap. Release the handles and slide the module out of the card cage. To replace, line up the module and cover the guide and rail in the card cage, be sure the projections on the top and bottom of the end plate align with the slots in the card cage, and slide the module into the cage. Push the handles in to connect at the centerplane, and let them spring into the stops.
NOTE: If you are replacing or adding a processor module, see Section 5.1.1, 5.1.2, or 5.1.3 before using this procedure. Removal 1. Shut down the operating system and power down the system. CAUTION: You must wear a wrist strap when you handle any modules. 2. Ground yourself to the cabinet with an antistatic wrist strap. 3. Push the handles of the module to be removed in toward the module end plate and to the left, releasing them from the stops. 4.
5.1.5 SIMM Removal and Replacement Remove both covers from the memory module. Remove the standoff at the end of the row with the failing SIMM. Remove all SIMMs in the row up to and including the failing SIMM. Release the latches on both ends of the SIMM by gently inserting a small Phillips head screwdriver.
Removal 1. Remove the appropriate memory module from the card cage. 2. Place the module on an ESD pad on a level surface. Remove both module covers by removing the eight screws from each. (The screws that attach to the end plate of the module are larger than those that attach to the standoffs.) 3. Use an adjustable wrench to remove the standoff at the end of the row with the failing SIMM. See ➌ in Figure 5-3 or 5-4. 4.
Figure 5-3 SIMM Connector Numbers – E2035 Module J32 J30 J28 J26 J33 J31 J29 J27 J25 J24 3 J22 J20 J18 J16 J14 J12 J23 J21 J19 J17 J15 J13 J11 J10 J8 J6 J4 J9 J7 J5 J3 J2 3 SM53-99 5-16 Service Manual
Figure 5-4 SIMM Connector Numbers – E2036 (2-Gbyte) and E2037 (4-Gbyte) Modules J36 J34 J32 J30 J28 3 J37 J35 J33 J31 J29 J26 J24 J22 J20 J18 J16 J14 J27 J25 J23 J21 J19 J17 J15 J13 J12 J10 J8 J6 J4 J11 J9 J7 J5 J3 J2 3 BX-0770-95 Removal and Replacement Procedures 5-17
5.1.6 I/O Cable and KFTHA Module Removal and Replacement The I/O hose cable connects the KFTHA module to an I/O bus. Remove a hose by loosening the captive screws on the connector. After disconnecting all cables, removal of the module is the same as other modules.
I/O Hose Cable Removal 1. Shut down the operating system and power down the system. 2. Ground yourself to the cabinet with an antistatic wrist strap. 3. Loosen the captive screws (slotted) to remove the cable connectors at both ends of the I/O cable to be replaced. See ➌ in Figure 5-5. I/O Hose Cable Replacement 1. Attach the TLSB end with pin 50 on top. Torque the screws to 6 inchpounds/ 2. Route the replacement I/O cable through the same path as the original one was routed. 3. Attach the I/O bus end.
5.2 TLSB Card Cage Removal Remove all modules (front and rear), disconnect the cables from the from the card cage, remove and save the mounting brackets, and slide the cage out from the front. You will need a Phillips head screwdriver and 8 mm and 10 mm nutdrivers.
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Ground yourself to the cabinet with an antistatic wrist strap. 3. Note the locations of the modules in the card cage and remove the modules. See Section 5.1. 4. At the front of the card cage, use the 8-mm nutdriver to remove the kepnuts from the terminal cover (see ➍ in Figure 5-6). Save the kepnuts. Using the 10-mm nutdriver, remove the nuts and washers that attach the power and ground cables to the power posts.
Replacement 1. Ground yourself to the cabinet with an antistatic wrist strap. CAUTION: The following step requires two people. Because of the height of the card cage in the cabinet, you should not install this assembly by yourself. 2. From the front, slide the replacement card cage into the cabinet so that the label is at the top on the front and the power filter is to the left. 3.
4. At the rear of the cabinet, use the Phillips head screwdriver to loosely install the reserved side bracket to the frame with two reserved screws. Line up the other two holes in the bracket with the card cage holes and insert two reserved screws. Tighten all four screws. Attach the card cage to the frame at the bottom with the reserved screws. 5. At the front of the cabinet, use the Phillips head screwdriver to attach the card cage to the frame at the top and bottom with five reserved screws. 6.
5.3 Operator Control Panel The operator control panel (OCP) attaches to the top of the front door. It is held in place by a boss on each side of the plastic bezel. The signal cable is attached to the bottom connector on the left side at the back of the OCP, accessible from the backside of the front door.
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Shut the main circuit breaker off by pushing down the handle. 3. Ground yourself to the cabinet with an antistatic wrist strap. 4. Open the front cabinet door. 5. Remove the signal cable by loosening the two thumbscrews. 6. From the inside of the door, push on the left hand side boss until it snaps out of the opening. 7. Move to the outside of the door.
5.4 CD Tray The CD tray houses the CD-ROM drive and optional floppy drive. It mounts to the left-hand rail in front of the DWLPB PCI box.
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Shut the main circuit breaker off by pushing down the handle. 3. Remove all cable connectors from the right side of the tray that houses the CD-ROM drive. 4. Loosen the two captive screws on the left side of the tray (see Figure 5-8). 5. Slide the tray out of the cabinet and place it on a stable working surface. Replacement • Reverse the steps in the removal procedure. Verification Boot LFU.
5.5 AC Distribution Box The 3-phase 208 VAC distribution box, located at the bottom rear of the system cabinet, rests on right and left side stop brackets and is attached to the cabinet rails with four screws.
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle. 3. Disconnect the system power cord. 4. From the front of the cabinet, unplug all option power cords from the AC distribution box. 5. At the rear of the cabinet (see Figure 5-9), loosen the four screws (two on each side) attaching the AC distribution box to the cabinet rails. 6. Slide the AC distribution box from the rear of the cabinet.
5.6 Power Rack Assembly The power rack assembly contains the DC distribution module and three H7506 power supplies.
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle. 3. Disconnect the system power cord. 4. From the front of the cabinet, remove the three H7506 power supplies by loosening the two screws in the front of each power supply and pulling out the power supply. 5. Remove the two screws (see Figure 5-10) attaching the power rack assembly to the right and left cabinet rails. 6.
5.7 Cabinet Control Logic (CCL) Panel The cabinet control logic (CCL) panel monitors signals from parts of the power system and provides error information to the console software. It is located in the rear lower cabinet, right behind the power rack assembly.
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Ground yourself to the cabinet with an antistatic wrist strap. 3. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle. 4. Disconnect the cables from the CCL panel. 5. Remove the four screws that hold the CCL panel to the CCL assembly. 6. Remove the CCL panel from the CCL assembly. Replacement • Reverse the steps in the Removal procedure. Verification Power up the system.
5.8 BA36R StorageWorks Shelf The StorageWorks shelf houses disk drives and a power regulator.
The StorageWorks shelf contains a power supply, StorageWorks disks, and a Controller. Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Disconnect the power cable. 3. Remove the two Philips screws that secure the shelf to the vertical rails. 4. Slide the shelf out of the cabinet. Replacement • Reverse the steps in the Removal procedure. Verification Power up the system.
5.9 DWLPB PCI Box The DWLPB provides a complete PCI bus subsystem. It contains a KFE72 adapter which provides I/O for systems using a graphics device.
Removal 5. Shut down the operating system and turn the keyswitch to Off. 6. Ground yourself to the cabinet with an antistatic wrist strap. 7. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle. 8. Disconnect the 48 V cable and I/O hose to the DWLPB. 9. Remove the four screws securing the DWLPB (see Figure 5-13). 10. Slide the DWLPB out on its rails, release the rail locking tabs, and remove the DWLPB from the system.
5.10 Plenum Assembly The plenum assembly houses the two blowers that cool the system. Air is draw in through the top of the cabinet, through the TLSB card cage, and exhausted at the middle of the cabinet, to the rear.
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle. 3. Disconnect the cables (17-04942-01) from the blowers. 4. Remove the four screws that secure the plenum assembly to the rack. 5. Remove the plenum assembly from the rack. Replacement • Reverse the steps in the Removal procedure. Verification Power up the system.
5.11 Cabinet Panels The cabinet panels and doors consist of the top and left and right cabinet panels and the front and rear doors.
Removal 1. Lift off the system cabinet cover and set aside (see ➊, Figure 5-15). 2. Open the system cabinet’s front and rear doors ➋. 3. Remove the front and rear screws holding the right panel ➌. 4. Pull the bottom of the panel away from the cabinet, lift up, and remove ➍. Repeat steps 3 and 4 on the left side to remove the left system cabinet panel. 5. To remove the front door, open it and unplug the signal cable from the rear of the OCP, located at the top inside of the front door.
5.12 Cables Figure 5-16 diagrams all the GS60E cables.
Table 5-1 Cables Cable Number Connects 17-04713-02 Cabinet Control Logic (CCL) panel to TLSB card cage. 17-04941-01 DC distribution module to TLSB card cage (48 V). 17-04942-01 J9, J10 of DC distribution module and CD-ROM tray to blowers. 17-04943-01 J17 of DC distribution module to OCP module. 17-04800-02 CCL panel to J6 of DC distribution module. 17-03961-10 CCL panel to J14 of DC distribution module. 17-03961-10 CCL panel to J15 of DC distribution module.
Appendix A Updating Firmware Use the Loadable Firmware Update (LFU) utility to update system firmware. LFU runs without any operating system and can update the firmware on any system module. LFU handles modules on the TLSB bus (for example, the CPU) as well as modules on the I/O buses. You are not required to specify any hardware path information, and the update process is highly automated. Both the LFU program and the firmware microcode images it writes are supplied on a CD-ROM.
A.1 Booting LFU Abstract LFU is supplied on the Alpha CD-ROM (Part Number AG– RCFB*–BE, where * is the letter that denotes the disk revision). Make sure this CD-ROM is mounted in the in-cabinet CD drive. Boot LFU from the CD-ROM. Example A–1 Booting LFU from CD-ROM P00>>> sho dev ➊ polling for units on isp0, slot 0, bus0, hose0... dka400.4.0.0.0 DKA400 RZ26L 440C polling for units on isp1, slot 1, bus0, hose0... polling for units on isp2, slot 4, bus0, hose0...
. ***** Loadable Firmware Update Utility ***** ---------------------------------------------------------Function Description ---------------------------------------------------------Display Displays the system’s configuration table. Exit Done exit LFU (reset). List Lists the device, revision, firmware name, and update revision. Lfu Restarts LFU. Readme Lists important release information. Create Make a custom Console Grom Image. Update Replaces current firmware with loadable data image.
A.2 List The list command displays the inventory of update firmware on the CDROM. Only the devices listed at your terminal are supported for firmware updates. Example A–2 List Command UPD> list Device Current Revision cipca0 A315 kn7cg-ab0_arc V5.68-0 kn7cg-ab0 G5.5-11 kn7cg-ab1_arc V5.68-0 kn7cg-ab1 G5.5-11 UPD> A-4 Service Manual Filename Update Revision cipca_fw A420 kn7xx_arc V5.68-0 kn7xx_fw V5.5-12 kn7xx_arc V5.68-0 kn7xx_fw V5.5-12 ccmab_fw 22 cixcd_fw 7 demfa_fw 2.1 demna_fw 9.4 dfxaa_fw 3.
The list command shows three pieces of information for each device: • Current revision — The revision of the device’s current firmware • Filename — The name of the file that is recommended for updating that firmware • Update revision — The revision of the firmware update Updating Firmware A-5
A.3 Update The update command writes new firmware from the CD-ROM to the module. Then LFU automatically verifies the update by reading the new firmware image from the module into memory and comparing it with the CD-ROM image. Example A–3 Update Command UPD> update kn7cg-ab0 ➊ WARNING: updates may take several minutes to complete for each device. Confirm update on: kn7cg-ab0_arc ➋ [Y/(N)] y DO NOT ABORT! Updating to V5.68-0 .Verifying V5.
➊ This command requests a firmware update for a specific module. If you want to update more than one device, you may use a wildcard but not a list. For example, update k* updates all devices with names beginning with k, and update * updates all devices. ➋ LFU requires you to confirm the update. For processors, the first update to confirm is the AlphaBIOS firmware; the second is the SRM console firmware. In either case, the default is no. ➌ ➍ ➎ Status message reports update and verification progress.
Example A–3 Update Command (Continued) ➏ UPD> update ➐ confirm update on: kzpsa0 kzpsa1 pfi0 [Y/(N)]n ➑ UPD> update kzpsa0 -path cipca_fw WARNING: updates may take several minutes to complete for each device.
➏ When you do not specify a device name, LFU tries to update all devices. ➐ LFU lists the selected devices to update and prompts before devices are updated. ➑ In this next example, the -path option is used to update a device with different firmware from the LFU default. A network location for the firmware file can be specified with the -path option. In this example, the firmware filename is not a valid file for the device specified. CAUTION: Never abort an update operation.
A.4 Exit The exit command terminates the LFU program, causes system initialization and self-test, and returns the system to console mode. Example A–4 Exit Command ➊ UPD> exit Initializing... [self-test display appears] P00>>> ➋ UPD> update kzpsa0 WARNING: updates may take several minutes to complete for each device. Confirm update on: kzpsa0 [Y/(N)]y DO NOT ABORT! kzpsa0 Updating to A10... FAILED.
➊ At the UPD> prompt, exit causes the system to be initialized. ➋ The console prompt appears. ➌ Errors occurred during an update. ➍ Because of the errors, confirmation of the exit is required. ➎ Typing y causes the system to be initialized and the console prompt to appear.
A.5 Display and Verify Commands Display and verify commands are used in special situations. Display shows the physical configuration. Verify repeats the verification process performed by the update command.
➊ Display shows the system physical configuration. Display is equivalent to issuing the console command show configuration. Because it shows the slot for each module, display can help you identify the location of a device. ➋ Verify reads the firmware from the module into memory and compares it with the update firmware on the CD-ROM. If a module already verified successfully when you updated it, but later failed selftest, you can use verify to tell whether the firmware has become corrupted.
A.6 Create The create command allows you to make a custom console image. Example A–6 Create Command UPD> create ➊ Console ARC image: File = obj\alpha\tl6ab Version = V5.68-0 Creation time = 26NOV-1998 05:56:28 Image size = 70000(458752) Console GROM image: File = tl6 Version = V5.
Example A-6 Create Command (Continued) Available overlays: cixcd dac960 demna dup kfmsb kfpsa lamb_diag mc_diag xdelta xmi debug i82558 kgpsa simport defpa kdm70 kzmsa tga Included overlays: tl6 advcmd ashshell basiccmd cpu_mem cpu_tst eecmd eeprom examine fat fru galaxy isp1020 isp1020fw lfu_drivers memtest nettest nport pci_diag phase3 set show toast tulip advshell bitmap diag_tio eisa flash hpc_diag kbd mp_ex ods2 powerup show_power vga arc arccmd boot cipca diagcmd diagsupport environ ether flopp
Appendix B Console Commands and Environment Variables B.1 Console Commands Table B-1 is a summary of the console commands, showing syntax and brief descriptions. For additional information, see the Operations Manual. Table B–1 Summary of Console Commands Command Description b[oot][-flags M,PPPP][-file ] Boot the operating system. –fl[ags]—overrides the boot_osflags environment variable. M — specifies the system root to be booted from the system disk.
Table B–1 Summary of Console Commands (Continued) Command Description bu[ild] –n Initialize the CPU’s nonvolatile RAM. — KN7CG- AA Initialize a module’s serial EEPROM. — MS7CC, KFTHA, or DWLPB. Clears the selected EEPROM option.
Table B–1 Summary of Console Commands (Continued) Command Description run [-d] [-p][-s] Runs one of four ARC utility programs: rcu (RAID Configuration Utility), swxcrfw, eepromcfg, util_cli. The arc_enable environment variable must be set. — command option. — console device containing the program (default is dva0). — unit number of the PCI to configure. — optional parameters to pass to the utility (must be enclosed in quotes).
Table B–1 Summary of Console Commands (Continued) Command Description sh[ow] or show * Displays the current state of the specified environment variable. — an environment variable name (see Table B-2). Displays memory module information. Displays the names and physical addresses of all known network devices. Displays elected SEEPROM information.
B.2 Environment Variables An environment variable is a name and value association maintained by the console program. The value associated with an environment variable is an ASCII string (up to 127 characters) or an integer. Some environment variables are typically modified by the user to tailor the recovery behavior of the system on power-up and after system failures. Volatile environment variables are initialized by a system reset; others are nonvolatile across system failures.
Table B–2 Environment Variables (Continued) Variable Attribute Function boot_reset Nonvolatile Resets system and displays self-test results during booting. Default value is off. console Nonvolatile The type of terminal being used for the console, either serial (default) for a standard video terminal or graphics for a graphics display. If the terminal is a graphics display, the system must have a PCI with a standard I/O module and a TGA graphics controller.
Table B–2 Environment Variables (Continued) Variable Attribute Function enable_audit Nonvolatile If set to on (default), enables the generation of audit trail messages. If set to off, audit trail messages are suppressed. Console initialization sets this to on. graphics_ switch Nonvolatile Overrides the screen resolution setting. The variable is an integer from 0 to 15, as described in Table B-3. interleave Nonvolatile The memory interleave specification.
Table B-3 Settings for the graphics_switch Environment Variable Setting Pixel Frequency (Mhz) Monitor Resolution (Pixels) Refresh Rate (Hz) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 130 119 108 104 93 75 74 69 65 50 40 32 25 135 110 Reserved 1280 x 1024 1280 x 1024 1280 x 1024 1152 x 900 1152 x 900 1024 x 768 1024 x 768 1024 x 864 1024 x 768 800 x 600 800 x 600 640 x 480 640 x 480 1280 x 1024 1280 x 1024 — 72 66 60 72 66 70 72 60 60 72 60 72 60 75 60 — B-8 Service Manual
Index A AC distribution box, 5-28 Address bus commands, 4-2 Address gate array (ADG), 1-7 ARC utility programs, B-3 Audit trail messages, B-7 B BA36R StorageWorks shelf, 1-14, 2-14, 5-33 Baud rate, console terminal, B-7 Blowers, 1-14, 2-14, 5-38 boot command, A-3, B-1 Boot processor, 3-3 Booting LFU, A-2 BPD line, 3-3 build -c command, 5-7, 5-11 build command, B-1 C Cabinet control logic (CCL) panel, 1-12, 5-32 Cabinet panels, 5-40 Cables, 5-42 Cache memory, 1-7 CD-ROM drive, 1-14, 2-14, 5-26 clear comma
simm_callout, B-7 sys_model_num, B-7 sys_serial_num, B-7 tta0_baud, B-7 Error checking, 4-3 Error log, DECevent, 4-4 Error log header structure, 4-31 Error log size, 4-42 Event type identification, 4-7 examine command, B-2 exit command, LFU, A-10 Expander cabinet, 1-2, 5-43 L LARS number, 5-7, 5-11 LFU booting, A-2 display command, A-12 exit command, A-10 list command, A-4 update command, A-6 verify command, A-12 LFU prompt, UPD>, A-3 list command, LFU, A-4 Loadable firmware update (LFU) utility, A-1 F F
P S PAL code, 4-3 Parse trees, 4-23, 4-61 Parsing errors, 4-8, 4-12 path option, A-9 PCI shelves (DWLPB-DA), 1-15 Plenum assembly, 5-38 Power rack assembly, 5-30 Power subsystem, 1-12 Power supplies, 1-12, 5-31 Processor module, 1-6, 5-2 placement, 1-5 replacement, 5-12 Self-test console display, 3-2 Serial console, B-6 set command, B-3 show command, B-3 show configuration command, 5-13 Show configuration display, 3-4 show device command, 5-23, 3 show simm command, 5-13 SIMM console commands, 3-13 SIMM f