Outdoor Storage Troubleshooting Guide

ManualsBrandsSun Microsystems ManualsOutdoor Storage3900

Sun Microsystems, Inc.

4150 Network Circle

Santa Clara, CA 95054 U.S.A.

650-960-1300

Send comments about this document to: docfeedback@sun.com

Sun StorEdge

™

3900 and 6900

Series 2.0 Troubleshooting Guide

Part No. 816-5255-12

March 2003, Revision A

Summary of content (204 pages)

PAGE 1
Sun StorEdge™ 3900 and 6900 Series 2.0 Troubleshooting Guide Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. 650-960-1300 Part No. 816-5255-12 March 2003, Revision A Send comments about this document to: docfeedback@sun.
PAGE 2
Copyright 2003 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at http://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S.
PAGE 3
Contents Preface XV How This Book Is Organized Using UNIX Commands XVI Typographic Conventions Shell Prompts XV XVII XVII Related Documentation XVIII Accessing Sun Documentation Online Sun Welcomes Your Comments 1. Introduction XX XX 1 Predictive Failure Analysis (PFA) Capabilities 2.
PAGE 4
Sun StorEdge 6900 Series Multipathing Example 11 Multipathing Options in the Sun StorEdge 6900 Series Manually Halting the I/O To Quiesce the I/O ▼ To Unconfigure the c2 Path ▼ ▼ 17 17 18 To Put the c2 Path Back into Production 19 To View the Dynamic Multi-Pathing (DMP) Properties ▼ 3.
PAGE 5
Troubleshooting the A1 or B1 FC Link Verifying the Data Host 42 45 FRU Tests Available for the A1 or B1 FC Link Segment ▼ To Isolate the A1 or B1 FC Link Troubleshooting the A2 or B2 FC Link Verifying the Data Host 48 49 51 Verifying the A2 or B2 FC Link 52 FRU Tests Available for the A2 or B2 FC Link Segment ▼ To Isolate the A2 or B2 FC Link Troubleshooting the A3 or B3 FC Link Verifying the Data Host 54 56 57 FRU Tests Available for the A3 or B3 FC Link Segment To Isolate the A3 or B3 FC Li
PAGE 6
▼ 7. To Replace the Alternate Master or Slave Monitoring Host Troubleshooting Switches About the Switches 73 73 Zone Modifications 74 Switchless Configurations ▼ 77 To Use the Switch Event Grid setupswitch Exit Values 8.
PAGE 7
To Clear the Log 110 Virtualization Engine LEDs 110 ▼ Power LED Codes 111 Interpreting LED Service and Diagnostic Codes Back Panel Features 112 Ethernet Port LEDs 112 FC Link Error Status Report ▼ 113 To Check the FC Link Error Status Manually Translating Host-Device Names 113 115 Displaying the VLUN Serial Number 116 ▼ To Display Devices That are Not Sun StorEdge Traffic Manager (MPxIO)Enabled 116 ▼ To Display Sun StorEdge Traffic Manager (MPxIO)-Enabled Devices Viewing the Virtualiz
PAGE 8
▼ To Use the Sun StorEdge T3+ Array Failover Driver GUI ▼ To Use the Sun StorEdge T3+ Array Failover Driver Command Line Interface (CLI) 142 11. Example of Fault Isolation A. Virtualization Engine References SRN Reference 147 155 155 SRN/SNMP Single Point-of-Failure Descriptions Port Communication Numbers 160 Virtualization Engine Service Codes B.
PAGE 9
List of Figures FIGURE 2-1 Sun StorEdge 6900 Series Logical View 11 FIGURE 2-2 Primary Data Paths to the Alternate Master 12 FIGURE 2-3 Primary Data Paths to the Master Sun StorEdge T3+ Array 13 FIGURE 2-4 Path Failure—Before the Second Tier of Switches 14 FIGURE 2-5 Path Failure—I/O Routed Through Both HBAs 15 FIGURE 3-1 Storage Automated Diagnostic Environment Example Topology 24 FIGURE 3-2 Microsoft Windows 2000 Event Properties System Log 26 FIGURE 3-3 Qlogic SANblade Manager HBA Driver
PAGE 10
FIGURE 5-11 A4 or B4 FC Link Data-Host Notification 60 FIGURE 5-12 Storage Service Processor-Side Notification 61 FIGURE 6-1 Sample Host Event Grid FIGURE 7-1 Switch Event Grid FIGURE 8-1 Storage Service Processor Event FIGURE 8-2 Virtualization Engine Alert FIGURE 8-3 Manage Configuration Files Menu 92 FIGURE 8-4 Example Link Test Text Output from the Storage Automated Diagnostic Environment FIGURE 8-5 Sun StorEdge T3+ Array Event Grid 95 FIGURE 9-1 Virtualization Engine Front Panel LEDs
PAGE 11
153 FIGURE 11-8 Successful Switch Test Results FIGURE 11-9 Multipath Recovery using the Sun StorEdge T3+ Array Multipath Configurator 154 FIGURE 11-10 Recovered Paths 154 List of Figures Sun Proprietary/Confidential: Internal Use Only XI
PAGE 12
XII Sun StorEdge 3900 and 6900 Series 2.
PAGE 13
List of Tables TABLE 1-1 Sun StorEdge 3900 and 6900 Series Configurations 1 TABLE 3-1 Event Grid Sorting Criteria 25 TABLE 5-1 FC Links TABLE 5-2 Ax to Bx FC Links.
PAGE 14
XIV TABLE A-5 Virtualization Engine Service Codes —400-599 Device-Side Interface Driver Errors 162 TABLE B-1 Virtualization Engine Error Messages 164 TABLE B-2 Sun StorEdge Network FC Switch Error Messages 168 TABLE B-3 Sun StorEdge T3+ Array Error Messages 171 TABLE B-4 Other SUNWsecfg Error Messages 175 Sun StorEdge 3900 and 6900 Series 2.
PAGE 15
Preface The Sun StorEdge 3900 and 6900 Series 2.0 Troubleshooting Guide provides guidelines for isolating problems in supported configurations of the Sun StorEdge TM 3900 and 6900 series. For detailed configuration information, refer to the Sun StorEdge 3900 and 6900 Series Reference Manual.
PAGE 16
Chapter 6 provides information on host device troubleshooting. Chapter 7 provides information on troubleshooting a Sun StorEdge Network FC switch-8 and switch-16 switch device. Chapter 8 describes how to troubleshoot the Sun StorEdge T3+ array devices. Also included in this chapter is information about the Explorer Data Collection Utility. Chapter 9 provides detailed information for troubleshooting the virtualization engines. Chapter 10 describes how to troubleshoot using Microsoft Windows 2000.
PAGE 17
Typographic Conventions Typeface Meaning Examples AaBbCc123 The names of commands, files, and directories; on-screen computer output Edit your.login file. Use ls -a to list all files. % You have mail. AaBbCc123 What you type, when contrasted with on-screen computer output % su Password: AaBbCc123 Book titles, new words or terms, words to be emphasized Read Chapter 6 in the User’s Guide. These are called class options. You must be superuser to do this.
PAGE 18
Related Documentation Product Title Part Number Late-breaking News • Sun StorEdge 3900 and 6900 Series 2.0 Release Notes 816-5254 Sun StorEdge 3900 and 6900 series information • Sun StorEdge 3900 • Sun StorEdge 3900 • Sun StorEdge 3900 Compliance Manual • Sun StorEdge 3900 816-5252 816-5253 Sun StorEdge T3 and T3+ array • Sun StorEdge • Sun StorEdge • Sun StorEdge Manual • Sun StorEdge • Sun StorEdge • Sun StorEdge and 6900 Series 2.0 Installation Guide and 6900 Series 2.
PAGE 19
Product Title Part Number SANbox-8/16 Segmented Loop FC Switch • SANbox-8/16 Segmented Loop Fibre Channel Switch Management User’s Manual • SANbox-8 Segmented Loop Fibre Channel Switch Installer’s/User’s Manual • SANbox-16 Segmented Loop Fibre Channel Switch Installer’s/User’s Manual 875-3060 Expansion cabinet • Sun StorEdge Expansion Cabinet Installation and Service Manual 805-3067 Storage Server Processor • Sun V100 Server User’s Guide • Netra X1 Server User’s Guide • Netra X1 Server Hard Disk D
PAGE 20
Accessing Sun Documentation Online You can view, print, or purchase a broad selection of Sun documentation, including localized versions, at: http://www.sun.com/documentation Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments and suggestions. You can email your comments to Sun at: docfeedback@sun.com Please include the part number (816-5255) of your document in the subject line of your email. XX Sun StorEdge 3900 and 6900 Series 2.
PAGE 21
CHAPTER 1 Introduction The Sun StorEdge 3900 and 6900 series storage subsystems are complete preconfigured storage solutions. The configurations for each of the storage subsystems are shown in TABLE 1-1.
PAGE 22
Predictive Failure Analysis (PFA) Capabilities The Storage Automated Diagnostic Environment software provides the health and monitoring functions for the Sun StorEdge 3900 and 6900 series systems. This software provides the following predictive failure analysis (PFA) capabilities: ■ FC links—Fibre Channel (FC) links are monitored at all end points using the Fibre Channel-Extended Link Service (FC-ELS) link counters. When link errors surpass the threshold values, an alert is sent.
PAGE 23
CHAPTER 2 General Troubleshooting Procedures This chapter contains the following sections: ■ “High-Level Troubleshooting Tasks” on page 3 ■ “Host-Side Troubleshooting” on page 6 ■ “Storage Service Processor-Side Troubleshooting” on page 6 ■ “Verifying the Configuration Settings” on page 7 ■ “Sun StorEdge 6900 Series Multipathing Example” on page 11 ■ “Multipathing Options in the Sun StorEdge 6900 Series” on page 16 High-Level Troubleshooting Tasks This section lists the high-level steps you ca
PAGE 24
1. Discover the error by checking one or more of the following messages or files: ■ ■ Storage Automated Diagnostic Environment alerts or email messages ■ /var/adm/messages ■ Sun StorEdge T3+ array syslog file Storage Service Processor messages ■ /var/adm/messages.t3 messages ■ /var/adm/log/SEcfglog file 2. Determine the extent of the problem by using one or more of the following methods: ■ Review the Storage Automated Diagnostic Environment topology view.
PAGE 25
4. Check the status of the Sun StorEdge network FC switch-8 and switch-16 switches using the following tools: ■ Review the Storage Automated Diagnostic Environment device monitoring reports. ■ Run the checkswitch(1M) and showswitch(1M) commands, which check and display the Sun StorEdge FC switch configurations. ■ Review the online and offline LED status codes and POST error codes, which can be found in the Sun StorEdge SAN 4.0 and SAN 4.1 Release Installation Guide.
PAGE 26
Note – These tests isolate the problem to a FRU that must be replaced. Follow the instructions in the Sun StorEdge 3900 and 6900 Series 2.0 Reference and Service Guide and the Sun StorEdge 3900 and 6900 Series 2.0 Installation Guide for proper FRU replacement procedures. 8. Verify the fix using the following tools: ■ Storage Automated Diagnostic Environment GUI Topology View and Diagnostic Tests ■ /var/adm/messages on the data host 9.
PAGE 27
Verifying the Configuration Settings During the course of troubleshooting, you might need to verify configuration settings on the various components in the Sun StorEdge 3900 or 6900 series. ▼ To Verify Configuration Settings 1. Run one of the following scripts: ■ Run the runsecfg(1M) script and select the various Verify menu selections for the Sun StorEdge T3+ arrays, the Sun StorEdge network FC switch-8 and switch16 switches, and the virtualization engine components.
PAGE 28
CODE EXAMPLE 2-1 checkdefaultconfig(1M) Output # /opt/SUNWsecfg/checkdefaultconfig Checking all accessible components..... Checking switch: sw1a Switch sw1a - PASSED Checking switch: sw1b Switch sw1b - PASSED Checking switch: sw2a Switch sw2a - PASSED Checking switch: sw2b Switch sw2b - PASSED Please enter the Sun StorEdge T3+ array password : Checking T3+: t3b0 Checking : t3b0 Configuration.......
PAGE 29
2. If anything is marked FAIL, check the /var/adm/log/SEcfglog file for the details of the failure. Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------SAVED CONFIGURATION--------------. Mon Jan 7 18:07:51 PST 2002 checkt3config: Mon Jan 7 18:07:51 PST 2002 checkt3config: Mon Jan 7 18:07:51 PST 2002 checkt3config: Mon Jan 7 18:07:51 PST 2002 checkt3config: Mon Jan 7 18:07:51 PST 2002 checkt3config: Mon Jan 7 18:07:51 PST 2002 checkt3config: Mon Jan 7 18:07:51 PST 2002 checkt3config: MBytes.
PAGE 30
Clearing the Lock File If you interrupt any of the Configuration Utility scripts (by typing Control-C, for example), a lock file might remain in the /opt/SUNWsecfg/etc directory, causing subsequent commands to fail. Use the following procedure to clear the lock file. ▼ To Clear the Lock File 1. Type the following command: # /opt/SUNWsecfg/bin/removelocks usage : removelocks [-t|-s|-v] where: -t - remove all T3+ related lock files. -s - remove all switch related lock files.
PAGE 31
Sun StorEdge 6900 Series Multipathing Example This Sun StorEdge 6900 series multipathing example contains the following elements: ■ One Sun StorEdge T3+ array partner group ■ Two total LUNs ■ One 500-Gbyte RAID5 LUN per partner group See FIGURE 2-1 for a logical view of the Sun StorEdge 6900 series.
PAGE 32
Currently, one 10-Gbyte VLUN is created from each physical LUN, for a total of two VLUNs. The Sun StorEdge 6900 series has four possible physical paths to each Sun StorEdge T3+ array volume (LUN). Refer to FIGURE 2-2, which illustrates primary data paths to the alternate master, and FIGURE 2-3, which illustrates the primary data paths to the master Sun StorEdge T3+ array.
PAGE 33
Host with HBA-0 and HBA-1 LUN0 - 10G LUN0 - 10G Active-MPDrive0 Active-MPDrive0 LUN1 - 10G LUN1 - 10G Active-MPDrive1 Active-MPDrive1 Switch Switch Virtualization SAN Database Virtualization Engine (2) Engine (1) MPDrive Carved LUNs Masking Storage I/O and Virtualization Engine Communications Traffic Switch Switch Logical Multipath Drive LUN0 - 500G Passive - Master LUN1 - 500G Active Alternate Master MPDrive 0 LUN0 - 500G Active-Master LUN1 - 500G Passive Alternate Master Logical Multi
PAGE 34
The host, using multipathing software, is presented with two primary (active) paths for each LUN, allowing the host to route I/O through either or both HBAs. If a path failure occurs before the second tier of Sun StorEdge network FC switch-8 and switch-16 switches, one of the paths is disabled—but the other path continues sending I/O as it normally would and takes over the entire load. Refer to FIGURE 2-4, which illustrates a path failure before the second tier of switches.
PAGE 35
The virtualization engine recognizes the primary (active) and secondary (passive) pathing for the LUNs, and routes the I/O to the primary controller—unless there is a path failure to the primary path. In that case, the virtualization engine initiates a LUN failover and routes the I/O through the secondary path (which, in turn, goes through the interconnect cables). Refer to FIGURE 2-5, which illustrates a path failure where I/O is routed through both HBAs.
PAGE 36
Multipathing Options in the Sun StorEdge 6900 Series The presence of the virtualization engine makes multipathing in a Sun StorEdge 6900 series environment challenging. Unlike Sun StorEdge T3+ array and Sun StorEdge network FC switch-8 and switch16 switch installations (which present primary and secondary pathing options), the virtualization engines present only primary pathing options to the data host.
PAGE 37
Note that in the Class and State fields, the virtualization engines are presented as two primary ONLINE devices. The current Sun StorEdge Traffic Manager software design does not enable you to manually halt the I/O (that is, you cannot perform a failover to the secondary path) when only primary devices are present.
PAGE 38
2. Using the Storage Automated Diagnostic Environment GUI Topology, determine which virtualization engine is in the path you need to disable. 3.
PAGE 39
Note – To confirm that a failover is occurring, open a Telnet session to the Sun StorEdge T3+ array and check the output of port listmap. Another, but slower, method is to run the runsecfg script and verify the virtualization engine maps by polling them against a live system. Caution – During the failover, small computer systems interface (SCSI) errors will occur on the data host and a brief suspension of I/O will occur. ▼ To Put the c2 Path Back into Production 1.
PAGE 40
▼ To View the Dynamic Multi-Pathing (DMP) Properties 1. Type: # vxdisk list Disk_1 Device: Disk_1 devicetag: Disk_1 type: sliced hostid: diag.xxxxx.xxx.COM disk: name=t3dg02 id=1010283311.1163.diag.xxxxx.xxx.com group: name=t3dg id=1010283312.1166.diag.xxxxx.xxx.com flags: online ready private autoconfig nohotuse autoimport imported pubpaths: block=/dev/vx/dmp/Disk_1s4 char=/dev/vx/rdmp/Disk_1s4 privpaths: block=/dev/vx/dmp/Disk_1s3 char=/dev/vx/rdmp/Disk_1s3 version: 2.
PAGE 41
2. Use the luxadm(1M) command to display further information about the underlying LUN. # /usr/sbin/luxadm display /dev/rdsk/c20t2B000060220041F4d0s2 DEVICE PROPERTIES for disk: /dev/rdsk/c20t2B000060220041F4d0s2 Status(Port A): O.K. Vendor: SUN Product ID: SESS01 WWN(Node): 2a000060220041f4 WWN(Port A): 2b000060220041f4 Revision: 080C Serial Num: Unsupported Unformatted capacity: 102400.
PAGE 42
▼ To Put the DMP-Enabled Paths Back into Production 1. Type: # vxdmpadm enable ctlr= 2. Verify that the path has been reenabled by typing: # vxdmpadm listctlr all 22 Sun StorEdge 3900 and 6900 2.
PAGE 43
CHAPTER 3 Troubleshooting Tools This chapter contains the following information related to tools used to troubleshoot the Sun StorEdge 3900 or 6900 series components. ■ ■ ■ ■ ■ “Storage Automated Diagnostic Environment 2.
PAGE 44
Example Topology In the Storage Automated Diagnostic Environment topology shown in FIGURE 3-1, the internel components of a Sun StorEdge 3910 system are shown. There is also a Solaris host (diag221) and the Storage Service Processor (diag156) in the view. What is missing is the Microsoft Windows 2000 host, which is also connected. FIGURE 3-1 24 Storage Automated Diagnostic Environment Example Topology Sun StorEdge 3900 and 6900 Series 2.
PAGE 45
Generating Component-Specific Event Grids The Storage Automated Diagnostic Environment generates component-specific event grids that describe the severity of an event, tell whether action is required, provide a description of the event, and recommended action. Refer to Chapters 5 through 9 of this troubleshooting guide for component-specific event grids. ▼ To Customize an Event Report 1. Choose the Event Grid link on the the Storage Automated Diagnostic Environment Help menu. 2.
PAGE 46
Microsoft Windows 2000 System Errors You can view Microsoft Windows 2000 errors through the Event Properties System Log. The types of errors that would indicate a Sun StorEdge T3+ Array Failover Driver issue have the Source "Jafo". An example is shown in FIGURE 3-2. You should also look for other events such as any HBA driver-related events (qla2200, for example) or disk-related events. FIGURE 3-2 26 Microsoft Windows 2000 Event Properties System Log Sun StorEdge 3900 and 6900 Series 2.
PAGE 47
Command Line Test Examples To run a single Sun StorEdge diagnostic test from the command line rather than through the Storage Automated Diagnostic Environment interface, you must log in to the appropriate host or slave for testing the components. The following two tests, qlctest(1M) and switchtest(1M), are provided as examples. qlctest(1M) The qlctest(1M) test comprises several subtests that test the functions of the Sun StorEdge PCI dual Fibre Channel (FC) host adapter board.
PAGE 48
switchtest(1M) switchtest(1M) diagnoses the Sun StorEdge network FC switch-8 and switch-16 switch devices. The switchtest process also provides command-line access to switch diagnostics. switchtest supports testing on local and remote switches. switchtest runs the port diagnostic on connected switch ports. While switchtest is running, the switch ports monitor the port statistics and check the chassis status. CODE EXAMPLE 3-2 switchtest(1M) # /opt/SUNWstade/Diags/bin/switchtest -v -o "dev=\ 2:192.168.0.
PAGE 49
Monitoring Sun StorEdge T3 and T3+ Arrays Using the Explorer Data Collection Utility The Explorer Data Collection Utility script is included on the Storage Service Processor in the /export/packages directory. The Explorer Data Collection Utility is not installed by default, but can be installed during rack setup. Customer-specific site information can be entered at that time. To find out more about the Explorer Data Collection Utility, you can access the web site with the following URL: http://webhome.
PAGE 50
3. Before running the Explorer Data Collection Utility, make sure that the switch and Sun StorEdge T3+ array information is added to the proper /opt/SUNWexplo/etc files. Example Type switch information in the /opt/SUNWexplo/etc/saninput.txt file. Edit the file and add the switch information, as shown in CODE EXAMPLE 3-3. CODE EXAMPLE 3-3 Editing Switch Information Using vi # vi saninput.
PAGE 51
■ You can now run /opt/SUNWexplo/bin/explorer for information about the Storage Service Processor operating system, the Sun StorEdge network FC switch8 or switch-16 switch, and Sun StorEdge T3+ array information that you can use for troubleshooting purposes. ■ A tar/gzip file is put in the /opt/SUNWexplo/output/tar/gzip file directory. You can send the tar/gzip file to Sun Solution Center for evaluation.
PAGE 52
Monitoring Host Bus Adapters (HBAs) Using QLogic SANblade Manager The most effective way to retrieve HBA status and information is by using the HBA manufacturer’s utility, such as the Qlogic SANblade Manager software provided by Qlogic for their HBAs. This software is freely downloadable from Qlogic’s website (http://www.qlogic.com). Note – Other manufacturer’s utilities, such as LightPulse’s Emulex, are needed for other HBA’s, such as Emulex HBAs.
PAGE 53
FIGURE 3-3 Qlogic SANblade Manager HBA Driver and Firmware Versions Chapter 3 Sun Proprietary/Confidential: Internal Use Only Troubleshooting Tools 33
PAGE 54
QLogic SANblade Manager is also useful for viewing a primitive topology and a LUN listing. FIGURE 3-4 QLogic SANblade Manager Diagnostics Note – Differing HBA manufacturer’s may bundle different features with their tools. The information in this guide is written with the assumption of Qlogic software usage. 34 Sun StorEdge 3900 and 6900 Series 2.
PAGE 55
CHAPTER 4 Troubleshooting Ethernet Hubs The Sun StorEdge 3900 and 6900 series uses an Ethernet hub as the backbone for the internal service network.
PAGE 56
36 Sun StorEdge 3900 and 6900 Series 2.
PAGE 57
CHAPTER 5 Troubleshooting the Fibre Channel (FC) Links FC links diagnose Sun StorEdge network FC components in a SAN or a direct attached storage (DAS) environment. linktest(1M), which tests the health of the FC links, is available only from the Test from Topology view of the Storage Automated Diagnostic Environment GUI. Note – linktest tests both ends of the link segment and enters a guided isolation when a fault is detected.
PAGE 58
FC Links The following sections provide troubleshooting information for the basic components and FC links, listed in TABLE 5-1. FC Links TABLE 5-1 Link A1 to B1 Provides FC Link Between These Components Data host, sw1a, and sw1b A2 sw1a and v1a* B2 sw1b and v1b* A3 v1a and sw2a* B3 v1b and sw2b* A4 Master Sun StorEdge T3+ array and the “A” path switch B4 Alternate master Sun StorEdge T3+ array and the “B” path switch T1 to T2 sw2a and sw2b* * Sun StorEdge 6900 1.
PAGE 59
FC Link Diagrams FIGURE 5-1 shows the basic components and the FC links for a Sun StorEdge 3900 series system: ■ A1 to B1—HBA to Sun StorEdge network FC switch-8 and switch-16 switch link ■ A4 to B4—Sun StorEdge network FC switch-8 and switch-16 switch to Sun StorEdge T3+ array link HOST HBA-B HBA-A B1 A1 sw1a sw1b B4 T3+ alternate master A4 T3+ Master FIGURE 5-1 Sun StorEdge 3900 Series FC Link Diagram Chapter 5 Troubleshooting the Fibre Channel (FC) Links Sun Proprietary/Confidential: Inter
PAGE 60
TABLE 5-2 and FIGURE 5-2 shows the basic components and the FC links for a Sun StorEdge 6900 series system: TABLE 5-2 40 Ax to Bx FC Links.
PAGE 61
HOST HBA-A HBA-B B1 A1 sw1b sw1a B2 A2 v1b v1a B3 A3 T1 sw2b sw2a T2 B4 A4 T3+ alternate master T3+ Master FIGURE 5-2 Sun StorEdge 6900 Series FC Link Diagram Chapter 5 Troubleshooting the Fibre Channel (FC) Links Sun Proprietary/Confidential: Internal Use Only 41
PAGE 62
Troubleshooting the A1 or B1 FC Link The A1 or B1 link is the FC link from the HBA to the switch. What happens when a FC link fails depends on the system. If a problem occurs with the A1 or B1 FC link: 42 ■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over. ■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over, but an error with the FC link can cause a path to go offline. Sun StorEdge 3900 and 6900 Series 2.
PAGE 63
FIGURE 5-3, FIGURE 5-4, and FIGURE 5-5 are examples of A1 or B1 link notification events. Site : Source : Severity : Category : EventType: EventTime: FSDE LAB Broomfield CO diag.xxxxx.xxx.com Normal Message Key: message:diag.xxxxx.xxx.com LogEvent.driver.LOOP_OFFLINE 01/08/2002 14:34:45 Found 1 ’driver.LOOP_OFFLINE’ error(s) in logfile: /var/adm/messages on diag.xxxxx.xxx.
PAGE 64
Site : Source : Severity : Category : EventType: EventTime: FSDE LAB Broomfield CO diag.xxxxx.xxx.com Normal Switch Key: switch:100000c0dd0057bd StateChangeEvent.X.port.6 01/08/2002 14:54:20 ’port.6’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Unknown (statusstate changed from ’Online’ to ’Admin’): FIGURE 5-5 Storage Service Processor Notification Note – An A1 or B1 FC link error can cause a port in sw1a or sw1b to change state. 44 Sun StorEdge 3900 and 6900 Series 2.
PAGE 65
Verifying the Data Host The following example shows an error in the A1 or B1 FC link, which can cause a path to go offline in the multipathing software. CODE EXAMPLE 5-1 luxadm(1M) Display # /usr/sbin/luxadm display /dev/rdsk/c6t29000060220041F96257354230303052d0s2 DEVICE PROPERTIES for disk: /dev/rdsk/ c6t29000060220041F96257354230303052d0s2 Status(Port A): O.K. Status(Port B): O.K.
PAGE 66
An error in the A1 or B1 FC link can also cause a device to enter the “unusable” state in cfgadm -al, as shown in CODE EXAMPLE 5-2.
PAGE 67
CODE EXAMPLE 5-3 switchtest(1M) Called With Options # /opt/SUNWstade/Diags/bin/switchtest -v -o "dev=2:192.168.0.30:0" "switchtest: called with options: dev=2:192.168.0.30:0" "switchtest: Started." "Testing port: 2" "Using ip_addr: 192.168.0.30, fcaddr: 0x0 to access this port." "Chassis Status for Device: Switch Power: OK Temp: OK 23.0c Fan 1: OK Fan 2: OK " 02/06/02 15:09:45 diag Storage Automated Diagnostic Environment MSGID 4001 switchtest.
PAGE 68
▼ To Isolate the A1 or B1 FC Link To isolate the A1 or B1 link, which is the FC link from the HBA to the switch, follow these steps: 1. Quiesce the I/O on the A1 or B1 FC link path. 2. Run switchtest(1M) or qlctest(1M) to test the entire link. 3. Break the connection by uncabling the link. 4. Insert a loopback connector into the switch port. 5. Rerun switchtest. a. If switchtest fails, replace the gigabit interface converter (GBIC) and rerun switchtest. b. If switchtest fails again, replace the switch. 6.
PAGE 69
Troubleshooting the A2 or B2 FC Link The A2 or B2 link is the FC link from the first switch to the virtualization engine. This link exists in the Sun StorEdge 6900 Series only. An error with the FC link can cause a path to go offline. FIGURE 5-6 and FIGURE 5-7 are examples of A2 or B2 Link Notification Events. From root Tue Jan 8 18:39:48 2002 Date: Tue, 8 Jan 2002 18:39:47 -0700 (MST) Message-Id: <200201090139.g091dlg07015@diag.xxxxx.xxx.com> From: Storage Automated Diagnostic Environment.
PAGE 70
Site : Source : Severity : Category : EventType: EventTime: FSDE LAB Broomfield CO diag.xxxxx.xxx.com Normal Switch Key: switch:100000c0dd0061bb StateChangeEvent.X.port.1 01/08/2002 17:38:32 ’port.1’ in SWITCH diag-sw1b (ip=192.168.0.31) is now Unknown (statusstate changed from ’Online’ to ’Admin’): ---------------------------------------------------------------Site : Source : Severity : Category : EventType: EventTime: FSDE LAB Broomfield CO diag.xxxxx.xxx.
PAGE 71
Verifying the Data Host An error in the A2 or B2 FC link can result in a device being listed as in an “unusable” state in cfgadm, but no HBAs being listed in the “unconnected” state in the luxadm output. The multipathing software will note an offline path, as shown in CODE EXAMPLE 5-4. CODE EXAMPLE 5-4 cfgadm -al # /usr/sbin/cfgadm -al Ap_Id c0 Type scsi-bus Receptacle connected Occupant configured Condition unknown ...
PAGE 72
Verifying the A2 or B2 FC Link You can check the A2 or B2 FC link using the Storage Automated Diagnostic Environment, Diagnose—Test from Topology functionality. The Storage Automated Diagnostic Environment’s implementation of diagnostic tests verifies the operation of user-selected components. Using the Topology view, you can select specific tests, subtests, and test options. FRU Tests Available for the A2 or B2 FC Link Segment ▼ ■ The linktest is not available.
PAGE 73
5. If the switch and the GBIC show no errors, replace the remaining components in the following order: a. Replace the virtualization engine-side GBIC, recable the link, and monitor the link for errors. b. Replace the cable, recable the link, and monitor the link for errors. c. Replace the virtualization engine, restore the virtualization engine settings, recable the link, and monitor the link for errors.
PAGE 74
Troubleshooting the A3 or B3 FC Link The A3 or B3 link is the FC link from the virtualization engine to the backend switch. The A3 or B3 FC link exists in a Sun StorEdge 6900 Series only. An error with the FC link can cause a path to go offline. FIGURE 5-8, FIGURE 5-9, and FIGURE 5-10 are examples of A3 or B3 link notification events. Site : Source : Severity : Category : EventType: EventTime: FSDE LAB Broomfield CO diag.xxxxx.xxx.com Normal Message Key: message:diag.xxxxx.xxx.com LogEvent.driver.
PAGE 75
Site : Source : Severity : Category : EventType: EventTime: FSDE LAB Broomfield CO diag.xxxxx.xxx.com Normal Switch Key: switch:100000c0dd0057bd StateChangeEvent.M.port.1 01/08/2002 18:28:38 ’port.1’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Not-Available (status-state changed from ’Online’ to ’Offline’): Info: A port on the switch has logged out of the fabric and gone offline Action: 1. Verify cables, GBICs and connections along FC path 2.
PAGE 76
Verifying the Data Host An error in the A3 or B3 FC link results in a device being listed as in an “unusable” state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm output. The multipathing software will note an offline path.
PAGE 77
CODE EXAMPLE 5-6 DMP Error Message Jul 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 619769 kern.notice] NOTICE: dmp: Path failure on 118/0x1f8 Jul 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 997040 kern.notice] NOTICE: vxvm:vxdmp: disabled path 118/0x1f8 belonging to the dmpnode 231/0xd0 Verifying the Storage Service Processor-Side You can check the A3 or B3 FC link using the Storage Automated Diagnostic Environment’s Test from Topology functionality.
PAGE 78
▼ To Isolate the A3 or B3 FC Link To isolate the A3 or B3 link, which is the FC link from the virtualization engine to the back-end switch, follow these steps: Note – The A3 or B3 FC link exists in a Sun StorEdge 6900 series only. 1. Quiesce the I/O on the A3 or B3 FC link path (refer to “Quiescing the I/O on the A3 or B3 Link” on page 59). 2. Break the connection by uncabling the link. 3. Insert the loopback connector in to the switch port. 4. Run switchtest: a.
PAGE 79
Quiescing the I/O on the A3 or B3 Link 1. Determine the path you want to disable. 2. Disable the path by typing the following: # /usr/bin/vxdmpadm disable ctlr= 3. Verify that the path is disabled: # /usr/bin/vxdmpadm listctlr all Steps 1 and 2 halt I/O only up to the A3 to B3 link. I/O continues to move over the T1 and T2 paths, as well as the A4 to B4 links to the Sun StorEdge T3+ array.
PAGE 80
Troubleshooting the A4 or B4 FC Link The A4 or B4 link is the FC link from the switch to the Sun StorEdge T3+ array. If a problem occurs with the A4 or B4 FC link: ■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over. ■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over, but an error with the FC link can cause a path to go offline. FIGURE 5-11 and FIGURE 5-12 are examples of A4 or B4 Link Notification Events.
PAGE 81
Site : Source : Severity : Category : DeviceId : EventType: EventTime: FSDE LAB Broomfield CO diag Warning Switch switch:100000c0dd0061bb LogEvent.MessageLog 01/29/2002 14:25:05 Change in Port Statistics on switch diag-sw1b (ip=192.168.0.31): Port-1: Received 16289 ’InvalidTxWds’ in 0 mins (value=365972 ) ---------------------------------------------------------------------Site : FSDE LAB Broomfield CO Source : diag Severity : Warning Category : T3message DeviceId : t3message:83060c0c EventType: LogEvent.
PAGE 82
Verifying the Data Host A problem in the A4 or B4 FC Link appears differently on the data host, depending on whether the array is a Sun StorEdge 3900 series or a Sun StorEdge 6900 series device. Sun StorEdge 3900 Series In a Sun StorEdge 3900 series device, the data host multipathing software is responsible for initiating the failover and reports it in /var/adm/messages, such as those reported by the Storage Automated Diagnostic Environment email notifications.
PAGE 83
To verify that the failover luxadm display can be used, the failed path is marked “offline,” as shown in CODE EXAMPLE 5-7. CODE EXAMPLE 5-7 Failed Path Marked Offline # /usr/sbin/luxadm display /dev/rdsk/c26t60020F200000644> DEVICE PROPERTIES for disk: /dev/rdsk/ c26t60020F20000064433C3352A60003E82Fd0s2 Status(Port A): O.K. Status(Port B): O.K.
PAGE 84
CODE EXAMPLE 5-8 Failed Path Marked Unusable # cfgadm -al Ap_Id ac0:bank0 ac0:bank1 c1 c16 c18 c19 c1::dsk/c1t6d0 c20 c21 c21::50020f2300006355 Type Receptacle Occupant Condition memory connected configured ok memory empty unconfigured unknown scsi-bus connected configured unknown scsi-bus connected unconfigured unknown scsi-bus connected unconfigured unknown scsi-bus connected unconfigured unknown CD-ROM connected configured unknown fc-private connected unconfigured unknown fc-fabric connected configure
PAGE 85
5. Rerun switchtest. a. If switchtest fails, replace the GBIC and rerun switchtest. b. If the test fails again, replace the switch. 6. If switchtest passes, assume that the suspect components are the cable and the Sun StorEdge T3+ array controller. a. Replace the cable. b. Rerun switchtest. 7. If the test fails again, replace the Sun StorEdge T3+ array controller. 8. Return the path to production. 9. Return the Sun StorEdge T3+ array LUNs to the correct controllers, if a failover occurred.
PAGE 86
66 Sun StorEdge 3900 and 6900 Series 2.
PAGE 87
CHAPTER 6 Troubleshooting Host Devices This chapter describes how to troubleshoot components associated with a Sun StorEdge 3900 or 6900 series host.
PAGE 88
FIGURE 6-1 68 Sample Host Event Grid Sun StorEdge 3900 and 6900 Series 2.
PAGE 89
TABLE 6-1 lists all the host events in the Storage Automated Diagnostic Environment. Information Description Action Severity Component EventT ype Storage Automated Diagnostic Environment Event Grid for the Host TABLE 6-1 The status of hba / devices/sbus@9,0/ SUNW,qlc@0,30000/ fp@0,0:devctl on diag.xxxxx.xxx.com. The status changed from not connected to connected. Monitors changes in the output of the luxadm -e port. Y The status of hba /devices/sbus@9,0/ SUNW,qlc@0,30000/ fp@0,0:devctl on diag.
PAGE 90
Action Red Y qlctest Diagnostic Test- socal test Diagnostic Test- enclosure 70 Information Severity Diagnostic Test- Component ifptest Description Storage Automated Diagnostic Environment Event Grid for the Host (Continued) EventT ype TABLE 6-1 ifptest (diag240) on the host failed. Check Test Manager for failure details. Red qlctest (diag240) on the host failed. Check Test Manager for failure details. Red socaltest (diag240) on the host failed. Check Test Manager for failure details.
PAGE 91
Replacing the Master, Alternate Master, and Slave Monitoring Host The following procedures are a high-level overview of the procedures that are detailed in the Storage Automated Diagnostic Environment User’s Guide. Follow these procedures when replacing a master, alternate master, or slave monitoring host. Note – The procedures for replacing the master host are different from the procedures for replacing an alternate master or slave monitoring host.
PAGE 92
7. Choose Maintenance -> General Maintenance -> Start/Stop Agent to start the agent on the master host. ▼ To Replace the Alternate Master or Slave Monitoring Host 1. Choose Maintenance -> General Maintenance -> Maintain Hosts. Refer to the maintenance section in Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide. 2. In the Maintain Hosts window, from the Existing Hosts list, select the host to be replaced and click Delete. 3. Install the new host.
PAGE 93
CHAPTER 7 Troubleshooting Switches This chapter describes how to troubleshoot the 1 Gbit and 2 Gbit switch components associated with a Sun StorEdge 3900 or 6900 series system.
PAGE 94
The Sun StorEdge network FC switches in a Sun StorEdge 3900 or 6900 configuration now support the Sun StorEdge SAN 4.1 Release. You can upgrade the switches to support the 402xx 2 Gbit-compatible firmware. Caution – Use caution when upgrading back-end switches to the 2 Gbit-compatible firmware. Use only the setswitchflash command, which performs the upgrade and creates the zone configuration in a controlled manner (refer to the Sun StorEdge 3900 and 6900 Series 2.
PAGE 95
Switchless Configurations In a switchless configuration (Sun StorEdge 3900SL, 6910SL, or 6960SL series system) you can upgrade the switches that are connected to the Solaris server to the Sun StorEdge SAN 4.1 Release firmware. For a list of the supported switches visit the http://www.sun.com web site. Direct attachment to the StorEdge 3900 and 6900 Series arrays with 1 Gbit or 2 Gbit HBAs require no changes.
PAGE 96
4. To restore the configuration from the saved map file back to the default switch configuration, type: # restoreswitch -s switch For detailed diagnostic and troubleshooting procedures for the Sun StorEdge network FC switch-8 and switch-16 switch hardware, refer to the Sun StorEdge SAN 4.1 Release Field Troubleshooting Guide. This document covers the Sun StorEdge network FC switch-8 and switch-16 switch and the interconnections (HBA, GBIC, and cables) on either side of the switch. The Sun StorEdge SAN 4.
PAGE 97
Using the Switch Event Grid The Storage Automated Diagnostic Environment Switch Event Grid enables you to sort switch events by component, category, or event type. The Storage Automated Diagnostic Environment GUI displays an event grid that describes the severity of the event, tells whether action is required, provides a description of the event, and gives the recommended action. Refer to the Storage Automated Diagnostic Environment User’s Guide for more information. ▼ To Use the Switch Event Grid 1.
PAGE 98
TABLE 7-1 lists the switch events for Sun StorEdge network FC switch-8 and switch16 1 Gbit switches. port statistics Yellow Y “Change in port statistics on switch diag156-sw1b (ip=192.168.0.31)” The switch has reported a change in an error counter. This could indicate a failing component in the link. Action Required Note: Text within quotation marks (“ “) is exactly as it appears on the Event Grid.
PAGE 99
chassis. zone Yellow Action Required Note: Text within quotation marks (“ “) is exactly as it appears on the Event Grid. Description Action EventType Alarm Severity Storage Automated Diagnostic Environment Event Grid for 1 Gbit Switches (Continued) Component TABLE 7-1 “Switch sw1a was rezoned” This event reports changes in the zoning of a switch. enclosure Audit “Auditing a new switch called ras d2-swb1 (ip=xxx.0.0.
PAGE 100
enclosure Discovery “Discovered a new switch called ras d2-swb1 (ip=xxx.0.0.41) 10002000007a609” Discovery events occur the very first time the agent probes a storage device. It creates a detailed description of the device monitored and sends it using any active notifier such as the SunTM Remote Services (SRS) Net Connect service or email. enclosure 80 Location Change “Location of switch rasd2swb0 (ip xxx.0.0.40) was changed” Sun StorEdge 3900 and 6900 Series 2.
PAGE 101
port State Change+ Action Required Note: Text within quotation marks (“ “) is exactly as it appears on the Event Grid. Description Action Severity Storage Automated Diagnostic Environment Event Grid for 1 Gbit Switches (Continued) EventType Component TABLE 7-1 “port.1 in SWITCH diag185 (ip= xxx.20.67.185) is now Available (status-state changed from offline to online)” The port on the switch is now available. port State Change- Red Y “port.1 in SWITCH diag185 (ip=xxx.20.67.
PAGE 102
TABLE 7-2 lists the switch events for Sun StorEdge network FC switch-8 and switch16 2 Gbit switches. Action Required Note: Text within quotation marks (“ “) is exactly as it appears on the Event Grid. Description Action EventType Severity Storage Automated Diagnostic Environment Event Grid for 2 GBit Switches Component TABLE 7-2 chassis. fan Alarm- Yellow Y “chassis.fan.1 status changed from OK” None. chassis.
PAGE 103
oob Down Y “Lost communication with sw1a (ip=xxx.20.67.213)” Ethernet connectivity to the switch has been lost. switch2 test Diagnostic Test- enclosure Discovery Red Action Required Note: Text within quotation marks (“ “) is exactly as it appears on the Event Grid. Description Action EventType Comm_ Lost Severity Storage Automated Diagnostic Environment Event Grid for 2 GBit Switches (Continued) Component TABLE 7-2 1. Check Ethernet connectivity to the switch. 2.
PAGE 104
port State Change+ Action Required Note: Text within quotation marks (“ “) is exactly as it appears on the Event Grid. Description Action Severity Storage Automated Diagnostic Environment Event Grid for 2 GBit Switches (Continued) EventType Component TABLE 7-2 “port.1 in SWITCH diag185 (ip= xxx.20.67.185) is now Available (status-state changed from offline to online)” The port on the switch is now available.
PAGE 105
setupswitch Exit Values TABLE 0-1 lists the setupswitch exit values. The associated messages are logged in the /var/adm/log/SEcfglog file. TABLE 0-1 setupswitch Exit Values Message Type Message Meaning 0 INFO All switch settings are properly set. The switch setting matches the default configuration. 1 ERROR Errors occurred while you tried to set the proper switch settings.The switch setting does not match the default configuration or any valid alternatives.
PAGE 106
Note – If multiple systems are connected to a switch, the switch settings might not match the default settings. 86 Sun StorEdge 3900 and 6900 Series 2.
PAGE 107
CHAPTER 8 Troubleshooting the Sun StorEdge T3+ Array Devices The Sun StorEdge T3+ array is a high-performance, modular, scalable storage device that contains an internal RAID controller and disk drives with FC connectivity to the data host. In the Sun StorEdge 3900 and 6900 series, the Sun StorEdge T3+ array is used as a building block, configured in various ways to provide a storage solution optimized to the host application.
PAGE 108
Troubleshooting the T1 or T2 Data Path When you are troubleshooting the T1 or T2 data path, note the following: ■ ■ ■ ■ 88 Two T port links provide redundancy. If one of the two links is lost, no Sun StorEdge T3+ array LUN failover occurs and no pathing failures are detected. If both T port links fail, a Sun StorEdge T3+ array LUN failover occurs, as one of the virtualization engines takes control of the I/O operations.
PAGE 109
Notification Events FIGURE 8-1 shows a typical port failure event. Site : Source : Severity : Category : DeviceId : EventType: EventTime: Lab 3286 - DSQA1 Broomfield diag.xxxxx.xxx.com Error (Actionable) Switch switch:100000c0dd00b682 StateChangeEvent.M.port.8 01/30/2002 11:17:22 ’port.8’ in SWITCH diag209-sw2a (ip=192.168.0.32) is now Not-Available (status-state changed from ’Online’ to ’Offline’): INFORMATION: A port on the switch has logged out of the fabric and gone offline PROBABLE-CAUSE: 1.
PAGE 110
If both T ports go offline, you might see a message like the following. The virtualization engine event is alerting the LUN failover. Site : Source : Severity : Category : DeviceId : EventType: EventTime: Lab 3286 - DSQA1 Broomfield diag.xxxxx.xxx.com Warning (Actionable) Ve ve:6257335A-30303142 AlarmEvent.
PAGE 111
...continued from previous page... ---------------------------------------------------------------------Site : Lab 3286 - DSQA1 Broomfield Source : diag.xxxxx.xxx.com Severity : Warning Category : Message DeviceId : message:diag.xxxxx.xxx.com EventType: LogEvent.driver.Fabric_Warning EventTime: 01/30/2002 11:50:07 Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on diag.xxxxx.xxx.com (id=809f76b4): INFORMATION: Fabric warning Jan 30 11:46:37 WWN:2b00006022004186 diag.xxxxx.xxx.
PAGE 112
▼ To Verify the Storage Service Processor 1. Run the Sun StorEdge T3+ array port listmap command to see the failover event. # t3b0:/:<1>port listmap port u1p1 u1p1 u2p1 u2p1 targetid 0 0 1 1 addr_type hard hard hard hard lun 0 1 0 1 volume vol1 vol2 vol1 vol2 owner u1 u1 u1 u1 access primary failover failover primary 2. Compare the virtualization engine configuration to a saved configuration by running runsecfg(1M). 3. Choose Verify Virtualization Engine Map.
PAGE 113
FRU Tests Available for the T1 or T2 Data Path FRU Running the tests from the Storage Automated Diagnostic Environment GUI guides you in discovering the failed FRU. Refer to Chapter 5 of the Storage Automated Diagnostic Environment User’s Guide for instructions on how to run tests. ■ ■ Run the switchtest to test the switches. Run the linktest to test the T1 or T2 connections. After a test has completed its run, an email message similar to the message in FIGURE 8-4 is sent to the specified email recipient.
PAGE 114
■ When you insert a loopback connector in to the T port, no green light appears to indicate a proper insertion. However, the test will run and be valid. ■ If only one of the links has failed and the I/O is traveling over the remaining link, I/O is automatically routed over the repaired link by the switch after the failed link is replaced and recabled. No manual intervention is required.
PAGE 115
Sun StorEdge T3+ Array Event Grid The Storage Automated Diagnostic Environment Event Grid enables you to sort Sun StorEdge T3+ array events by component, category, or event type. The Storage Automated Diagnostic Environment GUI displays an event grid that describes an event and its severity, and tells what, if any, action should be taken. Refer to the Storage Automated Diagnostic Environment User’s Guide for more information. ▼ To Use the Sun StorEdge T3+ Array Event Grid 1.
PAGE 116
TABLE 8-1 lists all of the events for the Sun StorEdge T3+ array. Storage Automated Diagnostic Environment Event Grid for the Sun StorEdge T3+ Array sysvolslice Alarm Yellow Y The vol slice feature is possible in Sun StorEdge T3+ array firmware version 2.1 and above. This option enables volume slicing, up to 16 LUN per single Sun StorEdge T3+ array or partner group. This feature also enables LUN masking (HBA zoning) features. This option is disabled by default.
PAGE 117
Y The Sun StorEdge T3+ array has reported that a loopcard is in a failed state. Possible Drive Status Messages: Value Description 0 Drive mounted 2 Drive present 3 Drive is spun up 4 Drive is disabled 5 Drive has been replaced 7 Invalid system area on drive 9 Drive not present D Drive disabled; drive is being reconstructed S Drive substituted power.battery Alarm- Red Y The state of the batteries in the Sun StorEdge T3+ array is not optimal.
PAGE 118
power.fan Alarm- Red Y The state of a fan on the Sun StorEdge T3+ array is not optimal. 1. Open a Telnet session to the affected Sun StorEdge T3+ array. 2. Verify the fan state with fru stat. 3. Replace the power cooling unit, if necessary. power.output Alarm- Red Y The state of the power in the Sun StorEdge T3+ array power cooling unit is not optimal. 1. Open a Telnet session to the affected Sun StorEdge T3+ array. 2. Verify power cooling unit state in fru stat. 3.
PAGE 119
Audit Action Description Action Event Type Component enclosure Severity Storage Automated Diagnostic Environment Event Grid for the Sun StorEdge T3+ Array TABLE 8-1 Auditing a new Sun StorEdge T3+ array Audits occur every week. The Storage Automated Diagnostic Environment sends a detailed description of the enclosure to the Sun Network Storage Command Center (NSCC).
PAGE 120
Comm_Lost Down Y OutOfBand (oob) means that the Sun StorEdge T3+ array failed to answer to a ping or failed to return its tokens. This OutOfBand problem can be caused by a very slow network, or because the Ethernet connection to this Sun StorEdge T3+ array was lost. Action Description Action Severity Component oob Event Type Storage Automated Diagnostic Environment Event Grid for the Sun StorEdge T3+ Array TABLE 8-1 1. Check the Ethernet connectivity to the affected Sun StorEdge T3+ array. 2.
PAGE 121
Discovery Action Description Action Severity Component enclosure Event Type Storage Automated Diagnostic Environment Event Grid for the Sun StorEdge T3+ Array TABLE 8-1 The Storage Automated Diagnostic Environment discovered a new Sun StorEdge T3+ array Discovery events occur the first time the Storage Automated Diagnostic Environment probes a storage device.
PAGE 122
Action Description Action Severity Component Event Type Storage Automated Diagnostic Environment Event Grid for the Sun StorEdge T3+ Array TABLE 8-1 enclosure QuiesceStart controller Topology- Red Y ’The Sun StorEdge T3+ array has reported that a controller was removed from the chassis. Replace the controller within the 30-minute power shutdown timeframe. disk Topology- Red Y The Sun StorEdge T3+ array has reported a disk has been removed from the chassis.
PAGE 123
power State Change+ controller State Change+ disk State Change+ interface. loopcard State Change+ volume State Change+ power State Change+ controller State Change- Action Description Action Severity Component Event Type Storage Automated Diagnostic Environment Event Grid for the Sun StorEdge T3+ Array TABLE 8-1 The status of the PCU has changed from readydisable to ready-enable. The Sun StorEdge T3+ array has reported that a LUN has changed state.
PAGE 124
Y The Sun StorEdge T3+ array has reported that a disk has failed. Action Red Description Action State Change- Severity Component disk Event Type Storage Automated Diagnostic Environment Event Grid for the Sun StorEdge T3+ Array TABLE 8-1 1. Open a Telnet session to the affected Sun StorEdge T3+ array. 2. Verify the disk state with vol_stat, fru_stat, and fru_list.
PAGE 125
Y Action Red Description Action State Change- Severity Component volume Event Type Storage Automated Diagnostic Environment Event Grid for the Sun StorEdge T3+ Array TABLE 8-1 1. Open a Telnet session to the affected Sun StorEdge T3+ array. 2. Verify the status of the LUNs with vol_mode or vol_stat.
PAGE 126
106 Sun StorEdge 3900 and 6900 Series 2.
PAGE 127
CHAPTER 9 Troubleshooting Virtualization Engine Devices This chapter describes how to troubleshoot the virtualization engine component of a Sun StorEdge 6900 series system.
PAGE 128
Virtualization Engine Diagnostics The virtualization engine monitors the following components: ■ ■ ■ Virtualization engine router Sun StorEdge T3+ array Cabling between the router and the storage Service Request Numbers (SRNs) SRNs are used to inform the user of storage subsystem activities. Service and Diagnostic Codes The virtualization engine’s service and diagnostic codes inform the user of subsystem activities. The codes are presented as a light-emitting diode (LED) readout.
PAGE 129
Error Log Analysis Commands ▼ To Display the Log Files and Retrieve SRNs ● Type # /opt/svengine/sduc/sreadlog Errors that need action are returned in the following format: TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm A description of the errors follows.
PAGE 130
Example # /opt/svengine/sduc/sreadlog -d v1 2002:Jan:3:10:13:05:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:13:31:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:17:10:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:17:37:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:22:26:v1.29000060-220041F9.SRN=70030 2002:Jan:3:10:25:54:v1.29000060-220041F9.SRN=70030 ▼ To Clear the Log ● Type # /opt/svengine/sduc/sclrlog Virtualization Engine LEDs TABLE 9-1 describes the LEDs on the back of the virtualization engine.
PAGE 131
Power LED Codes The virtualization engine LEDs are shown in FIGURE 9-1. VIRTUALIZATION ENGINE STATUS LED POWER LED FIGURE 9-1 FAULT LED Virtualization Engine Front Panel LEDs Interpreting LED Service and Diagnostic Codes The Status LED communicates the status of the virtualization engine in decimal numbers. Each decimal number is represented by a number of blinks, followed by a medium duration period (two seconds) of no LED display. TABLE 9-2 lists the status LED code descriptions.
PAGE 132
Back Panel Features The back panel of the virtualization engine contains the Sun StorEdge network FC switch-8 or switch-16 switches, a socket for the AC power input, and various data ports and LEDs.
PAGE 133
FC Link Error Status Report The virtualization engine’s host-side and device-side interfaces provide statistical data for the counts listed in TABLE 9-4. TABLE 9-4 ▼ Virtualization Engine Statistical Data Count Type Description Link failure count The number of times the virtualization engine’s frame manager detects a nonoperational state or other failure of N port initialization protocol.
PAGE 134
Note – If the t3ofdg(1M) is running while you perform these steps, the following error message is displayed: Daemon error: check the SLIC router.
PAGE 135
Note – The Serial Loop IntraConnect (SLIC) daemon must be running for the svstat(1M) -d v1 command to work. Translating Host-Device Names You can translate host-device names to VLUN, disk pool, and physical Sun StorEdge T3+ array LUNs. The luxadm output for a host device, shown in CODE EXAMPLE 9-2, does not include the unique VLUN serial number that is needed to identify this LUN. The procedure to obtain the VLUN serial number is detailed next.
PAGE 136
Displaying the VLUN Serial Number ▼ To Display Devices That are Not Sun StorEdge Traffic Manager (MPxIO)-Enabled 1. Use the format -e command. 2. Type the number of the disk on which you are working at the format prompt. 3. Type inquiry at the scsi prompt. 4. Find the VLUN serial number in the Inquiry displayed list. # format -e c4t2B00006022004186d0 format> scsi ...
PAGE 137
▼ To Display Sun StorEdge Traffic Manager (MPxIO)-Enabled Devices If the devices support the Sun StorEdge Traffic Manager software, you can use this shortcut. ● Type: # /usr/sbin/luxadm display /dev/rdsk/c6t29000060220041956257334B30303148d0s2 DEVICE PROPERTIES for disk: /dev/rdsk/ c6t29000060220041956257334B30303148d0s2 Status(Port A): O.K. Status(Port B): O.K.
PAGE 138
Viewing the Virtualization Engine Map The virtualization engine map is stored on the Storage Service Processor. 1. To view the virtualization engine map, type: # /opt/SUNWsecfg/showvemap -n v1 -f VIRTUAL LUN SUMMARY Disk pool VLUN Serial MP Drive VLUN VLUN Size SLIC Zones Number Target Target Name GB -------------------------------------------------------------------------------t3b00 6257334F30304148 T49152 T16384 VDRV000 55.0 t3b00 6257334F30304149 T49152 T16385 VDRV001 55.
PAGE 139
2. Optionally open a Telnet session to the virtualization engine and run the runsecfg utility to poll a live snapshot of the virtualization engine map. Refer to “To Failback the Virtualization Engine” on page 120 for instructions about how to open a Telnet session. Determining the virtualization engine pairs on the system .........
PAGE 140
▼ To Failback the Virtualization Engine In the event of a Sun StorEdge T3+ array LUN failover, the virtualization engine will route all I/O through the failover port on the Sun StorEdge T3+ array. After you isolate and check the cause of the failover, the virtualization engine continues to send I/O through the failover path. To restore the I/O to the primary path and fail the LUN back to its original controller, use the following procedure: 1.
PAGE 141
a. If no failures occur, the command exits with no output. b. If failures occur, you might see one of the following messages: CODE EXAMPLE 9-5 Sun StorEdge T3+ Array Failure Codes # /opt/SUWNsecfg/bin/failbackt3path -n t3b0 MultiPath failback command failed. Returned Result = 513 # /opt/SUWNsecfg/bin/failbackt3path -n t3b0 MultiPath failback command failed. Returned Result = 586 The message return code 513 indicates that the Sun StorEdge T3+ array did not require a failback.
PAGE 142
CODE EXAMPLE 9-6 Error-Free Online Switch Ports # showswitch -s sw2a ...
PAGE 143
6. If either port 1 or port 2 is offline, check the GBICs and cables. 7. If a Sun StorEdge T3+ array switch port is offline, log in to the Sun StorEdge T3+ array and look at the status of the controllers and the port list, as shown in CODE EXAMPLE 9-7.
PAGE 144
▼ To Reset the SAN Database on Both Virtualization Engines 1. Type: # resetsandb -n vepair # restorevemap -n vepair You do not need to manually open a Telnet session to the virtualization engines, unless an ERROR HALT 50 state is detected. Although you might need to power cycle the virtualization engine, first attempt to reset the virtualization engines using the following steps. 2. To disable the switch ports associated with the vehostname, type: # /opt/SUNWsecfg/flib/setveport -n vehostname -d 3.
PAGE 145
▼ To Reset the SAN Database on a Single Virtualization Engine 1. To disconnect the virtualization engine’s device-side FC cables, type: # setveport -v virtualization-engine-name -d 2. Open a Telnet session to the virtualization engine specified in Step 1. 3. Enter the password. The User Service Utility Menu is displayed. 4. Type 9 to clear the SAN database. ■ A successful command displays the message ■ An unsuccessful command results in the service code 051. If this occurs, repeat Steps 1 through 3.
PAGE 146
Restarting the slicd Daemon Follow this procedure to restart the slicd daemon if the SLIC daemon becomes unresponsive, or if a message similar to the following is displayed: connect: Connection refused or Socket error encountered.. ▼ To Restart the slicd Daemon 1. Check whether the slicd daemon is running: # ps -ef | grep slicd 2.
PAGE 147
4. To restart the slicd for the v1 virtualization engine, type: # /opt/SUNWsecfg/bin/startslicd -n v1 (or v2, depending on configuration) 5. Confirm that the slicd daemon is running: # ps -ef | grep slicd root root root root root root 16132 16135 16130 16131 16189 16143 16130 16130 1 16130 15877 16130 0 0 0 0 0 0 11:45:00 11:45:00 11:45:00 11:45:00 11:48:49 11:45:00 ? ? ? ? pts/1 ? 0:00 0:00 0:00 0:00 0:00 0:00 ./slicd ./slicd ./slicd ./slicd grep slicd .
PAGE 148
Product Type : FC-FC-3 SVE H FC-FC-3 router H Firmware Revision : 8.017 Vicom(release), Apr 11 2002 17:49:16 Loader Revision : 2.02.42 Unique ID : 00000060-2200418A Unit Serial Number : 00250339 PCB Number : 00166425 MAC address : 0.60.22.3.D1.E3 DIP SW1 = 00000000 DIP SW2 = 00000011 76543210 76543210 Official Release (1 = down ; 0 = up) Error: None 128 Sun StorEdge 3900 and 6900 Series 2.
PAGE 149
Diagnosing a creatediskpools(1M) Failure When modifying the Sun StorEdge T3+ array configuration on a Sun StorEdge 6900 series, the system should automatically create disk pools. If the virtualization engine cannot find two paths to all Sun StorEdge T3+ array LUNs, however, the multipath drives cannot be created. If this happens, the following procedure can help troubleshoot the problem: 1. Inspect the SUNWsecfg log file (/var/adm/log/SEcfglog) to see if any errors are indicated.
PAGE 150
2. Run the showswitch(1M) command for sw2a and sw2b. Refer to the Sun StorEdge 3900 and 6900 Series 2.0 Reference and Service Manual to see to which switch ports the Sun StorEdge T3+ array and virtualization engine should be attached. In this example, the Sun StorEdge T3+ array (t3b0) should be attached to port 2 of sw2a and sw2b and the virtualization engine should be attached to port 1. All ports should be online.
PAGE 151
3. After corrective action has been successfully completed, run the following command: # creatediskpools -n t3b0 The SEcfglog file should display the following message: Thu May 30 17:40:23 MDT 2002 creatediskpools: t3b0 ENTER: /opt/SUNWsecfg/ bin/creatediskpools -n t3b0. Thu May 30 17:40:24 MDT 2002 checkslicd: v1 ENTER /opt/SUNWsecfg/bin/ checkslicd -n v1. Thu May 30 17:40:28 MDT 2002 checkslicd: v1 EXIT.
PAGE 152
Virtualization Engine Event Grid The Storage Automated Diagnostic Environment Event Grid enables you to sort virtualization engine events by component, category, or event type. The Storage Automated Diagnostic Environment GUI displays an event grid that describes the severity of the event, tells whether action is required, provides a description of the event, and lists the recommended action. Refer to the Storage Automated Diagnostic Environment User’s Guide Help section for more information.
PAGE 153
TABLE 9-5 lists the Virtualization Engine Events. TABLE 9-5 Storage Automated Diagnostic Environment Event Grid for Virtualization Engine Component Required Action EventType Severity Information volume Alarm Yellow This event occurs when the virtualization engine has detected a change in status for a multipath drive or a VLUN. This usually indicates a pathing problem to a Sun StorEdge T3+ array controller, such as changes in active and passive paths. 1.
PAGE 154
TABLE 9-5 Storage Automated Diagnostic Environment Event Grid for Virtualization Engine (Continued) Component Required Action EventType Severity Information oob.slicd Comm_ Lost Down The virtualization engine failed to execute slicd command. 1. Check the status of the slicd daemon. 2. Check the power on the virtualization engine. 3. Make sure the virtualization engine is booted correctly. 4. Verify that the TCP/IP settings on the virtualization engine are correct. 5.
PAGE 155
TABLE 9-5 Storage Automated Diagnostic Environment Event Grid for Virtualization Engine (Continued) Component Required Action EventType Severity Information ve_diag Diagnostic Test- Red The ve_diag test on ve-1 failed veluntest Diagnostic Test- Red The veluntest failed enclosure Discovery The discovery device found a new virtualization engine called v1a. Discovery events occur the first time the agent probes a storage device and creates a detailed description of the device monitored.
PAGE 156
136 Sun StorEdge 3900 and 6900 Series 2.
PAGE 157
CHAPTER 10 Troubleshooting Using Microsoft Windows 2000 General Notes ■ Use the Manufacturer’s HBA Utilities to monitor and diagnose the HBAs. The examples in this chapter use Qlogic’s SANblade Manager utility. ■ The Storage Automated Diagnostic Environment running on the Storage Service Processor is not able to monitor the host-to-switch link.
PAGE 158
Troubleshooting Tasks Using Microsoft Windows 2000 Launching the Sun StorEdge T3+ Array Failover Driver GUI ● From the Microsoft Windows 2000 Advanced Server GUI, click Programs -> T3 StorEdge Configurator -> Configurator. FIGURE 10-1 138 Launching the Sun StorEdge T3+ Array Failover Driver Sun StorEdge 3900 and 6900 Series 2.
PAGE 159
Checking the Version of the Sun StorEdge T3+ Array Failover Driver ● From the Microsoft Windows 2000 Advanced Server GUI, click Help -> About. The About Multipath Configurator window is displayed. FIGURE 10-2 Sun StorEdge T3+ Array Failover Driver Versions 2.0.0.123 and 2.1.0.104 Note – In FIGURE 10-2, the example on the left shows build number 2.0.130 comprised of driver version 2.0.0.123 and application version 2.0.0.125.
PAGE 160
▼ To Use the Sun StorEdge T3+ Array Failover Driver GUI Note – The Sun StorEdge T3+ Array Failover Driver GUI is limited to the Sun StorEdge 3900 series systems. You must use the CLI for the Sun StorEdge 6900 series systems. 1. Make sure the Sun StorEdge T3+ Array failover driver is loaded. From the Microsoft Windows 2000 Advanced Server GUI, click Administrative Tools -> Computer Management -> Software Environment. 2. Ensure the "Jafo" driver is in a Running and OK state. 3.
PAGE 161
4. Compare the healthy Sun StorEdge 3900 series system to a system that has experienced a LUN failover. A system that has experienced a LUN failover has a broken line connecting the HBA to the storage, as shown in FIGURE 10-4. FIGURE 10-4 Sun StorEdge 3900 series system with a LUN failover, shown using Multipath Configurator 5. To further check the affected Sun StorEdge T3+ array: a. Right-click the Sun StorEdge T3+ array in the failed path. b. Select Array Properties.
PAGE 162
c. To view details about the Sun StorEdge T3+ Array paths, click the Details button. The Multipath Configurator LUN Properties detail window is displayed. FIGURE 10-6 Multipath Configurator LUN Properties Detail Note – From this example, note the Primary Path is Unknown and the Alternate Path is currently in use. ▼ To Use the Sun StorEdge T3+ Array Failover Driver Command Line Interface (CLI) Use the jafo_nutil.exe interface, which is available with Sun StorEdge T3+ Array Failover Driver version 2.
PAGE 163
FIGURE 10-7 displays example ouput for a Sun StorEdge 3900 series system from the jafo_nutil.exe interface. # E:\Program Files\Sun Microsystems\T3 Storedge Multiplatform Driver> jafo_nutil.
PAGE 164
FIGURE 10-8 displays example ouput for a Sun StorEdge 6900 series system from the jafo_nutil.exe interface.
PAGE 165
TABLE 10-1 lists some of the codes and descriptions for CLI output for a Sun StorEdge 6910 series system. TABLE 10-1 Tips for Interpreting Sun StorEdge 6910 Series CLI Output Component Output Code Description Device FW_REV Firmware revision level of the virtualization engine WWN The worldwide name of the Master virtualization engine of the partner group.
PAGE 166
146 Sun StorEdge 3900 and 6900 Series 2.
PAGE 167
CHAPTER 11 Example of Fault Isolation In the following example, a fault was injected into a running Sun StorEdge 3900 series system to show a troubleshooting flow. 1. Discover the Error One of the best ways to discover errors is by using the Storage Automated Diagnostic Environment monitoring system. The Storage Automated Diagnostic Environment should be configured to email alerts and events to a local System Administrator.
PAGE 168
In this configuration, Port 2 is shown to have gone offline. Port 2 is a Microsoft Windows 2000 host-to-switch connection. Since the Storage Automated Diagnostic Environment does not have visibility to a Microsoft Windows 2000 host, use the Sun StorEdge T3+ Array Failover Driver utility (the Multipath Configurator) and the HBA utility to troubleshoot the host side. 2.
PAGE 169
The primary path to Drive F: failed. The alternate path is currently handling all of the I/O. 3. Check the HBA Using the HBA utility (Qlogic SANblade in this example), confirm the fault. FIGURE 11-3 Fault Confirmation Using QLogic SunBlade 4. Isolate the components in the path. The components in the path are the HBA, the cable, the switch-side GBIC, and the Sun StorEdge network FC switch itself.
PAGE 170
FIGURE 11-4 Diagnostics Using QLogic SunBlade In this example, the HBA-to-switch cable was removed temporarily and a loopback connector was inserted into the HBA. The Qlogc SANblade LoopBack diagnostics were then run. The HBA passed the tests. Note – The next components that can be isolated are the switch-side GBIC and the Sun StorEdge network FC switch itself.
PAGE 171
In the examples shown in FIGURE 11-5, FIGURE 11-6, and FIGURE 11-7, Port 2 on Switch diag156-sw1a was marked with a "Red" icon, indicating a problem. Note – All tests were run with the default values.
PAGE 172
152 FIGURE 11-6 Storage Automated Diagnostic Environment Test from Topology Pull-Down Menu FIGURE 11-7 Storage Automated Diagnostic Environment Test from Topology Test Detail Sun StorEdge 3900 and 6900 Series 2.
PAGE 173
The first run failed, indicating a problem with either the GBIC or with the Sun StorEdge network FC switch. To further isolate the problem, a new GBIC was inserted into the port, the loopback connector was re-inserted, and the same test was run a second time. FIGURE 11-8 Successful Switch Test Results On this pass, the test was successful. This indicates that the problem was most likely the switch-side GBIC, which was replaced. 6. Recover the problem with the GBIC or the switch. a.
PAGE 174
FIGURE 11-9 Multipath Recovery using the Sun StorEdge T3+ Array Multipath Configurator Note – Storage Automated Diagnostic Environment should also post an event noting that the Port has gone back online. The Multipath Configurator GUI should show both paths online and handling I/O, as illustrated in FIGURE 11-10. FIGURE 11-10 154 Recovered Paths Sun StorEdge 3900 and 6900 Series 2.
PAGE 175
APPENDIX A Virtualization Engine References This appendix contains the following information: ■ “SRN Reference” on page 155 ■ “SRN/SNMP Single Point-of-Failure Descriptions” on page 159 ■ “Port Communication Numbers” on page 160 ■ “Virtualization Engine Service Codes” on page 160 SRN Reference TABLE A-1 provides an explanation of SRNs for the virtualization engine.
PAGE 176
TABLE A-1 SRN Reference SRN Description Corrective Action 1xxxx The SCSI Request Sense command has reported the condition of the disk drive, where xxxx is the Unit Error Code in Sense Data bytes 20 to 21. If too many check conditions are returned, check the link status. 70000 The SAN configuration has changed. No action is needed. 70001 The rebuild process has started. No action is needed. 70002 The rebuild completed without error. No action is needed.
PAGE 177
TABLE A-1 SRN Reference SRN Description Corrective Action 70021 The drive is offline. If the change was unintentional, check the condition of the drives. 70022 The virtualization engine is offline. If the change was unintentional, check the condition of the drives. 70023 The drive is unresponsive. Check the condition of drives. 70024 For the Sun StorEdge T3+ array pack, the master virtualization engine has detected the partner virtualization engine’s IP Address. No action is needed.
PAGE 178
TABLE A-1 SRN Reference SRN Description Corrective Action 71001 This is a generic error code for the SLIC. It signifies communication problems between the virtualization engine and the daemon. 1. Check the condition of the virtualization engine. 2. Check the cabling between the virtualization engine and daemon server. Error halt mode also forces this service request number. 71002 The SLIC was busy. Error halt mode also forces this service request number.
PAGE 179
SRN/SNMP Single Point-of-Failure Descriptions TABLE A-2 provides Simple Network Management Protocol (SNMP) descriptions, associated Service Request Numbers (SRNs), and recommendations for corrective action. TABLE A-2 SRN/SNMP Single Point-of-Failure Table SRN after Corrective Action SRN SNMP Description Corrective Action 70020 70021 70030 70050* • The SAN topology has changed. • The Global SAN configuration has changed. • The SAN configuration has changed. • A physical device is missing.
PAGE 180
Port Communication Numbers Port CommunicationNumbers TABLE A-3 Port Port Port Number Daemon Management programs 20000 Daemon Daemon 20001 Daemon Virtualization engine 25000 Virtualization engine Virtualization engine 25001 Virtualization Engine Service Codes TABLE A-4 lists the service code numbers for errors that occur on the virtualization engine, along with recommendations for corrective action : TABLE A-4 Virtualization Engine Service Codes —0 -399 Host-Side Interface Driver Errors Se
PAGE 181
TABLE A-4 Virtualization Engine Service Codes (Continued)—0 -399 Host-Side Interface Driver Errors 050 An attempt to write a value into nonvolatile storage failed, perhaps because a hardware failure, or one of the databases stored in Flash memory could not accept the entry being added. • Clear the SAN database. • Cycle power to the virtualization engine. 051 The virtualization engine cannot erase Flash memory. • Replace the virtualization engine. 53 The cabling configuration is unauthorized.
PAGE 182
TABLE A-5 Virtualization Engine Service Codes —400-599 Device-Side Interface Driver Errors Service Code Number 162 Cause of Error Recommended Corrective Action 409 The FC device-side type code is invalid. • Cycle the power • If the problem persists, replace the virtualization engine. 434 Cannot continue due to many elastic store errors. Elastic store errors result from a clock mismatch between transmitter and receiver and indicate an unreliable link.
PAGE 183
APPENDIX B Configuration Utility Error Messages The Sun StorEdge 3900 and 6900 Series Reference Manual lists and defines the command utilities that configure the various components of the Sun StorEdge 3900 and 6900 series storage systems. If you encounter errors with the command line utilities, refer to the recommendations for corrective action in this appendix.
PAGE 184
Virtualization Engine Error Messages TABLE B-1 Virtualization Engine Error Messages Source of Error Message Cause of Error Message Suggested Corrective Action Common to virtualization engine Invalid virtualization engine pair name, or the virtualization engine is unavailable. This is usually because the savevemap command is running Run ps -ef | grep savevemap or listavailable -v (which returns the status of individual virtualization engines) to confirm that the configuration locks are set.
PAGE 185
TABLE B-1 Virtualization Engine Error Messages (Continued) Source of Error Message Cause of Error Message Suggested Corrective Action Common to virtualization engine After resetting the virtualization engine, the $VENAME is unreachable. Check the IP address and netmask that has been assigned to the virtualization engine hardware. The hardware might be faulty. Be aware that the machine takes approximately 30 seconds to boot after a reset.
PAGE 186
TABLE B-1 Virtualization Engine Error Messages (Continued) Source of Error Message Cause of Error Message Suggested Corrective Action checkvemap Cannot establish communication with ${vepair} 1. Run the checkvemap command again. 2. If this fails, check the status of both virtualization engines. 3. If there is an error condition, see Appendix A for corrective action. createvezone An invalid WWN ($wwn) is on the $vepair initiator ($init), or the virtualization engine is unavailable.
PAGE 187
TABLE B-1 Virtualization Engine Error Messages (Continued) Source of Error Message Cause of Error Message Suggested Corrective Action restorevemap • The import zone data failed. • The restore physical and logical data failed. • The restore zone data failed. 1. Check the status of both virtualization engines. 2. If an error condition exists, refer to Appendix A for corrective action. 3. Run the restorevemap command again.
PAGE 188
Switch Error Messages TABLE B-2 Sun StorEdge Network FC Switch Error Messages Source of Error Message Cause of Error Message Suggested Corrective Action Common to all Sun StorEdge network FC switches The Sun StorEdge system type entered (${cab_type}) does not match the system type discovered (${boxtype}). Either call the command with the -f force option to force the series type, or do not specify the cabinet type (no -c option).
PAGE 189
TABLE B-2 Sun StorEdge Network FC Switch Error Messages (Continued) Source of Error Message Cause of Error Message Suggested Corrective Action checkswitch • The current configuration on $switch does not match the defined configuration. • One of the predefined static switch configuration parameters that can be overridden for special configurations (such as NT connect or cascaded switches) is set incorrectly. 1. Select View Logs or see $LOGFILE for more details. 2.
PAGE 190
TABLE B-2 Sun StorEdge Network FC Switch Error Messages (Continued) Source of Error Message Cause of Error Message Suggested Corrective Action setswitchflash Invalid flash file $flashfile. Check the number of ports on switch $switch. You might be attempting to download a flash file for an 8-port switch to a 16port switch. Check showswitch -s $switch and look for “number of ports.” Ensure that this matches the second and third characters of the flash file name.
PAGE 191
Sun StorEdge T3+ Array Partner Group Error Messages Caution – Running restoret3config(1M) or modifyt3config(1M) destroys all data on the Sun StorEdge T3+ array. TABLE B-3 Sun StorEdge T3+ Array Error Messages Source of Error Message Cause of Error Message Suggested Corrective Action Common to Sun StorEdge T3+ array • The current configuration does not match the reference (standard) configurations. • This particular configuration is not a standard, supported type. 1.
PAGE 192
TABLE B-3 Sun StorEdge T3+ Array Error Messages (Continued) Source of Error Message Cause of Error Message Suggested Corrective Action Common to Sun StorEdge T3+ array • The Sun StorEdge T3+ array is not of T3B type, so it aborts operations. • t3config utilities are supported only in the Sun StorEdge T3+ array; the t3config utilities are not supported on Sun StorEdge T3+ arrays with 1.xx firmware. 1. Refer to the T3 default/custom configuration table in the Sun StorEdge 3900 and 6900 Series 2.
PAGE 193
TABLE B-3 Sun StorEdge T3+ Array Error Messages (Continued) Source of Error Message Cause of Error Message Suggested Corrective Action checkt3mount • The $lun status reported a bad or nonexistent LUN. • While checking the configuration using the showt3 -n command, operations abort. 1. Run the showt3 -n command to verify that the requested LUN exists on the Sun StorEdge T3+ array. 2. Confirm that the Sun StorEdge T3+ array configuration matches standard configurations.
PAGE 194
TABLE B-3 Sun StorEdge T3+ Array Error Messages (Continued) Source of Error Message Cause of Error Message Suggested Corrective Action restoret3config • $LUN configuration failed to restore. • The force option tried unsuccessfully to reinitialize. 1. Check the Sun StorEdge T3+ configuration with the showt3 -n t3_name command. 2. Refer to the Sun StorEdge T3 and T3+ documentation. restoret3config • $LUN configuration is not found in the $restore_file. • Cannot restore $LUN. 1.
PAGE 195
Other Error Messages TABLE B-4 Other SUNWsecfg Error Messages Source of Error Message Cause of Error Message Suggested Corrective Action Common to all components If the Sun StorEdge 3900 or 6900 series has more than two failures (for example, both virtualization engines and two switches are down), the getcabinet tool might not determine the correct cabinet type.
PAGE 196
176 Sun StorEdge 3900 and 6900 Series 2.
PAGE 197
Abbreviations and Acronyms This list contains definitions for acronyms used in this troubleshooting guide.
PAGE 198
PFA predictive failure analysis POST power on self test RAID redundant array of independent disks RARP reverse address resolution protocol RFE request for enhancement RSS Remote Storage Services SAN storage area network SCSI small computer system interface SLIC Serial Loop IntraConnect SNMP SPOF simple network management protocol single point of failure SRN Service Request Number SRS Sun Remote Services SSP Storage Service Processor SVE storage virtualization engine TCP/IP trans
PAGE 199
Index NUMERICS C 2 Gbit switch error messages, 168 3Com Ethernet hubs, 35 c2 path returning to production, 19 unconfiguring, 17 cfgadm verifying functionality, 4 checkdefaultconfig verifying functionality, 4 command line test example qlctest(1M), 27 switchtest(1M), 28 communication loss event, 3 configuration settings, 23 verifying, 7 creatediskpools(1M) failure diagnosing, 129 A A1 or B1 link verifying, 45 A2 or B2 link isolating, 52 Storage Service Processor Side Event, 50 verifying, 51 A2/B2 link FR
PAGE 200
DMP-enabled paths returning to production, 22 documentation organization, XV shell prompts, XVII using UNIX commands, XVI dynamic multipathing (DMP), 20 E error discovery, 4 error messages other SUNWsecfg, 175 Sun StorEdge network FC switch, 168 Sun StorEdge T3+ array, 171 virtualization engine, 164 error status checking Fibre Channel link manually, 113 error status report Fibre Channel link, 113 Ethernet hubs 3Com related documentation, 35 troubleshooting, 35 ethernet hubs related documentation, 35 event
PAGE 201
installations Sun StorEdge Traffic Manager, 5 VERITAS VxDMP, 5 isolating A1 or B1 FC link, 48 A2 or B2 FC link, 52 A3 or B3 link, 58 isolating FRUs, 5 isolation procedures A1 or B1 FC link, 48 for A2/B2 link, 52 N notification Storage Service Processor, 44 used in PFA, 2 notification events A1 or B1, 43 A2 or B2, 49 A3 or B3, 54 A4 or B4, 60 T1 or T2, 89 P L LED service and diagnostic codes reading virtualization engine, 111 LEDs Ethernet port, 112 ethernet port, 111 power status, 111 virtualization engin
PAGE 202
switches, 74 SAN database manually clearing, 123 manually restoring, 123 resetting, 124 service codes interpreting, 111 overview, 108 retrieving, 108 virtualization engine, 111, 156 service processor troubleshooting, 6 service request numbers for virtualization engine, 155 retrieving, 109 virtualization engine, 108 setswitchflash to upgrade switches, 74 settings configuration, 7 SLIC daemon communication with virtualization engine, 108 killing and restarting, 126 statistical data FC link errors, 113 status
PAGE 203
test examples command line, 27 qlctest(1M), 27 switchtest(1M), 28 testing FRUs, 5 tests how to run, 5 Sun StorEdge T3+ arrays, 5 thresholds used in PFA, 2 tools troubleshooting, 23 troubleshooting broad steps, 3 check status of Sun StorEdge T3+ array, 4 check status of the Sun StorEdge network FC switch-8 and switch-16 switch, 5 check status of the virtualization engine, 5 determine extent of the problem, 4 discovering the error, 4 Ethernet hubs, 35 event grid tool, 95 general procedures, 3 host side, 6 qui
PAGE 204
Z zone modifications, 74 Index 184 Sun StorEdge 3900 and 6900 Series 2.