Sun StorEdge™ 5310 NAS Troubleshooting Guide Sun Microsystems, Inc. www.sun.com Part No. 817-7513-11 August 2004, Revision A Submit comments about this document at: http://www.sun.
Copyright 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at http://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in other countries.
Contents 1.
2.
FTP Server 40 Updating the OS on the Sun StorEdge 5310 NAS 40 Sun StorEdge 5310 NAS Firmware 40 Operating System 40 Common Problems Encountered on the Sun StorEdge 5310 NAS 42 CIFS/SMB/Domain 43 NFS Issues 61 Network Issues 66 NIC speed and duplex negotiation issues.
Macintosh Connectivity 146 Miscellaneous Log Messages 147 Direct Attached Tape Libraries 148 SCSI ID Settings 148 StorEdge File Replicator 149 StorEdge File Replicator Issues 152 3.
Pseudo Real-time Mirroring 3 StorEdge File Replicator 3 Mirroring Variations 7 Operational State 9 Mirror Creation 10 Mirror Replication 11 Mirror Sequencing 12 Link Down and Idle Conditions 12 Cracked and Broken Mirrors 12 Cannot perform first-time synchronization of mirror system: 13 Filesystem errors, such as run check, directory broken, etc.: 13 Error messages, panics or hang condition when enabling mirror: 13 5. Clustering 1 Overview 1 6.
Opening the Front Bezel 5 Memory 6 Power Supply Unit 7 Fan Module 9 High Profile Riser PCI Cards 12 Gigabit Ethernet Card 13 Low Profile Riser PCI Cards 15 Qlogic HBA Removal and Replacement 16 LCD Display Module 17 Flash Disk Module 18 System FRU (Super FRU) 22 Array FRU replacement Procedures 23 Replacing a Controller 23 Replacing a Controller Battery 29 Replacing a Drive 36 Replacing a Fan 39 Replacing a Power Supply 41 Replacing an SFP Transceiver 44 viii Sun StorEdge 5310 NAS Troubleshooting Guide •
Tables TABLE 1-1 List of Adapters 16 TABLE 1-2 Routing Table 16 TABLE A-3 UPS Error Messages 22 TABLE A-4 File System Errors 24 TABLE A-5 PEMS Error Messages TABLE 2-1 Index to Problems TABLE 2-2 Bootup Beep Codes 6 TABLE 2-3 Server LEDs TABLE 2-4 Front Panel LEDs TABLE 2-5 Front Panel Pushbuttons 15 TABLE 2-6 Rear Panel LEDs TABLE 2-7 System Status LED States TABLE 2-8 Power Supply Status LED States 20 TABLE 2-9 Standard POST Error Messages and Codes 24 TABLE 2-10 Extended PO
TABLE 2-16 POST Progress LED Code Table (Port 80h Codes) TABLE 2-17 Status LED Indicators 87 TABLE 2-18 Supported Tape Libraries and Tape Drives 148 TABLE 3-1 Lights on the Back of a Command Module 14 TABLE 3-2 Lights on the Front of a Command Module 23 TABLE 3-3 Lights on the Back of a Command Module 24 TABLE 3-4 Enterprise Management Window Menus TABLE 3-5 Enterprise Management Window Toolbar Buttons 49 TABLE 3-6 Array Management Window Tabs 52 TABLE 3-7 Array Management Window Menus (1
Figures FIGURE 2-1 Front Panel Pushbuttons and LEDs 13 FIGURE 2-2 Rear Panel LEDs FIGURE 2-3 Location of Front-Panel System Status LED 18 FIGURE 2-4 Location of Rear-Panel Power Supply Status LEDs FIGURE 2-5 Fault and Status LEDs on the Server Board FIGURE 2-6 Location of Front-Panel ID Pushbutton and LED 23 FIGURE 2-7 Examples of POST LED Coding FIGURE 2-8 The Update Software Panel 41 FIGURE 3-1 Controller FIGURE 3-2 Label Locations on the Controller FIGURE 3-3 Battery Charging/Charg
FIGURE 3-14 Lights on the Front of a Command Module 23 FIGURE 3-15 Lights on the Back of a Command Module 24 FIGURE 3-16 Power Supply Switches FIGURE 3-17 Setting the Tray ID Switch 30 FIGURE 3-18 Verifying the Link Rate Setting FIGURE 3-19 Removing and Installing a Drive 35 FIGURE 3-20 Power Supply Switches FIGURE 3-21 Removing and Installing a Drive 38 FIGURE 3-22 Removing and Installing a Drive 42 FIGURE 3-23 Enterprise Management Window 45 FIGURE 3-24 Array Management Window 45 FIG
FIGURE 4-2 Write ordering on the Mirror 5 FIGURE 4-3 Lost transaction handling on the Mirror FIGURE 4-4 The Mirror Log and Primary Journal 7 FIGURE 6-1 Physical and Logical Volume Relationship 2 FIGURE 6-2 The Copy-On-Write Mechanism for Checkpoints FIGURE 6-3 Mappings for Block n Before Modification 5 FIGURE 6-4 Mappings for Block n After Modification 6 FIGURE 6-5 Creating a hardlink when a volume is checkpointed and has active checkpoints FIGURE 6-6 Mappings for Block n After Deleting ck
FIGURE 7-15 Removing the SFP Transceiver and fibre Optic Cable 31 FIGURE 7-16 Removing and Replacing a Controller FIGURE 7-17 Removing the Controller Cover (Upside Down View) FIGURE 7-18 Removing and Installing the Controller Battery FIGURE 7-19 Label Locations on the Controller FIGURE 7-20 Drive Link, Host Link, Battery, and Fault Lights FIGURE 7-21 Replacing a Drive 38 FIGURE 7-22 Replacing a Fan 40 FIGURE 7-23 Replacing a Power Supply 43 FIGURE 7-24 Replacing an SFP Transceiver xiv 3
Preface This Troubleshooting Guide provides information on how to identify, isolate, and fix problems with the Sun StorEdgeTM 5310 NAS. It also explains how to remove and replace certain key server components.
Who Should Use This Book The intended audience for this book is Sun field service personnel who are responsible for maintaining Sun StorEdge 5310 NAS.
Related Documentation These documents contain information related to the tasks described in this book: Sun Sun Sun Sun StorEdge StorEdge StorEdge StorEdge 5310 5310 5310 5310 NAS NAS NAS NAS Quick Reference Manual Hardware Installation, Configuration, and User Guide Software Installation, Configuration, and User Guide Setup Poster Ordering Sun Documents The SunDocsSM program provides more than 250 manuals from Sun Microsystems, Inc.
Shell Prompts in Command Examples The following table shows the default system prompt and superuser prompt for the C, Bourne and Korn shell. TABLE P-2 Shell Prompt Shell Prompt Bourne shell and Korn shell prompt machine name$ Bourne shell and Korn shell superuser prompt machine name# Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments and suggestions. You can email your comments to Sun at: docfeedback@sun.
CHAPTER 1 Troubleshooting Overview This chapter provides an overview of diagnostic functions and tools needed for troubleshooting the Sun StorEdge 5310 NAS. This chapter contains the following sections: ■ ■ ■ 1.1 “How to Use This Manual” on page 1-1 “Important Notices and Information on the Sun StorEdge 5310 NAS” on page 1-2 “Diagnostic Information Sources” on page 1-8 How to Use This Manual Before going deep into this manual, check the following to ensure that common problems have been resolved.
1.2 Important Notices and Information on the Sun StorEdge 5310 NAS Caution – Do not plug a USB keyboard into the front USB connector. This will cause the system to crash. Caution – Do Not power on the Sun StorEdge 5310 NAS, until two minutes after the JBOD has been powered up, to ensure that the disk drives have finished spinning up. Caution – /dvol/etc folder contains config information and needs to be backed up to ensure that all configuration information is available upon a failure.
1.3 Troubleshooting Tools 1.3.0.1 Storage Automated Diagnostic Environment (StorAde) If you have the Storage Automated Diagnostic Environment installed in the host, check the internal status of the array with this tool. See the documentation for this tool for further information. All that you need to use the Storage Automated Diagnostic Environment is web browser access to the host where it is installed. 1.3.0.
A variety of software logging tools monitor the various branches of the storage network. When an error is detected, the error’s severity level is categorized and classified. Errors are reported or logged according to severity level. 1.3.0.6 Log Message Severity Levels ■ Emergency—Specifies emergency messages. These messages are not distributed to all users. Emergency priority messages are logged into a separate file for reviewing. ■ Alert—Specifies important messages that require immediate attention.
■ Review the Storage Automated Diagnostic Environment topology view ■ Using the Storage Automated Diagnostic Environment revision checking ■ functionality, determine whether the package or patch is installed 3.
1.5 ■ Sun StorEdge 5310 NAS messages, found in the syslog file, indicating a problem. See Error Messages section for more information about array generated messages. ■ Host-generated message, found in the /var/adm/messages file, CIFS clients may have errors on their monitor or in the event log. Troubleshooting Flow Charts Use the flow charts below to diagnose problems.
Follow the steps below to diagnose hardware problems.
Follow the steps below to diagnose software problems. 1.6 Diagnostic Information Sources 1.6.1 StorEdge Diagnostic Email The diagnostic email includes information about the StorEdge system configuration, disk subsystem, file system, network configuration, SMB shares, backup/restore information, /etc information, system log, environment data and administrator information. The diagnostics are a primary tool for checking configuration and troubleshooting.
To collect diagnostics, proceed as follows: 1. Access the StorEdge via Telnet or serial console. 2. Press enter at the [menu] prompt and enter the administrator password. 3. Press the spacebar until “Diagnostics” is displayed under “Extensions” at the lower right. 4. Select the letter corresponding to “Diagnostics”. 5. Wait a few seconds while the StorEdge builds the diagnostic. 6. Select option “2”, Send Email 7. Select option “1”, Edit problem description 8. Enter a precise description of the problem 9.
This functionality is also available through the StorEdge Web Admin. To access these settings, log in, and click the envelope icon on the top taskbar. All of the options described above are available. 1.6.2 Data Collection for Escalations 1.6.2.1 Collecting Information from the Sun StorEdge 5310 NAS The following are important considerations for data collection. Data collection is critical in cases that require escalation.
■ ■ ■ ■ ■ ■ ■ ■ ■ 1.6.2.5 Version(s) of software on client system(s) Version(s) of software on server system(s) Network topology Steps and/or sequence of events leading to the failure What was the user doing or attempting to do when the failure occurred? Problem symptom (error codes, failed operation, crash) Syslog data Network traces Diagnostic email Check remote access capabilities In some cases, it is useful for one of your escalation resources to directly access the system.
■ ■ ■ ■ ■ ■ Software version, including any service packs, options or minor revisions Client configuration information– mount options, NIC configuration, platform, etc. Network information – topology, switch and router information, path from client to StorEdge Server information – Detailed information on any application or authentication servers, including all of the above details. An exact set of steps to reproduce the problem.
Cacls For issues with access to a file or directory, collect the output of the cacls command. This command is available from the CLI. At the CLI, enter “cacls ”. The full pathname should begin with the volume name, as in this example: “cacls /vol1/testfile.txt”. Cacls output contains the following information: First, the basic mode information and UID/GID of the owner is displayed.
Proc filesystem The /proc filesystem is a virtual filesystem used to collect system data. The location of some of the more useful data is listed below. To collect the data, copy the file, or use the “cat” CLI command to dump it to the screen while logging the terminal session. /proc/cifs/DOMAIN.USER.6789ABCD… These are user access tokens. They may be useful in troubleshooting SMB issues. These file names begin with the domain name, then the username, then some hexadecimal digits.
3. Press the spacebar until “Packet Capture” is displayed under “Extensions” at the lower right. 4. Select the letter corresponding to “Packet Capture”. 5. Select option “1”, Edit Fields. The available options are as follows: ■ Capture FileWhere to save the capture file. ■ Frame Size (B)Size in bytes of each frame to capture. The default is normally used. ■ IP Packet Filter“No” captures all traffic, “Yes” allow you to filter what is received.
1.6.2.9 TCP/IP Connectivity problems A good tool to investigate network connectivity problems is the netstat command. This command is available from the StorEdge CLI. Simply type “netstat” at the CLI and a list of all network interfaces and routes is displayed, along with several useful statistics. Two tables are displayed, as follows: TABLE 1-1 List of Adapters Name Mtu Netmask Address Ipackets Ierr Opackets Oerr Coll lo0 1536 255.0.0.0 127.0.0.1 77 0 77 0 0 fxp1 1500 255.255.255.
These are the result of an ICMP message from another router or firewall, typically due to mis-configuration of that device. It is also possible to configure StorEdge to ignore ICMP requests to change the default gateway. ■ Check the “Use” statistic in the routing table. This statistic indicates how many times a route has been used. If you have defined a route for a specific purpose, such as mirroring, and this counter is not incrementing, then the route was most likely not defined correctly.
Other processes / High CPU Utilization When performance is low, one possible reason is that the system is busy with other processes. One way to check this is to observe the CPU utilization. This is best viewed from the activity monitor screen in the telnet interface. The CPU utilization can be found in the lower right corner, listed as a percentage. The rest of the activity monitor screen may also be helpful, as it may give an indication of the source of the demand on resources.
■ rateread: read contents of a file, report performance. The file read does not use network connection. This can determine if issue is disk or network related and also if problem is in reading or writing data. usage: rateread FILENAME [+OFFSET] TOTALKB [BLOCKSIZE] example: support > rateread /vol1/testfile 8192 1024000000 bytes (976.5M) in 0.877 seconds 1.086GB/sec ■ ratecopy: copy a file, test the performance of a file copy from source to target.
Components of an Event Message Time/date Severity 05/23/04 05:55:30 C sysmon[63]: Disk drive at enclosure 1 row 0 column 2 Facility failed. ■ ■ ■ ■ ■ FID Message body Time/date- Time and Date of the event Severity- Severity can be one of those listed below Facility- The system module that reported the message FID- The kernel ID of the Facility Message Body- The contents of the message Severity Level Definitions (highest to lowest) 1.6.5 ■ Emergency—Specifies emergency messages.
NAS, monitors the status of RAID devices, UPSs, file systems, head units, enclosure subsystems, and environmental variables. Monitoring and error messages vary depending on model and configuration. In the tables in this section, table columns with no entries have been deleted. About SysMon Error Notification SysMon, the monitoring thread in the Sun StorEdge 5310 NAS, captures events generated as a result of subsystem errors.
UPS Subsystem Errors Refer to Table A-3 for descriptions of UPS error conditions. TABLE A-3 UPS Error Messages Event E-Mail Subject: Text SNMP Trap LCD Panel Log Power Failure AC Power Failure: AC power failure. System is running on UPS battery. Action: Restore system power. Severity = Error EnvUpsOn Battery U20 on battery UPS: AC power failure. System is running on UPS battery. Power Restored AC power restored: AC power restored. System is running on AC power.
TABLE A-3 UPS Error Messages Event E-Mail Subject: Text Write-back cache is disabled. SNMP Trap LCD Panel Log Controller Cache Disabled: Either AC power or UPS is not charged completely. Action: 1 - If AC power has failed, restore system power. 2 - If after a long time UPS is not charged completely, check UPS. Severity = Warning Cache Disabled write-back cache for ctrl x disabled Write-back cache is enabled. Controller Cache Enabled: System AC power and UPS are reliable again.
File System Errors File system error messages occur when the file system usage exceeds a defined usage threshold. The default usage threshold is 95%. TABLE A-4 File System Errors Event E-Mail Subject: Text SNMP Trap LCD Panel Log File System Full File system full: File system is xx% full. Action: 1. Delete any unused or temporary files, OR 2. Extend the partition by using an unused partition, OR 3. Add additional disk drives and extend the partition after creating a new partition.
TABLE A-5 PEMS Error Messages Event E-Mail Subject: Text SNMP Trap LCD Panel Log Fan Error Fan Failure: Blower fan xx has failed. Fan speed = xx RPM. Action: The fan must be replaced as soon as possible. If the temperature begins to rise, the situation could become critical. Severity = Error envFanFail trap P11 Fan xx failed Blower fan xx has failed! Power Supply Module Failure Power supply failure: The power supply unit xx has failed.
1.7 Maintenance Precautions The sections that follow provide subassembly-level removal and installation guidelines. After completing all necessary removal and replacement procedures, verify that all components are working properly. 1.7.0.1 Tools Required To service the Sun StorEdge 5310 NAS, you need: ■ ■ 1.7.0.2 Phillips screw driver Flat head screw driver Electrostatic Discharge Information Static electricity can cause damage to static-sensitive devices and/or microcircuitry.
5. Disconnect the all other external peripheral devices from the Sun StorEdge 5310 NAS server if applicable. 6. Disconnect all optical fibre and network interface cables from the Sun StorEdge 5310 NAS server and the storage enclosure. 7. Remove the Sun StorEdge 5310 NAS and the storage enclosure from the rack. 1.8 Static Electricity Precautions 1.8.0.1 Grounding Procedure You must maintain reliable grounding of this equipment.
2. Wear a wrist strap, and always be properly grounded when touching staticsensitive equipment/parts. If a wrist strap is not available, touch any unpainted metal surface on the Sun StorEdge 5310 NAS (and optional Sun StorEdge 5310 NAS Expansion Unit) back panel to dissipate static electricity. Repeat this procedure several times during installation. 3. Avoid touching exposed circuitry, and handle components by their edges only.
CHAPTER 2 NAS Head This chapter addresses frequently asked questions for the Sun StorEdge 5310 NAS. The chapter contains these sections: ■ ■ ■ ■ ■ ■ “Hardware” on page 2-1 “OS Operations” on page 2-36 “Updating the OS on the Sun StorEdge 5310 NAS” on page 2-40 “Sun StorEdge 5310 NAS Firmware” on page 2-40 “Common Problems Encountered on the Sun StorEdge 5310 NAS” on page 2-42 “Frequently Asked Questions” on page 2-92 2.1 Hardware 2.
Spain Tel: +011 3491 767 6000 See the following link for US, Europe, South America, Africa, and APAC local country telephone numbers: http://www.sun.com/service/contacting/solution.html For general support and documentation on the servers, see the following link: http://www.sun.com/supporttraining/ 2.2.1 Problems With Initial System Startup Problems that occur at initial system startup are usually caused by incorrect installation or configuration. Hardware failure is a less frequent cause. 2.2.1.
■ 2.2.2 Are there any POST beep codes? If so check “POST Error Beep Codes” on page 227. Resetting the Server Quite often, a problem can be solved merely be resetting the server or shutting it down and powering it back up. You may restart or shut down the Sun StorEdge 5310 NAS using software or hardware. 2.2.2.1 Shutdown Commands for Software Menu To shutdown the system using the menu: 1. Use the Web Administrator or Telnet to the Sun StorEdge 5310 NAS to shutdown the server. 2.
2.2.3 Preparing the System for Diagnostic Testing Caution – Turn off devices before disconnecting cables. Before disconnecting any peripheral cables from the system, turn off the system and any external peripheral devices. Failure to do so can cause permanent damage to the system and/or the peripheral devices. 1. Turn off the system and all external peripheral devices. Disconnect all of them from the system, except the keyboard and video monitor. 2.
Problems Starting Up If the server does not start up properly, use the information in this section to diagnose problems. Server Does Not Power On If the server does not power on, check the following: ■ ■ ■ ■ ■ ■ Does the main server board have power? Open the chassis lid and check the 5V Standby LED on the baseboard to see if it is illuminated. If your server is plugged in, this LED should be green. See Figure 2-5, “Fault and Status LEDs on the Server Board,” on page 2-21 for the location of this LED.
Note – A corded PS/2 keyboard (not a wireless one) must be plugged into the keyboard/mouse connector at the back of the server. When the front panel is locked, the lights on the keyboard flash, but the server is still fully functional. Server Beeps at Power On or When Booting The server indicates problems with “beep codes” during Power-On Self Test (POST) in the event there is no displayed video. A complete list of beep codes is given in “POST Error Beep Codes” on page 2-27.
TABLE 2-2 Beeps Bootup Beep Codes Reason 1-5-4-2 Power fault 1-5-4-3 Chipset control failure 1-5-4-4 Power control failure Server Starts Booting Automatically at Power On The server board saves the last known power state in the event of a power failure. If you remove power before powering down the system using the power switch on the front panel, your system might automatically attempt to restore itself back to the state it was in after you restore power.
3. If you do not press or or and do NOT have a device with an operating system loaded, the boot process continues and the system beeps once. The following message is displayed: Operating System not found 4. At this time, pressing any key causes the system to attempt a reboot. The system searches all removable devices in the order defined by the boot priority. During POST, the server BIOS presents screen messages to indicate error conditions.
■ ■ ■ ■ ■ ■ ■ Did you press the power button? Is the power on light illuminated? Have any of the fan motors stopped (use the server management subsystem to check the fan status)? Are the fan power connectors properly connected to the server board? Is the cable from the front panel board connected to the server board? Are the power supply cables properly connected to the server board? Are there any shorted wires caused by pinched cables or power connector plugs forced into power connector sockets the wrong
2.2.3.6 Other Problems If the preceding information does not fix the problem with your server, try the following: 2.3 ■ Check for proper processor installation. Systems with a single processor must have the CPU installed in CPU socket 1. If two processors are installed, the processors must be of the same speed and voltage (and within one stepping). Do not attempt to over clock the processors or other components on this system.
2.4.1 LEDs You can use the diagnostic LED indications to isolate faults. See “LEDs and Pushbuttons” on page 2-11. 2.4.2 Beep Codes A built-in server speaker indicates failures with audible beeps. See “POST Error Beep Codes” on page 2-27. 2.4.3 POST Screen Messages For many failures, the BIOS sends error codes and message to the screen. See “POST Screen Messages” on page 2-24 2.5 LEDs and Pushbuttons Note – This section addresses LEDs and Pushbuttons on the Sun StorEdge 5310 NAS.
TABLE 2-3 LED Name Server LEDs Function Location Identifies failing At the front of each Memory DIMM module DIMM location on DIMM main board fault (1 - 6) POST LEDs (1 - 4) Displays boot 80 Left rear of main POST codes board Fan fault (1 - 4) Identifies Sun StorEdge 5310 NAS fan failure Color Status Amber On = fault Each LED See “POST Progress Code LED Indicators” on can be off, page 2-30 for POST code LED details.
NIC1 and NIC2 Activity LEDs Power/Sleep Pushbutton Power/Sleep LED System Status LED ID LED ID Pushbutton Hard Disk Status LED Reset Pushbutton NMI Pushbutton FIGURE 2-1 2.5.1.1 Front Panel Pushbuttons and LEDs Front Panel LEDs The front panel LEDs are summarized in Table 2-4. TABLE 2-4 Front Panel LEDs LED Color Function Power Green This LED is controlled by software. It turns steady when the server is powered up and is off when the system is off or in sleep mode.
TABLE 2-4 Front Panel LEDs LED Color Function System Status/Fault Green/ Amber This LED can assume different states (green, amber, steady, blinking) to indicate critical, non-critical, or degraded server operation. Steady green: Indicates the system is operating normally Blinking green: Indicates the system is operating in a degraded condition. Blinking amber: Indicates the system is in a non-critical condition. Steady amber: Indicates the system is in a critical or non-recoverable condition.
2.5.1.2 Front Panel Pushbuttons The front panel pushbuttons are summarized in Table 2-5. TABLE 2-5 Front Panel Pushbuttons Switch Function Power/Sleep This pushbutton is used to toggle the system power on and off. This button is also used as a sleep button for operating systems that follow the ACPI specification. Linux, for example, configures the power button to the instant off mode. There is no ACPI support for the Solaris OS. Reset Depressing this pushbutton reboots and initializes the system.
2.5.2 Rear Panel LEDs The rear panel contains the LEDs shown in Figure 2-2.
TABLE 2-6 Rear Panel LEDs LED Color Function System ID Blue This LED is located on the Main Board and is visible through holes in the rear panel. It can provide a mechanism for identifying one system out of a group of identical systems. This can be particularly useful if the server is used in a rackmount chassis in a high-density, multiple-system application.
2.5.3 Front-Panel System Status LED The front-panel system status LED is located as shown in Figure 2-3. System Status LED FIGURE 2-3 Location of Front-Panel System Status LED The front-panel system status LED has the states indicated in Table 2-7. TABLE 2-7 System Status LED States System Status LED State System Condition CONTINUOUS GREEN Indicates the system is operating normally. BLINKING GREEN Indicates the system is operating in a degraded condition.
■ ■ ■ ■ Power subsystem failure. The Baseboard1 Management Controller (BMC) asserts this failure whenever it detects a power control fault (for example, the BMC detects that the system power is remaining on even though the BMC has deasserted the signal to turn off power to the system). The system is unable to power up due to incorrectly installed processor(s), or processor incompatibility.
2.5.4 Rear Panel Power Supply Status LED The rear-panel power supply status LEDs are located as shown in Figure 2-4. Power Supply Status LEDs (Redundant Power Supplies) FIGURE 2-4 Location of Rear-Panel Power Supply Status LEDs The rear-panel power supply status LED has the states indicated in Table 2-8.
2.5.5 Server Main Board Fault LEDs There are several fault and status LEDs built into the server board (see Figure 2-5). Some of these LEDs are visible only when the chassis cover is removed. The LEDs are explained in this section. ID LED System Status LED POST LEDs DIMM Fault LEDs (6) CPU 2 Fault LED CPU 1 Fault LED 5V Sytem Standby LED FIGURE 2-5 Fault and Status LEDs on the Server Board The fault LEDs are summarized below.
■ POST LEDs: To help diagnose POST failures, a set of four bi-color diagnostic LEDs is located on the back edge of the baseboard. Each of the four LEDs can have one of four states: Off, Green, Red, or Amber. During the POST process, each light sequence represents a specific Port-80 POST code. If a system should hang during POST, the diagnostic LEDs present the last test executed before the hang. When reading the lights, the LEDs should be observed from the back of the system.
2.5.6 System ID LEDs A pair of blue LEDs, one at the rear of the server, and one on the front panel, can be used to easily identify the server when it is part of a large stack of servers. A single blue LED located at the back edge of the server board next to the backup battery is visible through the rear panel. The two LEDs mirror each other and can be illuminated by the Baseboard Management Controller (BMC) either by pressing a button on the chassis front panel or through server-management software.
2.6 Power-On Self Test (POST) The BIOS indicates the current testing phase during POST by writing a hex code to the Enhanced Diagnostic LEDs, located on the rear of the server main board and visible through the back of the chassis. If errors are encountered, error messages or codes will either be displayed to the video screen, or if an error has occurred prior to video initialization, errors will be reported through a series of audible beep codes. POST errors are logged in to the System Event Log (SEL).
TABLE 2-9 Error Code Standard POST Error Messages and Codes (Continued) Error Message Pause On Boot 104 CMOS options not set Yes 105 CMOS checksum failure Yes 106 CMOS display error Yes 107 Insert key pressed Yes 108 Keyboard locked message Yes 109 Keyboard stuck key Yes 10A Keyboard interface error Yes 10B System memory size error Yes 10E External cache failure Yes 113 Hard disk 0 error Yes 114 Hard disk 1 error Yes 115 Hard disk 2 error Yes 116 Hard disk 3 error Ye
TABLE 2-10 Error Code 2-26 Extended POST Error Messages and Codes Error Message Pause On Boot 8100 Processor 1 failed BIST No 8101 Processor 2 failed BIST No 8110 Processor 1 internal error (IERR) No 8111 Processor 2 internal error (IERR) No 8120 Processor 1 thermal trip error No 8121 Processor 2 thermal trip error No 8130 Processor 1 disabled No 8131 Processor 2 disabled No 8140 Processor 1 failed FRB-3 timer No 8141 Processor 2 failed FRB-3 timer No 8150 Processor 1 fail
TABLE 2-10 Error Code 2.6.
TABLE 2-11 BMC-Generated POST Beep Codes Beep Code1 Description 2-1-2-3 Check ROM copyright notice. 2-2-3-1 Test for unexpected interrupts.
TABLE 2-13 Memory 3-Beep and LED POST Error Codes Beep Code Debug Port 80h Error Indicator Diagnostic LED Decoder (G = green, R = red, A = amber) MSB Meaning LSB 3 00h Off Off Off Off 3 01h Off Off Off G 3 02h Off Off G Off 3 03h Off Off G G First row memory test failure 3 04h Off G Off Off Mismatched DIMMs in a row 3 05h Off G Off G 3 06h Off G G Off 3 07h Off G G G 08h G Off Off Off 09h G Off Off G 0Ah G Off G Off 0Bh G Off G G
TABLE 2-14 BIOS Recovery Beep Codes Error Message Beep Code Port 80h LED Indicators Description 1 Recovery started Series of long lowpitched single beeps Recovery failed EEh Unable to process valid BIOS recovery images. BIOS already passed control to OS and flash utility. Two long high pitched beeps Recovery complete EFh BIOS recovery succeeded, ready for powerdown, reboot. 2.6.3 Start recovery process.
POST LEDs (as viewed from back of server) = upper nibble bits = lower nibble bits RED GREEN 1 0 0 1 OFF 0 0 high bits (on left) AMBER 1 1 1 1 POST Code = 95h upper nibble = 1001 = 9h lower nibble = 0101 = 5h low bits (on right) RED 1 0 GREEN 0 1 OFF 0 0 POST Code = CAh upper nibble = 1100 = Ch lower nibble = 1010 = Ah low bits (on right) high bits (on left) FIGURE 2-7 AMBER Examples of POST LED Coding During the POST process, each light sequence represents a specific Port-80 POST cod
TABLE 2-15 POST Code Boot Block POST Progress LED Code Table (Port 80h Codes) (Continued) Diagnostic LED Decoder (G = green, R = red, A = amber) Description 12h Off Off G R Get start of initialization code and check BIOS header. 13h Off Off G A Memory sizing. 14h Off G Off R Test base 512K of memory. Return to real mode. Execute any OEM patches and set up the stack. 15h Off G Off A Pass control to the uncompressed code in shadow RAM.
TABLE 2-16 POST Progress LED Code Table (Port 80h Codes) (Continued) POST Code Diagnostic LED Decoder (G = green, R = red, A = amber) Description 2Ah G Off A Off Go to Big Real mode. 2Ch G G R Off Decompress INT13 module. 2Eh G G A Off Keyboard controller test: the keyboard controller input buffer is free. Next, the BAT command will be issued to the keyboard controller. 30h Off Off R R Swap keyboard and mouse ports, if needed.
TABLE 2-16 POST Progress LED Code Table (Port 80h Codes) (Continued) POST Code Diagnostic LED Decoder (G = green, R = red, A = amber) Description 5Ah G R G R 8254 timer test on channel 2. 5Ch G A Off R Enable 8042. Enable timer and keyboard IRQs. Set video mode initialization before setting the video mode is complete. Configuring the monochrome mode and color mode settings next. 5Eh G A G R Initialize PCI devices and motherboard devices. Pass control to video BIOS.
TABLE 2-16 POST Code POST Progress LED Code Table (Port 80h Codes) (Continued) Diagnostic LED Decoder (G = green, R = red, A = amber) Description 84h R G Off Off Check stuck key enable keyboard: the keyboard controller interface test is complete. Writing the command byte and initializing the circular buffer next. 86h R G G Off Disable parity NMI: the command byte was written and global data initialization has completed. Checking for a locked key next.
TABLE 2-16 POST Progress LED Code Table (Port 80h Codes) (Continued) POST Code Diagnostic LED Decoder (G = green, R = red, A = amber) Description ACh A G R Off Prepare USB controllers for operating system. AEh A G A Off One beep to indicate end of POST. No beep if silent boot is enabled. 000h Off Off Off Off POST completed. Passing control to INT 19h boot loader next. 2.7 OS Operations 2.7.
At the CLI, enter “fsck ”. You will then be prompted whether repairs should be made if errors are found. Generally, the answer should be “y” for “yes”. The other potentially useful option is “n” for “no”. This will run a check against the volume without writing the repairs. As noted above, this can be used to make decisions about running the filesystem check. If errors are reported by the filesystem check, the filesystem check must be repeated until there are no errors.
6. Select option “1”, Edit Fields. The available options are as follows: Capture File—Where to save the capture file, in the format /volumename/directory/filename ■ Frame Size (B)—Size in bytes of each frame to capture. The default is normally used. ■ IP Packet Filter—”No” captures all traffic, “Yes” allow you to filter what is received. ■ ■ A filter allows you to select which IP address or addresses you will capture traffic from. You can also filter on a particular TCP or UDP port.
Listed next are Creation time, FS Creation time, and FS mtime. These are timestamps associated with the file and the filesystem, generally only useful for troubleshooting timestamp issues. Next is the Windows security descriptor. In its simplest form, it will read “No security descriptor”. This means that no Windows security is present, and that Windows will simulate security based on the above NFS permissions.
A list of all trusted domains, their related SIDs, and the local machine and local domain SIDs. 2.7.6 FTP Server To use the built in ftp server, you need to load the ftp daemon from the command line. The command is as follows: load ftpd This will allow you to ftp files to and from the Sun StorEdge 5310 NAS. 2.
Important – After the reboot, the system may take as long as five minutes to complete the software upgrade and return to service. There is no visual indication that this process is taking place. The StorEdge LCD displays “…booting…” during this process. If it is necessary to check the status of the upgrade, connect a display to the StorEdge. To update the operating system via web interface: 1. To use the Web Admin, connect with a Web browser to http://. 2.
10. When the update process is complete, click Yes to reboot, or No to continue without rebooting. The update does not take effect until the system is rebooted. To update the operating system via file copy: 1. Access the StorEdge via SMB or NFS. 2. Via SMB, access the share c$. You must be a member of the local Administrators group to access this share. Via NFS, mount to /cvol. By default, this is only possible from a trusted host. 3. In either case, copy the operating system image to the root of /cvol. 4.
2.11 CIFS/SMB/Domain Changes to Windows group membership do not take effect. Changes to user mapping do not take effect. Windows clients use a device called an access token to assign user data and group membership. This token is assigned when the client connects to the StorEdge. Any changes to this token are not implemented until the next time the user connects. To cause any changes to take effect immediately, ensure that the user closes all sessions with the StorEdge.
3. Press the spacebar until “Diagnostics” is displayed under “Extensions” at the lower right. 4. Select the letter corresponding to “Diagnostics.” “Please wait…” is then displayed in the upper left. After a short time, the system diagnostics is displayed. 5. Scroll through the diagnostics with the [spacebar] and [b] keys. 6. Under the heading “NETBIOS Cache” look for an entry with a <1D> tag. <1D> is a segment master browser. 7. Verify that this entry matches your domain name and IP subnet. 8.
8. If no applicable messages are found, repeat the attempt to join the domain, and check the log again. The system log is also available through the StorEdge Web Admin. To access it, log in, and navigate to: Notification and Monitoring/View System Log. You can scroll through the log, or save it as a file. To check the NetBIOS cache, proceed as follows: 1. Access the StorEdge via Telnet. 2. Press enter at the [menu] prompt and enter the administrator password. 3.
Using these two information sources, you can begin to diagnose the problem. The following are the most common possible problems along with their indicating symptoms. Wrong password / insufficient permissions: This is usually indicated by a logon failure or access denied message in the system log. The user account that is entered into the StorEdge Domain configuration screen must have the correct password, and must have the authority to create computer accounts.
Multiple subnets connected to StorEdge: Care must be taken when StorEdge is connected to multiple subnets, particularly when the subnets are disjoint, i.e. not connected to one another. A common example of this is a direct connection to a backup or database server. The problem created by the disjoint subnets is that StorEdge registers each of its IP addresses via NetBIOS broadcast and/or WINS.
6. Reboot the StorEdge. Note that the above changes do not take effect until after the reboot. This action removes the undesired entries in almost every case. The only case where the entries may persist is in a multiple server WINS environment using replication. In this case, consult the provider of the WINS server operating system for removal instructions.
Assuming that the difficulty connecting to the Domain Controller is temporary, and related to network load, it should not be necessary to save this variable with the savevars command. Doing so will limit the ability of StorEdge to find an alternate Domain Controller in the case that this one fails. Cannot connect or authenticate to Windows 2003 Domain Controller. By default Windows 2003 is configured to require signed digital communications from clients. This is also known as SMB packet signing.
5. Select the letter corresponding to “Domain Configuration”. 6. Use the [Enter] or [Tab] key to navigate to the User name field. 7. Enter a user name for the listed domain with the rights to add a computer account. 8. Press [Enter] to move to the “Password” field. 9. Enter the password for this user. 10. Select option “7”, Save Changes. If the attempt to join the domain is unsuccessful, proceed according to the instructions in the Troubleshooting Guide: “Cannot join Windows Domain”.
StorEdge has same files in 2 different shares. This is caused by creating multiple share names that point to the same directory or volume. Shares always point to a directory. Root level shares will always contain all files on the volume, regardless of how many shares are created to this volume. View shares as pointers, with the understanding that many of these pointers may exist to a single location. User maps are incorrect. User maps are not automatically created.
Another way to resolve this, for users with primary group assignments in the passwd file, is to use the “Map to Primary Group” policy. Can’t copy greater than 4G file from Windows to StorEdge. This problem may be seen on Windows 2000 and prior versions. If running Windows 2000, it can be fixed applying the latest service pack. If running an older version, there is no fix available, though you may be able to work around the problem with the Windows backup utility or a similar third party solution.
■ Primary Group: The group name or SID of the group owner. ■ Discretionary Access Control List (DACL):A list of users who have access to the file, by SID. A SID is a number that uniquely identifies a user or group. The data to the right of the final dash identifies the user within the domain; the rest of the number indicates domain and type of account information. This user information is known as the RID (relative ID). The RID is the number used for user mapping.
Listed next are Creation time, FS Creation time, and FS mtime. These are timestamps associated with the file and the filesystem, generally only useful for troubleshooting timestamp issues. Next is the Windows security descriptor. In its simplest form, it will read “No security descriptor”. This means that no Windows security is present, and that Windows will simulate security based on the above NFS permissions.
3. After setting any variables on the StorEdge, i.e. anytime the “set” command is used, the command savevars must be entered at the command line in order for the settings to persist though future server reboots. CIFS/SMB share created to /cvol is not visible or accessible. StorEdge does not allow the export of /cvol by default. The /cvol volume exists on compact flash memory which is very space limited and contains sensitive operating system files.
2. At the CLI, enter “set srvsvc.netshare.enable yes”. After setting any variables on the StorEdge, i.e. anytime the “set” command is used, the command "savevars" must be entered at the command line in order for the settings to persist though future server reboots. CIFS/SMB disconnects from MS SQL Server. CIFS/SMB disconnects from MS Access. Though we do not provide support for either of these environments, we have discovered a setting to improve operations.
Server: Ipaddr: 3. To force the StorEdge to a preferred domain controller, set the smb.pdc variable to the IP address of your preferred DC and (re)join the domain. From the command line interface type in set smb.pdc 192.168.200.136 (IP address of domain controller) savevars menu 4. Press the space bar until “SMB/CIFS Setup” option is displayed in the extension section in the lower right. 5. Select the letter of that option 6. Enter “1” to edit 7. Enter in domain information 8. Enter “7” to save 9.
Connection to SAMBA domain controller fails. Although support for Samba to act as a primary domain controller has recently been announced (http://www.samba.org), the current implementation has several problems, as outlined below. Use of StorEdge with a Samba domain controller is not recommended at this time. The Samba PDC implementation is an ASCII-only implementation, i.e. it does not support Unicode, which impacts foreign language support. All Windows domain controllers support Unicode.
137/tcp 137/udp 138/tcp 138/udp 139/tcp 139/udp NETBIOS Name Service NETBIOS Name Service NETBIOS Datagram Service NETBIOS Datagram Service NETBIOS Session Service NETBIOS Session Service Windows local groups cannot be added to Access Control List. Windows Local Groups cannot be used to assign security on remote systems. Local groups are not stored in the Domain SAM database. They exist in the database of individual computers, for use on that computer only.
4. First, select option “N”, “No”. This will allow the database to be checked readonly. 5. If errors are reported, run dbck again as above, and select “Y”, “Yes” to perform the repairs. Can’t create new DTQ A DTQ cannot be defined for existing regular directories. A DTQ must be created using StorEdge administration interfaces (telnet, GUI or command-line), which automatically creates a new directory as part of the DTQ setup. The maximum number of DTQs per volume is 255.
2.12 NFS Issues NFS root user doesn't have appropriate access. StorEdge implements a feature known as “root squash”. When a user connects as root (UID 0) from an NFS client, StorEdge causes the UID to be mapped to UID 60001, the “nobody” account. In order for an NFS client to have root access to StorEdge, you must create a trusted hosts entry, or explicitly define root access for a particular export.
Modifying the system policy is also done at the CLI. Access the CLI as above, and enter “set acl.overwrite.allowed”. After setting any variables on the StorEdge, i.e. anytime the “set” command is used, the command "savevars" must be entered at the command line in order for the settings to persist though future server reboots. This particular variable setting will not take effect until the next reboot of the StorEdge. Trusted host does not have root access.
4. Finally, type “approve update”. This causes the new file to become active. Now, you should be able to mount StorEdge from a trusted host via NFS. Local NIS files are no longer updating. The first step is to check the system log. This will tell us if there is a problem connecting to the NIS server. 1. To do so, access the StorEdge via Telnet. 2. Type “admin” at the [menu] prompt and enter the administrator password. 3. At the CLI, enter “menu”. 4. Select option “2”, Show Log.
Windows created files are root owned when viewed via NFS. (In Workgroup mode) Workgroup mode assigns ownership per share, based on the UID and GID settings configured when the share was defined. By default, this is set to UID and GID 0, leaving the files root owned. The best way to manage ownership of files in Workgroup mode is to have each user access StorEdge via a unique share, and define the UID/GID settings accordingly. International NFS filenames are garbled or cannot be read from Windows.
3. Type the administrator password to access the administration interface. 4. Navigate to System Operations/Assign Language. Select the desired language, and click the “Apply” button. GID for new NFS objects is incorrect. StorEdge doesn’t recognize the set GID bit. The StorEdge software supports three ways of setting the group ID of new files and directories. The default is to inherit GID from the parent directory in all cases. This behavior is configurable only at the Command Line Interface (CLI). 1.
2.13 Network Issues When is it necessary to add a TCP/IP route? By default, StorEdge creates a route for each connected subnet. StorEdge also allows for the configuration of a default gateway. The local routes are used to send packets to the attached subnets, and packets to all other IP addresses are sent via the default gateway. This configuration works for the vast majority of networks.
8. Select option 7, “Save Changes” 9. Press [Esc] to return to the menu, or proceed as above to define another route. 2.13.1 NIC speed and duplex negotiation issues. StorEdge is reporting Ethernet transmit and receive errors on a switched network. By default, the StorEdge Ethernet driver is set to auto-negotiate speed and duplex. This works well the great majority of the time, but occasionally there is a problem in the negotiation between the switch and NIC.
2. To view the current negotiated rate: Enter "em show all". The name of each NIC and current speed, duplex and link status is displayed. 3. To force a particular rate (and disable auto-negotiation): enter “em set duplex= speed=”. Replace with the name of the NIC found in the above show command, with either “full” or “half”, and
1. To access the StorEdge CLI, connect to the StorEdge via Telnet, and type “admin” at the [menu] prompt and enter the administrator password. 2. To disable RIP, enter “set routed.active no” at the CLI. Press the [Enter] key. After setting any variables on the StorEdge, i.e. anytime the “set” command is used, the command "savevars" must be entered at the command line in order for the settings to persist though future server reboots. StorEdge is only reachable from systems on the local subnet.
This unnecessarily limits network bandwidth. The easiest solution is to link the cards at a lower level via port aggregation. 2.14 File System Issues Note – A full backup should be done before performing the following procedures. File system inaccessible (mount failure) Under certain circumstances, volumes may fail to mount. This will typically manifest as an “access denied” message returned to users attempting to access the data on the affected volume.
Before making this diagnosis, it is very important to ensure that you are checking the bootlog containing the first unsuccessful mount attempt. Regardless of what the original problem is, the second attempt and all subsequent unsuccessful attempts to mount the volume will always log the “previous mount did not complete” message. If you are 100% certain that the problem is an interrupted mount attempt, you should be able to correct the problem by entering “mount –f ” at the CLI.
RAID configuration changed. If syslogd logging was enabled, these results should be included as well. Find out whether remote access is available to the site, and if appropriate, provide the necessary details to accomplish this. After reviewing the case, engineering may make specific recommendations and modifications, or they may recommend that you proceed with the filesystem repair.
After reviewing the case, engineering may make specific recommendations and modifications, or they may recommend that you proceed with the filesystem repair. For instructions on how to complete a filesystem repair, see “Filesystem check procedure” under Diagnostic Procedures at the end of this document.
checkpoints. It may be necessary to disable the “Use checkpoints for backup” option in the StorEdge Volume Configuration screen in order to perform the prerequisite backup. 2.15 Drive Failure Messages Note – Check the WebAdmin and system log to ensure the drive rebuild is completed before performing the following procedures. The light on one of the hard drives is red. Check to ensure the drive rebuild has completed in the syslog. This is an indication of a failed drive.
A drive has failed how do I replace it? Failed drives are usually evident by the following: Red light is on the drive. Log will display “Failed drive at slot #”. Diagnostic email will list drive as failed. LUN will be reported as degraded in the log and in the diagnostic email. Controller alarm may be beeping. Some or all of these symptoms may exist. The process is the same regardless of how the failed drive is reporting. This functionality is only available through the StorEdge Web Admin. 1.
Are there any messages in the log other than “LUN Critical”? (Drive failed etc.) Controller alarm may be beeping. Follow instructions on drive replacement. Once the drive has been replaced, system will rebuild the drive and the LUN will go from critical to online. Alarm will continue to beep until rebuild is complete. The alarm can be silenced from the RAID page of the GUI or from the menu. 2.16 File and Volume Operations StorEdge doesn’t allow creation of volumes larger than 256 GB.
After deleting files, volume free space remains the same. The most likely cause of this is the checkpoint feature. Checkpoints store deleted and changed data for a defined period of time so that customers can retrieve deleted files and prior versions for data security. This means that the data is not removed from disk until the checkpoint is expired, a maximum of two weeks. If you are deleting data to free disk space, it will be necessary to remove or disable checkpoints. Can’t delete a file.
Log message “mbtowc[0xXX]: invalid first byte”, or “: invalid sequence”. This message is generated when StorEdge receives a filename or network name with a character that is unreadable. ASCII (plain text) characters are expressed with a byte value of 0x7F or below. Values above this range are expected to be Unicode encoded, per the UTF-8 specification. The encoding requires multiple bytes per character. In order for the character to be valid, the first byte must be in the range 0xC0 through 0xFD.
Verify Java client version. Version 1.3.1.1 or newer of the Java client is required. If no Java Client is installed on the client connecting to the StorEdge, you will normally be prompted for installation. If not, the client can be found at http://www.java.com. If you receive error messages related to the Java Client (or JRE), uninstall and reinstall the client. If problems continue, one of the most helpful troubleshooting steps is to try another workstation or two.
Web GUI session aborted while performing administration. For integrity reasons, only one user is permitted in the StorEdge Web Admin at a given time. A second connection to the Web Admin will terminate the first. This implementation is necessary to allow recovery from client hangs. How do I reset the StorEdge administrator password? The management administrator password can only be reset via a direct connection to the StorEdge.
PROMPT> rpcinfo -p programversprotoport 1000002tcp111portmapper 1000002udp111portmapper ... 8058985771udp693webadmin In the above example, UDP port 693 would have to be opened. The port is always in the range 600 to 1023 but may vary based on system parameters. The keyboard arrows keys do not respond properly when using telnet to the StorEdge The arrow keys generally do not work within StorEdge Telnet menus, and often cause an immediate exit from the current screen.
6. Navigate to System Operations/Shut down the server. 7. Select the “Reboot previous version” radio button. 8. Click Apply. After software upgrade and reboot, system appears to hang, LCD displays “…booting”. The software upgrade process may take as long as five minutes. No indication of progress is available. This is standard, the system will boot normally in a few minutes. If the process takes longer than five minutes, connect a VGA display to StorEdge to check status. Software Upgrade is not working.
Once a StorEdge volume reaches 90% disk space utilization, StorEdge will cease to create scheduled checkpoints. Once the volume reaches 95% disk space utilization, StorEdge will delete checkpoints, beginning with the oldest. Another possible reason for checkpoint creation failure is that the checkpoint limit for a particular volume has been reached.
This means that the email could not be sent. It is always accompanied by a more specific error. Error in tcp_open for servername This means that StorEdge could not open a TCP connection to the configured mail server. Possible reasons are name resolution, incorrect IP, IP unreachable due to network problem. To correct this issue, enter the mail server by IP address, and make sure the IP address is correct and reachable.
If AC power has been restored for 30 minutes, and the UPS is not charged completely, check the UPS battery. Log message: Low Battery. After StorEdge receives the Low Battery notification from the UPS, this message is logged and the shutdown process is initiated in order to protect customer data. Log message: Controller write-back cache is enabled. System AC power and UPS have returned to a reliable state. Write-back cache is enabled. Log Message: Blower Fan has failed.
And then the following message. 07/08/04 11:00:38 I sysmon[51]: Ctlr0: write-back cache disabled If the power is returned to the UPS, you will see the following message: 07/08/04 11:01:09 I sysmon[51]: UPS: AC power restored Front LCD panel message: “P21 Power 1 Failed”. This is an indication that one of the two power supplies has failed. Verify that one or more of the following indicators are present: The system status light on the front of the StorEdge is flashing green.
What do the Status LED Indicators on front panel indicate? LED Status indicators at the front panel signal current activities taking place in the system.. TABLE 2-17 Status LED Indicators Power LED A continuous green LED indicates the system is powered on. No light indicates the system is off. Built-in NIC 1 LED A green LED indicates network activity via the built in NIC port 1. Built-in NIC 2 LED A green LED indicates network activity via the built in NIC port 2.
2.20 Backup Issues Tape library not recognized. Make sure the tape drive is on the list of supported tape units. SCSI ID of tape library should be higher than the tape drive. Set ID of library to 0, ID of tape drive to 5. Does the SCSI card recognize the drive on system boot up? See if card can talk to tape drive. On boot get into SCSI card BIOS. Run scan utility. If no device is found check cables, termination. Try another tape drive. Network backup fails due to .attic$ directory. The .
NDMP backup fails: access denied message. NDMP software must authenticate to the StorEdge in order to backup files and directories. Each NDMP software solution has a place to configure a username and password for a device. For StorEdge, the username is “administrator”, and should be accompanied by the console password. NDMP: Can’t browse backup history. Certain NDMP backup utilities are not able to browse backup history without a configuration change on the StorEdge.
2.21 Direct Attached Tape Libraries Tape library not recognized. Check the following settings: ■ ■ ■ ■ ■ ■ ■ Make sure the tape drive is on the list of supported tape units. Set SCSI ID of tape library to 0, ID of tape drive to 5. Does the SCSI card recognize the drive on system boot up? See if card can talk to tape drive. On boot get into SCSI card bios. Run scan utility. If no device is found check cables, termination. Try another tape drive. Network backup fails due to .attic$ directory. The .
Local backup or restore fails with ?PNReduce error? in log. This message indicates that StorEdge could not read the pathname (PN) provided for backup or restore. The local backup and restore utilities require a full path, they are case sensitive, and they do not allow the use of wildcards (?*? or ???). Specifying a directory causes it to be backed up in its entirety. NDMP: Can?t browse backup history.
2.22 Frequently Asked Questions This section addresses frequently asked questions for the Sun StorEdge 5310 NAS. The section contains these topics: ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ 2.
5. Select the letter corresponding to “Local Groups”. 6. This takes you to a menu where you will see a list of all currently configured groups. By default, in Domain mode, the “Administrators”, “Power Users” and “Backup Operators” local groups exist. To add a group, press “8”, Add a Group, from this screen. To edit group settings, or to delete a group, press the letter to the left of the group name.
Note – Workgroup mode refers not only to the lack of domain membership, but the use of share-level security. For more information on this topic, refer to the Sun StorEdge 5310 NAS Software Installation, Configuration, and User Guide. How do I share files with SMB users? How do I create SMB shares? To share files via SMB, shares must be created. A share allows access to a particular location in the directory tree. To access this functionality, access the StorEdge via Telnet or serial console. 1.
■ ■ Umask—In Workgroup mode, these NFS permission bits will be cleared. (when creating new files.) Workgroup mode settings are ignored when Windows Domain Security is enabled. Workgroup mode on the Sun StorEdge 5310 NAS also implies use of what Microsoft calls “Share-level Security.” This functionality is also available through the StorEdge Web Admin. 1. To use the Web Admin, connect with a Web browser to http://. 2.
What does the umask setting do? The umask setting, allows the permissions of new files and directories to be specified on a per-share basis, which is consistent with the per-share UID and GID specification. A umask is a file creation mask. It defines the permission bits to turn off when creating a file. Bits that are set in the umask are cleared in the mode of a newly created file.
4. Navigate to Windows Configuration/Manage SMB CIFS Mapping/Configure User Mapping. There you will see radio buttons for each of three user mapping options and each of three group mapping options. The user mapping options are as follows: ■ No mapping: This is the default setting. When a new user connects, a new UID is generated by StorEdge. This UID will be one larger than the largest current UID found on the StorEdge. Any desired mapping of SMB users to NFS users must be done manually.
■ MAP_FULLNAME—Map by Full Name Valid options for smb.map.groups: ■ MAP_NONE—No Mapping ■ MAP_GROUPNAME—Map by Group Name ■ MAP_UNIXGID—Map to Primary Group Example: set smb.map.users MAP_USERNAME will define the mapping rule for users to Map by User Name. Note – All variable names and values are case sensitive. After setting any variables on the StorEdge, i.e.
on the Windows Domain Controllers. Note that changing a user’s RID in the StorEdge administration interface is not possible. Modifying the value collected from the Domain Controller will simply invalidate the mapping. On the right side of the screen, you will see the NFS username, which may or may not have been automatically generated based on the defined mapping rule. Option “7” refreshes the list of mappings, adding any new users.
How do I set up the SMB Autohome directory feature? Autohome shares are temporary shares that are created when a user logs on to the system and removed when the user logs off. The autohome path defines the base directory path for the shares. For example, if a user's home directory is /usr/home/john, then the autohome path should be set to /usr/home. The temporary share will be named john. It is assumed that the user's home directory name is the same as the user's logon name.
2. Press [Enter] at the [menu] prompt and enter the administrator password. 3. Press the spacebar until “CIFS/SMB Configuration” is displayed under “Extensions” at the lower right. 4. Select the letter corresponding to “CIFS/SMB” Configuration”. 5. Select the letter corresponding to “Domain Configuration”. Therein, you will see a list of options as follows: ■ Domain—Name of Windows domain. ■ Scope—SMB scope, this is typically left blank. ■ Description—This is displayed in the network browse list.
ADS relies on the Internet Domain Name System (DNS) to provide name resolution services. The DNS provided with ADS supports the ability for clients to dynamically update their entries in the DNS database; this is known as dynamic DNS. To configure ADS, proceed as follows: 1. Access the StorEdge via Telnet or serial console. 2. Press [Enter] at the [menu] prompt and enter the administrator password. 3. Press the spacebar until “ADS setup” is displayed under “Extensions” at the lower right. 4.
After you have successfully configured these settings, you will be able to publish shares to ADS using the SMB/CIFS shares menu. Please refer to the FAQ “How do I create SMB shares?” for details on this procedure. This functionality is also available through the StorEdge Web Admin. This functionality is also available through the StorEdge Web Admin. 1. To use the Web Admin, connect with a Web browser to http://. 2.
Does StorEdge support DFS? DFS (distributed file system) is a hierarchical file system that allows files to be stored across multiple servers and managed as a single group. StorEdge can serve as a DFS target. This means that DFS referrals can redirect clients to StorEdge, but StorEdge does not provide referrals and cannot be configured as a root replica.
■ Broadcast—Enables or disables broadcast search for NIS servers. ■ Server—IP address of NIS server. ■ ■ ■ Files Hosts Users Groups Netgroups—Select which files should be imported from NIS with “Y”. Check Rate minutes—How often to check the NIS server for changes. Network Information Services (NIS+) ■ Enable—Enables or disable NIS+. ■ NIS+ Domain—Defines the NIS+ domain. ■ Broadcast—Enables or disables broadcast search for NIS+ servers. ■ Home Domain Server—IP address of primary NIS+ server.
1. To use the Web Admin, connect with a Web browser to http://. 2. Click “Grant” or “Yes” to accept any Java software authorization windows and you will reach the login screen. 3. Type the administrator password to access the administration interface. 4. Navigate to Unix Configuration/Configure Name Service. Therein you will find tabs for users, hosts, hostgrps and netgroups. 5.
Any type of switch may be used for High Availability port bonding. Each NIC can be connected to a separate switch, and the switch hardware need not be similar. The only requirement is that all switches used for the HA bond are connected to the same subnet. How do I set up port aggregation? 1. Access the StorEdge via Telnet or serial console. 2. Press [Enter] at the [menu] prompt and enter the administrator password. 3. Select option “A”, “Host Name & Network” under the configuration section. 4.
What is IP aliasing? IP Aliasing is a networking feature that allows you to assign multiple IP addresses to a single NIC port. This is useful when StorEdge is replacing multiple servers. All of the IP aliases for the selected NIC port must be on the same physical network and share the same netmask and broadcast address as the first, or primary IP address specified for the selected adapter. Up to nine alias IP addresses can be added to the primary IP address of each NIC port.
How do I configure Jumbo Frames support? Currently this is not supported by the StorEdge software. Can I set more than one default gateway? No. The default gateway is the gateway used when a TCP/IP client needs to send data to a network to which it does not have a specific route. After checking the destination network against the routing table and finding no match, the data is sent to the default gateway. There is no provision for TCP/IP to choose between default gateways.
The primary interface for quota administration is the StorEdge Web Admin. To use the Web Admin, connect with a Web browser to http://. Click “Grant” or “Yes” to accept any Java software authorization windows and you will reach the login screen. Type the administrator password to access the administration interface. First, navigate to File Volume Operations/Edit Properties. There you will find a checkbox to enable quotas.
Hard Limits/Soft Limits: Same as KB Limits above, but applies to the number of files rather than their size. To define a quota, locate the desired user on the list, and double click the user entry. This will pop up a window which will allow you to define hard and soft KB limits, as well as hard and soft file limits. You can choose Default, No Limit, or Custom via radio buttons. Default applies the quotas defined for the default user, if any.
repquota /volumename: This lists all quotas for a given volume. The output is formatted to ten columns, as follows: Column 1: UID/GID or user group name. Column 2: By default, this appears to be a two line dash. However, if the listed user has exceeded either soft or hard quota, the respective dash turns to a "+" Column 3: Current disk usage in blocks for this user. Column 4: Current soft block quota for this user. Column 5: Current hard block quota for this user.
Note – The DTQ creation interface only allows the creation of a DTQ on a new directory. The primary interface for configuring DTQs is the StorEdge Web Admin. 1. To use the Web Admin, connect with a Web browser to http://. 2. Click “Grant” or “Yes” to accept any Java software authorization windows and you will reach the login screen. 3. Type the administrator password to access the administration interface. 4.
■ dtq rename volume=volume-name from=dtq-name to=dtq-name This changes the name of the DTQ. Note that this does not change the name of the directory. ■ dtq set volume=volume-name name=dtq-name [flimit=N] [slimit= MB] This modifies file or size limits for an existing DTQ. ■ dtq status [volume=volume-name] [name=dtq-name] [file= file_path] This shows detail on existing DTQ. “dtq status ” will return status of all DTQs on the volume.
Important – This will delete all previously defined quotas. If quotas are enabled in the future, all quotas must be redefined. Note – Deleting directories also deletes the DTQ set for that directory. This functionality is also available from the StorEdge CLI. 1. To access the StorEdge CLI, connect to the StorEdge via Telnet or serial console. 2. Type admin at the [menu] prompt and enter the administrator password. 3.
the checkpoints. A large amount of space and system memory is required for checkpoints. The more checkpoints there are on a system, the more it will affect the system’s performance. Checkpoints are generated from changed blocks. If system usage consists of large amount of file changes then the system will base the next checkpoint off of the changed blocks.
Checkpoints can be set up automatically or manually. If an automatic schedule is selected then day and time information must be configured for these checkpoints to occur. Checkpoints have a negative effect on system performance. This effect increases as checkpoints are added. Use them judiciously. To create automatic checkpoints, select “Yes” to “Automatic” and fill in each of the fields listed below: ■ Description – Enter a short description of the checkpoint. This is a mandatory field.
How do I manage checkpoints from the command line? To view all checkpoints on a volume: chkpntls VOLUME_NAME A list of all checkpoints on volume is displayed. To create a single checkpoint, enter the following command from the CLI: chkpntmk VOLUME-NAME CHECKPOINT-NAME Volume-name is the volume that will be checkpointed. Checkpoint_name is name of checkpoint that is to be created.
5. Enter the number of the volume that contains the checkpoints no longer required. 6. Select option “6”, “Checkpoints” to configure checkpoints on the selected volume 7. Select option “1”, “Edit fields”. 8. Select option “N”, “No” to disable checkpoints on the selected volume 9. Select option “7”, “Save Changes”. All existing checkpoints will be deleted and the volume will no longer create new checkpoints.
2.28 Volume Creation and Expansion How do I create a volume? The first step is to scan for new disks. The Scan for New Disks option on the Create File Volume panel allows you to scan for new disks that may have been recently added to the system. To scan for new disks: 1. Access the StorEdge via Telnet or serial console. 2. Press [Enter] at the [menu] prompt and enter the administrator password. 3. Select option “D”, “Disks and Volumes”. 4. Select option “9”, “scan for new disk”.
To access these settings, log in, and navigate to File Volume Operations/Create File Volumes. All of the options described above are available. How do I extend the size of an existing volume? A primary file volume is limited to 256GB; however, its size can be extended by attaching segments to it. Up to 63 segments can be attached to a single primary file volume. The first step is to create the segment: 1. Access the StorEdge via Telnet or serial console. 2.
You are about to add this new extension segment to this file volume. Doing so will increase the free space of the volume. This will only take a moment during which the file volume will remain in operation. ONCE THE EXTENSION IS ATTACHED TO THE FILE VOLUME, IT CANNOT BE DETACHED. THIS IS AN IRREVERSIBLE OPERATION. BE SURE! 17. Select option “1”, “Edit choice” to return to the previous screen, or option “7”, “Proceed” to continue with the attachment.
3. Select option “D”, “Disks and Volumes”. 4. Enter number of volume that is to be deleted. 5. Select option “8”, “Delete”. 6. As a sanity check, the system will prompt for the volume name. This is case sensitive and it must be typed in exactly as it was entered when the volume was created. 7. Select option “7”, “Proceed with delete”. 8. When complete, press the [Esc] key to return to the menu. This functionality is also available through the StorEdge Web Admin.
What is the function of the /cvol volume? The /cvol is a DOS file system volume that is located on the flash memory which the StorEdge boots to. It contains the operating system, the most recent previous version of the operating system, a base version of the operating system, and configuration and log information. It is strongly recommended that you do not write data to /cvol or modify data on /cvol.
Note – This screen provides access to the host group @general, which by default includes everyone who can reach the StorEdge.
How do I authorize an entire subnet as trusted hosts? This can only be done by directly editing the configuration file, /dvol/etc/hostgrps. 1. Access this file via NFS or SMB, and open it with an editor. 2. Edit the line which begins with the word “trusted”. Entries in hostgrps are plain text, separated by spaces. To allow trusted access to the entire class B subnet 192.168.0.0, you would add the entry “192.168.*” The finished hostgrps file should look something like this: general * trusted host1 host2 192.
Selecting an export and clicking the remove button will remove the export. Selecting an export and clicking the edit button, or double clicking an existing export will bring up the edit screen. Only the “access” field may be changed in this screen. Other changes must be made by deleting the export and recreating it, or by editing the configuration files manually. Also, please note that it is not currently possible to change order of exports in the Web Admin.
Security is the security setting for this export. Here’s an example: files / @trusted access=rw uid0=0 This entry gives access to all files ( / ), to the hostgrps entry trusted. (The “@” symbol defines it as a hostgrps entry) with read/write (rw) security. The last entry “uid0=0” is a special entry which disables root squash for an export. Root squash is the default, and it causes all mounts done as UID 0, to be done as UID 60001, the “nobody” account.
8. Select option “7”, “Save Changes”. This functionality is also available through the StorEdge Web Admin. 1. To use the Web Admin, connect with a Web browser to http://. 2. Click “Grant” or “Yes” to accept any Java software authorization windows and you will reach the login screen. 3. Type the administrator password to access the administration interface. 4. Navigate to System Operations/Set Administrator Password. In this menu, there is no enable/disable field.
2. Select option “1”, “Edit path”. 3. Enter a valid path name in the path box. Format is ///. 4. Select option “2”, “Save diagnostics file” 5. StorEdge will respond “Diagnostic saved”. 6. Access the volume that you saved the file to via SMB or NFS. 7. Copy the file to a local workstation. This functionality is also available through the StorEdge Web Admin. 1. To use the Web Admin, connect with a Web browser to http://. 2.
8. Select option “7”, “Save Changes”. This will return you to the menu. 9. Access the StorEdge via Telnet or serial console. 10. Press [Enter] at the [menu] prompt and enter the administrator password. 11. Press the space bar until the “Email Configuration” option is displayed under “Extensions” at the lower right. 12. Select the letter corresponding to “Email Configuration”. 13. Select option “1”, “Edit fields”. 14. Press [Tab] or [Enter] to navigate through the fields. 15.
2. Press [Enter] at the [menu] prompt and enter the administrator password. 3. Select option “H”, “DNS & SYSLOGD” in the configuration Section to set up remote logging 4. Select option “1”, “Edit fields”. 5. Use [Tab] or [Enter] to navigate through the fields. 6. Select option “Y”, “Yes”, to enable SYSLOGD. 7. Enter the IP address of the SYSLOGD server that will receive the StorEdge system log (if applicable). 8. Select the appropriate Facility.
1. To use the Web Admin, connect with a Web browser to http://. 2. Click “Grant” or “Yes” to accept any Java software authorization windows and you will reach the login screen. 3. Type the administrator password to access the administration interface. 4. Navigate to Monitoring and Notification/View System Events/Set Up Logging. All of the options described above are available.
■ Max Polling Interval - Maximum polling rate for NTP messages Note – Above two fields are in seconds, raised to the power of two. For example, an entry of 4 sets the interval to 16 seconds. Valid range is 4 to 17. ■ Broadcast Client Enabled - Allows StorEdge to respond to NTP broadcasts. ■ Require Server authentication - Allows NTP communication only with authentication. To set up RDATE, proceed as follows: 1. Press [Esc] to return to the menu. 2.
3. At the CLI, enter “load netm”. Then type “menu” to configure capture and capture packets. 4. Press the spacebar until “Packet Capture” is displayed under “Extensions” at the lower right. 5. Select the letter corresponding to “Packet Capture”. 6. Select option “1”, Edit Fields. The available options are as follows: ■ Capture File - Where to save the capture file. ■ Frame Size (B) - Size in bytes of each frame to capture. The default is normally used.
The command line can be edited using the following key bindings. There is no overwrite mode. Characters will always be inserted at the cursor position. [Ctrl] + t can be used to display the key bindings at any time. The current command line will be redisplayed following the key list. From the command line: The StorEdge supports standard keyboard functionality for command history. The cursor movement keys can be used to select the following: ■ Previous commands (using up arrow).
■ To repeat the last command, enter [Ctrl] + p. ■ To back up and edit the current line, use [Ctrl] + b. ■ To delete the character under the cursor, enter [Ctrl] + d. How do I delete files from the StorEdge administration utilities? The operating system has some CLI commands available to perform advanced system administration. Caution must be exercised, as these commands can change data paths and structures. Under certain circumstances, a mistyped command can result in downtime or data loss.
Remove the directory '/vol1/dir1' if it is empty. SE5310 > rm /vol1/dir1 Remove the file hierarchy rooted at '/vol1/dir1' displaying each file as it is removed. SE5310 > rm -r -v /vol1/dir1 This removes all files and the directory. Note – All paths must be absolute paths from the root directory. To delete directories from the CLI using rmdir The rmdir utility removes the directory entry specified by each directory argument, provided it is empty. Arguments are processed in the order given.
Currently, the rdel utility cannot be unloaded from memory, and therefore will not be removed from memory until the next reboot. How do I set up an FTP server on StorEdge? StorEdge has a built-in FTP server. Before using it, you must load it via the CLI (command line interface). 1. To access the StorEdge CLI, connect to the StorEdge via Telnet or serial console. 2. Enter “admin” at the [menu] prompt and enter the administrator password. 3. At the CLI, enter “load ftpd” to initialize the FTP service. 4.
ftpd 2. Next, this file must be copied to the StorEdge /dvol/etc directory. Access this directory via NFS or SMB and copy the file. On future system reboots, the inetload service will read and act on the file automatically at boot time. How do I enable server-to-server FTP copying on the StorEdge? The FXP protocol for FTP allows for server-to-server file transfers. The environment variables “ftp.fxp.” must be set to yes to enable FXP for the appropriate user class.
Users allowed explicit access by one of the following environment variables will not be prompted for a password. Variables are set at the StorEdge CLI (command line interface). 1. To access the StorEdge CLI, connect to the StorEdge via Telnet or serial console. 2. Type “admin” at the [menu] prompt, and enter the administrator password. The syntax is as follows: set rshd.allow.. yes The parameter is optional. If it is not used, it allows rsh execution of the specified command for all users.
Help is available for each of these commands at the prompt by typing “help ”. Additionally, man pages are available for mkdir, rmdir, cp and rm. Access these by entering “man ”. How do I enable or disable ftp, tftp, rlogin, rsh, telnet, ssh, smb or the Web Admin? By default, all of these services are enabled, with the exception of tftp. Enabling or disabling of these services is done via the netserv command at the CLI (command line interface). 1.
3. At the CLI, enter “show file.hosts”. This will return the location of the active hosts file. You can safely assume that the active passwd and group file are located in the same directory. 4. Next, run the following commands from the CLI: “cleari //etc/hosts”, “cleari //etc/passwd”, and “cleari//etc/group”. Press the [Enter] key after each of these. 5. Next, copy the updated version of these files to the /etc directory located above via NFS or SMB.
Define Directory Tree Quotas: It is best to define DTQs before migrating data to the StorEdge. There are difficulties associated with setting DTQs on existing data which can be completely avoided by planning ahead. Set an administrator password: Administrator access allows many powerful options, including deletion of volumes and override of security settings. Define a secure password to protect your data.
5. Enter the number corresponding to the volume that requires checkpoints. 6. Select option “6”, “Checkpoints”. 7. Select option “1”, “Edit fields”. 8. Use [Tab] or [Enter] to navigate through fields. 9. Select option “Y”, “Yes” to enable checkpoints on the selected volume. 10. Select option “Y”, “Yes” to use the checkpoint for backups. Backing up from a checkpoint avoids all issues related to backing up open files. 11. Select option “7”, “Save Changes”. This will take you to the main menu.
How do I move files from one disk to another with NDMP? Currently this function is not supported in the StorEdge software. What tape libraries and drives have been tested for compatibility with the StorEdge? NEED TO ADD THIS INFORmation 2.33 Macintosh Connectivity How do I share files with Mac Users? StorEdge does not support AppleTalk networking. Macintosh clients require an SMB or NFS client in order to connect to the StorEdge. Apple OS-X has built-in NFS client software.
2.34 Miscellaneous Log Messages Why can’t I see system log information prior to most recent reboot? By default, the system log is stored only in memory. Therefore, it is lost upon reboot. StorEdge offers the option to save syslog data locally, or to send it to a syslogd server. For additional information on this topic, please refer to the FAQ “How do I set up local or remote logging?” System log message: “nfsd: error 'prog unavail' x.x.x.
2.35 Direct Attached Tape Libraries The Sun StorEdge 5310 NAS supports specific SUN branded Tape Libraries. For the updated list refer to the WWWW for the Sun StorEdge 5310 NAS. The following Sun Tape Libraries and Tape drives are supported: TABLE 2-18 Supported Tape Libraries and Tape Drives Tape Libraries Tape Drives L8 Ultrium LTO1 L25 Ultrium LTO2 L100 SDLT 320 L180 2.35.
2.36 StorEdge File Replicator This section provides the following information: ■ ■ ■ How does File Replicator work? The applications for File Replicator How do I set up File Replicator? How Does File Replicator work? Replicating allows you to duplicate any or all of the file volumes of one StorEdge server onto another StorEdge server. The source server is referred to as the active server and the target server is referred to as the mirror server.
Backup A File Replicator target volume may be dedicated for backing up source volumes. File Replicator enhances operations by moving backup I/O to the remote volume. This shadow processing capability reduces CPU load on the production StorEdge, streamlining operations. Data Distribution For businesses with remote locations, File Replicator simplifies data distribution.
4. Select option "1", "Edit fields." 5. Navigate through the fields with [Tab] or [Enter] until the "Role" field of the NIC that will be used for mirror is highlighted. 6. Select option "4", "Mirror" to change the role to mirror. To create Host File (both systems), do the following: 1. From the main menu, enter "F", "Hosts". 2. Create a new host entry for the mirror interface selected above. For each system, choose a name similar to the hostname, such as, "host-M.
9. Enter the desired size of the Mirror Buffer. The mirror buffer stores file system write transactions while they are being transferred to the mirror server. The size of the mirror buffer depends on a variety of factors, but must be at least 100 MB. You may want to create a mirror buffer that is approximately 10% of the size of the file volume you are replicating. The size you choose should depend on how much information is being written to the file volume rather than the size of the file volume.
Mirror Promoted on host Once a volume has been promoted, the mirror cannot continue. Once a volume has been promoted it can not be a mirror again; however it can function as a master. Waiting on host, link is down This is typically a connection problem between the two system. Check the cables and the network connectivity. The master system will continue to send changed data to the buffer. Once the buffer is filled the mirror will crack.
From the CLI, enter "fsctl frags ." A screen similar to the following is displayed: /vol1: Analyzing volume. Press ESC to abort ... EXT 0 FREE PAGES 1244306 RANGEFRAGSLOW/HIGHPAGESSIZE(MB) 1-851/7190 9-3100/000 32-10242039/6312204 1024216399/1226668 12430674855 ========================================================= TOTAL2712443064860 The key data on this page are the third and fourth entries in the SIZE column.
CHAPTER 3 Storage Arrays This chapter instructs you on how to solve specific Storage Array problems with the Sun StorEdge 5310 NAS. It contains the following sections: ■ ■ ■ ■ ■ “Array Overview” on page 3-1 “Using the Array” on page 3-8 “Troubleshooting and Recovery” on page 3-22 “Relocating a Command Module” on page 3-31 “Raid Storage Manager (RSM)” on page 3-44 3.1 Fibre Channel FC 3.1.1 Array Overview This chapter describes the array command module its components. 3.1.1.
■ ■ Front bezel - Molded frame containing drive and global indicator lights and a mute button for the optional audible alarm feature Drives - Fourteen removable disk drives The back of the command module contains the following components: ■ ■ ■ 3.1.1.2 Fans - Two removable fan housings, containing two fans each Controller - Two removable controllers Power supplies - Two removable power supplies Controllers The command module supports two controllers.
The Battery Charging/Charged light flashes during the startup self-test and when the battery is charging. It turns on and does not flash when the battery is fully charged, and turns off if the battery fails. Figure 3-2 shows the controller labels. Each controller has a media access control (MAC) address label, located on the top or the front of the controller, and a battery label, located on top of the controller, which lists the battery installation and expiration dates. FIGURE 3-2 3.1.1.
FIGURE 3-3 3.1.1.5 Battery Charging/Charged and Cache Active Lights Drives The command module supports up to 14 removable fibre Channel drives internally, plus up to seven expansion drive modules containing 14 drives each for a total of 112 drives. Figure 3-4 shows the command module drive and its lights. Note that the drives in your command module may differ slightly in appearance from those shown. The variation will not affect the function of the drives.
FIGURE 3-5 3.1.1.6 Drive Numbering – Rackmount Module Fans Each module has two removable fan housings. Each fan housing contains two fans. The fans provide redundant cooling, which means that if one of the fans in either fan housing fails, the remaining fans will continue to provide sufficient cooling to operate the command module. Figure 3-6 shows a set of fans in a fan housing.
3.1.1.7 Power Supplies Each module contains two removable power supplies. The power supplies provide power to the internal components by converting incoming AC voltage to DC voltage. If one of the power supplies is turned off or malfunctions, the other power supply can maintain electrical power to the command module. Figure 3-7 shows the power supplies, which are interchangeable by reversing the locking levers. FIGURE 3-7 3.1.1.
FIGURE 3-8 3.1.1.9 SFP Transceiver and fibre Optic Cable Tray ID Switch Note – IMPORTANT Each module in the storage array must have a unique tray ID. The Tray ID switch is located between the power supplies. The Tray ID switch lets you assign each module a unique tray ID, which is required for proper operation of the storage array. The settings for each digit (X10 and X1) in the Tray ID range from 0 through 7.Recommended unique ID numbers range from 01 through 77. Figure 3-9 shows the tray ID switch.
FIGURE 3-9 3.1.2 Tray ID Switch Using the Array This chapter provides general operating procedures for the array command module. 3.1.2.1 Removing and Replacing the Back Cover Caution – Potential damage to cables. To prevent degraded performance or damaged cables, do not bend or pinch the cables between the module and the backfire. Back covers are available only on deskside modules. If you have a rackmount module, you must open the hinged door or remove the access panel of the rackmount cabinet.
FIGURE 3-10 Removing and Replacing a Deskside Module Back Cover 1. Remove the back cover. Push the back cover up from the bottom, and pull the cover away from the module. 2. Replace the back cover. a. Hold the back cover next to the back of the command module, and carefully route all cables through the opening at the bottom of the cover.If the opening is too small for all cabling, route some cables through the gap between the bottom of the module and the floor. b.
Note – Turn off both power switches on all modules in the configuration before connecting power cords or turning on the main circuit breakers. 1. Remove the back cover, if needed. 2. Are the main circuit breakers in the cabinet turned on? ■ ■ Yes - Turn off both power switches on all modules that you intend to connect to the power. No - Turn off both power switches on all modules in the cabinet. 3. Connect the power cords to the power supplies on each module.
a. Note the status of the lights on the front and the back of each module. A green light indicates a normal status; an amber light indicates a hardware fault. b. Open the Array Management Window for the storage array. c. To view the status of its components, select the appropriate component button for each module in the Physical View of the Array Management Window. The status for each component will be either Optimal or Needs Attention. 7.
1. Stop I/O activity to all modules. 2. Remove the front cover from the command module, if applicable. 3. Determine the status of each module and its components. a. Note the status of the lights on the front and the back of each module. A green light indicates a normal status; an amber light indicates a hardware fault. b. Open the Array Management Window for the storage array. c.
6. Check the lights on the front and the back of each drive module, and verify that all drive Active lights are on but not flashing. If one or more drive Active lights are flashing, then data is being written to or from the disks. Wait for all drive Active lights to stop flashing, and then go to step 7. 7. Check the Cache Active light, then choose one of the following steps, based on the status of the light. Figure 3-12 and Table 3-1shows the locations of the status lights.
TABLE 3-1 Location Lights on the Back of a Command Module Component Light Color Normal Status Problem Status Procedure Controller 3-14 1 Host Connector 1 Link Indicator Green On Off 2 Host Connector 1 Speed Indicator Green On - 2 Gb/s data rate Off - 1 Gb/s data rate Not Applicable 3 Host Connector 2 Link Indicator Green On Off 4 Host Connector 2 Speed Indicator Green On - 2 Gb/s data rate Off - 1 Gb/s data rate Not Applicable 5 Ethernet link indicator Green On - Connection a
TABLE 3-1 Location Lights on the Back of a Command Module Component Light Color Normal Status Problem Status 8 Cache Active Green On Off (if cache enabled) 9 Fault Amber Off On 10 Drive Link Green On Off 11 Expansion Port Bypass Amber Off On Procedure “Replacing a Power Supply” on page 7-41 Fan 12 Fan Fault Amber Off On “Replacing a Fan” on page 7-39 Power Supply 13 Power Green On Off 14 Fault Amber Off On “Replacing a Power Supply” on page 7-41 Link Rate 15
9. Turn off both power switches on the back of each drive module End Of Procedure 3.1.2.4 Turning Off Power for an Unplanned Shutdown Storage array modules are designed to run continuously, 24 hours a day. Certain situations, however, may require you to shut down all storage array modules quickly. These situations might include a power failure or emergency because of a fire, a flood, extreme weather conditions, some other hazardous circumstance, or a power supply shutdown caused byoverheating.
a. To run the Recovery Guru, select the Recovery Guru toolbar button in the Array Management Window. b. Complete the recovery procedure. If the Recovery Guru directs you to replace a failed component, use the individual lights on the modules to locate the failed component. For troubleshooting procedures, refer to “Troubleshooting and Recovery” on page 3-22. Note – If a fault requires you to power off an attached module, you may need to cycle the power on all remaining modules in the storage array.
8. Check the lights on the back of the command module, and then choose one of the following steps, based on the status of the lights. The Host Link and Host Speed lights as well as the Power light are illuminated; all others are off - The module Link or 100BT light might be on if the command module is using an Ethernet connection. Go to step 9. One or more amber lights are on - Do not continue with the power off procedure until you have corrected the fault.
3. Verify that both power switches on all modules in the cabinet are turned off. 4. Are the main circuit breakers in the cabinet turned off? Yes -Turn on the main circuit breakers in the cabinet. No - Reset the main circuit breakers in the cabinet. 5. Connect the power cables to both power supplies in each module.
a. To run the Recovery Guru, select the Recovery Guru toolbar button in the Array Management Window. b. Complete the recovery procedure. If the Recovery Guru directs you to replace a failed component, use the individual lights on the modules to locate the specific failed component. For troubleshooting procedures, refer to “Troubleshooting and Recovery” on page 3-22. c.
Use the following procedure to turn off the alarm and to identify the problem that caused the alarm to sound. 1. Locate the module with the alarm sounding and the amber Global Fault light illuminated. 2. Press the Mute button to turn off the alarm. If another fault occurs, the alarm will sound again. 3. Remove the back cover, if needed. 4. Determine the status of each module and its components. a. Note the status of the lights on the front and the back of each module.
End Of Procedure 3.2 Troubleshooting and Recovery This chapter provides procedures for diagnosing and correcting problems with the array command module. 3.2.1 Troubleshooting the Module The storage management software provides the best way to monitor the modules, diagnose problems, and to recover from hardware failures. You should run the storage management software continuously and check the status of the storage array frequently.
■ No - You are finished with this procedure. If you are still experiencing a problem with this storage array, go to step 10. 7. Remove the cover. 8. If needed, turn off the alarm. 9. Check all of the lights on the front and the back of each module. Figure 3-14 and Figure 3-15 show the locations of indicator lights. Table 3-2 and Table 3-3 refer you to the appropriate procedures for various fault indicators.
TABLE 3-2 Location Lights on the Front of a Command Module Component Light Color Normal Status Problem Status Procedure Global 3 Global Power Green On Off “Recoverin g from an Overheate d Power Supply” on page 3-26 4 Global Fault Amber Off On Recovery Guru procedure FIGURE 3-15 Lights on the Back of a Command Module TABLE 3-3 Lights on the Back of a Command Module Location Component Light Color Normal Status Controller 3-24 Sun StorEdge 5310 NAS Troubleshooting Guide • December
TABLE 3-3 Location Lights on the Back of a Command Module Component Light Color Normal Status Problem Status Procedure 1 Host Connector 1 Link Indicator Green On Off “Replacing a Drive” on page 736 2 Host Connector 1 Speed Indicator Green On - 2 Gb/s data rate Off - 1 Gb/s data rate Not Applicable 3 Host Connector 2 Link Indicator Green On Off 4 Host Connector 2 Speed Indicator Green On - 2 Gb/s data rate Off - 1 Gb/s data rate Not Applicable 5 Ethernet link indicator Green On
TABLE 3-3 Lights on the Back of a Command Module Location Component Light Color Normal Status Problem Status 12 Fan Fault Amber Off On “Replacing a Fan” on page 7-39 Procedure Power Supply 13 Power Green On Off 14 Fault Amber Off On “Replacing a Power Supply” on page 7-41 Link Rate 3.2.
If event monitoring is enabled, and if event notification is configured, the software also issues one or both of the following critical problem notifications: ■ ■ If one power supply shuts down, the storage management software will display a Needs Attention status in the Array Management Window. If both power supplies shut down, the module will shut down, and the storage management software will display a Not Responding status in the Array Management Window.
■ ■ Yes - Go to step 9. No - Go to step 8. 8. Turn on both power switches on the back of each drive module connected to the command module. The drive modules will not spin up until they receive a Start Unit command from the controller. While the drive modules power up, the lights on the fronts and backs of the modules will flash intermittently. Depending on your configuration, it can take between 20 seconds and several minutes for the drive modules to power up. FIGURE 3-16 Power Supply Switches 9.
a. To run the Recovery Guru, select the Recovery Guru toolbar button in the Array Management Window. b. Complete the recovery procedure. If the Recovery Guru directs you to replace a failed component, use the individual lights on the modules to locate the specific failed component. Figure 3-14 on page 3-23 and Figure 3-15 on page 3-24 show the locations of indicator lights. Table 3-2 and Table 3-3 refer you to the appropriate procedures for various fault indicators.
6. If applicable, repeat for all other modules in the storage array. End Of Procedure FIGURE 3-17 3.2.4 Setting the Tray ID Switch Verifying the Link Rate Setting Use the following procedure to verify the Link Rate setting if a link rate problem is indicated. Figure 3-18 shows the location of the Link Rate switch. Caution – Electrostatic discharge damage to sensitive components.
4. Replace the switch cover and tighten the screw to secure it into place. 5. If applicable, repeat for all other modules in the storage array. End of Procedure FIGURE 3-18 3.3 Verifying the Link Rate Setting Relocating a Command Module This chapter provides procedures for upgrading an E2600 command module for greater storage capacity and guidelines for relocating an E2600 command module. 3.3.
Caution – Potential data loss or data corruption. Never insert drives into a command module without first confirming the drive firmware level. Inserting a drive with the incorrect firmware level may cause data loss or data corruption. For information on supported drive firmware levels, contact technical support. Caution – Potential command module failure. Use of nonsupported drives in the command modules can cause the command module to fail. Caution – Potential data loss or data corruption.
array, the amount of time you can afford to keep the command module offline, and the method that most closely matches the upgrade procedure recommended in the storage management software and this guide. Replace All Drives at the Same Time If you are upgrading drives containing RAID 0 volumes, you must use this method. This method requires you to back up the command module and turn off the power to the storage array before replacing the drives.
Caution – Potential volume group failure. The command module supports a maximum of eight drive modules per loop (112 drives maximum). Do not install new drives into the empty drive slots in the command module enclosure if the module is already at the maximum configuration. Doing so will exceed the fibre channel protocol limit and cause volume groups to fail. Caution – Electrostatic discharge damage to sensitive components.
6. Repeat step 4 and step 5 to install each new drive. FIGURE 3-19 Removing and Installing a Drive 7. Based on the status of the Active and Fault lights, choose one of the following steps: ■ ■ ■ Active lights are on while Fault lights are off - Go to step 9. Active lights are off while Fault lights are off - The drive may be installed incorrectly. Remove the drive, wait 30 seconds, and then reinstall it. Go to step 8. Fault lights are on - The new drive may be defective.
a. Complete the recovery procedure. If the Recovery Guru directs you to replace a failed component, use the individual lights on the modules to locate the specific failed component. For troubleshooting procedures, refer to “Troubleshooting and Recovery” on page 3-22. b. Select Recheck in the Recovery Guru to re-run the Recovery Guru and to ensure that the problem has been corrected. c. If the problem persists, contact technical support. 12. Configure the new drives using the storage management software. 13.
3. Read all information provided in “Replace Existing Drives with Greater Capacity Drives” on page 3-32, particularly the paragraphs explaining the differences between the two possible upgrade procedures. 4. Compare the SANtricity Storage Manager Product Release Notes with this procedure to determine if you need to modify this procedure, based on more recent information. 5. Determine the status of each module and its components.
FIGURE 3-20 Power Supply Switches Note – IMPORTANT If you accidentally remove an active drive, wait at least 30 seconds and then reinstall it. For recovery procedures, refer to your storage management software. 12. Lift the locking lever on the drive and remove it from the slot. 13. Slide the new drive all the way into the empty slot and close the drive lever. 14. Repeat step 12 and step 13 for each drive you are replacing. FIGURE 3-21 Removing and Installing a Drive 15.
■ ■ ■ Active lights are on while Fault lights are off - Go to step 18. Active lights are off while Fault lights are off - The drive may be installed incorrectly. Remove the drive, wait 30 seconds, and then reinstall it. Go to step 17. Fault lights are on - The new drive may be defective. Replace it with another new drive, and then go to step 17. 17. Did this correct the problem? ■ ■ Yes - Go to step 18.
Caution – Potential data loss. Using the wrong drive upgrade procedure can cause data loss. If you are upgrading drives containing RAID 0 volumes, you must use the procedure for replacing all of the drives at once. If you are upgrading drives containing RAID 1, 3, or 5 volumes, you may use either upgrade procedure. Caution – Potential data loss. When replacing a drive, make sure the new drive has a storage capacity equal to or greater than the old drive.
5. Determine the status of all modules and their components in the storage array. Note the status of the indicator lights on the front and the back of each module. A green light indicates a normal status; an amber light indicates a hardware fault. d. Select the appropriate component button for each module in the Physical View of the Array Management Window to view the status of all its components. The status for each component will be either Optimal or Needs Attention. 6.
FIGURE 3-22 Removing and Installing a Drive 11. Slide the new drive all the way into the empty slot and close the locking lever. As the drive spins up, the Fault lights may flash intermittently. The new drive should begin reconstructing automatically after you install it in the drive slot. During reconstruction, the drive's Fault light may come on for a few minutes, and then turn off when the Active light begins flashing. A flashing Active light indicates that data is being restored to the new drive.
15. To view the status of its components, select the appropriate component button for each module in the Physical View of the Array Management Window. The status for each component will be either Optimal or Needs Attention. Do all module components have an Optimal status? ■ ■ Yes - Go to step 17. No - Go to step 16. 16. To run the Recovery Guru, select the Recovery Guru toolbar button in the Array Management Window. a. Complete the recovery procedure.
Drives, drive modules, command modules, or command modules that are part of a volume group configuration should not be moved. If you need to move storage array components, call technical support for detailed procedures. Technical support may direct you to complete several storage array preparation tasks undertaking the relocation. These tasks may include the following: ■ ■ ■ ■ 3.3.5.
3.3.6.1 Client Software Windows The client software has two main windows: the Enterprise Management Window (Figure 3-23) and the Array Management Window (Figure 3-24).
3.3.6.2 The Enterprise Management Window The Enterprise Management Window is the first window to appear when you start the software. It is used to: ■ ■ ■ ■ Detect and add the storage arrays you want to manage. View the status of all the storage arrays detected or added. Execute scripts to perform batch management tasks on a particular storage array using the Script Editor. For example, scripts may be run to create new volumes or download new controller firmware.
Device Tree The Enterprise Management Window Device Tree provides a hierarchical view of all the host-agent and directly managed storage arrays (Figure 3-26). The storage management station node is the root node and sends the storage management commands. When storage arrays are added to the Enterprise Management Window, they are shown in the Device Tree as child nodes of the storage management station node.
Device Table The Device Table lists the name, type of managed device, status, management type (direct network attached for directly managed storage arrays, or host-agent attached for host-agent managed storage arrays), and comments entered for storage arrays (Figure 3-25). Enterprise Management Window Menus The Enterprise Management Window menus on the menu bar are described in Table 3-4.
Enterprise Management Window Toolbar The Enterprise Management Window toolbar buttons are described in Table 3-5. TABLE 3-5 3.3.6.3 Enterprise Management Window Toolbar Buttons Toolbar Button Description Automatically detect new devices Activates the Automatic Discovery option that detects hosts and storage arrays on the local subnetwork and adds them to the Enterprise Management Window. Rescan selected host for new devices Rescans the highlighted host for any newly attached storage arrays.
The Array Management Window is specific to an individual storage array; therefore, you can manage only a single storage array within an Array Management Window. However, you can start other Array Management Windows from the Enterprise Management Window to simultaneously manage multiple storage arrays. The storage management software supports firmware version 5.40 and all firmware versions 4.x and 5.x. For maximum system stability, the recommended minimum is firmware version 4.01.02.30.
FIGURE 3-27 Array Management Window Chapter 3 Storage Arrays 3-51
TABLE 3-6 Array Management Window Tabs Tabs Description Logical/Physical View The Array Management Window Logical/Physical View contains two panes: the Logical View and the Physical View. The Logical View (left pane of Figure 3-27 on page 3-51) provides a tree-structured view of logical nodes. This view shows the organization of storage array capacity into volume groups and volumes.
TABLE 3-7 Array Management Window Menus (1 of 2) Menu Description Storage Array Contains options to perform the following storage array management operations: locating functions (locating the storage array or a specific drive channel by flashing indicator lights), configuring the storage array, enabling premium features, starting Recovery Guru, monitoring performance, downloading firmware and NVSRAM files, changing various settings, setting controller clocks, redistributing volumes, running Read Link S
TABLE 3-7 Array Management Window Menus (1 of 2) Menu Description Drive Contains options to perform the following storage management operations on drives: locating a drive, assigning or unassigning a hot spare, failing, reconstructing, reviving or initializing a drive, or viewing drive properties. Note These menu options are only available when a drive is selected. Advanced Presents maintenance options which should only be used under the guidance of technical support.
TABLE 3-8 Array Management Window Toolbar Buttons Toolbar Button Description Monitor performance Opens the Performance Monitor which provides information about how the storage array is functioning. Recover from failures Initiates the Recovery Guru which is used to help troubleshoot storage array problems. Note If the storage array is in a Needs Attention state, the icon on the Recovery Guru toolbar button flashes. Find node in tree 3.3.6.
Volume Copy The Volume Copy premium feature is used to copy data from one volume (the source) to another volume (the target) in a single storage array. The source volume is a standard volume in a volume copy that accepts host I/O requests and stores application. The target volume is a standard volume in a volume copy that maintains a copy of the data from the source volume.
operations overlap and performance improves. If a disk drive in a volume group fails, the redundant or parity data can be used to regenerate the user data on replacement disk drives. RAID relies on a series of configurations, called levels, to determine how user and redundancy data is written and retrieved from the drives. Each level provides different performance and protection features. The storage management software offers four formal RAID level configurations: RAID levels 0, 1, 3, and 5.
TABLE 3-9 RAID Level RAID 0 3-58 RAID Level Configurations Short Description Detailed Description NonRedundant, Striping Mode • Used for high performance needs, but does not provide data redundancy. • Stripes data across all drives in the volume group. • Not recommended for high data availability needs. RAID 0 is better for non-critical data. • A single drive failure causes all associated volumes to fail and data loss can occur.
TABLE 3-9 RAID Level RAID Level Configurations Short Description Detailed Description RAID 1 Striping/ Mirroring Mode • Also called RAID 10 or 0+1. • A minimum of two drives is required for RAID 1;one for the user data and one for the mirrored data. • Offers the best availability high performance and the best data availability. Data is written to two duplicate disks simultaneously.
3.3.6.7 Hardware Redundancy Data protection strategies provided by the storage system hardware include cache memory, hot spare drives, background media scans, and channel protection. Controller Cache Memory Caution – Sometimes write caching is disabled when batteries are low or discharged. If a parameter called Write caching without batteries is enabled on a volume, write caching continues even when batteries in the command module or array module are discharged.
to the cache memory of both controllers. It is, therefore, important to change the command module and array module batteries at the recommended time intervals. The controllers in the storage array keep track of the age (in days) of the battery. After replacing the battery, the age must be reset so that you will receive an accurate critical alert notification when the battery is nearing expiration and when it has expired.
Channel Protection In a Fibre Channel environment, channel protection is usually present for any volume group candidate because, when the storage array is properly cabled, there are two redundant Fibre Channel Arbitrated Loops for each drive. 3.3.6.8 I/O Data Path Protection I/O data path protection to redundant controllers in a storage array is accomplished with the Auto-Volume Transfer (AVT) feature and a host multi-path driver.
When AVT is enabled and used in conjunction with a host multi-path driver, it helps ensure an I/O data path is available for the storage array volumes. The AVT feature changes the ownership of the volume receiving the I/O to the alternate controller. After the I/O data path problem is corrected, the preferred controller will automatically reestablish ownership of the volume as soon as the multi-path driver detects the path is normal again.
Password Failure Reporting and Lockout For storage arrays with a password and alert notifications configured, any attempts to access the storage array without the correct password will be reported. If a password is incorrectly entered, an information major event log (MEL) event is logged, indicating than an invalid password or no password has been entered. If the password is incorrectly entered 10 times within 10 minutes, both controllers will enter lockout mode.
■ ■ ■ ■ ■ ■ ■ Standard volume - A logical structure created on a storage array for data storage. A standard volume is created using the Create Volume Wizard. If the premium feature is not enabled for snapshot volumes or Remote Volume Mirroring, then only standard volumes will be created. Standard volumes are also used in conjunction with creating snapshot volumes and remote mirror volumes. Snapshot volume - A point-in-time image of a standard volume.
To create a volume group, two parameters must be specified: RAID level and capacity (how large you want the volume group). For the capacity parameter, you can either choose the automatic choices provided by the software or select the manual method to indicate the specific drives to include in the volume group. The automatic method should be used whenever possible, because the software provides the best selections for drive groupings. FIGURE 3-28 3.3.6.
Specifying Volume Parameters from Free Capacity Note – IMPORTANT The free capacity, unconfigured capacity, or unassigned drives selected when starting the Wizard determine the default initial capacity selections. After the Wizard begins, the capacity can be changed by selecting a different free capacity node location for the volume, or by selecting different unassigned drives for the volume group.
■ ■ 3.3.6.13 Automatic - If you are not using SANshare Storage Partitioning, specify this setting. The Automatic setting specifies that a logical unit number (LUN) be automatically assigned to the volume using the next available LUN within the default group. This setting grants volume access to host groups or hosts that have no specific volume-to-LUN mappings (designated by the Default Group node in the Topology View).
TABLE 3-10 Mappings View Tab View Description Topology Shows defined topological elements (host groups, hosts, and host ports), undefined mappings (volumes that have been created but do not have a defined volume-to-LUN mapping), and the Default Group. Defined Mappings Displays the volume-to-LUN mappings in a storage array in table form. Information is displayed about the volumes: topological entities that can access the volume, volume name, volume capacity, and LUN number associated with the volume.
TABLE 3-11 3-70 Volume-to-LUN Terminology Term Description Host Port The physical connection that allows a host to gain access to the volumes in the storage array. When the host bus adapter only has one physical connection (host port), the terms host port and host bus adapter are synonymous. Host ports can be automatically detected by the storage management software after the storage array has been connected and powered-up.
TABLE 3-11 Volume-to-LUN Terminology Term Description Logical Unit Number (LUN) The number a host uses to access a volume on a storage array. Each host has its own LUN address space. Therefore, the same LUN may be used by different hosts to access different volumes on the storage array. However, a volume can only be mapped to a single LUN. A volume cannot be mapped to more than one host group or host.
The host-based SM devices utility (if available for your operating system) is used to associate the physical device name and the volume name. Refer to “SANshare Storage Partitioning” on page 3-72 for more information on assigning volume-toLUN mappings. 3.3.6.14 SANshare Storage Partitioning This is a premium feature of the storage management software and must be enabled either by you or your storage vendor.
The first partition is composed of Volume Financial. This volume is accessed by Host KC-B using LUN 5. Even though Host KC-B is part of the logical Host Group Kansas City, Host KC-A cannot access this volume because the volume-to-LUN mapping was created with Host KC-B rather than the Host Group Kansas City. The second partition consists of Volumes Legal and Engineering. This volume-toLUN mapping was created using Host Group Kansas City.
■ Create volumes on the storage array. As part of the volume creation, specify one of two volume-to-LUN mapping settings: Automatic - If you are not using SANshare Storage Partitioning, specify this setting. The Automatic setting specifies that a LUN be automatically assigned to the volume using the next available LUN within the Default Group. This setting will grant volume access to host groups or hosts that have no specific volume-to-LUN mappings (designated by the Default Group in the Topology View).
Heterogeneous Hosts Example Note – IMPORTANT Heterogeneous host settings are only available with SANshare Storage Partitioning enabled. In Figure 3-32 on page 3-76, the SANshare Storage Partitioning feature is enabled. In a heterogeneous environment, you must set each host type to the appropriate operating system during host port definition (Figure 3-31 on page 3-75). By doing this, the firmware on each controller can respond correctly for that host's operating system.
FIGURE 3-32 Heterogeneous Hosts Example Snapshot Volumes This is a premium feature of the storage management software and must be enabled either by you or your storage vendor. The Snapshot Volume feature is used to create a logical point-in-time image of another volume. Typically, you create a snapshot so that an application, for example a backup application, can access the snapshot and read the data while the base volume remains online and user-accessible.
only data blocks that are physically stored in the snapshot repository volume are those that have changed since the time the snapshot volume was created, the snapshot volume uses less disk space than a full physical copy. The storage management software provides a warning message when the snapshot repository volume nears a user-specified threshold (a percentage of its full capacity; the default is 50%).
If you have a snapshot volume that you no longer need, instead of deleting it, you can reuse it (and its associated repository volume) to create a different point-in-time image of the same base volume. Re-creating a snapshot volume takes less time than creating a new one. When you re-create a snapshot volume: ■ ■ ■ ■ The snapshot volume must have either an optimal or a disabled state. All copy-on-write data previously on the snapshot repository volume is deleted.
Dynamic Volume Expansion (DVE) is a modification operation used to increase the capacity of standard or snapshot repository volumes. The increase in capacity can be achieved by using any free capacity available on the volume group of the standard or snapshot repository volume. Data will be accessible on volume groups, volumes, and disk drives throughout the entire modification operation.
Prior to creating a mirror relationship, the Remote Volume Mirroring feature must be enabled and activated on both the primary and secondary storage arrays. The primary volume is the volume that accepts host I/O and stores application data. When you create a remote volume mirror, a mirrored volume pair is created and consists of a primary volume at the primary storage array and a secondary volume at the secondary storage array. Data from the primary volume is copied in its entirety to the secondary volume.
Data Replication Data replication between the primary volume and the secondary volume is managed by the controllers and is transparent to host machines and applications. When the controller owner of the primary volume receives a write request from a host, the controller first logs information about the write to a mirror repository volume, then writes the data to the primary volume.
Target Volume Caution – A volume copy will overwrite all data on the target volume. Ensure that you no longer need the data or have backed up the data on the target volume before starting a volume copy. A target volume maintains a copy of the data from the source volume, and can be a standard volume, the base volume of a Failed or Disabled snapshot volume, or a Remote Volume Mirror primary volume in an active mirrored pair.
3.3.6.16 Managing Persistent Reservations Caution – The Persistent Reservations option should be used only under the guidance of a technical support representative. The Persistent Reservations option enables you to view and clear volume reservations and associated registrations. Persistent reservations are configured and managed through the cluster server software, and prevent other hosts from accessing particular volumes.
3.3.6.17 Maintaining and Monitoring Storage Arrays This section describes methods for maintaining storage arrays, including troubleshooting storage array problems, recovering from a storage array problem using the Recovery Guru, and configuring alert notifications using the event monitor. For additional conceptual information and detailed procedures for the options described in this section, refer to Learn About Monitoring Storage Arrays in the Enterprise Management Window online help.
FIGURE 3-35 Monitoring Storage Array Health Using the Enterprise Management Window Storage Array Status Icons Table 3-12 provides information about the storage array status icons that display: ■ ■ In the Device Tree, Device Table, and Overall Health Status area in the Enterprise Management Window As the Root Node in the Logical View Tree in the Array Management Window Chapter 3 Storage Arrays 3-85
TABLE 3-12 Storage Array Status Icon Quick Reference Status Description Optimal Indicates every component in the storage array is in the desired working condition. Needs Attention Specifies a problem on a storage array that requires intervention to correct. To correct the problem, start the Array Management Window for that particular storage array, and then use Recovery Guru to pinpoint the cause of the problem and obtain appropriate procedures.
The event monitor is a separate program bundled with the client software and must be installed with the storage management software. The client/event monitor is installed on a storage management station or host connected to the storage arrays. For continuous monitoring, install the event monitor on a computer that runs 24 hours a day.
You must have administrative permissions to install software on the computer where the event monitor will reside. After the storage management software has been installed, the icon shown in Figure 3-37 on page 3-88 will be present in the lower-left corner of the Enterprise Management Window. FIGURE 3-37 Event Monitor Example 2. Set up the alert destinations for the storage arrays you want to monitor from the Enterprise Management Window.
■ ■ ■ ■ ■ ■ Name of the affected storage array Host IP address (only for a storage array managed through a host-agent) Host name/ID (shown as directly managed if the storage array is managed through each controller's Ethernet connection) Event error type related to an Event Log entry Date and time when the event occurred Brief description of the event Note – IMPORTANT To set up alert notifications using SNMP traps, you must copy and compile a management information base (MIB) file on the designated netwo
1. Create a text file containing the contact information you want to send to the customer support group. For example, include the names and pager numbers of the system administrators. 2. Name the file userdata.txt and save it in the home directory (for example, Winnt\ profiles\) on the client machine you are using to manage the storage array (This may be your host machine if you installed the client software on the host.). 3. Configure the alert notifications using e-mail or SNMP trap destinations.
FIGURE 3-38 Problem Notification in the Array Management Window Storage Array Problem Recovery When you suspect a storage array problem, launch the Recovery Guru. The Recovery Guru is a component of the Array Management Window that will diagnose the problem and provide the appropriate procedure to use for recovery. The Recovery Guru can be displayed by selecting the Recovery Guru toolbar button in the Array Management Window, shown in Figure 3-39 on page 3-92.
FIGURE 3-39 Displaying the Recovery Guru Window Recovery Guru Example The Recovery Guru window is divided into three views: Summary, Details, and Recovery Procedures. The Summary view presents a list of storage array problems. The Details view displays information about the selected problem in the Summary area. The Recovery Procedure view lists the appropriate steps to follow for the selected problem in the Summary view.
FIGURE 3-40 Recovery Guru Window Example As you follow the recovery procedure to replace the failed drive, the storage array status changes to Fixing, the associated volume (HResources) status changes from failed to Degraded-Copyback in Progress, and the replaced drive status changes to Replaced. The data that was reconstructed to the hot spare drive is now being copied back to the replaced drive. These changes are shown in Figure 3-41 on page 3-94.
FIGURE 3-41 Status Changes During an Example Recovery Operation When the copyback operation is finished, the status change to reflect the optimal status of the components, as shown in Figure 3-42 on page 3-95.
FIGURE 3-42 Status Changes When The Example Recovery Operation is Completed After you replace the failed drive in the drive module: ■ ■ ■ The storage array status in the Logical View returns to Optimal. The storage array status in the Enterprise Management Window changes from Needs Attention to Optimal. The Recovery Guru button stops blinking. Note – For the Recovery Guru button to register Optimal status, the failed battery must be replaced as well. 3.
1. Start the SMclient. A window similar to the one below is displayed after a short delay.
a. Select Edit>>Add Device and add the storage array by entering the IP address for the A controller and then click Add. b. When the Add Device window returns, enter the IP address for the B controller and click Add. c. When the Add Device window returns, click close. 2. Double-click the array name in the right window. The Array Management Window (AMW) is displayed.
3. Select Advanced>>Maintenance>>Download>>Controller Firmware. The Download Firmware screen is displayed. 4. Click the browse button to navigate to the directory where the firmware is located. (Example, c:\ temp\rsmfw \) Select SNAP_0610xxxx.dlp. 5. If upgrading NVSRAM, check the box marked "Download NVSRAM file with firmware" and using the browse button, navigate to the directory where the NVSRAM file is located. (Example, c:\temp\rsmfw\) Select N2882-610843-5xx.dlp. 6.
7. After selecting the firmware and NVSRAM files click OK and then Yes to begin the firmware upgrade. The downloading screen is displayed. 3.5 Updating ESM Firmware To update the ESM (CSM100_E_FC_S) firmware, do the following: 1. Select Advanced>>Maintenance>>Download>>ESM firmware. 2. Select the drive tray(s) to upgrade. 3. Browse to where the file is located (Example, c:\temp\rsmfw\) 4.
5. Select Start to begin the upgrade.
CHAPTER 4 StorEdge File Replicator This chapter provides an overview of the StorEdge File Replicator. 4.1 Overview The Sun StorEdge 5310 NAS data appliances provide fault-tolerant features such as redundant hardware devices, extensive monitoring and notification of both software and hardware components, checkpointing and controller and server Failover. StorEdge File Replicator extends these capabilities to include mirroring. This section provides an overview of the StorEdge File Replicator feature.
TABLE 4-1 4-2 Standard Terms Term Definition Master The system that is being mirrored or the source system Mirror The system that is being used to mirror the Master system, or the target system Checkpoint / Snapshot A static image of the file system at a fixed point in time Client A network computer that initiates a read or write request Delta The filesystem blocks that have changed during a fixed period of time, usually between successive checkpoints Disaster Recovery (DR) The act of recove
4.1.1 Real-time Mirroring Real-time mirroring is the simplest to describe, and the most difficult and expensive to implement. The requirement and guarantee of real-time mirroring is that data is committed in a persistent manner on both the Master and Mirror prior to reflecting transaction complete to the client. If the mirroring is remote, e.g.
1)A client issues a write for transaction TXID 1 2)Sun StorEdge 5310 NAS receives transaction TXID 1 3)Transaction TXID 1 is committed to the Master system's journal 4)Transaction complete is reflected back to the client for TXID 1 5)Transaction TXID 1 is queued to the Mirror system 6)Transaction TXID 1 is sent over the network to the Mirror system 7)TXID 1 is received by the Mirror system 8)TXID 1 is committed to the Mirror system's journal 9)Transaction complete is reflected to the Master system for TXID
architecture provides clear advantages over Checkpoint or Snapshot Mirroring, in that the state of the Mirror's data is typically only seconds behind that of the Master's filesystem. Extensive effort has gone into ensuring not only that the performance of the Mirror does not lag that of the Master, thereby maintaining the Mirror in a high state of synchronization with the Master, but that data integrity - including write ordering is preserved (Figure 4-2 on page 4-5).
FIGURE 4-3 Lost transaction handling on the Mirror In the event an out-of-order transaction is received, the Master is notified and it resends the missing transaction (Figure 4-3 on page 4-6). The out-of-order transaction TXID 4 in this example - is not committed until the missing transaction(s) is (are) received and committed (TXID 3). This is crucial as there are many applications (e.g., databases) that require write ordering to be preserved.
FIGURE 4-4 The Mirror Log and Primary Journal The Primary Journal floats through the Mirror Log functioning in a similar manner to that in which it does now. Sun StorEdge 5310 NAS maintains hints in the filesystem that enable it to quickly locate its position in both the Primary Journal and Mirror Log if an outage (e.g., a complete power loss, component failure, etc.) is experienced.
systems fails or is lost, the data is preserved at the central, corporate site, providing a fallback position for the remote locations and expediting the recovery of the remote site. Sun StorEdge 5310 NAS supports Many-to-One Mirroring. 4.1.4.2 One-to-Many Mirroring One-to-Many Mirroring refers to the ability for a Master system to mirror simultaneously to multiple Mirror locations. It may seem a simple variation on the mirroring theme, but it actually introduces a number of complexities to mirroring.
4.1.4.4 Bi-directional Mirroring Bi-directional Mirroring refers to the ability for systems at sister locations to mirror to each other. For instance, a system in Los Angeles may be configured to mirror its volumes to a sister system in Houston, which in turn and simultaneously mirrors its volumes to the Los Angeles system. In the event either site experiences a problem, the data is readily available at the sister site.
opaque to the mirror service. Connections are made with standard UDP and TCP protocols, so Sun StorEdge 5310 NAS servers can be mirrored across any reachable network. Since StorEdge File Replicator operates at the disk block level, the mirror system is an exact replica of the master system. However, since mirroring operations are not strictly real time, the mirror system may lag the master by a time delta dependent on the speed and quality of the network.
constituted with the same structure as that on the master system. File system segments, or extents, are created on the mirror system in numbers and sizes matching those on the master system. In the case of a master reboot, this state consists of simply validating each extent on the mirror system. The mirror volume is identical to that of the master, except that the partition type is reported as NBD, and all access to the volume outside the mirror service is limited to read-only.
4.2.3 Mirror Sequencing Once a mirror has been created and fully replicated, it enters the INSYNC state. The mirror volume is finally mounted, read-only, and declared to be of partition type NBD. Then, the mirror begins sequencing. It is at this point that the mirror buffer begins to be actively used. Each file system transaction written to the mirror buffer on the master system is sent, or sequenced, to the mirror system.
To stop mirroring on a volume, the user must "break" the mirror. The mirror can be broken from either the master or mirror system. When the mirror is broken, it enters the BREAKING state until the last file system transactions in transit have been acknowledged, and the break request has been communicated to both master and mirror. Data transfer between master and mirror is then stopped, the mirror definition is removed from the mirror service, and the mirror buffer is removed on the master volume.
4-14 Sun StorEdge 5310 NAS Troubleshooting Guide • December 2004
CHAPTER 5 Clustering This chapter 5.1 Overview This section will be updated when the information is available.
5-2 Sun StorEdge 5310 NAS Troubleshooting Guide • December 2004
CHAPTER 6 Checkpoints/Snapshots This chapter provides an insight on how checkpoints are created, maintained and deleted. 6.1 Overview The goal of the checkpoints is to minimize the number of copies when creating a checkpoint. This document discusses what happens to a checkpoint from the time it is created to when it is removed. Checkpoint lifetime can be divided into three main stages: 1. Creation, 2. Active as a pseudo filesystem, and 3. Deletion. These stages are described in the following sections.
Checkpoints of a volume are accessed through a separate fs_online. This volume corresponds to the virtual checkpoint volume created when checkpoints are made enabled on a volume. Logical volume vol1 vol1.chkpnt checkpoint database Physical volume FIGURE 6-1 /vol1 Physical and Logical Volume Relationship As shown in Figure 6-1, the existence of checkpoint database distinguishes the checkpoint volume or CFS (Checkpoint File System) from the main volume or LFS (Live File System).
6.1.2 Checkpoint Lifecycle Checkpoints are created and managed using the fs_chkpntcl( ) function of filesystem which is sfs2_chkpntctl( ) for SFS2 filesystem. The checkpoint management interfaces use this call to create, delete and deactivate checkpoints and checkpointing on sfs2 filesystems. Checkpoints have three states: ■ ■ ■ 6.1.2.1 Active: while checkpoints are active, they can be accessed for most of the readonly file system operations.
ckpti-2 ckpti-1 ckpti m ckpti+1 p Mappings for block n m n p Filesystem a. image FIGURE 6-2 6.1.2.2 The Copy-On-Write Mechanism for Checkpoints Active Checkpoint Each CFS has a mapping function for all of the blocks on LFS. This mapping function returns a value for each page of LFS. Currently, there can be no more than 16 active checkpoints on a filesystem. This means the checkpoint filesystem can hold at most 16 mappings for each block. Also not all of the blocks in the LFS are mapped.
Figure 6-2 shows part of array of mappings for block n. In this example, ckpti+1 is the most recent checkpoint and thus, the last entry in the mappings and ckpti-2 is the oldest checkpoint (in this example). Assume a filesystem operation on checkpoint ckpti-2, and block n is accessed. The mapping function first checks the mapping for ckpti-2 and given it is empty, it moves forward and checks the mapping for ckpti-1. It finds m and will use block m instead of block n.
In Figure 6-4, there are two active checkpoints ckpti-2 and ckpti-1. If block n is going to be modified, a new block m will be allocated and the old content of n is copied to it. The address for block m will be inserted into the list of mappings for the block n in the latest checkpoint, which is ckpti-1. From this point, any request for block n in ckpti-2 and ckpti-1 will be redirected to the block m. From this, we can tell from a mapping table when system blocks are modified relative to active checkpoints.
6.1.2.3 Translation of File System Objects in Checkpoints As previously discussed, because the checkpointing mechanism is applied to filesystem blocks rather than filesystem objects, there is no special consideration for the type of object that is checkpointed. Hardlinks and symlinks in checkpoints will continue to have the same semantics they had in the live filesystem at the time the checkpoint is created.
src inode Src inode nlink = 2 Src inode nlink = 1 checkpoint block src Src directory entry dst entry x dst directory entry FIGURE 6-5 6-8 entry x checkpoint block entry y Creating a hardlink when a volume is checkpointed and has active checkpoints Sun StorEdge 5310 NAS Troubleshooting Guide • December 2004
Another example is when a hard-link is created in a directory named dst to a file residing in a directory called src. First, a new directory is inserted in dst. However, since volume is checkpointed, a copy of corresponding disk block is made first, and then the disk block is modified. When accessing the checkpoint version (and thus accessing the copied block), the old directory content (without the new entry) is seen.
FIGURE 6-6 Mappings for Block n After Deleting ckpti-1 Another example assumes checkpoint ckpti+1 is to be removed. Because ckpti has no mapping for itself, p is copied to the entry for ckpti. Because there is no entry after ckpti, nothing else need be done. The result is depicted in Figure 6-7. Yet, another example where deleting ckpti-2 is exactly like deleting ckpti-1 with the difference that in Figure 6-6, ckpti-2 is replaced with ckpti-1. Checkpoint are deleted using sfs2cp_dirop( ) function.
6.1.2.5 Checkpoint Scheduling Checkpoints can be created in two ways: automatic and manual. If the user selects the automatic checkpoints, checkpoints are created and removed based on the scheduling that user specifies for the checkpoints. This scheduling is enforced by a checkpoint manager thread.
Accessing Checkpointed Data Access to the checkpointed versions of directories and files is achieved through the provision of a hidden, virtual directory - named .chkpnt - within each live directory. Changing the current directory to the virtual .chkpnt directory enables users to access checkpointed or prior versions of filesystem objects. Note the .chkpnt directories are hidden to prevent problems with applications that search through directory hierarchies, e.g., backup or virus scanning applications.
FIGURE 6-8 Accessing .chkpnt in UNIX In order to access the ".chkpnt" directory, UNIX clients use standard file system commands (as shown in Figure 6-8). The operation from a Windows client is similarly simple; Windows clients can either provide the complete explicit path name or use the "Go to" option in Windows Explorer. This option can be found in Tools menu (see Figure 6-9 and Figure 6-10).
Once the user has navigated to the .chkpnt directory at any point in the directory hierarchy, there are no .chkpnt subdirectories, i.e., there are no nested .chkpnt directories. Users can then access checkpointed versions of the entire volume by navigating to the .chkpnt directory at the root of the volume. Compatibility Issues In order to avoid name conflicts between user applications and existing files, each .chkpnt entry possesses a number of special characteristics.
FIGURE 6-9 Accessing ".
FIGURE 6-10 6.1.3 Viewing ".chkpnt" in Windows Explorer Object Checkpoint Restore Checkpoints are read-only point-in-time images of a volume. They can be created manually or scheduled to be created (and subsequently removed) by the system without user intervention. By definition, objects within the checkpointed volume cannot be modified; this is the very nature of checkpointing and desirable behavior.
■ ■ Overhead is imposed at each layer of the operation The copy operation is a block-for-block operation because the client system is not cognizant of the structure of the StorEdge filesystem, and therefore of the underlying relationship between the blocks in the live and checkpointed versions of the filesystem. The new StorEdge internal cp command was engineered to confine the operation to the filer, resulting in far greater efficiency and speed. 6.1.
the "-c" option, the checkpointed file source will be restored to the destination. If the destination is omitted then system will try to restore to the original (noncheckpointed) path. For example: cp -c /v6.chkpnt/ckpt1/docs/sample.doc will restore to: /v6/docs/sample.doc The cp command is also available as part of the chkpnt command line operation.
FIGURE 6-12 Windows File Copy Error Message During a Checkpoint Restore Operation FIGURE 6-13 Windows Excel Open Error Message During a Checkpoint Restore Operation The CLI copy (cp) command is a general command and can be used to copy files on StorEdge whether or not the intention is to restore a checkpointed version of an object to the live filesystem. For instance, the cp command can be used to copy a file from one StorEdge volume to another volume on the same filer.
Finally, should a checkpoint restore operation be interrupted for any reason, e.g., a power failure - the operation will be resumed automatically on restart. Other checkpoint-related operations are also suspended for the duration of the checkpoint restore.
CHAPTER 7 FRU/CRU Replacement Procedures This chapter describes how to replace components in the Sun StorEdge 5310 NAS after they have been set up.
7.2 Determining a Faulty Component To determine and isolate a faulty component, refer to “Troubleshooting the Server Using Built-In Tools” on page 2-10.” This section can help you isolate a faulty component using the following methods: ■ ■ 7.
FIGURE 7-1 Removing the Cover 2. Set the cover aside and away from the immediate work area. Note – A non-skid surface or a stop behind the chassis may be needed if attempting to remove the top cover on a flat surface. Sliding the server chassis on a wooden surface may mar the surface (there are no rubber feet on the bottom of the chassis).
7.
7.6.1 Opening the Front Bezel To access the system controls and peripherals when a front bezel is installed, grasp the bezel at the finger hole on the left side and gently pull it towards you, unhinging it at the right, until it unsnaps from the chassis. Replace the bezel using the reverse process (see Figure 7-2). 1 1 2 1 Chassis Handle 2 Bezel Locating Tab FIGURE 7-2 Sun StorEdge 5310 NAS Bezel Replacement The EU front bezel uses a key system to lock the bezel. To remove the front bezel: 1.
3. Rotate the bezel towards the front. 4. Remove the bezel by pressing the hinges in and pulling it out. FIGURE 7-3 7.6.2 Sun StorEdge 5310 NAS Expansion Unit Memory Caution – Before touching or replacing any component inside the server, disconnect all external cables and follow the instructions in “Safety: Before You Remove the Cover” on page 7-2 and “Removing and Replacing the Cover” on page 7-2. Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap.
■ ■ Memory scrubbing: Error correction is performed on data being read from memory. The correction is then passed to the requestor and at the same time the error is “scrubbed” (corrected) in main memory. Memory scrubbing prevents the accumulation of single-bit errors in main memory that would then become unrecoverable multiple-bit errors.
6. When the module is nearly all of the way in, the handle will rotate up. At this time, push firmly on the front of the handle to lock the latch.
7.6.4 Fan Module Caution – Before touching or replacing any component inside the Sun StorEdge 5310 NAS, disconnect all external cables and follow the instructions in “Safety: Before You Remove the Cover” on page 7-2 and “Removing and Replacing the Cover” on page 7-2. Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap. 7.6.4.1 Sun StorEdge 5310 NAS Fan Module Removal The fans in the Sun StorEdge 5310 NAS are individually replaceable.
1 C B A 2 2 1 3 3 Reverse view to DIMM fan FIGURE 7-5 7-10 Removing the Fan Module Sun StorEdge 5310 NAS Troubleshooting Guide • December 2004 1 Front panel USB Ribbon Cable 2 Cable Retention Clip 3 Floppy/FP/IDE Cable
7.6.4.2 Sun StorEdge 5310 NAS Fan Module Replacement Replacing the fan module is essentially the reverse of the procedure described in “Sun StorEdge 5310 NAS Fan Module Removal” on page 7-9. 1. Note the raised tabs on the chassis floor and the corresponding notches in the bottom of the fan module. 2. Lower the fan module until it is just above the chassis floor. 3. Align the notches in the fan module with the raised tabs on the chassis and lower the fan module onto the floor. 4.
7.6.5 High Profile Riser PCI Cards Note – Add-in cards must be replaced while the riser board is removed from the chassis. The server supports 3V only and Universal PCI cards. It does not support 5V only cards. Caution – Before touching or replacing any component inside the Sun StorEdge 5310 NAS, disconnect all external cables and follow the instructions in “Safety: Before You Remove the Cover” on page 7-2 and “Removing and Replacing the Cover” on page 7-2.
7.6.6 Gigabit Ethernet Card Note – Add-in cards must be replaced while the riser board is removed from the chassis. The server supports 3V only and Universal PCI cards. It does not support 5V only cards. Alternately, a 10/100 Ethernetcard may be used for Cluster HeartBeat . The procedure to replace the 10/100 Ethernet is the same as the fibre Ethernet card.
11. Replace the chassis cover if you have no additional work to do inside the chassis.
7.6.7 Low Profile Riser PCI Cards Note – Add-in cards must be replaced while the riser board is removed from the chassis. The server supports 3V only and Universal PCI cards. It does not support 5V only cards. Caution – Before touching or replacing any component inside the Sun StorEdge 5310 NAS, disconnect all external cables and follow the instructions in “Safety: Before You Remove the Cover” on page 7-2 and “Removing and Replacing the Cover” on page 7-2.
7.6.8 Qlogic HBA Removal and Replacement Note – Add-in cards must be replaced while the riser board is removed from the chassis. The server supports 3V only and Universal PCI cards. It does not support 5V only cards. Caution – Before touching or replacing any component inside the Sun StorEdge 5310 NAS, disconnect all external cables and follow the instructions in "“Safety: Before You Remove the Cover” on page 7-2 and “Removing and Replacing the Cover” on page 7-2.
7.6.9 LCD Display Module Note – The LCD Display must be replaced while the cover is removed from the chassis. Caution – Before touching or replacing any component inside the Sun StorEdge 5310 NAS, disconnect all external cables and follow the instructions in “Safety: Before You Remove the Cover” on page 7-2 and “Removing and Replacing the Cover” on page 7-2. Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap. To replace the LCD Display, follow these steps: 1.
15. Replace the chassis cover if you have no additional work to do inside the chassis. FIGURE 7-7 Connecting the LCD Display 7.6.10 Flash Disk Module 7.6.10.1 Backup of /dvol/etc Assuming that the flash disk and the /etc directory are still accessible and in usable condition, the /dvol/etc directory should be backed up. This backup saves some configuration steps. 1. Telnet to the StorEdge and access the CLI. 2. Type load unixtools 3.
7.6.10.2 Replacing the Flash Disk Note – The Flash Disk must be replaced while the cover is removed from the chassis. Caution – Before touching or replacing any component inside the Sun StorEdge 5310 NAS, disconnect all external cables and follow the instructions in “Safety: Before You Remove the Cover” on page 7-2 and “Removing and Replacing the Cover” on page 7-2. Always place the server on a grounded ESD pad and wear a properly grounded antistatic wrist strap.
8. Replace the chassis cover if you have no additional work to do inside the chassis. Flash Disk FIGURE 7-8 The Flash Disk Note – After completing the flash disk replacement, you must recover the configuration information to bring the system back online. 7.6.10.3 Upgrade and Configuration Recovery In this step, you will restore the system configuration and upgrade from the base operating system to a full version of the OS.
http:///.BUILT-IN/upgrade/ 4. Ensure that the model field reads “NOMODEL”. 5. Enter 0 (zero) for the serial number field. (A blank serial number may NOT be used, and may cause this procedure to fail.) 6. Download the full version of the operating system to the local workstation. Contact Technical Support to obtain this file. 7. Click the browse button and navigate to the OS upgrade file on the workstation. Click on Install to copy the image to the server.
5. Type reboot at the CLI to reboot the system and ensure that the new settings are in effect. If a backup of /dvol/etc is not available, the following must be reconfigured manually: user maps, local group members and permissions, ssh keys, exports and related security, and local users/groups/hostgrps entries. Please see the software manual or the FAQ for setup instructions for these items. 7.6.
7.7 Array FRU replacement Procedures This chapter provides procedures for replacing failed components in an array command module. Before using the procedures in this chapter, perform the appropriate troubleshooting steps described in Chapter 3, "Troubleshooting and Recovery" and in the Recovery Guru. The replacement procedures described in this chapter can be performed as hot swap procedures.
7.7.1.2 Procedure 1. If needed, use the storage management software to create, save, and print a new storage array profile. 2. Did the Recovery Guru direct you to replace a failed controller? ■ ■ Yes - Go to step 3. No - Run the Recovery Guru to identify the failed component. Go to step 3. 3. Remove the back cover. 4. If needed, turn off the alarm. Caution – Electrostatic discharge damage to sensitive components.
8. Disconnect the SFP transceivers and all attached interface cables from the failed controller. Label all cables such that you can reconnect them correctly to the new controller. FIGURE 7-9 Removing an SFP Transceiver and fibre Optic Cable FIGURE 7-10 Removing and Replacing a Controller 9. Remove the failed controller. Figure 7-10 illustrates the following steps: a. Push down on the latch. b. Open the levers. c. Removed the controller. 10.
■ ■ Yes - Go to step 14. No - If the battery in the failed controller is still viable, you have the option of using that battery in the replacement controller. Go to step 11. 11. Are you using the battery from the old controller? ■ ■ Yes - Go to step 12. No - Unpack the new battery. Set the new battery on a dry, level surface. Save all packing materials in case you need to return the battery, and then go to step 13. 12. Remove the battery from the failed controller. a.
c. Replace the controller cover and secure the screws. Figure 7-11 on page 7-26 shows these screws. FIGURE 7-12 Replacing the Controller Battery 14. Update the following information on the controller labels. Figure 7-13 on page 728 shows the location of the labels. ■ ■ ■ Date of Installation - Enter today's date. Replacement Date - Enter the date two years from now. MAC Address - Record the MAC address for the new controller. You will need this information in step 17. 15.
17. Change the bootstrap protocol (BOOTP) server configuration to the MAC address you recorded in step 14. For detailed information on the configuration procedure, refer to your specific operating system administrator's guide. FIGURE 7-13 Label Locations for the Controller 18. Wait approximately 60 seconds for the storage management software to recognize the new controller, and then go to step 19. Caution – Potential data loss.
21. Is the problem corrected? ■ ■ Yes - Go to step 22. No - Contact technical support. 22. Remove the antistatic protection. 23. If needed, replace the back cover. 24. Use the Array Management Window to check the status of each module. 25. Do any components have a Needs Attention status? ■ ■ Yes - Select the Recovery Guru toolbar button in the Array Management Window and complete the recovery procedure. If the problem persists, contact technical support. No - Go to step 26. 26.
7.7.2.2 Procedure 1. Use the storage management software to create, save, and print a new storage array profile. 2. Did the Recovery Guru direct you to replace a failed controller battery? ■ ■ Yes - Go to step 3. No - Run the Recovery Guru to identify the failed component. Go to step 3. 3. Remove the back cover. 4. If needed, mute the alarm. Caution – Electrostatic discharge damage to sensitive components.
8. Disconnect the SFP transceivers and all attached interface cables from the failed controller. Label all cables such that you can reconnect them correctly to the new controller. Figure 7-15 illustrates disconnecting a cable. FIGURE 7-15 Removing the SFP Transceiver and fibre Optic Cable 9. Remove the failed controller. Figure 7-16 illustrates the following steps: a. Push down on the latch. b. Open the levers. c. Remove the controller. FIGURE 7-16 Removing and Replacing a Controller 10.
b. Remove the single screw securing the battery bracket, slide the bracket sideways to clear the lugs, and lift the bracket up. Figure 7-18 show the bracket in relation to the controller. c. Disconnect the battery harness from its controller board connector. d. Remove the battery from the controller. You may need to hold the controller close above a flat surface and let the battery fall out. Do not let the battery pull on the battery harness. e.
■ Replacement Date - If a new battery is used, enter the date two years from now. If the battery from the old controller is used, copy the date from the battery label on the old controller.
FIGURE 7-19 Label Locations on the Controller 12. Slide the replacement controller all the way into the empty slot and lock the levers into place. Figure 7-16 on page 7-31 illustrates installing a controller. 13. Reinstall the SFP transceivers to their original connectors, and attach the host and drive interface cables to their respective SFP transceivers. Figure 7-15 on page 7-31 illustrates installing an SFP transceiver and cable. 14.
17. Is the problem corrected? ■ ■ Yes - Go to step 18. No - Contact technical support. 18. Remove the antistatic protection and replace the back cover, if needed. 19. Check the status of all modules. 20. Do any components have a Needs Attention status? ■ ■ Yes - Select the Recovery Guru toolbar button in the Array Management Window and complete the recovery procedure. If the problem persists, contact technical support. No - Go to step 21. 21. Create, save, and print a new storage array profile.
23. After 24 hours, check the host link, drive link, fault, and battery lights to ensure the battery is working properly. Figure 7-20 shows the locations of these lights. If the battery has a fault, use the storage management software to check the command module status and obtain the recovery procedure. End Of Procedure FIGURE 7-20 7.7.3 Drive Link, Host Link, Battery, and Fault Lights Replacing a Drive Use the following procedure to replace a drive in a command module.
Caution – Mixed configurations speed requirements. In configurations involving various models of command modules, command modules, or drive modules, all modules must be operating at the same speed. Refer to the Product Release Notes for any model-specific restrictions. Caution – Risk of data loss and permanent damage. Magnetic fields will destroy all data on a disk drive and cause irreparable damage to its circuitry.
Caution – Potential data loss. Removing a drive that has not failed can cause data loss. To prevent data loss, remove only a failed drive that has a Fault (amber) light on or a Failed status in the storage management software. 6. Check the Fault lights on the front of the module. If a fault is detected, the amber Fault light will be on. Note – IMPORTANT If you remove an active drive accidentally, wait 30 seconds and then reinstall it. Refer to your storage management software for the recovery procedure. 7.
10. Choose one of the following steps, based on the status of the Active and Fault lights: ■ ■ ■ ■ ■ Active light is off - The drive may be installed incorrectly. Remove the drive, wait 30 seconds, and then reinstall it. When finished, go to step 12. Fault light is illuminated - The new drive may be defective. Replace it with another new drive, and then go to step 12. Active lights are on and Fault lights are off - Go to step 13. 12 Is the problem corrected? Yes - Go to step 13.
■ ■ Yes - Go to step 3. No - Run the Recovery Guru to identify the failed component. Go to step 3. 3. Remove the back cover. 4. If needed, mute the alarm. Caution – Electrostatic discharge damage to sensitive components. To prevent electrostatic discharge damage to the module, use proper antistatic protection when handling the module components. 5. Put on antistatic protection. 6. Unpack the new fan. Set the new fan on a dry, level surface. Save all packing materials in case you need to return the fan.
■ ■ Fault light is illuminated - The fan may be installed incorrectly. Reinstall the fan and then go to step 13. Fault light is off - Go to step 14. 13. Is the problem corrected? ■ ■ Yes - Go to step 14. No - Contact technical support. 14. Remove the antistatic protection, and replace the back cover, if needed. 15. Complete the remaining Recovery Guru procedures, if needed. 16. Check the status of all the modules in the storage array. 17.
■ ■ Yes - Go to step 3. No - Run the Recovery Guru to identify the failed component, and then go to step 3. 3. Mute the alarm, and remove the back cover, if needed. 4. Put on antistatic protection. 5. Unpack the new power supply. Set the new power supply on a dry, level surface near the command module. Save all packing materials in case you need to return it. 6. Turn off the power switch on the new power supply. 7. Check the Fault lights to locate the failed power supply.
13. Check the Power and Fault light on the new power supply. FIGURE 7-23 Replacing a Power Supply 14. Choose one of the following steps, based on the status of the Power and Fault lights: ■ Power light is off or Fault light is illuminated - The power supply may be installed incorrectly. Reinstall the power supply, and then go to step 15. Power light is illuminated and Fault light is off - Go to step 16. ■ 15. Is the problem corrected? ■ ■ Yes - Go to step 16. No - Contact technical support. 16.
■ No - Go to step 20. 20. Create, save, and print a new storage array profile. End Of Procedure 7.7.6 Replacing an SFP Transceiver Use the following procedure to replace a Small Form-factor Pluggable (SFP) transceiver in a command module. The SFP transceiver shown in this procedure may look different from those you are using, but the difference will not affect transceiver performance. Figure 7-24 illustrates connecting an SFP and a cable. 7.7.6.1 Tools and Equipment ■ ■ 7.7.6.
Caution – Potential data loss or degraded performance. To prevent data loss or damage to a cable, do not twist, fold, pinch, or step on a fibre optic cable, and do not bend a cable tighter than a 2-inch radius. Caution – Potential data loss. Removing an SFP transceiver that has not failed can cause data loss. To prevent data loss, remove only the component that has a Fault light on or a failed status in the storage management software. 7. Disconnect the interface cable from the SFP transceiver. 8.
■ ■ Yes - Go to step 14. No - Contact technical support. 14. Remove the antistatic protection, and replace the back cover, if needed. 15. Complete any remaining Recovery Guru procedures, if needed. 16. Check the status of each module. 17. Do any components have a Needs Attention status? ■ ■ Yes - Select the Recovery Guru toolbar button in the Array Management Window and complete the recovery procedure. If the problem persists, contact technical support. No - Go to step 18. 18.