SGI® Total Performance 9100 (2Gb TP9100) Storage System User’s Guide 007-4522-003
CONTRIBUTORS Written by Matt Hoy Illustrated by Kelly Begley Production by Glen Traefald and Karen Jacobson Engineering contributions by Terry Fliflet, David Lucas, Van Tran, Michael Raskie and Ted Wood COPYRIGHT © 2002–2003, Silicon Graphics, Inc. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein.
Record of Revision Version Description 001 August 2002 Original printing 002 February 2003 Engineering revisions 003 October 2003 Firmware and hardware revisions 007-4522-003 iii
Contents Figures . . . . . . . . . . . . . . . . . . . . . . . . . . ix Tables . . . . . . . . . . . . . . . . . . . . . . . . . . xi About This Guide. . . . . . . . . . . . . . . . . . . . . . . xiii Audience . . . . . . . . . . . . . . . . . . . . . . xiii Structure of This Document . . . . . . . . . . . . . . . . . . . . . . xiii Related Publications . . . . . . . . . . . .
Contents Enclosure Components . . . . . . . . . . . . . . . . . . . . . 8 Operators (Ops) Panel. . . . . . . . . . . . . . . . . . . . . 11 PSU/Cooling Module . . . . . . . . . . . . . . . . . . . . . 12 RAID LRC I/O Modules . . . . . . . . . . . . . . . . . . . . 14 RAID Loopback LRC I/O Modules . . . . . . . . . . . . . . . . . 16 JBOD LRC I/O Module . . . . . . . . . . . . . . . . . . .
Contents RAID Disk Topologies 4. 5. . . . . . . . . . . . . . . . . . . . . . 52 Simplex Single-port RAID Topology . . . . . . . . . . . . . . . . 53 Duplex Single-Port RAID Topology . . . . . . . . . . . . . . . . 54 Simplex Dual-Port RAID Topology. . . . . . . . . . . . . . . . . 55 Duplex Dual-Port RAID Topology . . . . . . . . . . . . . . . . . 56 Dual-port Duplex Two-Host RAID Configuration . . . . . . .
Contents Solving Storage System Temperature Issues A. B. . . . . . . . . . . . . . 92 . . . . . . . . . . . . . . . . . . . . . . 92 Thermal Alarm. . . . . . . . . . . . . . . . . . . . . . . 93 . . . . . . . . . . . . . . . . . . . . . . 93 Care and Cleaning of Optical Cables . . . . . . . . . . . . . . . . . . 94 Installing and Replacing Drive Carrier Modules . . . . . . . . . . . . . .
Figures 007-4522-003 Figure 1-1 Front View of Rackmount Enclosure. . . . . . . . . . 7 Figure 1-2 Rear View of Rackmount Enclosure . . . . . . . . . . 8 Figure 1-3 Front View of Enclosure Components . . . . . . . . . 9 Figure 1-4 RAID (Base) Enclosure Components, Rear View . . . . . . 10 Figure 1-5 JBOD (Expansion) Enclosure Components, Rear View . . . . 11 Figure 1-6 Ops Panel . . . . . . . . . . . . . . 12 Figure 1-7 PSU/Cooling Module .
Figures x Figure 2-3 Rack Power Cabling . . . . . . . . . . 41 Figure 2-4 Rackmount Enclosure ESI/Ops Panel Indicators and Switches . . 44 Figure 3-1 Simplex Single-port RAID Topology . . . . . . 53 Figure 3-2 Duplex Single-port RAID Topology . . . . . . . . . . 54 Figure 3-3 Simplex Dual-port Dual-host RAID Topology . . . . . . . 55 Figure 3-4 Duplex Dual-port RAID Configuration . . . . . . .
Tables 007-4522-003 Table i Document Conventions . . . . . . . . . . . . . xv Table 4-1 Supported RAID Levels . . . . . . . . . . . . . 60 Table 4-2 RAID Level Maximum Capacity . . . . . . . . . . . 62 Table 4-3 Array Operating Conditions . . . . . . . . . . . . 63 Table 4-4 RAID Levels and Availability . . . . . . . . . . . . 64 Table 4-5 RAID Levels and Performance . . . . . . . . . . .
About This Guide This guide explains how to operate and maintain the SGI 2 Gb Total Performance 9100 (2 Gb TP9100) Fibre Channel storage system. As part of the SGI Total Performance Series of Fibre Channel storage, this storage system provides compact, high-capacity, high-availability RAID and JBOD (“just a bunch of disks”) storage for supported SGI servers.
About This Guide • Chapter 4, “Using the RAID Controller,” introduces software tools for the controller, gives configuration information, and explains RAID levels and criteria for selecting them, storage system drives and drive state management, and automatic rebuild. • Chapter 5, “Troubleshooting,” describes storage system problems and suggests solutions. It explains how to use storage system LEDs and the storage system alarm for troubleshooting.
About This Guide Conventions Used in This Guide Table i contains the conventions used throughout this guide. Table i Document Conventions Convention Meaning Command This fixed-space font denotes literal items such as commands, files, routines, path names, signals, messages, and programming language structures. variable Italic typeface denotes variable entries and words or concepts being defined. user input Fixed-space font denotes literal items that the user enters in interactive sessions.
About This Guide Reader Comments If you have comments about the technical accuracy, content, or organization of this document, please contact SGI. Be sure to include the title and document number of the manual with your comments. (Online, the document number is located in the front matter of the manual. In printed manuals, the document number can be found on the back cover.) You can contact us in any of the following ways: • Send e-mail to the following address: techpubs@sgi.
Chapter 1 1. Storage System Overview The SGI 2 Gb Total Performance 9100 (2 Gb TP9100) Fibre Channel storage system provides you with a high-capacity, high-availability Fibre Channel storage solution. The storage system can be configured for JBOD (“just a bunch of disks”) or RAID (“redundant array of inexpensive disks”) operation, and is available in both rackmount and tower formats. The modular design of the 2 Gb TP9100 expands easily to meet your needs.
1: Storage System Overview RAID Configuration Features • 64-drive maximum configuration • 32 logical units maximum Release 6.0 Features Version 9.03 FFx2 controller firmware introduces the following new features over the 8.40 and 8.
RAID Configuration Features second controller is updated with new firmware image. When the second controller is restarted, I/O is again routed to its normal path. Note: Rolling upgrades will only be supported on SGI IRIX platforms (as listed in “Supported Platforms” on page 5) and SGI Altix series servers running an SGI Linux environment of 7.2 or later with SGI ProPack 2.1, or SGI Advanced Linux Environment 2.1 or later with SGI ProPack 2.2 or later.
1: Storage System Overview Note: Upgrading from 8.40 or 8.50 to 9.03 or later does not cause the new drive sizing algorithm to be used. The drive sizing algorithm only applies to new configurations where no logical units or previous configurations have existed.
RAID Configuration Features • Automatic firmware flashing In a dual controller configuration, the firmware of the replacement controller is automatically flashed to match the firmware of the surviving controller. Features supported by release 6.
1: Storage System Overview Note: Rolling upgrades will only be supported on SGI IRIX platforms (as listed in “Supported Platforms” on page 5) and SGI Altix series servers running an SGI Linux environment of 7.2 or later with SGI ProPack 2.1, or SGI Advanced Linux Environment 2.1 or later with SGI ProPack 2.2 or later. Compatibility Note: Copper Fibre Channel host bus adapters (HBAs) are not supported by the TP9100 (2Gb TP9100).
Storage System Enclosure RAID (base) enclosure. An enclosure without a RAID module is a JBOD or expansion enclosure. The expansion enclosure can be cabled to a RAID enclosure and provides additional disk modules. The RAID controller can address up to 64 disk drives; thus, three expansion enclosures can be cabled to it. TP9100 Enclosures can be installed in industry standard 19-in. racks or be configured as a stand-alone tower.
1: Storage System Overview FC-AL Loops RS232 Figure 1-2 shows the rear view of a rackmount enclosure.
Enclosure Components Note: In simplex RAID configurations, the enclosure will contain a RAID loopback LRC module in place of one of the RAID LRC I/O modules. • Up to 16 disk drive carrier modules • Dummy drive carrier modules TP9100 Figure 1-3 shows a front view of the enclosure components.
1: Storage System Overview Figure 1-4 shows a rear view of the RAID (base) enclosure components.
Enclosure Components JBOD LRC/IO JBOD LRC/IO module A module B Operators panel PSU/cooling module RS232 PSU/cooling module ID FC-AL Loops FC-AL Loops RS232 Figure 1-5 JBOD (Expansion) Enclosure Components, Rear View These components are discussed in the following sections: • “Operators (Ops) Panel” on page 11 • “PSU/Cooling Module” on page 12 • “RAID LRC I/O Modules” on page 14 • “RAID Loopback LRC I/O Modules” on page 16 • “JBOD LRC I/O Module” on page 18 • “Drive Carrier Module” on p
1: Storage System Overview Figure 1-6 shows the ops panel and identifies its components. For more information about the LEDs and configuration switches, see “ESI/Ops Panel LEDs and Switches” in Chapter 5.
Enclosure Components Figure 1-7 PSU/Cooling Module Four LEDs mounted on the front panel of the PSU/cooling module (see Figure 1-8) indicate the status of the power supply and the fans. Module replacement must be completed within 10 minutes after removal of the failed module. For more information, see “Power Supply/Cooling Module LEDs” in Chapter 5.
1: Storage System Overview Power on/off switch (I = on) AC power input PSU good LED DC output fail LED AC input fail LED Fan fail LED Figure 1-8 PSU/Cooling Module Switches and LEDs RAID LRC I/O Modules The storage system enclosure includes two loop resiliency circuit (LRC) I/O modules with optional integrated RAID controllers. There are two RAID LRC I/O modules available: a dual-port version and a single-port version (see Figure 1-9 and Figure 1-10).
Enclosure Components RS 232 ult Fa Host 1 b 2G Expansion Host 0 b 2G Figure 1-9 007-4522-003 Dual-port RAID LRC I/O Module 15
1: Storage System Overview Expansion 2 Gb Host 0 RS 232 Fault Figure 1-10 Single-port RAID LRC I/O Module The RAID LRC I/O modules can address up to 64 disk drives. A maximum of two fully populated JBOD expansion enclosure can be cabled to a RAID base enclosure. The disk drives in each enclosure can be of different capacities, but all of the disk drives in an individual LUN must be of the same capacity.
Enclosure Components Host 0 Expansion 2 Gb NO RAID INSTALLED RS 232 Fault Note: The RAID LRC I/O modules in an enclosure must both be single-port controllers, or they must both be dual-port controllers. SGI does not support single-port and dual-port controllers in the same enclosure.
1: Storage System Overview RS 232 Host 0 b 2G NO RAID INSTALLED ult Fa Expansion Host 1 b 2G Figure 1-12 Dual-port RAID Loopback LRC I/O Module JBOD LRC I/O Module The JBOD LRC/IO module uses a Fibre Channel arbitrated loop (FC-AL) to interface with the host computer system. The FC-AL backplane incorporates two independent loops formed by port bypass circuits within the LRC I/O modules.
Enclosure Components FC-AL Loops RS232 Figure 1-13 JBOD LRC I/O Module For information about the LEDs on the rear of the JBOD LRC I/O module, see “RAID Loopback LRC I/O Module LEDs” in Chapter 5. Drive Carrier Module The disk drive carrier module consists of a hard disk drive mounted in a die-cast aluminum carrier. The carrier protects the disk drive from radio frequency interference, electromagnetic induction, and physical damage and provides a means for thermal conduction.
1: Storage System Overview Disk drive Carrier Handle Latch Carrier lock Note: Ensure that the handle always opens from the left. Figure 1-14 Drive Carrier Module Drive Carrier Handle The drive carrier module has a handle integrated into its front face.
Enclosure Components Figure 1-15). For more information about operating the anti-tamper lock, see “Replacing a Drive Carrier Module” on page 98. Indicator aperature Anti-tamper lock Locked Figure 1-15 Unlocked Anti-tamper Lock For information about the drive carrier module LEDs, see “Drive Carrier Module LEDs” in Chapter 5. Dummy Drive Carrier Modules Dummy drive carrier modules must be installed in all unused drive bays.
1: Storage System Overview Figure 1-16 Dummy Drive Carrier Module Enclosure Bay Numbering This section contains information about enclosure bay numbering in the following sections: • “Rackmount Enclosure Bay Numbering” on page 22 • “Tower Enclosure Bay Numbering” on page 24 Rackmount Enclosure Bay Numbering The rackmount enclosure is 4 bays wide and 4 bays high, and the bays are numbered as follows: • The disk drive bays, located in front, are numbered 1 to 4 from left to right and 1 to 4 from top
Enclosure Components Figure 1-17 shows the enclosure bay numbering convention and the location of modules in the rackmount enclosure.
1: Storage System Overview Tower Enclosure Bay Numbering The tower enclosure is 4 bays wide by 4 bays high, and the bays are numbered as follows: • The disk drive bays, located in front, are numbered 1 to 4 from right to left and 1 to 4 from top to bottom. Drives in bays 1/1 and 4/4 are required for storage system management; these bays must always be occupied. • The rear bays are numbered 1 to 5 from top to bottom.
Enclosure Components 2 x 8 drive configuration Column 4 3 2 1 1 x 16 drive configuration Column 4 3 2 1 TP9100 Rear view TP9100 Drive 1 FC-AL Loops 3 Drive 2 Drive 7 Drive 3 5 RS232 RS232 Drive 6 Drive 1-3 Row 4 Drive 11 Drive 1-2 Drive 1-7 Drive 10 Drive 1-6 Drive 2-4 4 Drive 15* Drive 2-5 Drive 2-0* Drive 14 Drive 2-1 Row 3 Ops panel ID Drive 0* Drive 5 2 PSU/cooling module FC-AL Loops Drive 4 Drive 1-1 Drive 9 Drive 1-0* Drive 1-5 Drive 8 Drive 1-4 Drive 2-6 1
1: Storage System Overview Storage System Rack This section contains information about the 2 Gb TP9100 storage system rack in the following sections: • “Rack Structure” on page 26 • “Power Distribution Units (PDUs)” on page 29 • “Opening and Closing the Rear Rack Door” on page 31 Rack Structure The 2 Gb TP9100 rack is 38U high and is divided into 12 bays.
TP9100 Storage System Rack sgi sgi sgi TP9100 sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi sgi Figure 1-19 Example of 2 Gb TP9100 Rack (Front View) Figure 1-20 is a rear view of the 2 Gb TP9100 rack.
1: Storage System Overview FC-AL Loops RS232 ID FC-AL Loops RS232 ID FC-AL Loops RS232 ID Figure 1-20 28 Example of 2 Gb TP9100 Rack (Rear View) 007-4522-003
Storage System Rack Power Distribution Units (PDUs) The power distribution units (PDUs) mounted in the rear of the rack provide power to the enclosure and switch bays. The breakers on the PDUs also provide a power on/off point for the rack and enclosures. See Figure 1-21 for socket and breaker locations and functions. All sockets in the PDUs are rated at 200 to 240 VAC, with a maximum load per bank of outlet sockets of 8 A, and are labeled as such.
1: Storage System Overview PDUs Breaker for top 4 sockets PDUs Breaker for middle 3 sockets FC-AL Loops RS232 ID FC-AL Loops RS232 ID Breaker for lower 3 sockets Main breaker switch FC-AL Loops RS232 ID Figure 1-21 30 PDU Locations and Functions 007-4522-003
Storage System Rack Opening and Closing the Rear Rack Door To open the rear rack door, follow these steps: 1. Locate the latch on the rear rack door. 2. Push up the top part of the latch, as shown in the second panel of Figure 1-22. 1 Figure 1-22 2 3 4 Opening the Rack Rear Door 3. Press the button as shown in the third panel of Figure 1-22. This action releases the door lever. 4.
1: Storage System Overview Storage System Tower The tower (deskside) version of the storage system houses one RAID enclosure. The tower is mounted on four casters for easy movement. The enclosure in the tower system is rotated 90 degrees from the rackmount orientation. Figure 1-23 shows the front of the tower.
Storage System Tower ID Figure 1-24 shows a rear view of the tower. FC -A LoopL s RS2 32 RS2 32 Figure 1-24 Rear View of Tower The tower storage system receives power from standard electrical sockets. Figure 1-25 shows the power cords attached to the rear of the tower.
ID 1: Storage System Overview FC -AL Loop s RS 232 RS 232 Figure 1-25 Tower Storage System Power Cords The tower enclosure can be adapted for rackmounting; contact your service provider for more information.
Chapter 2 2.
2: Connecting to a Host and Powering On and Off as a host-attached JBOD enclosure, the copper cable/SFP assembly can be replaced with optical SFPs and optical cables. To connect the storage system to a host, insert an optical cable (with SFP) into the connector labeled “Host 0.” Connect the other end of the optical cable to the FC-AL port on the host.
Grounding Issues This transparent flexibility protects investments in existing infrastructure, enhances storage area network (SAN) robustness, and simplifies SAN configuration management. The 2Gb TP9100 with FFx-2 RAID controller features a host side hub function which is configured by the switches on the ops panel. When the system is in hub mode, FC-AL is the only supported topology.
ID 2: Connecting to a Host and Powering On and Off FC -AL Loop s RS 232 RS 232 Figure 2-1 ! Power Cords for the Tower Caution: Use the power cords supplied with the storage system or power cords that match the specification shown in Table A-7 on page 106. Geography-specific power cords are available from SGI. To install the power cords and power on the storage system, follow these steps: 1.
Connecting the Power Cords and Powering On the 2 Gb TP9100 Tower ! Caution: Some electrical circuits could be damaged if external signal cables are present during the grounding checks. Do not connect any signal cables to the enclosure until you have completed the ground test 4. Connect the AC power cords to properly grounded outlets. 5. Turn the power switch on each PSU/cooling module to the “on” position (“I”=on, “O”=off).
2: Connecting to a Host and Powering On and Off Other modules in the storage system also have LEDs, which are described in “Using Storage System LEDs for Troubleshooting” on page 78. Connecting the Power Cords and Powering On the 2 Gb TP9100 Rack The rack requires 220 V and is shipped with a country-specific power cord for each power distribution unit (PDU) that the rack contains.
Connecting the Power Cords and Powering On the 2 Gb TP9100 Rack PDUs Breaker for top 4 sockets PDUs Breaker for middle 3 sockets FC-AL Loops RS232 ID FC-AL Loops RS232 ID Breaker for lower 3 sockets Main breaker switch FC-AL Loops RS232 ID Figure 2-3 007-4522-003 Rack Power Cabling 41
2: Connecting to a Host and Powering On and Off Checking Grounding for the Rack If necessary, follow these steps to ensure that a safe grounding system is provided: 1. Note the information in “Grounding Issues” on page 37. 2. For the grounding check, ensure that the rack PDU power cords are not plugged in to a power source. Caution: Some electrical circuits could be damaged if external signal cables or power control cables are present during the grounding checks. ! 3.
Connecting the Power Cords and Powering On the 2 Gb TP9100 Rack Warning: The rack PDUs must be connected only to power sources that have a safe electrical earth connection. For safety reasons this earth connection must be in place at all times. Be careful not to touch the pins on the PDU plug when you insert it into a power source. 4. Press the rack breaker switch at the bottom of each PDU so that the word ON shows. 5.
2: Connecting to a Host and Powering On and Off Power on LED Alarm mute switch ID System/ESI fault LED PSU/cooling/temperature fault LED 1 2 3 4 5 6 7 8 9 10 11 12 Figure 2-4 Rackmount Enclosure ESI/Ops Panel Indicators and Switches At power-on, check the ESI/ops panel LEDs for system status. Under normal conditions, the “Power on” LED should illuminate constant green. If a problem is detected, the ESI processor in the ops panel will illuminate the “System/ESI fault” LED in amber.
Powering Off Powering Off the 2 Gb TP9100 Rack Besides the main breaker switch at the bottom of each PDU, the rack PDUs have breaker switches at each 12U of space so that you can power off the enclosures in groups of four and leave the others powered on. Figure 2-3 shows their locations. To power off the entire rack, follow these steps: 1. Ensure that users are logged off of the affected systems. 2.
2: Connecting to a Host and Powering On and Off Powering Off the 2 Gb TP9100 Tower or a Single Enclosure Besides the main breaker switch at the bottom of each PDU, the rack PDUs have breaker switches at each 12U of space so that you can power off three enclosures and leave others powered on. To power off a single enclosure or tower storage system, follow these steps: 1. Ensure that users are logged off of the affected systems. 2.
Chapter 3 3.
3: Features of the RAID Controller • Power supply status • Cooling element status • Storage system temperature The LEDs on the ESI/ops panel show the status of these components. Configuration on Disk (COD) Configuration on disk (COD) retains the latest version of the saved configuration at a reserved location on every physical drive. The RAID Controller in the 2Gb TP9100 (Mylex FFx-2) uses COD version 2.1. Previous versions of the TP9100 use COD version 1.0. Controller firmware versions prior to 7.
Drive Roaming COD plays a significant role during the power-on sequence after a controller is replaced. The replacement controller tests the validity of any configuration currently present in its NVRAM. Then, it test the validity of the COD information on all disk drives in the storage system. The final configuration is determined by the following rules: 1. The controller will use the most recent COD information available, no matter where it is stored.
3: Features of the RAID Controller information on a replacement disk drive is questionable or invalid, the disk drive will be labeled unconfigured offline or dead. If a drive fails in a RAID level that uses a hot spare, drive roaming allows the controller to keep track of the new hot spare, which is the replacement for the failed drive. ! Caution: Mixing controllers or disk drives from systems running different versions of firmware presents special situations that may affect data integrity.
Data Caching before this data is written to disk. During this interval there is risk of data loss in the following situations: • If only one controller is present and this controller fails. • If power to the controller is lost and its internal battery fails or is discharged.
3: Features of the RAID Controller RAID Disk Topologies The 2 Gb TP9100 RAID enclosure can be configured with any of the following topologies: 52 • “Simplex Single-port RAID Topology” on page 53 • “Duplex Single-Port RAID Topology” on page 54 • “Simplex Dual-Port RAID Topology” on page 55 • “Duplex Dual-Port RAID Topology” on page 56 • “Dual-port Duplex Two-Host RAID Configuration” on page 57 • “Dual-Port Duplex RAID Configuration” on page 58 007-4522-003
RAID Disk Topologies Simplex Single-port RAID Topology Figure 3-1 illustrates a simplex single-port RAID configuration that uses a single host.
3: Features of the RAID Controller Duplex Single-Port RAID Topology Figure 3-2 illustrates a duplex single-port RAID configuration.
RAID Disk Topologies Simplex Dual-Port RAID Topology Figure 3-3 illustrates a simplex dual-port RAID configuration using two hosts.
3: Features of the RAID Controller Duplex Dual-Port RAID Topology Figure 3-4 illustrates a duplex dual-port RAID configuration using two hosts and two controllers.
RAID Disk Topologies Dual-port Duplex Two-Host RAID Configuration Figure 3-5 illustrates a dual-port, duplex, dual-path RAID configuration that uses two hosts.
3: Features of the RAID Controller Dual-Port Duplex RAID Configuration Figure 3-6 illustrates a dual-port quad-path attached duplex RAID configuration.
Chapter 4 4.
4: Using the RAID Controller RAID Levels RAID stands for “redundant array of inexpensive disks.” In a RAID storage system, multiple disk drives are grouped into arrays. Each array is configured as a single system drive consisting of one or more disk drives. Correct installation of the disk array and the controller requires a proper understanding of RAID technology and concepts. The controllers implement several versions of the Berkeley RAID technology, as summarized in Table 4-1.
CAP Strategy for Selecting a RAID Level • Disk capacity utilization (number of disk drives) • Data redundancy (fault tolerance) • Disk performance The controllers make the RAID implementation and the disk drives’ physical configuration transparent to the host operating system. This transparency means that the host operating logical drivers and software utilities are unchanged, regardless of the RAID level selected.
4: Using the RAID Controller It is impossible to configure an array optimizing all of these characteristics; that is a limitation of the technology. For example, maximum capacity and maximum availability cannot exist in a single array. Some of the disk drives must be used for redundancy, which reduces capacity. Similarly, configuring a single array for both maximum availability and maximum performance is not an option. The best approach is to prioritize requirements.
CAP Strategy for Selecting a RAID Level tolerance. RAID 3 and RAID 5 give the next best capacity, followed by RAID 1 and RAID 0+1. Configuring for Maximum Availability Table 4-3 presents definitions of array operating conditions. Table 4-3 Array Operating Conditions Array Condition Meaning Normal (online) The array is operating in a fault-tolerant mode, and can sustain a disk drive failure without data loss.
4: Using the RAID Controller Controller Cache and Availability The RAID controller has a write cache of 512 MB. This physical memory is used to increase the performance of data retrieval and storage operations. The controller can report to the operating system that a write is complete as soon as the controller receives the data.
Disk Topologies Table 4-4 RAID Levels and Availability (continued) RAID Level Fault Tolerance Type Availability 0+1 Mirrored and striped Data is striped across multiple disk drives, and written to a mirrored set of disk drives. JBOD None This configuration offers no redundancy and is not recommended for applications requiring fault tolerance. Configuring for Maximum Performance Table 4-5 presents the relative performance advantages of each RAID level.
4: Using the RAID Controller The disk drive modules are dual-ported. A RAID controller sees 16 to 32 drives on each loop (A and B), because it finds both ports of each drive. Via the I/O modules, it alternates allocation of the drives between channels, so that the drive addresses are available for failover. At startup, half the drives are on channel 0 via their A port and the other half are on channel 1 via their B port; each I/O module controls a separate loop of half the drives.
Disk Topologies Channel 1 target 20 loop ID 20 Channel 0 target 21 loop ID 21 Channel 1 target 22 loop ID 22 Channel 0 target 23 loop ID 23 Channel 1 target 24 loop ID 24 Channel 0 target 25 loop ID 25 Channel 1 target 26 loop ID 26 Channel 0 target 27 loop ID 27 Channel 1 target 28 loop ID 28 Channel 0 target 29 loop ID 29 Channel 1 target 30 loop ID 30 Channel 0 target 31 loop ID 31 Channel 1 target 32 loop ID 32 Channel 0 target 33 loop ID 33 Channel 1 target 34 loop ID 34 Channel 0 targe
4: Using the RAID Controller modified for the intended application; see the documentation for the management software included with the storage system for information on controller parameters. Note: Changes to the controller parameter settings take effect after the controller is rebooted. System Drives System drives are the logical devices that are presented to the operating system.
Drive State Reporting System Drive Affinity and Programmable LUN Mapping System drive affinity and programmable LUN mapping are configuration features that work together to define how the host accesses the available storage space.
4: Using the RAID Controller disk drive is considered unconfigured and the operational state is marked unconfigured, offline, or dead. If a configured disk drive is removed or fails, and a new disk drive replaces the failed disk drive at the same location, the new disk drive is set to online spare. This allows the automatic rebuild operation to function with replaced drives. When a disk drive is inserted into the system, the controller recognizes that the drive has been replaced.
Automatic Rebuild Table 4-6 Physical Disk Drive States (continued) State Description Online rebuild The disk drive is in the process of being rebuilt. (In a RAID 1 or 0+1 array, data is being copied from the mirrored disk drive to the replacement disk drive. In a RAID 3 or 5 array, data is being regenerated by the exclusive OR (XOR) algorithm and written to the replacement disk drive.) Unconfigured This location is unconfigured. Environmental An environmental device is present at this address.
4: Using the RAID Controller Note: The priority of rebuild activity can be adjusted using the controller parameters to adjust the Rebuild and check consistency rate. In order to use the automatic rebuild feature, you must maintain an online spare disk drive in the system. The number of online spare disk drives in a system is limited only by the maximum number of disk drives available on each drive channel.
Automatic Rebuild The rebuild procedure begins after a REBUILD has been started or power has been cycled to the controllers. Cycling the power also removes the “ghost drive” from the configuration.
Chapter 5 5. Troubleshooting The 2 Gb TP9100 storage system includes a processor and associated monitoring and control logic that allows it to diagnose problems within the storage system’s power, cooling, and drive systems. SES (SCSI enclosure services) communications are used between the storage system and the RAID controllers. Status information on power, cooling, and thermal conditions is communicated to the controllers and is displayed in the management software interface.
5: Troubleshooting RAID Guidelines RAID stands for “redundant array of independent disks”. In a RAID system multiple disk drives are grouped into arrays. Each array is configured as system drives consisting of one or more disk drives. A small, but important set of guidelines should be followed when connecting devices and configuring them to work with a controller. Follow these guidelines when configuring a RAID system: • Distribute the disk drives equally among all the drive channels on the controller.
Solving Initial Startup Problems • There is a storage system fault. See “ESI/Ops Panel LEDs and Switches” on page 79. • There are mixed single-port and dual-ports modules within an enclosure. Only one type of module may be installed in an enclosure. If the SGI server does not recognize the storage system, check the following: • Ensure that the device driver for the host bus adapter board has been installed.
5: Troubleshooting Using Storage System LEDs for Troubleshooting This section summarizes LED functions and gives instructions for solving storage system problems in these subsections: 78 • “ESI/Ops Panel LEDs and Switches” on page 79 • “Power Supply/Cooling Module LEDs” on page 84 • “RAID LRC I/O Module LEDs” on page 85 • “RAID Loopback LRC I/O Module LEDs” on page 89 • “Drive Carrier Module LEDs” on page 90 007-4522-003
Using Storage System LEDs for Troubleshooting ESI/Ops Panel LEDs and Switches Figure 5-1 shows details of the ESI/ops panel.
5: Troubleshooting Table 5-1 ESI/Ops Panel LEDs (continued) LED Description Corrective Action System/ESI fault This LED illuminates amber and the audible alarm sounds when the ESI processor detects an internal problem. This LED flashes when an over- or under-temperature condition exists. Contact your service provider. PSU/cooling/ temperature fault This LED illuminates amber if an overor under-temperature condition exists. This LED flashes if there is an ESI communications failure.
Using Storage System LEDs for Troubleshooting Table 5-2 Switch Number 5 and 6 Function Function When Off Function When On RAID host hub speed select Sw 5 Sw 6 Function Off Off Force 1 Gb/s On Off Force 2 Gb/s Off On Reserved On On Auto loop speed detect based on LRC port signals Note: This feature is not supported Sw 7 Sw 8 Function Off Off Force 1 Gb/s On Off Force 2 Gb/s Off On Speed selected by EEPROM bit On On Auto loop speed detect based on LRC port signals Note: Thi
5: Troubleshooting Ops Panel Configuration Switch Settings for RAID Table 5-3 Switch Number Function Function when Off Function when On 1 (Ona) Loop select single (1x16) or dual (2x8) LRC operates as two loops of 8 drives LRC operates as 1 loop of 16 drives (1x16 loop mode) 2 (On) Loop terminate mode If no signal is present on If no signal is present on the external FC port then the loop the external FC port, is closed internally then the loop is left open 3 (Off) Hub mode select (RAID only)
Using Storage System LEDs for Troubleshooting Ops Panel Configuration Switch Settings for RAID (continued) Table 5-3 Switch Number Function Function when Off Function when On 9 and 10 Drive addressing mode Sw 9 Sw 10 Function On On Mode 0 - Single loop, base 16, offset of 4, 7 address rangesb Off On Mode 1 - Single loop, base 20, 6 address ranges On Off Mode 2 - JBOD, dual loop, base 8, 15 address rangesc Off Off Mode 3 (Not used) 11 (On) Soft select Selects switch values stored in
5: Troubleshooting Power Supply/Cooling Module LEDs Figure 5-2 shows the meanings of the LEDs on the power supply/cooling module. PSU good LED DC ouput fail LED AC input fail LED Fan fail LED Figure 5-2 Power Supply/Cooling Module LED If the green “PSU good” LED is not lit during operation, or if the power/cooling LED on the ESI/ops panel is amber and the alarm is sounding, contact your service provider.
Using Storage System LEDs for Troubleshooting RAID LRC I/O Module LEDs Figure 5-3 shows the LEDs on the dual-port RAID LRC I/O module.
5: Troubleshooting Table 5-4 explains what the LEDs in Figure 5-3 indicate. Table 5-4 86 Dual-port RAID LRC I/O Module LEDs LED Description Corrective Action ESI fault This LED illuminates amber and the audible alarm sounds when the ESI processor detects an internal problem. Check for mixed single-port and dual-port modules within an enclosure. Also check the drive carrier modules and PSU/cooling modules for faults. If the problem persists, contact your service provider.
Using Storage System LEDs for Troubleshooting Figure 5-4 shows the LEDs on the single-port RAID LRC I/O module. Fault ESI fault LED RAID fault LED RAID activity LED RS 232 RS-232 port 2 Gb Host 0 Host port 0 signal good LED 2-Gb/s link speed Expansion Cache active LED Drive loop signal good LED Drive FC expansion port Figure 5-4 Single-port RAID LRC I/O Module LEDs Table 5-5 explains what the LEDs in Figure 5-4 indicate.
5: Troubleshooting Table 5-5 88 Single-port RAID LRC I/O Module LEDs (continued) LED Description Corrective Action RAID activity This LED flashes green when the RAID controller is active. N/A Cache active This LED flashes green when data is read into the cache. N/A Host port signal good This LED illuminates green when the port is connected to a host. Check both ends of the cable and ensure that they are properly seated. If the problem persists, contact your service provider.
Using Storage System LEDs for Troubleshooting RAID Loopback LRC I/O Module LEDs The LEDs on the rear of the RAID loopback LRC I/O module function similarly to those on the RAID LRC I/O modules. See “RAID LRC I/O Module LEDs” on page 85 for more information. JBOD LRC I/O Module LEDs Figure 5-5 shows the JBOD LRC I/O module LEDs.
5: Troubleshooting Table 5-6 explain what the LEDs in Figure 5-5 indicate. Table 5-6 JBOD LRC I/O Module LEDs LED Description Corrective Action ESI fault This LED illuminates amber and the audible alarm sounds when the ESI processor detects an internal problem. Check the drive carrier modules and PSU/cooling modules. If the problem persists, contact your service provider. FC-AL signal present These LEDs illuminate green when the port is connected to an FC-AL. Check the cable connections.
Using the Alarm for Troubleshooting Table 5-7 Disk Drive LED Function (continued) Green LED Amber LED State Remedy Blinking Off Disk drive is active. (LED might be off during power-on.) N/A Flashing at 2-second intervals On Disk drive fault (SES function). Contact your service provider for a replacement drive and follow instructions in Chapter 6. N/A Flashing at half-second intervals Disk drive identify (SES function).
5: Troubleshooting Solving Storage System Temperature Issues This section explains storage system temperature conditions and problems in these subsections: • “Thermal Control” on page 92 • “Thermal Alarm” on page 93 Thermal Control The storage system uses extensive thermal monitoring and ensures that component temperatures are kept low and acoustic noise is minimized. Airflow is from front to rear of the storage system.
Using Test Mode Thermal Alarm The four types of thermal alarms and the associated corrective actions are described in Table 5-8.
5: Troubleshooting Care and Cleaning of Optical Cables ! ! Warning: Never look into the end of a fiber optic cable to confirm that light is being emitted (or for any other reason). Most fiber optic laser wavelengths (1300 nm and 1550 nm) are invisible to the eye and cause permanent eye damage. Shorter wavelength lasers (for example, 780 nm) are visible and can cause significant eye damage. Use only an optical power meter to verify light output.
Chapter 6 6.
6: Installing and Replacing Drive Carrier Modules Warning: The disk drive handle might have become unlatched in shipment and might spring open when you open the bag. As you open the bag, keep it a safe distance from your face. 3. Place the drive carrier module on an antistatic work surface and ensure that the anti-tamper lock is disengaged (unlocked). A disk drive module cannot be installed if its anti-tamper lock is activated outside the enclosure.
Adding a Drive Carrier Module Figure 6-2 Opening the Module Handle 5. Orient the module so that the hinge of the handle is on the right. Then slide the disk carrier module into the chassis until it is stopped by the camming lever on the right of the module (see Figure 6-3).
6: Installing and Replacing Drive Carrier Modules 6. Swing the drive handle shut and press it to seat the drive carrier module. The camming lever on the right of the module will engage with a slot in the chassis. Continue to push firmly until the handle fully engages with the module cap. You should hear a click as the latch engages and holds the handle closed. 7. Repeat steps 2 through 6 for all drive modules to be installed. 8.
Replacing a Drive Carrier Module • “LUN Integrity and Drive Carrier Module Failure” on page 99 • “Replacing the Disk Drive Module” on page 100 LUN Integrity and Drive Carrier Module Failure When a disk drive fails in a RAID 5, 3, 1, or 0+1 LUN, the amber LEDs on all disks in the LUN (except the failed one) alternate on and off every 1.2 seconds until the fault condition is cleared. The amber LED on the failed disk remains lit.
6: Installing and Replacing Drive Carrier Modules Replacing the Disk Drive Module If an LED indicates that a disk drive is defective, follow these steps to remove the faulty drive: 1. Make sure enough disk drives and dummy drives are available to occupy all bays. 2. Ensure that users are logged off of the affected systems; back up data if necessary. Note: Replace disk drive modules one at a time. 3.
Replacing a Drive Carrier Module ssggii sgi TP9100 sgi Figure 6-6 sgi sgi sgi sgi Removing the Drive Carrier Module 6. Withdraw the module from the drive bay. Replace it immediately; follow instructions in “Adding a Drive Carrier Module” on page 95. 7. If you are replacing a module in a LUN that uses a hot spare, note the location of the replacement module; it is the new hot spare.
Appendix A A.
A: Technical Specifications Table A-2 shows the weights of various component modules. Table A-2 Weights Component Weight Enclosure, fully populated Rackmount: 32.3 kg (71 lb) Tower: 42.3 kg (93.0 lb) Enclosure, empty Rackmount: 17.9 kg (39.4 lb) Tower: 12 kg (26.4 lb) Power supply/cooling module 3.6 kg (7.9 lb) Disk carrier module with 36-GB drive 0.88 kg (1.9 lb) LRC I/O module 1.2 kg (2.
Environmental Requirements Environmental Requirements Table A-4 provides temperature and humidity requirements for both the rack and tower storage systems.
A: Technical Specifications Power Requirements Table A-6 provides minimum storage system power requirements.
LRC I/O Module Specifications LRC I/O Module Specifications Table A-8 provides specifications for the LRC I/O module.
A: Technical Specifications Disk Drive Module Specifications Consult your supplier for details of disk drives supported for use with the RAID storage system. Table A-9 provides specifications for a typical drive carrier module. Table A-9 Drive Carrier Module Specifications (1.6-inch 36-GB Drive) Factor Requirement Dimensions Height 2.91 cm (1.1 in.) Width 10.65 cm (4.2 in.) Depth 20.7 cm (8.1 in.) Weight .88 kg (1.
Appendix B B. Regulatory Information The SGI 2 Gb Total Performance 9100 (2 Gb TP9100) conforms to Class A specifications. Note: This equipment is for use with Information Technology Equipment only. FCC Warning This equipment has been tested and found compliant with the limits for a Class A digital device, pursuant to Part 15 of the FCC rules. These limits are designed to provide reasonable protection against harmful interference when the equipment is operated in a commercial environment.
B: Regulatory Information TUV R geprufte Sicherheit NRTL/C International Special Committee on Radio Interference (CISPR) This equipment has been tested to and is in compliance with the Class A limits per CISPR publication 22, Limits and Methods of Measurement of Radio Interference Characteristics of Information Technology Equipment; and Japan’s VCCI Class 1 limits.
Class A Warning for Taiwan Class A Warning for Taiwan 007-4522-003 111
Index A affinity. See LUN affinity and system drive. airflow, 92 alarm and troubleshooting, 91 at power-on, 76 muting, 91 thermal, 93 with LED on power supply/cooling module, 84 automatic rebuild, 71 availability, configuring for maximum, 63-65 B bay numbering rackmount enclosure, 23 tower, 24 breaker in rack, 43 C cable fibre channel, 108 power, 37-42 to host, 36 caching disabled, 51 enabled, 50, 64 write-back, 50, 64 write-through, 51 007-4522-003 CAP, 61-65 capacity, availability, and performance.
Index Disk drive carrier module, 19 replacement procedure, 95 disk drive module adding, 95-98 antitamper lock disengaging, 96 dual-ported, 66 LEDs and troubleshooting, 90 at power-on, 77 replacing, 98-101 required, 47 specifications, 108 states, 69-71 total addressed by RAID controller, 7 Disk topologies RAID, 52 Disk topology, RAID, 52 documentation, other, xiv door of rack, opening and closing, 31 drive carrier module antitamper lock disengaging, 100 drive roaming, 49 and hot spare, 99 drive state report
Index hot spare, 4, 49, 70, 99 and availability, 63 and drive roaming, 99 hot swap disk drive replacement, 95 hub cabling to, 36 in rack, 29 humidity requirements, 105 I ID selector switch and troubleshooting, 77 system expansion enclosure, 77 RAID module, 77 I/O module and loops, 66 ESI/ops panel module and ESI, 47-48 and troubleshooting, 79-80 JBOD controller module and troubleshooting, 89 power supply/cooling module and troubleshooting, 84 RAID controller module and troubleshooting, 85 loop configurat
Index power, 46 checking, 39, 43 cord, 37-42 requirements, 106 PDU, 106 rack, 106 voltage requirement rack, 40 tower, 37 Power supplies. See PSU/cooling module.
Index RAID 7, 60 See also JBOD.
Index W write cache disable, 51 enable, 50, 64 size, RAID controller, 64 write-back caching, 50, 64 and system drive, 68 write-through caching, 51 and system drive, 68 118 007-4522-003