HP StorageWorks Scalable File Share System User Guide Version 2.2 Product Version: HP StorageWorks Scalable File Share Version 2.
© Copyright 2005, 2006 Hewlett-Packard Development Company, L.P. Lustre® is a registered trademark of Cluster File Systems, Inc. Linux is a U.S. registered trademark of Linus Torvalds. Quadrics® is a registered trademark of Quadrics, Ltd. Myrinet® and Myricom® are registered trademarks of Myricom, Inc. InfiniBand® is a registered trademark and service mark of the InfiniBand Trade Association Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation.
Contents About this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi Safety considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1 Overview 1.1 Product overview ...................................................................................................................... 1-2 1.1.1 Overview of the Lustre file system.....................................................
Viewing system information 4.1 Viewing server information .........................................................................................................4-2 4.1.1 Server information after system parameter is changed ...............................................................4-6 4.2 Viewing file system information ...................................................................................................4-6 4.3 Viewing OST service information..........................................
.7 Managing quotas ................................................................................................................... 5-37 5.7.1 Understanding quota tuning ................................................................................................ 5-37 5.7.2 Configuring quotas on a file system...................................................................................... 5-38 5.7.2.1 Step 1: Verifying that the quota option is enabled on the HP SFS system .......................
.2.5 Modifying email alerts.........................................................................................................6-44 6.2.6 Creating email alerts...........................................................................................................6-44 6.2.7 Viewing email alerts............................................................................................................6-45 6.2.8 Disabling and enabling email alerts ......................................................
8.1.27 8.1.28 8.1.29 8.1.30 8.1.31 Replacing a Voltaire InfiniBand switch .................................................................................. 8-15 Relocating an InfiniBand cable to a different port on the InfiniBand switch ................................ 8-15 Replacing a Power Distribution Unit (PDU) on a rack .............................................................. 8-16 Replacing a Power Distribution Module (AC power strip) on a rack..........................................
9.25.1.1 Determining whether Voltaire InfiniBand interconnect is loaded ...........................................9-16 9.25.1.2 Starting, stopping, and auto-starting the Voltaire InfiniBand interconnect ...............................9-17 9.25.1.2.1 Using the ib-setup utility..............................................................................................9-17 9.25.1.2.2 From the command line..............................................................................................9-18 9.25.
.43 Recovering after using the ifdown command .............................................................................. 9-70 9.44 Password needed for ssh access............................................................................................... 9-70 9.45 The syscheck command reports that the Insight Manager Agents are not running............................ 9-70 A CLI commands A.1 General information..........................................................................................
A.20.9 show log .......................................................................................................................... A-27 A.20.10 show lun........................................................................................................................... A-29 A.20.11 show network.................................................................................................................... A-29 A.20.12 show ost ................................................................
About this guide This guide describes how to operate the HP SFS system and perform routine system administration tasks. This guide describes only the software and hardware components of the HP SFS product. It does not document standard Linux® administrative tasks or the functions provided by standard Linux tools and commands; it provides only administrative information and instructions for tasks specific to HP SFS. Audience This guide is intended for experienced Linux system administrators.
• Appendix A contains a description of the HP SFS CLI commands. • Appendix B contains details of expected performance figures, based on tests carried out by HP. • Appendix C provides examples of file system configurations. • Appendix D provides a guide to the estimated time that it takes to rebuild a LUN on an SFS20 array following a disk failure. • Appendix E provides information on HP SFS specifications.
mount(8) A cross-reference to a manpage includes the appropriate section number in parentheses. For example, mount(8) indicates that you can find information on the mount command in Section 8 of the manpages. Using this example, the command to display the manpage is: # man 8 mount or # man mount. . . . A vertical ellipsis indicates that a portion of an example is not shown. [|] In syntax definitions, brackets indicate items that are optional.
xiv
Safety considerations To avoid bodily harm and damage to electronic components, read the following warning before performing any maintenance on a server in your HP StorageWorks Scalable File Share system. WARNING! For your safety, never perform any maintenance on a ProLiant DL server in the HP StorageWorks Scalable File Share system without first disconnecting the server’s power cord from the power outlet. See below for more information.
xvi
1 Overview This chapter provides an overview of the HP StorageWorks Scalable File Share product. The chapter is organized as follows: • Product overview (Section 1.1) • Supported operating system software (Section 1.2) • Supported hardware (Section 1.
1.1 Product overview HP StorageWorks Scalable File Share Version 2.2 (based on Lustre® technology) is a product from HP that uses the Lustre File System (from Cluster File Systems, Inc.). An HP StorageWorks Scalable File Share (HP SFS) system is a set of independent servers and storage subsystems combined through system software and networking technologies into a unified system that provides a storage system for standalone servers and/or compute clusters.
Figure 1-1 Logical overview of the Lustre file system Configuration Management Server Configuration information, network connection details, & security management Lustre Client Directory operations, meta-data & concurrency MDS Server File I/O & file locking Recovery, file status, file creation Object Storage Servers A typical Lustre file system consists of multiple Object Storage Servers that have storage attached to them.
All of the software that is needed to operate the HP SFS system is installed from the HP StorageWorks Scalable File Share System Software CD-ROM. You do not need to, nor should you, install any other software on an HP SFS system. 1.1.
• A server pair consisting of the administration server and the MDS server, and pairs of Object Storage Servers In the event of failure of the MDS server, the administration server is capable of taking over the function of serving file system meta-data (in addition to continuing to serve the administration functions). Similarly, in the event of failure of the administration server, the administration functions automatically fail over to the MDS server.
1.1.3.2.2 SFS20 storage arrays An SFS20 storage array is a RAID array that can be attached to two servers. A system configured with SFS20 arrays is configured to have high availability, and Object Storage Servers are organized in pairs. While each array is attached to two servers, an array is normally accessed by one of the servers only.
1.1.3.3 Network connections The servers in the HP SFS system are connected to networks inside and outside the system as follows: • The administration server and the MDS server each have a connection to the site network. The site network connection provides management access to the HP SFS system. • All servers within the HP SFS system are connected to the system interconnect (there can be a number of interconnects).
1.1.3.3.2 Management network Figure 1-6 shows how the servers in the HP SFS system are connected to the management network. Figure 1-6 HP SFS management network Site Network Admin 1 iLO MDS 2 iLO Management Network OSS iLO 3 OSS iLO 4 OSS iLO 5 OSS iLO OSS 6 iLO n-1 OSS iLO n Legend Admin: Administration Server MDS: Meta-data Server OSS: Object Storage Server iLO: Integrated Lights Out The management network is a private network.
1.3 Supported hardware Table 1-1 lists the supported hardware devices for HP SFS systems.
1–10 Overview
2 The sfsmgr utility This chapter provides an overview of the sfsmgr(8) utility, and is organized as follows: • Overview (Section 2.1) • Starting the SFS CLI (Section 2.2) • Running sfsmgr commands (Section 2.3) • Troubleshooting the sfsmgr command (Section 2.
2.1 Overview The HP SFS system is not a general purpose system; instead, it is dedicated to running MDS and OST services. Unless instructed to do so by your HP Customer Support representative, do not install any software package on the system. Such action is not supported and may impact the correct operation of the system. The HP SFS system is managed and operated through the sfsmgr(8) utility. All changes to system configuration or server configuration are executed by this utility.
When you first enter the sfsmgr command when installing or upgrading a system, the output displayed is one of the following: • If the system software has never been installed, either at the factory or on site, the following menu is displayed. Enter 1 to select the option to install the system: # sfsmgr . . .
You must enter enough letters to make the abbreviated command unique; if you do not, the command will not work (the usage for the command will be displayed). For example, in the show commands, you cannot abbreviate the word show to sh, as this could be confused with the shutdown command; you can, however, abbreviate it to sho. Similarly, you can abbreviate the show lun and the show log commands to sho lu and sho lo. 2.3.
========================= H e a r t b e a t S t a t u s ================== Name Type Status ------------------------------ ---------- -----------south1 <--> south2 network ONLINE ========================= S e r v i c e S t a t u s ====================== Last Service Status Owner Transition -------------- -------- -------------- ---------------admin started south1 10:49:09 Aug 04 Monitor Interval -------30 Restart Count ------0 Under normal circumstances, the status of the administration service
2–6 The sfsmgr utility
3 Operating the system This chapter contains instructions for operating the HP SFS system. The chapter is organized as follows: • Booting the system (Section 3.1) • Shutting down the system (Section 3.2) • Booting a server (Section 3.3) • Booting multiple servers (Section 3.4) • Shutting down an Object Storage Server or the MDS server (Section 3.5) • Shutting down the administration server (Section 3.6) • Stopping a file system (Section 3.7) • Starting a file system (Section 3.
3.1 Booting the system Before booting the system, ensure that all of the system components other than the servers—that is, the storage arrays, management network, and so on—are turned on. To boot the system, perform the following steps: 1. Turn on the power to the administration server. The server boots automatically. 2. Log in to the administration server as root user. 3. Start the SFS CLI by entering the following command: # sfsmgr . . .
. . . Command has finished: south[3,5,7] -- *** Server States *** Success: south[3,5,7] 6. Boot the second server in each server pair by entering the command shown in the following example: sfs> boot server south[4,6,8] Command id 136 16:29:59 south4 -- . . . 16:30:02 . . . south4 -- 16:30:03 16:30:14 . . . south4 -- Checking server status south4 -- Powering on server 16:30:02 . . . south6 -- 16:30:03 16:30:55 . . .
3.2 Shutting down the system HP recommends that you stop all file systems (using the stop filesystem command) before shutting down the system. This is so that when you next reboot the system, the MDS and OST services that comprise the file system do not start until all servers are fully booted and the start filesystem command is used. The stop filesystem command is described in Section 3.7. The shutdown server command is used to shut down and turn off the power to the servers in the system.
. . . Command has finished: south3 -- *** Server States *** Success: south3 3.4 Booting multiple servers To boot more than one server at a time, perform the following steps: 1. Log in to the administration server as root user. 2. Start the SFS CLI by entering the following command: # sfsmgr . . . sfs> 3.
To shut down an Object Storage Server or the MDS server, enter the command shown in the following example, where server south3 is shut down. You can specify one or more servers: sfs> shutdown server south3 Are you sure you wish to shutdown the server(s) "south3"? [no]: yes You are prompted to confirm that you want to shut down the server; enter yes to confirm.
3.7 Stopping a file system When you create a file system using the create filesystem command, the file system is started and ready for use. You can stop a file system by entering the stop filesystem filesystem_name command. The stop filesystem filesystem_name command stops the file system and at the same time preserves user connections. When you stop a file system, any active I/O operations are suspended.
If a service shows the unload-failed state, reboot the server to force the file system to unload. (For information on file system service states, see Table 4-3 on page 4-7.) If a server is booted while the file system is in the stopped or not-stopped state, the MDS and OST services that normally run on that server will not start. 3.
You can expect a starting service to go to the recovering or running state after about 1 minute. The recovering state indicates that client nodes were connected to the service before it was stopped and those client nodes have not yet reconnected.
3.9 Unconfiguring storage arrays This section describe how to unconfigure storage arrays, and is organized as follows: • Unconfiguring EVA4000 arrays attached to the administration and MDS servers (Section 3.9.1) • Unconfiguring EVA4000 arrays attached to Object Storage Servers (Section 3.9.2) • Unconfiguring SFS20 arrays attached to the administration and MDS servers (Section 3.9.3) • Unconfiguring SFS20 arrays attached to Object Storage Servers (Section 3.9.4) 3.9.
b. In the Navigation pane, select the array you want to unconfigure, then click the Uninitialize button on the Initialized Storage System Properties page to unconfigure the array. 7. If you intend to reconfigure the storage on the EVA4000 arrays attached to the Object Storage Servers, unconfigure those arrays as described in Step 6. 8. You must now reinstall the system, as described in Chapter 6 of the HP StorageWorks Scalable File Share System Installation and Upgrade Guide.
5. Boot the servers, as shown in the following example: sfs> boot server south[3-4] 6. When the servers have booted, you will need to reconfigure the servers, as described in Chapter 6 of the HP StorageWorks Scalable File Share System Installation and Upgrade Guide. 3.9.
7. Log in to the system as follows: login: root password: secret 8. Unconfigure the arrays attached to the administration and MDS servers, as shown in the following example, where there are two SFS20 arrays attached to the administration and MDS servers. Note that you do not use the unconfigure array command to unconfigure these arrays; instead, you use the hpacucli utility: a. Start the hpacucli utility on the administration server by entering the following command: # hpacucli . . . => b.
To unconfigure an SFS20 array, perform the following steps: 1. Determine the name of the array you want to unconfigure, as shown in the following example: sfs> show array Array ----1 2 3 4 Type ------MSA20 MSA20 MSA20 MSA20 Array (WWID) ----------------P6C8CX7MQPS7X6 P6C8CX7MQPS7TU P6C8CX7MQPSU67 P6C8CX7MQPS7XG Disks ----12 12 12 12 Free ----0 0 0 0 Server --------------south[1-2] south[1-2] south[3-4] south[3-4] 2.
6. When you have finished unconfiguring (or reconfiguring) the array, shut down the servers, enable them, and then boot them again, by entering the commands shown in the following example: sfs> shutdown server south[3-4] sfs> enable server south[3-4] sfs> boot server south[3-4] Because the servers are enabled, the services will run on the servers when the servers boot. 3.
SFSMDSCAP (MDS license) Allows you to start a file system that is based on capacity class (SFS20 enclosure) storage. SFSOSTENT (OST license) Allows you to create a file system that is based on enterprise class (EVA4000) storage. SFSOSTCAP (OST license) Allows you to create a file system that is based on capacity class (SFS20 enclosure) storage.
INCREMENT SFSMDSCAP HPQ 1.0 permanent 1 HOSTID="000bcd505cbb \ 000bcd827644" NOTICE="License Number 7YCYHDHTEYAH" \ SIGN=F3D03152BE24 INCREMENT SFSOSTCAP HPQ 1.0 permanent 100 HOSTID="000bcd505cbb \ 000bcd827644" NOTICE="License Number 7YCYHDHTEYDE" \ SIGN=80B1511C256A INCREMENT SFSOSTCAP HPQ 1.
3.11.2 Viewing license information To view the license information on the HP SFS system, enter the show license command as shown in the following example. It takes a few seconds for the license status to be read (during which time the Reading license status message is displayed); please wait for the command to complete: sfs> show license Reading license status -- please wait... lmstat - Copyright (c) 1989-2003 by Macrovision Corporation. All rights reserved.
When you edit the /var/flexlm/license.master file, note the following rules: • Do not remove or modify the lines that start with SERVER, USE_SERVER and VENDOR. • To update a line starting with INCREMENT, insert or remove the line in its entirety. You must not change any text on these lines. In particular, you must not change the number of licensed units, specified in terabytes (TB). Doing so invalidates the license. • An INCREMENT may be split across several lines.
3. Save the license file you receive from HP in a convenient directory on a host that can be accessed by the HP SFS system. 4. At this point, it is convenient to use two windows; one to log in to the HP SFS system and the other to view the license file (in, for example, your email system). 5. On one window, log in to the administration server and use the vi(1) editor to edit the /var/flexlm/license.master file. CAUTION: Do not simply replace the existing /var/flexlm/license.
installation, make sure these files are the same on both the Administration and MDS servers: /var/flexlm/license.master /var/flexlm/license.lic 7. Copy the license files to the MDS server, as shown in the following example, where south2 is the MDS server. # scp /var/flexlm/* south2:/var/flexlm 8. On the administration server, enter the show license command and examine the output to ensure that the licenses are as expected (that is, the license information has been updated). 9.
3.12.1 Creating authorizations Authorizations allow users to access the HP SFS system remotely without a password. To create an authorization in the system database, you must have (or have access to) the public key file (id_rsa.pub or id_dsa.pub) for the user on the remote system. You can create an authorization in one of the following ways: • Copy over the public key file for the user on the remote system to the HP SFS system, as shown in the following example: [root@south1 lscli]# scp fred@16.123.123.
3.12.3 Viewing authorizations To view a list of all authorizations in the HP SFS system, enter the following command: sfs> show authorization Name ---------------------root_10@ms fred@ms sfs> Id ------------------------------------------------------ssh-rsa AA...ijoFIU1rf7E= root_10@system1.my.domain.com ssh-rsa AA...OIU9mjib0hMqr0= fred@system1.my.domain.com A short version of the public key (ID) is shown, which includes the start and the end of the ID.
3.13 Locating servers and SFS20 arrays To help you to physically locate a server or an SFS20 array, the sfsmgr utility provides two commands that turn on (or off) the blue unit identification (UID) LED on a server and the blue LEDs on an array, as follows: • To turn on the UID LED (which is visible at the front and back of the server) on a specific server, enter the set server command as shown in the following example: sfs> set server south3 locator=on Use the locator=off option to turn off the UID LED.
4 Viewing system information There are a number of commands that you can use to view information about the status and configuration of components in your HP SFS system. These commands are described in this chapter, which is organized as follows: • Viewing server information (Section 4.1) • Viewing file system information (Section 4.2) • Viewing OST service information (Section 4.3) • Viewing LUN information (Section 4.4) • Viewing array information (Section 4.5) • Viewing event logs (Section 4.
4.1 Viewing server information The show server command provides information on the state of servers. If you suspect that one or more of the servers is not functioning correctly, you can use this command to determine which servers are running, and what services they are providing. The server states are described in Table 4-1. Table 4-1 Server states State Description booting The power to the server has just been turned on, and the server is booting. This is a temporary state.
To see more information on a server, enter the show server command for that server, as shown in the following example. This example shows a server in a system where EVA4000 storage is used: sfs> show server south1 Name: Primary Role: Backup Server: Server Firmware Model: Server Firmware Date: iLO Firmware: south1 adm south2 29 05/01/2004 1.82 Current Config. State: Configured Desired Config.
DIMM ---------------------01 02 03 04 05 06 Status -------Ok Ok Ok Ok Absent Absent Integrated Management Log (IML) events Critical Caution -------- ------0 0 sfs> Use the following information to help you to understand the output: • On the administration and MDS servers, the value of the Desired Config. State field changes during system configuration. On Object Storage Servers, the value changes during the time the server is booting.
Server Firmware Date: 05/01/2004 iLO Firmware: 1.82 Current Config. State: Configured Desired Config.
Integrated Management Log (IML) events Critical Caution -------- ------0 0 4.1.1 Server information after system parameter is changed If certain system parameters (such as the Start IP address of an interconnect) are changed (using the configure server command, as described in Section 7.1), the servers are using an actual configuration that differs from the parameters in the database.
Table 4-2 File system states State Description started All file system services are running. Client nodes will be able to mount the file system and perform I/O operations. recovering All of the services in the file system have been started. However, one or more of the services are in the recovering state—that is, waiting for all previously connected client nodes to reconnect. When the file system is in this state, all I/O operations in progress on client nodes will stall.
Table 4-3 File system service states State Description running The service is running. The reconnection process has finished and all requests are being handled normally. shutdown The file system service has just shut down but has not yet completely unloaded. The state will change to stopped within a few seconds. starting The service has just been started. stopped The service is stopped. The associated file system software is unloaded.
To see more information on a file system, enter the show filesystem command for the file system, as shown in the following example: sfs> show filesystem data Name: OSTs: State: Mountpoint: Stripe Size: Stripe Count: Interconnect: MDS mount options: OST mount options: Lustre timeout: Quota options: data ost[10-15] started /usr/data/ 4194304 6 tcp elan acl,user_xattr extents 200 quotaon=ug MDS Information: Name ----mds3 LUN --15 Array ----- Controller ---------AB Files -------2.
Size The size of the LUN, in gigabytes. Used For the MDS service: the percentage of the total number of files (shown in the Files field) used by existing files (for example, a value of 30% indicates that another 60% of the total number of files can be created). For OST services: the percentage of the total space used by files stored in the OST service (for example, a value of 80% indicates that the OST LUN is quite full). Service State The state of the MDS or OST service.
To see more information on an OST service, enter the show ost command for the OST service, as shown in the following example: sfs> show ost ost14 Name: ost14 Filesystem: data Primary Server: south5 Running on: south6 LUN (WWID): 600508b4-0000c11b-00022000-006c0000 LUN (Number): 14 Partition: 1 The output shows the server that the service normally runs on (the Primary Server field), and the server that the service is currently running on (the Running on field).
Type: Role: User: Size: Preferred Server: Path from Server ---------------south1 south2 array service south1 1024 MB Preferred --------none none Current --------fc-0/A2 fc-0/A2 Available -------------------fc-0/B2 fc-0/A2 fc-0/B2 fc-0/A2 Note the following points: • The LUN numbers are assigned as the LUNs are discovered by the system. These LUN numbers are unique across the HP SFS system. • The Preferred Controller field can contain a dash (–), A, or B.
Mirrored LUNs If the LUN being displayed is a mirrored LUN (that is, mirrored across two component LUNs), the show lun lun_number command shows details of the component LUNs, as shown in the following example: sfs> show lun 17 LUN (WWID): LUN (Number): Device: Array (WWID): Array (Number): Current Controller: Preferred Controller: Type: Role: User: Size: Preferred Server: Component LUN ------------2 3 4.
LUN ---7 8 9 (WWID) ----------------------------------600508b4-000139ed-00007000-01050000 600508b4-000139ed-00007000-010b0000 600508b4-000139ed-00007000-01110000 Controller ---------A B A Number of Paths --------------2 2 2 The following example shows the output from the show array array_number command in a system where SFS20 storage is used: sfs> show array 2 Array: Name: Array Number: Type: Preferred Server: Firmware: Total Disks: Free Disks: Disks Per Group: Raid Type: Spare Disk Count: Cache Status:
Table 4-5 Connection status values State Description replaced This status indicates that an array that was previously connected to this port on the server has been removed and a new array has been attached in its place. The information about the detached array remains in the database; you can delete the array information from the database using the delete array array_number command (see Section 3.10 for more information). The status of each disk drive in disk bays 1 to 12 is shown for SFS20 arrays.
Table 4-7 Logical drive status values State Description parity initialization failed An SFS20 array performs parity verification (a surface scan) on the logical drive when the system is idle. If the surface scan finds one or more inconsistent blocks during its most recent traversal of the logical drive, the drive will be given the parity initialization failed status. Usually this status is automatically cleared when a subsequent surface scan finds no inconsistencies.
You can restrict the show log command to showing a small number of events by using the recent argument, as follows: sfs> show log recent The show log command is usually used to display events that have occurred in the past. However, you can also use the command to display events as they occur, by entering the following command: sfs> show log now No output is displayed until an event occurs. To interrupt the command, press Ctrl/c. You can use filters with the show log command to limit the events shown.
4.6.1.
4.7 Viewing performance statistics Performance statistics are automatically gathered from each server in the HP SFS system. This section describes how to view those performance statistics. The section is organized as follows: • Overview of the collectl utility (Section 4.7.1) • Overview of the information gathered by the collectl utility (Section 4.7.2) • Using the Web server to view information gathered by the collectl utility (Section 4.7.
4.7.2 Overview of the information gathered by the collectl utility The collectl utility gathers the following information for each individual server: • • Memory usage • Total slab memory • Total mapped memory • Total buffer memory • Total cache memory Total or Individual CPU utilization CPU usage graphs are stacked; that is, each successive item is plotted with the value of the previous item as its y-axis 0 co-ordinate.
• • Swap space • Megabytes in use • Megabytes free Disk I/O sizes • • Per OST device or cumulative RPC sizes • Per OST device or cumulative 4.7.3 Using the Web server to view information gathered by the collectl utility CAUTION: The Web server is also used by client nodes when mounting file systems using the HTTP protocol. If you disable the Web server, client nodes will not be able to use the HTTP protocol to mount file systems.
Figure 4-1 The ColPlot Web page 1. 2. 3. 4. 5. 6. 7. 8. 9. The highlighted areas of the Web page shown in Figure 4-1 are as follows: 1. Dates for which graph is to be plotted (start date and end date). Enter the date in the format YYYYMMDD. 2. Time period for which graph is to be plotted (start time and end time). Enter the time in the format HH:MM. 3. Text field that allows you to type in the name of the server for which the graph is to be plotted. 4.
4.7.3.1 Viewing overall throughput to OST devices from a server To view the overall throughput to OST devices from a server, perform the following steps on the ColPlot page: 1. Enter the dates and times for which the graph is to be plotted (areas 1 and 2 in Figure 4-1). 2. In the text field that allows you to specify the server name (area 3 in Figure 4-1), enter the name of the server; for example, south3. 3.
Figure 4-3 A detailed graph showing throughput to each OST device from a server 4.7.3.3 Viewing throughput to OST devices from network devices on a server To view the throughput to each OST device from specified network devices on a server, perform the following steps on the ColPlot page: 1. Enter the dates and times for which the graph is to be plotted (areas 1 and 2 in Figure 4-1). 2.
Figure 4-4 A detailed graph showing throughput to OST devices from specified network devices 4.7.3.4 Viewing RPC transfer sizes or disk I/O transaction sizes There are four configuration files that provide information about remote procedure call (RPC) transfer sizes and disk I/O transaction sizes. The files are: • ost-brw.cfg—Provides cumulative RPC statistics, including the number of pages of data in each RPC sent or received on the server, for all OST devices served by the server. • ost-brwD.
Figure 4-5 A graph showing the information in the ost-blkD.cfg file 4.7.
You can view the disk statistics stored in the /proc/scsi/sd_iostats and /proc/driver/cciss/ cciss_iostats/ directories, as shown in the following examples.
read discont pages 0: write rpcs % cum % | 0 0 0 | rpcs % cum % 39624 100 100 discont blocks 0: 1: 2: 3: read rpcs 0 0 0 0 write rpcs % cum % 34572 87 87 4941 12 99 106 0 99 5 0 100 % cum % | 0 0 | 0 0 | 0 0 | 0 0 | # cat /proc/fs/lustre/obdfilter/ost10/brw_stats snapshot_time: 1096994356:601614 (secs:usecs) pages per brw 1: 2: 4: 8: 16: 32: 64: 128: read brws 0 0 0 0 0 0 0 0 % cum % | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | write rpcs % cum % 12 0 0 0 0 0 25 0 0 2 0 0 222 0 0 226 0 0 429
5 Creating and modifying file systems This chapter describes how to create, modify, operate, and delete file systems, and is organized as follows: • Creating a file system — EVA4000 storage (Section 5.1) • Creating a file System — SFS20 storage (Section 5.2) • LUN Roles, used by, and preferred server changes (Section 5.3) • Operating a file system (Section 5.4) • Recovering from failure of the create filesystem command (Section 5.5) • Modifying file system attributes (Section 5.
5.1 Creating a file system — EVA4000 storage This section describes how to create a file system on an HP SFS system that uses EVA4000 storage. (For information on creating a file system on an HP SFS system that uses SFS20 storage, see Section 5.2.) When you configured the EVA4000 storage as described in Chapter 5 of the HP StorageWorks Scalable File Share System Installation and Upgrade Guide, you created virtual disks for use as administration, service, MDS, and OST LUNs.
5.1.2 Step 2: Matching array numbers to physical arrays — EVA4000 storage To determine what role you will assign to each LUN, you must first determine which physical storage array the LUN is on. When you installed the system, you identified the WWIDs of each EVA4000 storage array. Each controller on the arrays also has a unique serial number. You will now use this information to match the WWIDs of the arrays to the array numbers shown in the output from the show lun command.
5.1.3 Step 3: Setting roles, preferred controllers, and disk group information for LUNs — EVA4000 storage NOTE: In the example output shown in this section, the MDS role is assigned to four LUNs. This configuration allows for two file systems, with two (mirrored) MDS LUNs for each file system. NOTE: Setting the preferred controller for a LUN allows the HP SFS software to determine the appropriate HBA and path to the LUN, and this improves performance.
6 7 8 9 10 11 12 13 14 15 16 sfs> 1 2 2 2 3 3 4 4 4 5 5 mds service ost ost ost ost service ost ost ost ost south3 south5 - 290 1 290 290 290 290 1 290 290 290 290 - south[1-2] south[3-4] south[3-4] south[3-4] south[3-4] south[3-4] south[5-6] south[5-6] south[5-6] south[5-6] south[5-6] If necessary, enter the set lun command again to correct any incorrect role assignments.
Stripe size The file system stripe size is the default size for files created in the file system. However, this can be changed on a file-by-file basis. You can specify the stripe size in KB (kilobytes) or MB (megabytes). The following rules apply to the file system stripe size: • The lower limit of the default stripe size for a file system is 4MB; if you set a file system stripe size of less than this, the stripe size will default to 4MB.
14 15 16 . . . 4 5 5 B A B ost ost ost 290 290 290 south[5-6] south[5-6] south[5-6] - You must select preferred servers so that OST serving is balanced optimally. Both of the servers in a server pair are capable of serving (that is, being the preferred server for) any of the LUNs on the arrays attached to the server. In the example above, LUNs 8 through 11 are visible to (that is, presented to) the south3 and south4 servers.
HP recommends that you enable the ACL functionality on your HP SFS file systems, unless your system is running in Portals compatibility mode. CAUTION: If your system is running in Portals compatibility mode, do not enable the acl option or the user_xattrs option on a file system. Client nodes will not be able to successfully mount a file system that has these options enabled while the system is running in Portals compatibility mode.
When quotas are enabled on a file system, the itune and btune settings (specified as a percentage) are used in conjunction with the iunit and bunit settings to calculate the quantities of inodes and blocks that will be reserved and released for a user or group. If you decide to enable quotas on the file system, HP strongly recommends that you accept the default values for the quota options. The default values are as follows: • iunit: 5000 • bunit: 100 • itune: 50 • btune: 50 See Section 5.
5.1.5 Step 5: Using the create filesystem command — EVA4000 storage CAUTION: HP recommends that you do not create file systems while the HP SFS system is under load from client nodes. Before you enter the create filesystem command, ensure that all servers are running. In addition, verify that all LUNs are operating normally, by performing the tests described in Section 6.1.3 and Section 6.1.5.
Depending on the number of LUNs that have an MDS role, you may now be prompted to choose MDS LUNs, as follows: • If there are no LUNs with an MDS role, the create filesystem command terminates. In this case, you must assign the MDS role to LUNs (see Section 5.1.3) and enter the create filesystem command again.
14 15 16 4 5 5 B A B ost ost ost 290 290 290 south[5-6] south[5-6] south[5-6] - Please enter the LUN you wish to use [8]: 8 Please enter the preferred server for this OST LUN [south3]: south3 You are now asked to specify the primary usage of the file system, so that the system software can optimize file system performance.
MDS LUN(s): LUN Array Controller --- ----- ---------3 1 A 4 1 B Role ---mds mds Size(GB) -------290 290 OST LUN(s): LUN Array Controller --- ----- ---------8 2 A 9 2 B 10 3 A 11 3 B 13 4 A 14 4 B Role ---ost ost ost ost ost ost Size (GB) --------290 290 290 290 290 290 Preferred Server ---------------south2 south2 Preferred Server ---------------south3 south3 south4 south4 south5 south5 Backup Server ------------south1 south1 Backup Server ------------south4 south4 south3 south3 south6 south6 Is t
5.1.7 Step 7: Backing up the system database Back up the system database, as follows: 1. Back up the database, as follows: # sfsmgr . . . sfs> create database_backup 2. Save the backup file to an external system, as shown in the following example, where the full name of the backup file is /var/hpls.local/hplsdb_20040311-1041.tar.gz, the external system host address is 16.123.123.1, and the account on the host is named fred: # scp /var/hpls.local/hplsdb_200403011-1041.tar.gz \ fred@16.123.123.
The configuration shown in this example would allow you to create an MDS service that uses two LUNs, and four OST services that use one LUN each on the arrays connected to the south3, south4, south5, and south6 servers. In this situation, you probably do not need to change the role of any of the existing LUNs and you can proceed to Section 5.2.2 to identify the file system information. However, in very small HP SFS systems, the default LUN roles that are set by the configure array command may be unsuitable.
File system name The file system name must be a maximum length of 32 characters and must not contain spaces. HP recommends that you use a shorter name (not more than 10 characters). This will simplify administration and make the displays easier to view. Mount point The mount point defined for the file system is the mount point that clients will use when mounting the file system. Each client node that mounts the file system must have a directory with the same name as the file system mount point.
Number of OST services You will be asked to specify how many OST services the file system is to use. A list of all available OST LUNs (that is, OST LUNs that are not already part of a file system) is displayed. The maximum number of OST services that the file system can have is determined by the number of available OST LUNs. For example, if the system has eight OST LUNs and you decide to mirror each OST service across two LUNs, you can create four OST services.
Mount options Underlying the OST and MDS services on the Lustre file systems, there are ldiskfs file systems. When an OST or MDS service starts, it mounts the underlying ldiskfs file system so that it can access the data. You can specify mount options on the file system services to determine what happens when the file systems are mounted. • When creating a file system, accept the default options provided by the create filesystem command for OST services.
The system-wide quota attribute must be enabled in order for any file system to use quotas; this attribute is enabled by default. On a file system, you can enable quotas for users, groups, or both, by setting the quotaon option. In HP SFS Version 2.2, the quotaon option on a file system is set to ug (that is, quotas are enabled for both users and groups) by default when the file system is created, unless the system is running in Portals compatibility mode.
Here are the available LUNs which you can assign to OSTs: LUN Array --- ----8 3 13 5 11 4 15 6 17 7 21 9 19 8 23 10 Controller ---------scsi-1/1 scsi-2/1 scsi-1/2 scsi-2/2 scsi-1/1 scsi-2/1 scsi-1/2 scsi-2/2 Role ---ost ost ost ost ost ost ost ost Size (GB) --------1050 1050 1050 1050 1050 1050 1050 1050 Visible to --------south[3-4] south[3-4] south[3-4] south[3-4] south[5-6] south[5-6] south[5-6] south[5-6] Preferred --------south3 south3 south4 south4 south5 south5 south6 south6 Please enter the LU
In the following example, array1 and array 2 are connected to different Smart Array 6404 adapters on the servers. In this example, when a mirrored OST LUN is being created, the LUN connected to the scsi-1/1 port should be paired with the LUN connected to the scsi-3/1 port.
LUN ---7 8 9 10 11 12 (WWID) ----------------------------------a44a62b2-1727001e a44a62bb-e9ee001f a44a62c0-05140020 a44a62a4-5c200021 a44a62a9-c6610022 a44a62ad-a2730023 Logical Drive ---------ok ok ok ok ok ok As a general rule, where there are two Smart Array 6404 adapters on each server, the LUNs on the following ports should be paired: • • On the first server: • scsi-1/1 with scsi-3/1 • scsi-2/1 with scsi-4/1 On the second server: • scsi-1/2 with scsi-3/2 • scsi-2/2 with scsi-4/2 5.2.
5.2.3.1 Using the create filesystem command in interactive mode — SFS20 storage When you enter the create filesystem command in interactive mode, a series of questions prompts you to enter the required information (the information you identified in Section 5.2.2). To accept a suggested value, press the Return key. The following is an example of creating a file system in an HP SFS system where SFS20 storage is used.
You are now asked to specify how many OST services the file system is to use. In this example, the OST services are not mirrored. If you are creating mirrored OST LUNs, see Section 5.2.2.2 for more information. You need to specify the OSTs to use in the filesystem.
You are now asked to enter the stripe count for the file system. The stripe count of a filesystem refers to the number of OST services a newly created file will be striped across by default. (1 to 6; 0=1) Please enter the stripe count of the filesystem [1]: You are now asked to specify the interconnects that are to be used for file system I/O operations. Lustre can use various interconnects for file systems I/O.
5.2.3.2 Using the create filesystem command in scripted mode — SFS20 storage You can create a file system in scripted mode by entering the create filesystem command with all of the parameters for the file system.
The following example shows that LUNs 3 and 5 are components of a mirrored MDS service (LUN 30) for the data file system, and LUN 8 is an OST LUN used by the ost1 service: sfs> show lun 5.4 LUN --. . . Array ----- Role ----- Used by ---------- Size(GB) -------- Preferred Server ---------------- Visible to ---------- 3 5 . . . 1 2 mirror mirror 30 30 1050 1050 south2 south2 south[1-2] south[1-2] 8 . . .
If the create filesystem command fails, you need to determine whether the file system was partially created or not. Enter the show filesystem command; if the file system is not shown at all, this indicates that no component of the file system has been created (that is, the create filesystem command failed entirely); if this happens, you must correct the original problem that caused the command to fail, and then use the create filesystem command again.
5.6 Modifying file system attributes You can use the modify filesystem command to modify the following attributes of a file system: • The number of OST services used for the file system. Adding OST services to the file system increases the size of the file system. The file system must be unmounted on all clients before OST services are added. There are constraints on the amount of space that can be used on the new OST services. For information on how to add OST services to file systems, see Section 5.6.1.
for the quotas options and that you do not change the values unless you are requested to do so by your HP Customer Support representative. You can also use the modify filesystem command to rewrite the LDAP configuration data for a file system. The following sections provide instructions for modifying various file system attributes: • Section 5.6.1 describes how to add OST services to a file system. • Section 5.6.2 describes how to change the preferred server for an OST or MDS service. • Section 5.6.
4. Select the Add OSTs option, as follows: Enter your choice [c]: 1 You are then prompted to select and configure the new OST services for the file system; the process is similar to the process for selecting OST services when you create a file system (see Section 5.2.3). 5. When you have finished adding OST services to the file system, restart the file system, as shown in the following example: sfs> start filesystem data 6. 5.6.1.
5. Select the service and then select the preferred server for the service. Repeat this step for each service where you want to change the preferred server. When you have made your changes, press the Return key without selecting a service. 6. Restart the file system, as shown in the following example: sfs> start filesystem data 7. When the file system restarts, the client nodes can remount the file system. If a client node fails to remount the file system, reboot the client node. 5.6.
3. When the file system has stopped, enter the modify filesystem filesystem_name command, as shown in the following example: sfs> modify filesystem data Select an option: 1) 2) 3) 4) 5) 6) 7) 8) 9) w) c) Add OSTs Change stripe size Change stripe count Change mount point Change preferred servers Change LUN mount options Change interconnect Change Lustre timeout Change quota options Rewrite LDAP configuration data Cancel Enter your choice [c]: 4. Select the LUN Mount Options menu item, as follows.
To add or remove an interconnect for a file system, perform the following steps: 1. Unmount the file system on all client nodes. 2. Stop the file system as shown in the following example, where the data file is system is stopped: sfs> stop filesystem data Do not proceed to the next step (running the modify filesystem command) until the file system is stopped; the file system is stopped when all of the file system services are in the stopped or down state. See Section 3.
5.6.5 Changing other file system attributes NOTE: Systems running in Portals compatibility mode cannot use the quota functionality. You will not be able to change the quota options on a file system while the HP SFS system is running in Portals compatibility mode. To change file system attributes, perform the following steps: 1. Unmount the file system on all client nodes. 2.
5.6.6 Rewriting LDAP configuration data To rewrite the LDAP configuration data, perform the following series of steps. In the example shown here, the name of the file system to be reconfigured is data: 1. Unmount the file system on all client nodes. 2. Stop the file system, as shown in the following example, where the data file system is stopped: sfs> stop filesystem data 3. Enter the show filesystem command and verify that all MDS and OST services used by the file system are stopped. 4.
5.7 Managing quotas Quotas allow you to control how many blocks or inodes a user or group can use in a file system. In this release of the HP SFS software, only hard limits are supported. Quotas are managed on both the HP SFS system and on a client system. You can manage the client portion of the quotas functionality on any HP SFS client system that has mounted the file system that is being configured to use quotas. Section 5.7.1 describes how quota tuning works on Lustre file systems. Section 5.7.
As the user continues to access the OST device, the amount of unused quota reservation continues to drop. When the amount of unused quota reservation drops below 50MB (that is, 50% of the block reservation), the quota management system reserves a further 100MB of the user’s quota for use by the OST device. Note that when a user deletes files from the file system, blocks and inodes are released. A user may be denied permission to write data before the real block limit is reached.
5.7.2.2 Step 2: Enabling quota functionality on a file system NOTE: Unless you have enabled quotas on the file system (and the system-wide quota option is also enabled), quotas will not work even if you attempt to enable quotas when mounting a file system on a client node. Quotas are enabled or disabled on a per-filesystem basis—that is, it is possible to enable quota functionality on one file system and disable quota functionality on another file system.
• 5.7.2.4 If the file system was created on a system running HP SFS Version 2.2, proceed to Section 5.7.2.5 to activate quota functionality on the file system. Step 4: Configuring user and group information on a file system This step applies only to file systems that were originally created when the system was running a version of the HP SFS software earlier than Version 2.2; if your file system was created on a system running HP SFS Version 2.2, skip this step.
You must run the command on a client node that has been configured as described in Section 5.7.2.3, and where the file system is mounted. Note also that the lfs setquota command will not succeed if the lfs quotacheck command has not been run first to activate quota functionality on the file system, as described in Section 5.7.2.5.
5.7.4 Disabling quota functionality on one or all file systems You can disable quota functionality on a specific file system by entering the modify filesystem command, selecting the Change quota options menu option, and then disabling quotas on the file system (see Section 5.6.5 for more information on changing file system attributes). To disable quota functionality for all file systems on a client node, remove the lquota option from the /etc/modprobe.conf.lustre file or the /etc/moduldes.conf.
2. Stop all file systems, by entering the stop filesystem filesystem_name command for each file system. Do not proceed to the next step (defining the system nickname) until all file systems are stopped; a file system is stopped when all of the file system services are in the stopped or down state. See Section 3.7 for more information on the stop filesystem command. 3.
You can manage space on OST services as follows: • By changing the threshold at which the alert triggers, as described in Section 5.11.1. • By deleting files when one or more OST services become too full. Use the information provided in Section 5.11.2 to help you to identify which files have components on a specific OST service. (Note that you can delete files from an OST service even if the service has been deactivated.) • By deactivating an OST service that is becoming full.
3. Start the file system(s). When new files are created, ost3 and ost7 will not be used to store new files. However, access to existing files on the OST service (for both read and write operations) is unaffected, and the OST service can continue to grow in size. You can rebalance storage across OST services by copying any large files to files with new names and then deleting the old files. The new files will not use the deactivated OST services.
5–46 Creating and modifying file systems
6 Verifying, diagnosing, and maintaining the system This chapter describes how to verify the system configuration and how to diagnose possible problems. It also provides information on tasks that you can perform to prevent problems in the HP SFS system. The chapter is organized as follows: • Verifying the system (Section 6.1) • Managing email alerts (Section 6.2) • Backing up and restoring system data (Section 6.3) • Backing up and restoring file system data (Section 6.
6.1 Verifying the system This section describes how to verify that the system has been installed correctly and is operating successfully. You can perform these tests at any time. To verify the system installation and operation, perform the following tasks: 1. Verify the system configuration using the syscheck command, as described in Section 6.1.1. 2. Verify the operation of the power management, as described in Section 6.1.2. 3.
• Specify the severity levels to be included in the report. There are four severity levels: • Critical conditions Critical conditions are failures that will severely impact the operation of the system; for example, an SFS20 array that is offline. • Warnings Warnings identify failures that may not prevent the system from operating, but are nevertheless serious; for example, a LUN that is visible to only one server.
Table 6-1 Arguments for the syscheck command Argument Description severity Specifies the severity levels that are to be reported from the tests. Valid values are 1, 2, 3, 4. You can use this option to control the report generation and limit the output from the tests. You can specify one severity level, a comma-separated list of levels, and/ or a range of levels; for example, 1,3-4. For example, specifying severity=2 limits the reporting to warnings only. The default is severity=1-3.
Table 6-2 Components, tests, and levels for syscheck command Component Test Level storage Check that adm LUN exists and is visible to administration and 1 MDS servers. Check that there is a service LUN for each server pair. 1 Check that each LUN is presented to two servers. 1 Check that MDS and OST LUNs are balanced between the A and B controllers (EVA storage only). 1 Check the RAID state of each LUN.
----------------------------------- Warnings ----------------------------------server -----south[4] Server state: non-configured storage ------south1 south1 LUN 34 is not visible to south4 MSA20 array 4 is not visible to south4 ----------------------------- Configuration Issues ----------------------------server -----south4 Configuration State: Db_Moved Reason: Failed to configure clumanager: Failed to configure and start the cluster: Failed to configure the Service LUN (/dev/hpls/dev15a): Unable to de
c. Look at the server and confirm that the power has been turned off. Only one server in the system should be turned off at this point. d. Turn on the power to the server, as follows: sfs> boot server south2 e. Use the show server command to verify that the power to the server has been turned on, as follows: sfs> show server south2 Name: south2 Primary Role: mds Backup Server: south1 . . . Power: on . . . Repeat these steps for each server except the administration server. 3.
6.1.3 Verifying EVA4000 storage failover configuration This section only applies to systems where EVA4000 storage arrays are used and are configured with redundant Fibre Channel fabric (that is, systems where two Fibre Channel switches are used). Testing the Fibre Channel fabric failover configuration involves turning off one side of the Fibre Channel fabric, and verifying that the system continues to operate.
normally, and that the problem is related to array 3. Examine the controller and switch connections, and correct any faults before proceeding to the next step. 7. When you have corrected any faults in the controller and switch connections, turn on the power to all Fibre Channel switches. 8. Reboot all servers, as described in Steps 2 through 5 above. 9. Use the show lun command as described in Step 6 above to verify that each LUN is visible to two servers in a server pair. 10.
In the output from the hpacucli utility, the SFS20 arrays are shown as MSA20. Take note of the array serial numbers, and proceed to Section 6.1.4.4 to view the information on the arrays. Note that the serial number of an SFS20 array is used by the HP SFS CLI to identify the array. You can use the show array command to map the serial number to the array number. 6.1.4.2 Reviewing SFS20 array information To review SFS20 array information using the hpacucli utility, perform the following steps: 1.
3. Look at the status of each logical disk on the SFS20 array, as follows: => ld all show MSA20 at 01 array A logicaldrive 1 (1.99 GB, RAID 5, OK) logicaldrive 2 (1.45 TB, RAID 5, OK) The status for each logical disk should be set to OK. If there has been a disk failure, the state of the logical disk may be set to Rebuilding, as shown in the following example: => ld all show . . . array A logicaldrive 1 (1.99 GB, RAID 5, Rebuilding) 4.
7.
• Use the following table to determine the drive_number for the array; the drive number is derived from the bay number. For example, the drive number for the disk in bay 3 is 130.
This example shows that there are errors in the disk’s log. Such errors do not necessarily mean that the disk has failed or is about to fail; however, HP recommends that you periodically review the status of any disk that is logging errors. Repeat these steps for each SFS20 array attached to the server. 6.1.4.
• The Battery Status field should be set to OK. • The Firmware Version and Hardware Revision fields should be set to the correct version, as specified in the HP StorageWorks Scalable File Share Release Notes. If you need to upgrade the firmware on the Smart Array 6404 controller, see Section 8.2.2 for more information. Repeat these steps for each Smart Array 6404 HBA controller on the server. 6.1.
The syntax of the command is as follows: all_ost_raw_lun_check.bash [-h] | [-v] [-r path] [-f filesystem] -s|-p Where: -r --raw_lun_check path Specifies a path other than the default path for the raw_lun_check.bash command. -f --filesystem filesystem Specifies that only the OST LUNs in the specified file system are to be tested. If no file system is specified, all OST LUNs used in file systems are tested.
6.1.5.2 Verifying the performance of LUNs on a single server CAUTION: Do not use the procedure described here to test the performance of a LUN that is a mirrored LUN or a component of a mirrored LUN (MDS LUNs are usually mirrored). If the output of the show lun command shows the LUN role as mirror, do not run the raw_lun_check.bash command on the LUN. If you do so, the contents of the LUN may be destroyed. The /usr/opt/hpls/diags/bin/raw_lun_check.
To verify the performance of LUNs on one server, perform the following steps: 1. Enter the show lun lun_number command and determine whether the LUN is part of a file system or not: sfs> show lun 8 . . . User: data . . . • If the User field contains the name of a server, the LUN is not an MDS or OST LUN; in this case, do not proceed with this test. • If the User field is empty, that is, it shows a dash (-), the LUN is not used by a file system. Although you can use the raw_lun_check.
Speed for read to /dev/hpls/dev8a: 152 MB/s Cleaning up test space Resetting lun features for /dev/hpls/dev7a... Resetting lun features for /dev/hpls/dev8a... • To run the command in parallel mode, enter the command using the following syntax, where lun_device is the name of a device name to be tested. Run the command from the administration server: /usr/opt/hpls/diags/bin/raw_lun_check.bash --lun "lun_device [lun_device ...
6.1.6 Verifying the management network To verify that the servers are correctly connected to the management network, perform the following steps for each server in turn: 1. Enter the command shown in the following example. Note that, to address the management network, you must append -adm to the server name: # ssh south3-adm hostname south3.my.domain.com 2. Verify that the host name returned by the command is correct for the server. Repeat these steps for each server. 6.1.
6.1.8 Interconnect diagnostics This section is organized as follows: 6.1.8.1 • Testing Gigabit Ethernet interconnect performance (Section 6.1.8.1) • Examining the Myrinet interconnect (Section 6.1.8.2) • Examining the Quadrics adapter and interconnect link (Section 6.1.8.3) • Examining the Voltaire InfiniBand interconnect HCA adapter and interconnect link (Section 6.1.8.
The output from the command displays the speed the link is running at, as shown in the following example: # ./net_test.bash --serial --net tcp --server "10.128.0.1 10.128.0.2" == Testing 2 servers in serial == 10.128.0.1 Throughput MBytes/sec 10.128.0.2 Throughput MBytes/sec == Test Finished == 91.40 91.73 Compare the results of this test with the expected performance details provided in Appendix B.
NOTE: To ensure that an accurate test is performed where a dual Gigabit Ethernet interconnect is used, order the client nodes so that they are matched with a server IP address that they can communicate with. Where there is only a single Gigabit Ethernet link on the client nodes, this means that each client must be matched to a server address on the same subnet as itself.
2. Run the gm_board_info program, by entering the following command: # /opt/gm/bin/gm_board_info 3. If the gm_board_info program returns a message similar to the following, this indicates that the Myrinet adapter was not found: *** No boards found or all ports busy *** Look to see if the board is physically present. If the board is physically present, enter the following commands and then examine the log output to see whether there were any software issues when the server booted: # sfsmgr . . .
6. Test the PCI bandwidth to confirm whether the PCI interface is correctly detected as 64-bit/132MHz, by entering the following command: # /opt/gm/bin/gm_debug –L Opening board 0, port 0 LANai is running at 224 MHz. GM-2 DMA rate for 4096 Byte transfers (64bit / 132MHz bus) Timing 32 segments.
Parallel test To run the net_test.bash command to test connections between a number of servers and a number of clients in parallel, enter the command on one of the client nodes, using the following syntax. Specify the Myrinet host name of each server and the host name of each client to be tested: /usr/opt/hpls/diags/bin/net_test.bash --parallel --net gm --server "server_address1 [server_address2 ...]" --client "client_name1 [client_name2 ...
6.1.8.2.3 Testing Myrinet interconnect performance using the gm_allsize command You can use the gm_allsize tool as an alternative to the net_test.bash command described in Section 6.1.8.2.2. If you have already performed the tests in Section 6.1.8.2.2, you do not need to perform the tests in this section. You can use the gm_allsize tool as a basic connectivity test that operates in loopback mode. You can also use the gm_allsize command to measure interconnect one-way latency and bandwidth.
The output from the command consists of two columns of data: the first column lists the message size (in bytes), and the second column lists the latency (in microseconds). Unidirectional bandwidth test Unidirectional bandwidth is a ping-ping test that measures the startup and throughput of a single message sent between two processes, where outgoing messages are obstructed by incoming messages.
*** program interrupted by user *** Total queued = 0. Total sent = 0. Total received = 93751. Bidirectional (summed) bandwidth The bidirectional test causes packets to stream in both directions (both nodes are always sending). The gm_allsize command reports the sum of the send and receive bandwidths.
6.1.8.3 Examining the Quadrics adapter and interconnect link This section describes how to identify and isolate problems on the Quadrics interconnect. The procedures described in this section apply only to the portion of the interconnect that connects the server in the HP SFS system to the switch infrastructure of the Quadrics interconnect. Specifically, diagnosing problems with links within or between switches is not covered in this manual.
CAUTION: Using NFS on an Object Storage Server is not generally supported, as it may seriously interfere with Lustre operation. Do not use NFS while Lustre is operating, and do not use NFS as a general file exchange mechanism between servers. To run the qselantest diagnostics on an Object Storage Server, perform the following steps: 1.
6.1.8.3.3 Example output from a qselantest test The following example shows typical output from a qselantest test that detected no errors: qselantest: testing device(s) 0 Device 0: -------------------------------------------------------------------------------qsnet2_regtest dev_info 14fc 1 ...
BARs[5]: 00000000 CISPointer: 00000000 SubVendorID: 0000 SubDeviceID: 0000 ROMBAR: 00000000 Capabilities: 60 InterruptLine: 0f InterruptPin: 01 MinGnt: ff MaxLat: 00 ParityPhysAddrLo: 3878a800 ParityPhysAddrHi: 00000000 ParityMasterType: 000cc001 ElanControlRegister: 800f0faa InvertMSIPriority: 0 EnableWrBursts: 1 SoftIntReset: 0 EnableRd2_2Bursts: 1 ConfigInEBusRom: 0 ReducedPciDecode: 1 ExtraMADBits: 0 EnableLatencyCntRst: 1 DisableCouplingTest: 0 ReadHighPriTime: 0f WriteHighPriTime: 0f RecvdSplitCompErr
-------------------------------------------------------------------------------qsnet2_dmatest -f latency -S ...STARTED -------------------------------------------------------------------------------latency set/reset 1.19 uSec qsnet2_dmatest -f latency -S ...PASSED -------------------------------------------------------------------------------qsnet2_dmatest -f pcispeed -de 8k -S ...
6.1.8.4 Examining the Voltaire InfiniBand interconnect HCA adapter and interconnect link This section describes how to identify and isolate problems on the Voltaire InfiniBand interconnect. The procedures described in this section apply only to the portion of the interconnect that connects the server in the HP SFS system to the switch infrastructure of the InfiniBand interconnect. Specifically, diagnosing problems with links within or between switches is not covered in this manual.
2. On the second node in the test, enter the command shown in the following example: # perf_main -a192.168.xx.xx where 192.168.xx.xx is the InfiniBand IP address of the first node in the test. The output should be as follows: ******************************************** ********* perf_main version 9.6 ********* ********* CPU is: 3600.
When you have ensured that no other processes are accessing the file system, enter the ost_perf_check.bash command on the client node, using the following syntax: /usr/opt/hpls/diags/bin/ost_perf_check.bash --serial --mount-point mount_point --remote-shell remote_shell where mount_point is the directory where the file system is mounted on the client node.
The command tests the speed at which each client node can read from and write to a single OST service in the file system, by creating a file of a single stripe on a single OST service and using the IOR utility to read and write data to a file for each client node in parallel. If a dual Gigabit Ethernet interconnect is used on the servers in the HP SFS system and a single Gigabit Ethernet interconnect is used on client nodes, the order in which the client nodes are specified to the ost_perf_check.
6.1.9.3 Overall file system performance tests The overall file system performance test measures the read and write speeds to and from a specified file system for a specified set of client nodes, in parallel. The results of the test indicate the maximum possible throughput that can be obtained on a file system. The file system performance test uses the client_perf_check.bash command from the hpls-diags-client RPM file.
• Viewing email alerts (Section 6.2.7) • Disabling and enabling email alerts (Section 6.2.8) • Deleting email alerts (Section 6.2.9) • Verifying email alert operation (Section 6.2.10) 6.2.1 Configuring default email alert addresses By default, all alert emails are sent to the system default email alert address(es). HP strongly recommends that you configure one or more default email addresses other than (or in addition to) the root user, which is the default value.
6.2.3 Constructing email alert filters Email alert filters use the syntax that is used for queries in the show log command (see Section 4.6 for more information about the show log command). However, some of the options that are supported by the show log command are not supported for email alerts. Specifically, you cannot filter events by time (time=), age (age=), or by server name (server=) for email alerts. Table 6-3 shows the valid attributes and operators that can be used to create an email alert filter.
Table 6-4 Default email alerts 6–42 Email Alert Name Purpose Email Alert Filter Action Required array_failure Alerts you when a server cannot access an SFS20 storage array. data contains "array" && severity>err Power cycle the array and then reboot the servers attached to the array. bond_link_down Alerts you when one of the links in a bonded ethernet interface has been disconnected.
Table 6-4 Default email alerts Email Alert Name Purpose Email Alert Filter Action Required lustre_bug Alerts you when a fault occurs in the Lustre software. facility=kern && data contains "LustreError" && data contains "LBUG" The server where the fault occurred normally reboots automatically. If this does not happen, reboot the server. See also Section 9.39 for information on handling LBUG errors on the MDS node.
Table 6-4 Default email alerts Email Alert Name Purpose Email Alert Filter Action Required unload-failure Alerts you when an OST or MDS service fails to unload when a file system is being stopped. Since the service cannot stop, it will not start on another server. facility=lustre && severity>warning && data contains "unload-failed" Reboot the server where the service is in the unloadfailed state. 6.2.
Enter filter: facility=server && data contains "Down" . . . Enter email address(s): root@16.123.123.20 . . . Enter the throttle period (minutes) for the alert [0]: 0 The following alert will be created: Alert: Filter: E-mail: Throttle: server_down facility=server && data contains "Down" root@16.123.123.20 0 Is this correct? (yes/no/cancel): yes sfs> 6.2.
6.2.9 Deleting email alerts To delete an email alert, enter the delete alert alert_name command as shown in the following example: sfs> delete alert server_down To delete all email alerts, enter the following command: sfs> delete alert all 6.2.10 Verifying email alert operation To verify that email alerts are working, use the hplsLog command to generate a fake event that will trigger one of the default alerts, as shown in the following example.
In addition to the backups created by the create database_backup command, the following snapshot backups are also created and stored: Monthly Snapshot Backups A snapshot backup is taken and stored once each month. The first twelve monthly snapshot backups are stored; after the first twelve months, the oldest monthly snapshot backup is deleted each time a new monthly snapshot backup is taken. Yearly Snapshot Backups A snapshot backup is taken and stored once a year.
6.3.3 Restoring a system database backup When the database is backed up automatically at a set interval, it is backed up on both the administration server and on the MDS server (assuming that both servers are running; if either the administration server or the MDS server is down, the database is only backed up on the other server). The backup files on the two servers may not be identical, as they are not created simultaneously.
To delete a system database backup, perform the following steps: 1. Enter the show database_backups command to list the available backups, as follows. If necessary, enter the command on both the administration server and the MDS server: sfs> show database_backups Copy ---1 2 2. Made At ---------10:41 2004-03-10 10:41 2004-03-11 When you have identified the number of the backup that you want to delete, enter the command shown in the following example.
6.5 Removing log files There is an automated process to clear up log files. However, examine the usage of the system directories and file systems on each server from time to time. The system directories and file systems are as follows: • Administration and MDS servers On these servers, the operating system image is stored on an internal disk that is divided into two partitions: a partition for the / file system and a partition for the /var file system.
6.6 Viewing and clearing the Integrated Management Log 6.6.1 Overview The Integrated Management Log (IML) on each server contains information about hardware events, including power-on self-test (POST) events, on that server. When you enter the show server server_name command, the monitoring software counts the number of critical and caution events for the previous 30 days and displays the total in the output from the command.
6–52 Verifying, diagnosing, and maintaining the system
7 Changing the system parameters This chapter contains instructions for reconfiguring the HP SFS system after initial installation. The chapter is organized as follows: • Changing system parameters (Section 7.1) • Using the configure system command (Section 7.2) • Using the configure server command (Section 7.3) • Adding, changing, or removing the Alias IP on a Gigabit Ethernet network (Section 7.4) • Changing the IP address of an individual server (Section 7.
7.1 Changing system parameters After the HP SFS system has been installed and configured, it is possible to change some of the configured parameters; however, there are some parameters that can only be changed by reinstalling the system. This section identifies the parameters that can be changed, and provides instructions for changing them. In general, the process for changing a parameter is as follows: 1.
Table 7-1 Procedures for changing parameters Parameter Can be Changed? Procedure for Changing Networks/Gigabit Ethernet Networks Parameters Type of network No N/A Start IP Yes This procedure describes how to change the Start IP address on a Gigabit Ethernet network. Note that you can also change the IP address on individual servers. For information on how to make this change, see Section 7.5. 1. Unmount the file systems on all client nodes. 2.
Table 7-1 Procedures for changing parameters Parameter Can be Changed? Procedure for Changing Configure the network device on all servers Yes 1. Unmount the file systems on all client nodes. 2. Log in to the administration server and stop the file systems by entering the stop filesystem filesystem_name command for each file system. 3. Enter the show filesystem command and verify that all MDS and OST services used by the file systems are stopped. 4.
Table 7-1 Procedures for changing parameters Parameter Can be Changed? Procedure for Changing Start IP Yes This procedure describes how to change the Start IP address on a bonded Gigabit Ethernet network. Note that you can also change the IP address on individual servers. For information on how to make this change, see Section 7.5. 1. Unmount the file systems on all client nodes. 2.
Table 7-1 Procedures for changing parameters Parameter Can be Changed? Procedure for Changing Networks/InfiniBand Interconnect Parameters Configure an InfiniBand Interconnect Yes 1. Unmount the file systems on all client nodes. 2. Log in to the administration server and stop the file systems by entering the stop filesystem filesystem_name command for each file system. 3. Enter the show filesystem command and verify that all MDS and OST services used by the file systems are stopped. 4.
Table 7-1 Procedures for changing parameters Parameter Can be Changed? Procedure for Changing Networks/Quadrics Interconnect Parameters Quadrics Interconnect Machine ID Yes 1. Run the configure system command to change the parameter value (see Section 7.2). 2. Run the configure server command for all servers (see Section 7.3). Network Services Parameters Gateway IP Yes 1. Run the configure system command to change the parameter value (see Section 7.2).
Table 7-1 Procedures for changing parameters Parameter Can be Changed? Procedure for Changing iLO Parameters ILO User Name Yes See Section 7.9. ILO Password Yes See Section 7.9.
2. The Configure Networks menu is displayed. Select the option to configure Gigabit Ethernet networks. Configure Networks ------------------------------------------------------------. . . 1) 2) 3) a) n) p) c) Configure Gigabit Ethernet Networks Configure InfiniBand Interconnect Configure Quadrics Interconnect All of the above Next Section Previous Section Cancel Enter your choice [a]: 1 3. The Configure Gigabit Ethernet Networks menu is displayed.
• If both the administration and the MDS servers are affected, enter the configure server command for each server, as shown in the following example, where south1 is the administration server, and south2 is the MDS server: sfs> configure server south1 sfs> configure server south2 • If all servers are affected, enter the configure server command for the servers in the order shown in the following example, where south1 is the administration server, south2 is the MDS server, and south3 and south4 are the Ob
2. Use the set nic command to change the IP address, as shown in the following example. In this example, the IP address of the south3 server on the network is changed from the default value that was assigned to it when the network was first configured to 16.123.123.110: sfs> set nic network=nic2 server=south3 ip=16.123.123.110 7.6 Changing the attributes of bonded Gigabit Ethernet networks The attributes of bonded Gigabit Ethernet interconnects are stored in the bonding_options attribute.
5. The Configure Networks menu is displayed. Enter 2 to select the option to configure the InfiniBand interconnect, as follows: Configure Networks ------------------------------------------------------------. . . 1) 2) 3) a) n) p) c) Configure Gigabit Ethernet Networks Configure InfiniBand Interconnect Configure Quadrics Interconnect All of the above Next Section Previous Section Cancel Enter your choice [a]: 2 6. The Configure InfiniBand Interconnect menu is displayed.
5) a) n) p) c) Partition Key All of the above Next Section Previous Section Cancel None Enter your choice [a]: Enter your choice [a]: Enter the Type of network [vib]: Once you specify the IP address of the Admin server the remaining servers are assigned an IP address by incrementing this IP address. Enter the Start IP (e.g. 1.2.3.4 or None) [None]: 172.32.1.2 Enter the Netmask [255.255.255.0]: Enter the MTU [1500]: Enter the partition key used when creating the InfiniBand switch partition.
4. Enter the configure system command, and then enter 2 to select the Configure Networks menu, as follows: sfs> configure system 1) 2) 3) 4) 5) 6) a) e) q) System Configure Networks Network Services Management Network iLO Access Storage All of the above Exit (save updates) Quit (discard updates) Enter your choice [a]: 2 5. The Configure Networks menu is displayed.
Deleting interface ipoib1, are you sure (yes/no)? [no]: yes ipoib1 deleted. Delete IB Interface -----------------------------------------------------------This section is used to delete an InfiniBand Interface. Interface ipoib0 is not included an an interface that can be deleted. c) Cancel Enter the interface to delete [c]: c 8. Repeat Step 7 for each interface you want to delete. 9.
4. Turn on the power to the administration server, by entering the command shown in the following example: sfs> boot server south1 5. Log in to the administration server, then shut down the MDS server by entering the command shown in the following example (where south2 is the MDS server): sfs> shutdown server south2 6. Turn on the power to the MDS server, by entering the command shown in the following example: sfs> boot server south2 The system will now be using the new iLO user name and password.
8 Replacing, adding, and removing hardware, and upgrading firmware This chapter describes the software configuration issues that arise as a result of replacing and adding hardware components. The chapter is organized as follows: • Replacing hardware components (Section 8.1) • Upgrading firmware (Section 8.2) • Adding and removing components (Section 8.
8.1 Replacing hardware components In this section, the term replace means to remove a component and put a new component in its place. NOTE: This guide does not describe how to replace the hardware components; only the software implications of replacing hardware components is addressed here. The following sections describe the configuration impact of replacing hardware components and outline the processes involved.
• Replacing a Voltaire InfiniBand switch (Section 8.1.27) • Relocating an InfiniBand cable to a different port on the InfiniBand switch (Section 8.1.28) • Replacing a Power Distribution Unit (PDU) on a rack (Section 8.1.29) • Replacing a Power Distribution Module (AC power strip) on a rack (Section 8.1.30) • Replacing a Power Distribution Unit (PDU) on an HP ProCurve Switch 2650 or HP ProCurve Switch 2626 (Section 8.1.31) 8.1.
8.1.2 Replacing an Object Storage Server or the motherboard component in an Object Storage Server The process described here assumes that when a server is replaced, the PCI adapters and the internal disk are moved from the old server to the new server. Configuration impact Changes the MAC address of the server on the management network. Process 1.
To recover from failure of both internal disks in the administration server or the MDS server, perform the following steps: 1. Replace both disks. 2. Open the integrated keyboard and flat-panel monitor on the server. 3. Select the console of the server. 4. Turn on the power to the server. 5. Insert the HP StorageWorks Scalable File Share System Software CD-ROM into the disk drive. 6.
8.1.6 Replacing a Storage Host Bus Adapter on an EVA4000 array (HP Part Number FCA2214 DC) Configuration impact Changes the WWID of the ports. Process 8.1.7 1. Update the presentation of virtual disks on each EVA. Refer to Chapter 5 of the HP StorageWorks Scalable File Share System Installation and Upgrade Guide for information on presenting virtual disks. 2. Reboot the server. Replacing a Fibre Channel cable (from server to Fibre Channel switch) Configuration impact None.
8.1.10 Replacing a disk in an SFS20 array TIP: Section 9.34 provides useful information on how to determine whether you need to replace a disk in an SFS20 array. Configuration impact When a disk fails, the SFS20 array automatically rebuilds the logical drives to use the spare disk (if one is available). It is unusual for multiple disks in an SFS20 array to fail at the same time.
2. To reduce the amount of time that the read operation will take, bind each LUN on the array to a raw device, as follows (binding the LUN to a raw device increases the block size from 128KB to 256KB and approximately halves the run time for the read operation): a.
disk drive as soon as possible, because if a second disk drive fails, it will cause the entire logical drive to fail. • If the status of any logical drive on the array is set to recovery in progress, this indicates that the logical drive is being rebuilt. Do not proceed to Step 2 while any logical drive on the array shows the recovery in progress status.
• When a controller module is replaced in an SFS20 array, the battery in the array starts to recharge, and this can take some time (on average, about two hours, but possibly up to six and a half hours). During this time, the array works in an extremely degraded mode. Process 1. Back up any file system that uses the LUNs on the array where you are replacing the controller module (see Section 6.4 for information on backing up file system data). 2. Stop any file systems that use the LUNs on the array. 3.
Configuration impact None. NOTE: Replace failed cache batteries as soon as possible. The batteries are not a redundant component— when one of the batteries fails, the array cache is immediately disabled. This severely degrades the performance of the OST services on the array. Process 1. Shut down the servers attached to the SFS20 array. 2. Replace the hardware. 3. Reboot the servers. When you shut down the Object Storage Servers, the file systems they serve will appear to stall.
8.1.15 Replacing Smart Array 6404 adapter Configuration impact None. Process 1. Use the show server command to determine whether the firmware version on the new adapter is as specified in the HP StorageWorks Scalable File Share Release Notes. 2. If necessary, upgrade the firmware version. 8.1.16 Replacing a Quadrics network adapter Configuration impact None. Process Boot the server. 8.1.
7. Start the file system(s). 8. When file systems restart, client nodes can remount the file systems. If a client node fails to mount a file system, reboot the client node. 8.1.18 Replacing a Quadrics cable Configuration impact None. Process Shut down the server before replacing the cable. 8.1.19 Replacing a Myrinet network adapter Configuration impact Changes the Network ID of the server on the Myrinet interconnect.
8.1.20 Relocating a Myrinet cable to a different port on a Myrinet network Configuration impact None. Process Shut down the server before relocating the cable. 8.1.21 Replacing a Myrinet cable Configuration impact None. Process Shut down the server before replacing the cable. 8.1.22 Replacing a dual-port Gigabit Ethernet adapter (for dual or bonded Gigabit Ethernet) Configuration impact None. Process Shut down the server before replacing the adapter. 8.1.
8.1.25 Replacing a Voltaire HCA adapter Configuration impact None. Process Boot the server. 8.1.26 Replacing an InfiniBand interconnect cable (from server to InfiniBand switch) Configuration impact None. Process Shut down the server before replacing the cable. 8.1.27 Replacing a Voltaire InfiniBand switch Configuration impact None. Process Reboot the servers attached to the switch but stagger the rebooting so that services are not lost, as follows: 1.
8.1.29 Replacing a Power Distribution Unit (PDU) on a rack Configuration impact File system is unavailable while PDU is down. Process Shut down the servers in the rack before replacing the PDU. The servers automatically reboot when the PDU is replaced. 8.1.30 Replacing a Power Distribution Module (AC power strip) on a rack Configuration impact None. Process No action required. 8.1.
8.2.1.1 Upgrading online using the OnlineROM Flash Component executable To upgrade the HP Integrated Lights-Out Management Controller firmware or the BIOS - System ROM firmware online, perform the following steps: 1. Download the Online ROM Flash Component executable file from the Web site specified in the HP StorageWorks Scalable File Share Release Notes. 2. From the administration server, use the ssh utility to log in to the server to be upgraded. 3. Run the executable file on the server.
8.2.2 Upgrading firmware on Smart Array 6404 adapters and SFS20 arrays The HP StorageWorks Scalable File Share Release Notes provide details of the correct firmware version for the SFS20 array as well as details of where you can obtain firmware source files. This section provides instructions for upgrading the firmware on all SFS20 arrays and all Smart Array 6404 adapters in your HP SFS system. • If you want to upgrade only the firmware on the SFS20 arrays, skip Steps 4, 5, and 12.
NOTE: The firmware files are currently placed in the /usr/mst/fw-23108-rel-3.0.0/ directory. However, if this directory does not exist, or if the names of the firmware files in that directory do not correspond to the firmware versions recommended in the HP StorageWorks Scalable File Share Release Notes, please contact your HP Customer Support representative for details of which firmware files you must use.
8.3 5. When the command prompt returns, the upgrade is complete. Reboot the server to bring the new firmware into effect. 6. When the server has rebooted, enter the vstat command and verify that the firmware has been correctly upgraded. Adding and removing components This section describes how to add and remove components in an HP SFS system, and is organized as follows: 8.3.1 • Adding Object Storage Servers (Section 8.3.1) • Removing Object Storage Servers (Section 8.3.
e) Exit (save updates) q) Quit (discard updates) Enter your choice [a]: 1 System ---------------------------------------------------------------------This section specifies the name of the system and the number of servers currently present. 1) 2) 3) 4) 5) a) n) c) System name DNS domainname Number of servers Number of the Admin (first) server Timezone All of the above Next Section Cancel south my.domain.
7. Boot the new Object Storage Servers, as shown in the following example: sfs> boot server south[5-6] 8. Configure the storage for the new Object Storage Servers and complete the configuration of the servers, as described in the following sections of the HP StorageWorks Scalable File Share System Installation and Upgrade Guide: • For EVA4000 storage, refer to Sections 6.18.2 through 6.18.6. • For SFS20 arrays, refer to Sections 7.15.2 through 7.15.4.
At any stage you can enter ? to get more help on the topic. To update data select one or all of the following sections: 1) 2) 3) 4) 5) 6) a) e) q) System Configure Networks Network Services Management Network iLO Access Storage All of the above Exit (save updates) Quit (discard updates) Enter your choice [a]: 1 System ---------------------------------------------------------------------This section specifies the name of the system and the number of servers currently present.
6. Reconfigure the administration and MDS servers and the remaining Object Storage Servers in the system by entering the configure server command, as shown in the following example: sfs> configure servers south[1-4] 7. Delete the storage that was attached to the removed Object Storage Servers from the database, as shown in the following example, where arrays 5 and 6 were attached to the south5 and south6 servers: sfs> delete array 5 sfs> delete array 6 8.3.
To upgrade a single Gigabit Ethernet interconnect in an existing HP SFS system to become a dual or a bonded Gigabit Ethernet interconnect, perform the following steps: 1. Unmount the file systems on all client nodes. 2. Stop all file systems, as shown in the following example, where the data file system is stopped: sfs> stop filesystem data Repeat this command for each file system. 3. Enter the show filesystem command and verify that all MDS and OST services used by the file systems are stopped. 4.
8.3.6 Removing a dual or a bonded Gigabit Ethernet interconnect In an HP SFS system where a dual or a bonded Gigabit Ethernet interconnect is used, you can downgrade the system to use a single Gigabit Ethernet interconnect. The entire HP SFS system must be downgraded, as described here (that is, you cannot downgrade some servers to use a single Gigabit Ethernet interconnect and leave other servers using a dual or a bonded Gigabit Ethernet interconnect).
9 Troubleshooting This chapter provides information for troubleshooting possible problems in the HP SFS system. (For information on troubleshooting problems on client nodes, refer to the HP StorageWorks Scalable File Share Client Installation and User Guide.) The topics covered in this chapter include the following: • Server fails to boot during installation (Section 9.1) • Server fails to boot (Section 9.2) • Server stops responding (Section 9.
9–2 • Accessing the iLO component (Section 9.31) • Troubleshooting licenses (Section 9.32) • Troubleshooting failed SFS20 arrays (Section 9.33) • Handling Disk Errors on SFS20 storage (Section 9.34) • Recovering degraded MDS services on systems using EVA4000 storage (Section 9.35) • System log files (Section 9.36) • Administration service restarts every one minute (attempting to start the evlogd daemon) (Section 9.
9.1 Server fails to boot during installation If a server fails to boot during the installation process, it may be as a result of incorrect firmware versions on the server. Make sure that the firmware versions on the HP Integrated Lights-Out Management Controller and the BIOS - System ROM on the server meet the specifications listed in the HP StorageWorks Scalable File Share Release Notes. In addition, check that the MAC address of each server is correct.
This forces the server to crash and reboot, and the output of the crash dump is captured when the server reboots. The crash dump file will be located at /var/crash/nnn (where nnn is the IP address of the local loopback interface (127.0.0.1), the date and the time of the crash) on the server that crashed. Copy all of the files in this directory to the administration server using the scp command. • The server kernel, /boot/vmlinux-version on the server, and the corresponding /boot/System.map file.
When the configure system command attempted to add this route, the command failed and returned the RTNETLINK answers: File exists message. This failure occurred because a route to this network already existed, as is shown in the output from the following command (run on the administration server): # netstat -nr Kernel IP routing table Destination Gateway 10.128.0.0 0.0.0.0 169.254.0.0 0.0.0.0 192.168.0.0 0.0.0.0 Genmask 255.255.0.0 255.255.0.0 255.255.0.
3. Connect to the remote console of the server, as follows: # hpls_console --server server_name --remote 4. During the power on/start-up phase on the server, enter the ROM-Based Setup Utility (RBSU) by pressing the F9 key when prompted. 5. The MAC addresses for NIC 1 and NIC 2 on the server are displayed on the right-hand side of the screen; record the value for NIC 1. 6. To update the database, enter the command shown in the following example at the sfs> prompt.
mysql> select * from hpls_object_states where type=’Luns’; +------+------+--------------+--------------+--------+------------+-------+ | name | type | current_state| desired_state| action | command_id | status| +------+------+--------------+--------------+--------+------------+-------+ | 4 | Luns | Configured | NULL | NULL | NULL | NULL | | 10 | Luns | Configured | NULL | NULL | NULL | NULL | +------+------+---------------+-------------+--------+------------+-------+ 2 rows in set (0.00 sec) 3.
In this example, the service LUN is LUN 11, and LUN 13 is a spare service LUN on an array attached to servers south3 and south4. 3. For the existing service LUN, set the user value to an empty string and the role of the LUN to svcspare, by entering the commands shown in the following example: sfs> set lun 11 user="" sfs> set lun 11 role=svcspare 4.
========================= Member -------------south1 south2 M e m b e r Status ---------Up Up ========================= Node Id ---------0 1 S t a t u s ========================== Power Switch -----------Good Good H e a r t b e a t S t a t u s ==================== Name Type Status ------------------------------ ---------- -----------south1 <--> south2 network ONLINE ========================= S e r v i c e S t a t u s ======================== Last Service Status Owner Transition --------------
9.15.
9.18 Emergency clustat events occur during configure server command When the configure server command is running on a server for the first time, emergency clustat events such as the following may appear in the event log: south1 clustat[4867]: diskLseekRawReadChecksum: bad check sum, part = 1 offset = 8704 len = 24 These messages are benign and the events can be ignored. 9.
If the table has not been repaired, restore an earlier version of the database.
9.23 Troubleshooting the Quadrics interconnect This section provides some useful tips for investigating and solving potential problems with a Quadrics interconnect. In this section, the term node is used to refer to any system that is connected to the interconnect: a node can be a server in the HP SFS system, or a client node. The term nodeset is used to describe the collection of nodes that are on an interconnect. 9.23.
9.23.2 Nodeset and Node ID information To determine what nodes are visible to a node on the Quadrics interconnect, enter the following command: # cat /proc/qsnet/ep/rail0/nodeset [7-10] In this example, the output indicates that nodes 7, 8, 9, and 10 are visible to the node. Note that the /proc/qsnet area is created when the qsnet module is loaded. Similarly, the /proc/qsnet/elan area is created when the elan module is loaded, the /proc/qsnet/ep area is created when the ep module is loaded, and so on.
9.24 Troubleshooting the Myrinet interconnect This section provides some useful tips for investigating and solving potential problems with a Myrinet interconnect. In this section, the term node is used to refer to any system that is connected to the interconnect: a node can be a server in the HP SFS system, or a client node. The term nodeset is used to describe the collection of nodes that are on an interconnect. 9.24.
Board number 0: lanai_cpu_version = 0x0a00 (LANai10.0) lanai_sram_size = 0x001fe000 (2040K bytes) ROM settings: MAC=00:60:dd:48:f9:5e SN=200151 PC=M3F2-PCIXE-2 PN=09-02746 LANai time is 0x205a627547 ticks, or about 1035 minutes since reset. Packet Interface 0: Mapper is 00:60:dd:48:f9:5e. Map version is 3070933. 1 hosts. Network is fully configured. Packet Interface 1: Mapper is 00:60:dd:48:fa:ea. Map version is 1025328. 33 hosts. Network is fully configured.
ats devucm q_mng ibat cm_2 gsi adaptor-tavor vverbs-base mlog repository hadump bmod_ib_mgt mod_vapi mod_vipkl mod_thh mod_hh mod_vapi_common mod_mpga mosal 36416 13944 18768 62572 72416 56392 136480 49960 1 2 0 5 0 6 1 0 8628 0 77628 0 1488 75892 127456 203424 260640 16632 63880 24960 112276 0 0 0 0 0 0 0 0 0 [sdp] [sockets ipoib-ud devucm] [sdp devucm] [ipoib-ud ats ibat cm_2] [sdp ipoib-ud ats devucm q_mng ibat cm_2 gsi adaptor-tavor] [sockets sdp ipoib-ud ats devucm q_mng ibat cm_2 gsi adaptor-
1) Auto-start 2) IPoIB 3) Fabric => 4) Firmware-update 5) Start 6) Stop 7) MPI 8) Exit Note the following points: • The HBA Ports status field for port 1 must be set to PORT_ACTIVE. • The IPoIB (active) column must contain the same data as the IPoIB (config) column. 9.25.1.2.
Subsystem: Mellanox Technology MT23108 InfiniHost Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 32 Memory at f7e00000 (64-bit, non-prefetchable) [size=1M] Memory at f5000000 (64-bit, prefetchable) [size=8M] Memory at e8000000 (64-bit, prefetchable) [size=128M] Capabilities: [40] #11 [001f] Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/5 Capabilities: [70] PCI-X non-bridge device. Enable- If the output is not as shown here, check the adapter as follows: 1.
9.25.4 Connection and data transfer problems If a server has connection or data transfer problems over a Voltaire InfiniBand interconnect, check whether the rx/tx/error counters are correct by entering the following command on the server: # /usr/mellanox/bin/get_pcounter -p 1 If the rx/tx/error counters are correct, the output will be as follows.
9.26 Troubleshooting file systems This section provides information for troubleshooting problems with file systems, and is organized as follows: • Problems creating a file system (Section 9.26.1) • Identifying servers serving OST services (Section 9.26.2) • The start filesystem command may fail twice (Section 9.26.3) • Troubleshooting the stop filesystem command (Section 9.26.4) • Using the MPI Lustre repair utility to repair file systems (Section 9.26.
Shut down the Object Storage Servers that served the file system by entering the following commands on the administration server: 3. # sfsmgr . . . sfs> shutdown servers server_names From the administration server, reboot the MDS server by entering the command shown in the following example, where south2 is the MDS server: 4.
9.26.4 Troubleshooting the stop filesystem command If the stop filesystem command fails, attempt to correct the problem as follows: 1. Identify the servers where the command failed; they may have been down when the command was run. If this is the case, reboot the servers. 2. Enter the stop filesystem command again. If the command fails again, continue with the remaining steps. 3.
• mdc • obdclass • lvfs • lnet If only one file system is being served, and one or more of the above modules still exists on the server (indicating that the Lustre modules have not been unloaded cleanly), reboot the Object Storage Server. Repeat these steps on each Object Storage Server that has OST services serving the file system. 6.
Note the following points: • You must not run more than one file system repair session at any one time. • The version of the MPI Lustre repair utility that is provided with HP SFS Version 2.2 is intended for use only while the file system is stopped—that is, the file system is not being used to create, delete, or modify any files contained within it. • HP recommends that you only use the MPI Lustre repair utility if you suspect that a file system has become catastrophically corrupted.
NOTE: When a new file system is created or an existing file system is modified, you can run the repair-lfsck script so that a shell script will be available if the file system later needs to be repaired. If you want to repair a file system that has been changed since a shell script was last generated for the file system (for example, new OST services have since been added to the file system), you must run the repair-lfsck script again to generate a new shell script for the file system.
2. Stop the file system and ensure that the file system devices are not being used. Check the /proc/ mounts file and the /proc/fs/lustre/obdfilter/mntdev (or /proc/fs/lustre/mds/ mntdev) files to ensure that the file system devices are not mounted on any mount point. 3. Ensure that the administration server can connect to all of the other servers used by the file systems using the ssh utility without a password. 4.
2. Unmount the file system that uses the device on all client nodes. The /proc/fs/lustre/mds/*/num_exports counter on the MDS server must be 0 (zero). This counter is set internally when client nodes mount and unmount file systems. 3. Stop the file system by entering the command shown in the following example, where the file system is called test: sfs> stop filesystem test 4. Verify that the file system is stopped by entering the show filesystem command.
NOTE: An MDS service is considered to be a client of an OST service; as a result, the number of recoverable clients shown in the message should be the total number of client nodes plus one for the MDS service. However, it is possible for this count to exceed this value; for example, when a client node is reset or crashes while the Lustre file system is mounted, and then later mounts the Lustre file system again, Lustre counts the rebooted client mount operation as a completely new client connection. 2.
9.26.8 Rebalancing file system services Sometimes, file system services do not run on their preferred server. This happens after a temporary failure of a server or storage component. During the initial failure, the service fails over to the backup server. When you reboot the failed server, the services sometimes remain on the backup server; they do not automatically relocate back to their usual (preferred) server. You can see this by examining the output from the show server and show filesystem commands.
9.26.9 Troubleshooting supplementary groups access If a user receives unexpected access denied errors when using supplementary groups, use the following procedures and information to troubleshoot the problem: • Ensure that ssh access is set up correctly (refer to the Configuring supplementary groups section in Chapter 9 of the HP StorageWorks Scalable File Share System Installation and Upgrade Guide for more information).
9.27 Troubleshooting file system performance In some circumstances, the performance of a Lustre file system may seem to be less than optimal. There are a number of tests that you can run to assess whether the system is performing within the prescribed limits; these tests, and the steps you can take to remedy possible problems, are described in this section. This section is organized as follows: • Performance troubleshooting (Section 9.27.1) • Verifying file striping (Section 9.27.
Table 9-1 Performance troubleshooting tasks 2. Task Process Determine whether the performance anomaly can be reproduced. 1. Rerun the client application. 2. Ensure that the timing measurements are being made correctly, by synchronizing the time on the client nodes and on the servers in the HP SFS system. Use the NTP functionality to ensure accurate synchronization.
Table 9-1 Performance troubleshooting tasks 6. Task Process Verify that the interconnect is performing correctly. 1.
Table 9-1 Performance troubleshooting tasks 7. Task Process Verify that the storage is configured correctly. 1. Examine the HP SFS system database to ensure that the file system is correctly configured and all of the services are correctly distributed (see Chapter 4). 2. (EVA4000 storage only) Using the Command View EVA Web interface, view events for each EVA4000 array and look for any hardware errors.
9.27.2 Verifying file striping If a file is not being striped across all OST services, the full bandwidth may not be available.
Then enter the show filesystem filesystem_name command to view details for the file system, as follows: sfs> show filesystem scratch Name: scratch . . . Stripe count: 2 . . . In this example, the stripe count is shown as 2. Remedial action Correct the problem as follows: • If the striping on the file is incorrect, ensure that the striping was not assigned through default directory or file system striping, then recreate the file using the correct striping (see Section 9.27.2.1).
09:31:28 09:31:35 09:31:35 09:31:35 south2 south2 south2 south2 -- -- -- Server is already configured to the desired state -- Command has finished: south2 -- *** Server States *** Completed: south2 Note: The MDS configuration will be rewritten when the file system is restarted. sfs> 5.
2. Rename the new file to the original name using the mv command, as shown in the following example: # mv data.new data.1 mv: overwrite ’data.1’? # lfs getstripe data.1 OBDS: 0: ost1_UUID 1: ost2_UUID 2: ost3_UUID 3: ost4_UUID ./data.1 obdidx 0 1 2 3 # 9.27.
Remedial action See Section 9.26.8 for information on how to rebalance file system services. 9.27.4 Checking for unbalanced controllers in EVA4000 arrays File system performance may be affected if all of the storage on an EVA4000 array is served through a single controller. This situation may arise in the following circumstances: • One of the pair of controllers on the EVA4000 array has failed. • All storage on the EVA4000 array is presented through a single controller in error.
Remedial action See Section 9.29 for information on troubleshooting LUN presentation. 9.27.5 Examining the system logs for errors Events that occur on any server in the system are sent to the administration server (or to the MDS server if the administration server has failed) and logged in the system event log.
9.27.6 Examining EVA4000 storage subsystems for errors To examine the EVA4000 storage subsystems for errors, perform the following checks: 1. Log in to the Command View EVA Web interface. 2. View the events for each EVA4000 array to determine whether there are errors on the storage hardware. Remedial action Refer to the Command View EVA documentation or to the online help available on the switch interface for help with interpreting the event log and correcting errors on the storage hardware. 1.
9.27.8 Examining the interconnect switch for errors To find information on errors on a Myrinet interconnect, a Quadrics interconnect, or a Voltaire InfiniBand interconnect, refer to the manufacturer’s documentation. To find information on errors on a Gigabit Ethernet interconnect, look at all of the switches through which traffic between servers and clients will be routed and check for errors or misconfigurations. The data available varies depending on the type of switch that is used.
pg44lab1 11-Feb-1990 18:15:50 ==========================- TELNET - OPERATOR MODE -============================ Status and Counters - Port Counters Port ------35 36 37 38 39 40 41 42 43 44 45-Trk1 Total Bytes -----------+259,835,435 +086,639,838 473,938 68,393 843,200,329 +777,558,393 832,640,584 +933,868,570 +280,011,919 +205,240,000 98,273,964 Actions-> Back Total Frames -----------15,981,828 13,521,325 6915 811 +460,165,122 +818,355,966 492,295,586 344,202,370 15,679,849 15,362,166 393,671,331 Show
Use the telnet(1) command to access the Fibre Channel switch, as shown in the following example: # telnet 192.168.32.1 The throughput statistics for the ports on a switch are refreshed once every second. To view details of the speed that each port on the switch is running at, log in to the switch and enter the following command.
To display the WWIDs of all HBAs in the HP SFS system, enter the following command: # lsdbquery system ’select * from hpls_hbas’ 210000e08b0eac9c 210100e08b2eac9c 0 1 fc fc Null Null down1 down1 9.27.10 Troubleshooting slow commit messages If messages relating to slow commit operations are reported, they may be as a result of the file system being under a particularly heavy load. However, such messages may also be an indication of an underlying hardware issue.
• There may have been a battery failure on the SFS20 array, causing the cache to be disabled. See Section 8.1.12 for information about replacing a failed battery pack and cache on an SFS20 array. 9.28 Troubleshooting EVA4000 array connectivity If you are encountering difficulties in accessing storage from an EVA4000 controller pair, make sure that the presentations to your server are correct.
If your system cannot see the controllers (see Example 9-2), check the Fibre Channel network. If your system can see the controllers but cannot see the storage, check the presentations of the LUNs on the Storage Management Appliance (refer to Chapter 5 and Chapter 6 of the HP StorageWorks Scalable File Share System Installation and Upgrade Guide for information on storage configuration, and see Example 9.29 of this guide for information on troubleshooting LUN presentation).
( ( ( ( ( ( ( ( ( 2: 2: 2: 2: 3: 4: 4: 4: 5: 2): 3): 4): 5): 0): 0): 1): 2): 0): Total Total Total Total Total Total Total Total Total reqs reqs reqs reqs reqs reqs reqs reqs reqs 24, Pending reqs 0, flags 0x2, 0:0:84 00 24, Pending reqs 0, flags 0x2, 0:0:83 00 24, Pending reqs 0, flags 0x2, 0:0:84 00 24, Pending reqs 0, flags 0x2, 0:0:83 00 3, Pending reqs 0, flags 0x0, 0:0:84 0c 3, Pending reqs 0, flags 0x0, 0:0:85 0c 24, Pending reqs 0, flags 0x2, 0:0:85 00 28, Pending reqs 0, flags 0x2, 0:0:86 00 3
The following is an example of the output from the command where LUNs from an EVA4000 array are (incorrectly) visible to both controllers in the controller pair: # /usr/opt/hpls/diags/bin/controller_check.
9.30 Accessing consoles You can connect to the console of a server through the iLO, as shown in the following example (where south6 is the server being accessed): # hpls_console --server south6 By default, the connection is to the iLO serial port emulation, which is used to access the Linux console (configured as ttyS1 (COM2)). If you want to connect to the Remote Console Port, which is used to access the BIOS and iLO menus, use the --remote argument with the command.
NOTE: If you get a message similar to the following, see Section 9.31.2 for information on how to troubleshoot the problem: Requested service is unavailable, it is already in use by a different client.Connection closed by foreign host. 2. If the server whose console you want to access is currently running, shut down the server by entering the commands shown in the following example, where the south3 server is shut down. Enter the commands on the administration server: # sfsmgr . . .
Other than another user accessing a console, there is another known cause for a locked up console: if the administration server crashes while the powerd daemon is using the telnet(1) command to access the iLO component, the iLO component keeps the socket open. This means that the MDS server or other servers will be unable to access the iLO component. When the administration server is next rebooted, the iLO component will release the socket and allow other hosts to access it.
8. You can directly test whether an increment is licensed, as follows. For the SFSMDSCAP or SFSMDSENT licenses, enter the following command: # /usr/opt/hpls/bin/hpls_license SFSMDSCAP 1 granted {} For the SFSOSTCAP or SFSOSTENT licenses, you can specify the number of license units (terabytes of storage) as shown in the following examples.
• When you reboot the affected servers, the show array command shows a failed status. Because an array is attached to two servers, there are actually two status values, one for each server. The following shows an example of the show array command: sfs> show array 2 . . . Connected to -------------south3 south4 . . . HBA/Port ---------scsi-1/2 scsi-1/2 Status ---------failed online In this example, the south4 server appears to show a status of online.
9.33.2 Recovering from a temporary SFS20 array failure If an SFS20 array has had a temporary failure such as loss of power or inadvertent disconnection of one of its UltraSCSI cables, operation can be restored by correcting the failure and rebooting the affected servers. To recover from a temporary failure of an SFS20 array, perform the following tasks: 1. Correct the cause of the failure (for example, restore power to the array). 2.
2. Identify the LUNs that the MDS or OST service is based on, by using the show filesystem filesystem_name command as follows: sfs> show filesystem data . . . MDS Information: Name ----mds8 LUN --41 Array ----- Controller ---------- Files -------2.3M Used ---20% Service State ------------running Running on ---------south2 OST Information: Name ----. . . LUN --- Array ----- Controller ---------- Size(GB) -------- Used ---- Service State ------------- Running on ---------- ost29 . . .
Number 0 1 Major 105 105 Minor RaidDevice State 96 0 active sync /dev/cciss/c1d6 32 -1 faulty /dev/cciss/c1d2 # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] [multipath] read_ahead 1024 sectors md0 : active raid1 cciss/c1d2[1](F) cciss/c1d6[0] 10485504 blocks [2/1] [U_] unused devices: If you have stopped and restarted the file system or rebooted the south2 server, the status may indicate that the failed component is missing, instead of failed.
When the resynchronization is complete, the status information will change, as shown in the following example: # mdadm --detail /dev/md0 /dev/md0: . . . State : clean . . . Number 0 1 Major 105 105 Minor RaidDevice State 96 0 active sync 32 1 active sync /dev/cciss/c1d6 /dev/cciss/c1d2 You can check the progress of the resynchronization process by examining the event log as follows: sfs> show log facility=storage && age < "5m" . . .
(For information on configuring email alerts, see Section 6.2.) In the event of disk errors occurring, take action as described in the following sections: • Disks showing the removed/failed state (Section 9.34.1) • Disks showing the predict fail state (Section 9.34.2) • Disks showing the logging errors state (Section 9.34.3) 9.34.
For more information on reviewing SFS20 array information, see Section 6.1.4.2. If, after further investigation, you decide to replace a disk that is logging URE errors, see Section 8.1.10 for more information. TIP: You can use the diskinfo wrapper script for the hpls_cciss_info command to report the drive status on all SFS20 arrays attached to a server.
3. Identify the component LUNs that are used to mirror the LUN, using the show lun lun_number command as shown in the following example: sfs> show lun 41 LUN . . . Component LUN ------------19 22 (WWID): raid_41 Size (GB) --------1900 1900 In this example, LUNs 19 and 22 are components of LUN 41. 4. As displayed by the show filesystem command in Step 2, the mds8 service is running on the south2 server.
6. In the example in the previous step, the /dev/sdc component is shown to have failed, while the /dev/sdb component is operational. You can determine which LUN the /dev/sdc component is associated with by entering the command shown in the following example: # ls -l /dev/hpls | grep sdc$ lrwxrwxrwx 1 root root 9 May 4 01:49 /dev/hpls/dev22a -> /dev/sdc In this example, LUN 22 (/dev/hpls/dev22a) is the failed mirror component. 7.
When the resynchronization is complete, the /proc/mdstat command indicates this, as shown in the following example: sfs> show log facility=storage && age < "5m" . . . 2004/11/02 10:56:41 storage n south2: mds8: /proc/mdstat: md0 : active raid1 sdc[1] sdb[0] 10485504 blocks [2/2] [UU] . . . 9.36 System log files The following log files are present on each server and may be useful: • Configuration system: /var/log/configd/configd.log • Power management: /var/log/powerd/powerd.
9.38 The MDS service fails with an ASSERTION(ino ==inode->i_ino) message In rare circumstances, the MDS service encounters a bug (caused by a client node) during the recovery process. This causes the server where the MDS service is running (normally the MDS server) to crash with an LBUG error. When this happens, events similar to the following are displayed in the event log: LustreError: 11691:0:(mds_open.c:1013:mds_open()) uuid 92ea84f7-3a40-ac0a-74b5-4e0be3d3e3a3 LustreError: 11691:0:(mds_open.
5. If you know which file system is causing the problem, start that file system; otherwise, start all file systems, but wait until each file system goes to the started state before starting the next file system. If all file systems start normally, you can skip the remaining steps. If the MDS service crashes again after you start a file system, continue with the remaining steps. 6.
When the disks have been replaced, the error messages shown above may continue to be generated; this is because the logical drive needs to be rebuilt. You can rebuild the logical drive using the hpacucli utility (on one of the servers to which the array is attached), as described here. In the example shown here, array 42 is the array where the logical drive failed and the disks have been replaced. The ld all show command in the hpacucli utility shows the status of the logical drives on the array as Failed.
9.41 Determining if the Network ID of a server on a Quadrics or Myrinet interconnect has been changed If you relocate a server to a different port on a Quadrics switch, or replace a Myrinet interconnect adapter on a server, the Network ID of the server is changed. The procedures for dealing with these changes are described in Section 8.1.17 and Section 8.1.19. If you do not perform the documented steps, errors similar to the following will occur on the server where the Network ID changed.
9.42 Troubleshooting client mount failures When a node is booting, you can monitor the progress on the console.
When this happens, the SFS service prints a message similar to the following in the /var/ log/messages file: sfsmount: Waiting for IB to be ready. If this state persists for a long time, investigate the InfiniBand network. (You can log in and run the vstat command to confirm the port state.) • If some of the services are in the recovering state, the mount operation may be delayed for any of the reasons described above (for the scenario where all file systems are in the running state).
A CLI commands This appendix lists the HP SFS CLI commands.
A.1 General information • To start the CLI, enter the sfsmgr command. • In commands where it is appropriate, you can use either the singular or the plural form of the component to which the command applies. For example, you can enter the configure server command in either of the following formats: sfs> configure server server_name or sfs> configure servers server_names • Most of the CLI commands can be abbreviated.
A.2 accept license command The accept license command accepts and installs license information. For information about license file concepts, see Section 3.11.1. Syntax accept license A.3 activate ost command The activate ost command activates OST services that were previously deactivated. Before an OST service can be activated, the file system that uses the service must be stopped.
A.5 A.5.1 configure commands configure array The configure array command configures an SFS20 array. This command cannot be used to configure an EVA4000 array. Syntax configure array array_numbers [adm_lun=yes|no] [adm_lun_size=size] [service_lun=yes|no] [luns_size=size|size%[,size|size%,...
raid Optional. Specify the redundancy level that is to be used for the array. The default value is ADG. Valid values are ADG, 6, 5, 1+0, 1, and 0. In HP SFS Version 2.2, only ADG (also called RAID6) and RAID5 redundancy are supported. For 500GB disks, only ADG redundancy is supported. If the MDS LUNs are mirrored, both of the component LUNs must be configured with the same type of redundancy.
When high priority is specified, expansion or rebuild occurs at the expense of normal I/O operations. Although system performance is affected, this setting provides better data protection because the array is vulnerable to additional drive failures for a shorter time. For more information on rebuild priority settings, refer to Appendix J of the HP StorageWorks Scalable File Share System Installation and Upgrade Guide.
redo Optional. Specify the configuration state to which you want the servers returned. Valid values are Unconfigured, Prepared, Booted, Db_Moved, Clusterized, and Configured. The state specified here must be an earlier state than the current state of the servers that are being configured.
Syntax create attribute attribute_name=attribute_value Arguments attribute_name Required. Specify the name and value of a new or existing attribute. If the attribute value contains spaces, enclose the value in double quotes. Example The following example creates the nickname attribute and sets its value to west: sfs> create attribute nickname=west See Section A.19.2 for a list of the system attributes that can be changed. A.6.
A.6.5 create filesystem The create filesystem command creates a new Lustre file system. Syntax Interactive mode create filesystem Scripted mode You can create a file system in scripted mode by entering the create filesystem command with all of the parameters for the file system. The syntax for the script is as follows: create filesystem filesystem_name parameter_list The parameters are described in Table A-1.
A.7 deactivate ost command The deactivate ost command deactivates OST services. When an OST service is deactivated, no new files will be created on the service. However, access to existing files on the OST service (for both read and write operations) is unaffected. Before an OST service can be deactivated, the file system that uses the service must be stopped.
A.8.2 delete array The delete array command deletes array information from the database. CAUTION: Deleting the array information from the database makes the array unusable. Use this command only when an array is being removed from the system. Syntax delete array array_name|array_number|array_wwid Example The following example deletes array 4: sfs> delete array 4 A.8.3 delete attribute The delete attribute command deletes a system attribute. You can only delete certain attributes.
A.8.5 delete database_backup The delete database_backup command deletes the specified system database backup copy. Syntax delete database_backup backup_number Arguments backup_number Required. Specify the number of the system database backup copy to be deleted. Example The following example deletes system database backup copy 4: sfs> delete database_backup 4 A.8.6 delete filesystem The delete filesystem command deletes the specified file system.
Example The following example disables the server_down alert: sfs> disable alert server_down A.9.2 disable server The disable server command is used to specify that no services are to run on a server the next time the server is booted. There are times when you want to bring up a server without any MDS or OST services running on the server. To do this, you shut down the server and then enter the disable server server_name command. When you start the server again, no services start on the server.
A.10 enable commands A.10.1 enable alert The enable alert command enables email alerts so that email messages are sent when events occur that match the filters on the alerts. If an email alert is already enabled when you enter the enable alert command, no error message is reported—the alert simply remains enabled. Syntax enable alert alert_name|all Arguments alert_name|all Required. Specify the email alert to be enabled, or all to enable all email alerts.
A.11 help command The help command shows information about commands. Syntax help [command_name] Arguments command_name Optional. Specify the command you want to find information about. If you do not specify a command, the help command displays a list of all commands. Example The following example displays information about the shutdown server command: sfs> help shutdown server A.12 kill command command The kill command command kills the specified command.
A.13 modify commands A.13.1 modify alert The modify alert command allows you to modify email alerts. When the HP SFS system software is installed, a number of email alerts are created by default. After the system has been installed, you can use the modify alert command to change the email addresses on the default alerts. In addition, you can use the modify alert command to modify any email alerts that you create on the system.
on OST services. HP recommends that you enable the extents mount option on file systems underlying OST services only—do not enable this mount option on MDS services. • • • The setting for the mballoc option is determined by the file I/O pattern specified for the file system (when a file system is created in HP SFS Version 2.2). Do not change the setting for the mballoc option on an existing file system.
Table A-2 Parameters for scripted modify filesystem command Parameter Value Format Examples add_ost lun[:lun2][/server][,...] 5/south3 5:6,7:8 mount_point pathname /mnt/here stripe_size1 nXB 4MB 1GB stripe_count n 1 8 preferred_server service/server [,...] mds/south1 mount_opts mds_mount_opts;ost_mount_opts ““;extents acl;extents mds_mount_opt option[,option]... acl ost_mount_opt option[,option]... extents interconnect net[,...
A.16 quit command The quit command closes the CLI. (You can also use the exit alias to close the CLI.) Syntax quit A.17 restore database_backup command The restore database_backup command restores the specified system database backup copy. Syntax restore database_backup backup_number Arguments backup_number Required. Specify the number of the system database backup copy to be restored. You can use the show database_backup command (see Section A.20.
A.19 set commands A.19.1 set array The set array command turns on/off the blue LEDs on the SATA drives in the SFS20 array, and sets the preferred server, rebuild priority, and surface scan delay for the array. Turning array LEDs on or off can help you to physically locate the array in the event of a cabling or setup error. Syntax set array array_number [preferred_server=server_name] [locator=on|off] [rebuild_priority=low|medium|high] [surface_scan_delay=delay_period] Arguments array_number Required.
A.19.2 set attribute The set attribute command creates a system attribute and/or sets the value of the attribute. You can only set the values of certain attributes. In most cases, the new attribute or value does not come into effect until you configure servers or reconfigure file systems. The set attribute command prints a message describing the action that is needed to bring the new attribute or value into effect. TIP: The set attribute command is the same as the create attribute command.
Table A-3 System attributes A.19.3 Attribute Notes nickname This attribute is used to give the HP SFS system a nickname; nicknames are needed if two or more HP SFS systems have the same short system name. See Section 5.9 (of this document) for more information. ost_critical_size This attribute is used for monitoring OST service space usage. See Section 5.11 (of this document) for more information. ost_warning_size This attribute is used for monitoring OST service space usage. See Section 5.
The following information applies to EVA4000 arrays only: The recommended way to specify path information for a LUN is to set the preferred controller for each LUN (see Section 5.1.3). However, it is also possible to change the path information for a LUN by using the set lun command to set the preferred controller to none or to set an explicit path. When the preferred controller is set to none, the Fibre Channel driver chooses the path. However, the path may change each time a server is booted.
A.19.5 set password The set password command allows you to change the password for a user, including the root user. When you enter the command, you are asked to enter the new password—you do not have to enter the old password. You are then asked to enter the new password a second time for confirmation. The password entries are not echoed to the screen.
A.20 show commands A.20.1 show alert The show alert command displays details of email alerts. Syntax show alert [alert_name] Arguments alert_name Optional. Specify the name of the email alert whose details you want to display. If you do not specify an attribute, the show alert command displays a list of all email alerts in the system.
Arguments attribute_name Optional. Specify the name of the attribute whose details you want to display. If you do not specify an attribute, the show attribute command displays a list of all attributes in the system. Example The following example displays details of the nickname attribute: sfs> show attribute nickname A.20.4 show authorization The show authorization command displays details of authorizations for remote users.
A.20.6 show database_backups The show database_backups command displays details of all system database backups. Syntax show database_backups A.20.7 show filesystem The show filesystem command displays details of file systems. Syntax show filesystem [filesystem_name] [-v] Arguments filesystem_name Optional. Specify the name of the file system whose details you want to display. If you do not specify a file system, the show filesystem command displays a list of all file systems.
now Attribute Operators Valid Values facility =, != kern daemon cron auth server storage local4 lustre severity =, !=, <, <=, >, >= debug info notice warn err crit emerg time =, !=, <, <=, >, >= Specified in time_t format, based on seconds since the standard epoch of 1/1/1970. age =, !=, <, <=, >, >= Examples: "10m" "5h" "2d" Note that you must use a space before the double quotes surrounding the argument.
A.20.10 show lun The show lun command displays details of LUNs. Syntax show lun [wwid|device|lun_number] Arguments wwid|device|lun_number Optional. Specify the WWID, or the device name, or the LUN number of the LUN whose details you want to display. If you do not specify a LUN, the show lun command displays a list of all LUNs.
Example The following example displays details of the ost4 OST service: sfs> show ost ost4 A.20.13 show server The show server command displays details of servers. Syntax show server [server_name] Arguments server_name Optional. Specify the name of the server whose details you want to display. If you do not specify a server, the show server command displays a list of all servers. Example The following example displays details of the south8 server: sfs> show server south8 A.20.
A.21 shutdown server command The shutdown server command shuts down and turns off servers. The shutdown server command sends a command to each server telling the server to shut down and turn off its power. If a server has not shut down and turned off its power after a default period of 90 seconds, the shutdown server command turns off the power to the server through the iLO connection. Syntax shutdown server server_name|all [wait=wait_time] Arguments server_name|all Required.
A.23 stop filesystem command The stop filesystem command stops a file system while preserving user connections. When you stop a file system, any active I/O operations are suspended. The application that was performing the I/O operation is blocked until the file system is next restarted and the client node has reconnected. While the file system is stopped, any new access (mount, unmount, I/O operation) is blocked until the file system is next restarted and the client node has reconnected.
Note that specifying level=2 limits the testing to level 2 tests; it does not run level 1 tests. To run both level 1 and level 2 tests, specify level=1-2. The default is level=1-4. severity Specifies the severity levels that are to be reported from the tests. Valid values are 1, 2, 3, 4. You can use this option to control the report generation and limit the output from the tests. You can specify one severity level, a comma-separated list of levels, and/or a range of levels; for example, severity=1,3-4.
Arguments array_numbers Required. Specify the number of the array that you want to restore to an unconfigured state. You can specify a single array, a comma-separated list of arrays, or a range of arrays. force Optional. Use this option to force the unconfigure array command to ignore the status of the array’s cache. The default is no. The unconfigure array command performs status checks on the array before proceeding to restore the array to an unconfigured state.
B Performance figures This appendix contains details of approximate expected performance figures for an HP SFS system, and is organized as follows: • I/O performance (Section B.1) • Network performance (Section B.2) • SFS20 array configuration (Section B.3) • SFS20 RAID5 and ADG performance (Section B.4) • Default file system stripe count (Section B.5) • Bandwidth variation—number of OST services and number of client nodes (Section B.6) • Single client node bandwidth (Section B.
B.1 I/O performance This section provides expected I/O performance figures for a single server in the HP SFS system. These figures are based on tests carried out by HP. Note the following points in regard to these figures: • The raw_lun_check.bash script was used to obtain the raw performance figures. This script tests the speed of reading and writing 4GB of data. The devices are unmounted and remounted between the write and read tests.
SFS20 storage — 250GB disks Table B-2 provides details of expected performance figures for one Object Storage Server in systems using SFS20 storage. These figures are based on the following configuration: • SFS20 array firmware at Version 1.92. • One 2TB LUN per disk group (per array). • Infiniband interconnect for Lustre figures. • For ADG (also called RAID6) redundancy configurations: One disk group consisting of eleven 250GB disks and a single spare disk.
B.2 Network performance This section provides approximate expected network performance figures for each interconnect type. Figures for Gigabit Ethernet, Myrinet 2XP, and Voltaire InfiniBand interconnects are based on tests carried out by HP. The figure for the Quadrics (QsNetII) interconnect is the expected throughput and is not a measured value. Table B-4 shows the approximate expected network performance figures.
SFS20 array configuration The graph in Figure B-1 shows the variation in the performance for write operations on an SFS20 array when the array is configured with either one or two OST LUNs. In each case, the array is populated with twelve 250GB disks configured with ADG (RAID6) redundancy, and eight client nodes are writing to the LUNs over a high-performance interconnect (that is, not a Gigabit Ethernet interconnect).
B.4 SFS20 RAID5 and ADG performance The graphs in Figure B-2 and Figure B-3 show the streaming write and read performance for HP SFS configurations using ADG (RAID6) redundancy. The graphs also compare the measurements for ADG redundancy with the measurements for RAID5 redundancy.
• • Each array populated with twelve 250GB disks, configured as one 2TB LUN with ADG (RAID6) redundancy • Voltaire InfiniBand interconnect RAID5 configuration • Four SFS20 arrays attached to each Object Storage Server • Each array populated with eleven 250GB disks, configured as one 2TB LUN with RAID5 redundancy • Voltaire InfiniBand interconnect All results are based on tests using new, empty file systems with default MDS and OST mount options and without quotas enabled.
B.5 Default file system stripe count Figure B-4 Aggregate bandwidth versus file stripe count Bandwidth vs Stripe count (64 OSTs) 7000 6000 MB/sec 5000 4000 64 clients 128 Clients 3000 192 clients 2000 1000 0 1 4 8 16 Stripe count The graph in Figure B-4 shows the variation in aggregate file system bandwidth for write operations as the file stripe count used by the client nodes is varied.
Bandwidth variation—number of OST services and number of client nodes The graph in Figure B-5 shows how the aggregate file system bandwidth varies as the number of client nodes is increased and the file stripe count used by the client nodes also increases. Figure B-5 Bandwidth variation—increasing numbers of client nodes of and OST services Bandwidth vs Clients 1600 1400 1200 16 OSTs 1000 MB/sec B.
B.7 Single client node bandwidth The graph in Figure B-6 shows the variation of bandwidth available to a single client node as a file is extended to incorporate additional OST services.
Gigabit Ethernet bandwidth The graphs in Figure B-7 through Figure B-10 show how the addition of dual Gigabit Ethernet links or a bonded Gigabit Ethernet interconnect on a server in the HP SFS system improves filesystem throughput. Figure B-9 and Figure B-10 show how Gigabit and dual Gigabit Ethernet links perform with an MTU of 9000. (It is not currently possible to use an MTU greater than 1500 for bonded Gigabit Ethernet links.
Figure B-9 9000 MTU average write MB/sec 9000 MTU Average Write MB/sec 1 link per server - 1 link per client 2 links per server- 1 link per client 2 links per server - 2 links per client 250 200 MB/sec 150 100 50 0 1 2 4 8 Clients Figure B-10 9000 MTU average read MB/sec 9000 MTU Average Read MB/sec 1 link per server - 1 link per client 2 links per server- 1 link per client 2 links per server - 2 links per client 200 180 160 140 MB/sec 120 100 80 60 40 20 0 1 2 4 8 Clients T
B.9 Meta-data operations from a single client node Figure B-11 shows the numbers of meta-data operations that are possible to a Lustre file system from a single client node. Figure B-11 Meta-data operations from a single client node. HP SFS Meta-data Operations (3M files) 7000 6000 5000 4000 mknods Operations/sec unlinks (mknod) 3000 creats unlinks (creat) stats 2000 1000 0 100000 500000 1 million 1.
the HP StorageWorks Scalable File Share Client Installation and User Guide (specifically, the section titled Using Lustre file systems — performance hints) for information on how to optimize performance of the rm command for large numbers of files. • B–14 stat() operations are impacted by the ability of the client node and the server to cache file attributes after the file data has already been accessed. As a result, the rate of these operations declines as the directory population increases.
C File system configuration examples This appendix provides examples of file system configurations, and is organized as follows: • EVA4000 storage examples (Section C.1) • SFS20 storage examples (Section C.
C.1 EVA4000 storage examples Example C-1 shows a file system that is not optimally configured. The example is taken from a system where EVA4000 storage is used. The configuration has the following problems: • There is an uneven number of OST services (5); this will lead to unbalanced OST serving. • One of the OST LUNs is very small (ost8 is only 2GB). Again, this will result in an imbalance in OST serving. HP recommends that all of the OST services in a file system are the same size.
Interconnect: MDS mount options: OST mount options: Lustre timeout: Quota options: elan gm tcp acl,user_xattr extents 200 quotaon=ug MDS Information: Name ----- LUN --- Array ----- Controller ---------- Files -------- Used ---- Service State ------------- Running on ---------- mds2 5 1 a 123M 0% stopped south28* Controller ---------a b a b Size(GB) -------555 555 555 555 Used ---5% 5% 5% 5% Service State ------------stopped stopped stopped stopped Running on ---------south29* south29*
C.2 SFS20 storage examples Example C-3 shows a file system that is not optimally configured. The example is taken from a system where SFS20 storage is used. The configuration has the following problems: • There are two LUNs on each of two of the arrays. HP does not recommend creating more than one OST LUN on an array attached to Object Storage Servers, as such a configuration results in poor performance. • The OST LUNs are of varying sizes. It is better to have LUNs of the same size in the file system.
Interconnect: MDS mount options: OST mount options: Lustre timeout: Quota options: elan gm tcp acl,user_xattr extents 200 equotaon=ug MDS Information: Name ----mds2 LUN --5 Array ----- Controller ----------- Files -------469M Used ---0% Service State ------------running Running on ---------south2 Controller ---------scsi-1/1 scsi-2/1 scsi-1/2 scsi-2/2 Size(GB) -------2048 2048 2048 2048 Used ---5% 5% 5% 5% Service State ------------running running running running Running on ---------south3 sou
C–6 File system configuration examples
D RAID rebuild timing information This appendix provides a guide to the estimated time that it takes to rebuild a LUN on an SFS20 array following a disk failure.
D.1 RAID rebuild information The time taken for a RAID rebuild operation on an SFS20 array is divided into two parts: • The time taken to rebuild the spare disk • The time taken to rebuild subsequently to a replacement disk (when the disk has been added) NOTE: All SFS20 arrays are configured with a spare disk; as a result of this, the rebuild to the spare disk takes place immediately.
E HP SFS specifications This appendix provides information on HP SFS software and system specifications: • Supported number of Object Storage Servers (Section E.1) • Supported number of OST services (Section E.2) • Supported number of file systems (Section E.3) • File and file system limits (Section E.4) • Maximum file system default stripe size is 1024MB (Section E.5) • File system default stripe size and client page size (Section E.6) • File stripe size and client page size (Section E.
E.1 Supported number of Object Storage Servers HP SFS supports a maximum of 64 Object Storage Servers. This means that the maximum number of servers in a system is 66, including 64 Object Storage Servers, one administration server, and one MDS server. In this maximum configuration, the administration and MDS servers must not run OST services. E.2 Supported number of OST services The maximum number of OST services supported in a file system is 256. E.
E.6 File system default stripe size and client page size The stripe size that you set as the default for files in a file system must be a minimum of 4MB and must also be a multiple of the largest page size on the client nodes that will be mounting the file system. Page sizes are 4KB on 32–bit systems; on 64–bit systems, page sizes are 4KB, 8KB, or 16KB.
E–4 HP SFS specifications
Glossary administration server The ProLiant DL server that the administration service runs on. Usually the first server in the system. See also administration service administration service The software functionality that allows you to configure and administer the HP SFS system. See also administration server ARP Address Resolution Protocol. ARP is a TCP/IP protocol that is used to get the physical address of a client node or server.
internet protocol See IP IP Internet Protocol. The network layer protocol for the Internet protocol suite that provides the basis for the connectionless, best-effort packet delivery service. IP includes the Internet Control Message Protocol (ICMP) as an integral part. The Internet protocol suite is referred to as TCP/IP because IP is one of the two most fundamental protocols. IP address See Internet address Jumbo packets Ethernet packets that are larger than the Ethernet standard of 1500 bytes.
OST service The Object Storage Target software subsystem that provides object services in a Lustre file system. See also Object Storage Server Portals A message passing interface API used in HP SFS versions up to and including Version 2.1-1. Python Python is an interpreted, interactive, object-oriented programming language from the Python Software Foundation (refer to www.python.org). reboot To bring the system down to the firmware level and restart the operating system.
Glossary–4
Index A AC power strip, replacing on a rack 8-16 accessing consoles 9-51 iLO component 9-51 adding a dual Gigabit Ethernet interconnect 8-24 components 8-20 licenses 3-18 Object Storage Servers 8-20 OST services to file systems 5-30 SFS20 arrays 8-24 administration server booting 3-4 replacing 8-3 replacing motherboard 8-3 shutting down 3-6 alerts, See email alerts alias IP address, changing 7-10 arrays rebuild priority A-5, A-20 viewing information 4-13 assigning MDS role to LUN 5-4 OST role to LUN 5-4 att
F F1 key prompt 9-12 Fibre Channel cable, replacing 8-6 Fibre Channel switch, replacing 8-6 file systems backing up and restoring data 6-49 changing preferred server for services 5-31 configuration examples C-1 creating in systems using EVA4000 storage 5-2 creating in systems using SFS20 storage 5-14 deleting 5-43 modifying attributes 5-29 modifying interconnects 5-33 modifying mount options 5-32 operating 5-27 quota options 5-8, 5-18 repairing 9-24 repair-lfsck script 9-25 starting 3-8 stopping 3-7 trouble
testing Myrinet interconnect performance using the gm_allsize command 6-27 testing Myrinet interconnect performance using the net_test.
stopping file systems 3-7 storage EVA4000 arrays 1-5 SFS20 arrays 1-6 types supported 1-5 stripe count 5-8, 5-18 stripe size 5-6, 5-16 supported hardware 1-9 supported operating systems 1-8 syscheck command 6-2 system booting 3-2 checking with syscheck command 6-2 configurations 1-4 creating nickname 5-42 shutting down 3-4 system alias, setting up access 3-23 system database See database system installation, verifying 6-2 system parameters, changing 7-2 T testing Gigabit Ethernet interconnect performance 6