HP Scalable File Share User's Guide G3.
© Copyright 2009 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents About This Document.........................................................................................................9 Intended Audience.................................................................................................................................9 New and Changed Information in This Edition.....................................................................................9 Typographic Conventions...................................................................
Installing and Configuring HP SFS Software on Client Nodes...............................37 4.1 Installation Requirements...............................................................................................................37 4.1.1 Client Operating System and Interconnect Software Requirements......................................37 4.1.2 InfiniBand Clients....................................................................................................................37 4.1.3 10 GigE Clients..
A HP SFS G3 Performance.............................................................................................59 A.1 Benchmark Platform.......................................................................................................................59 A.2 Single Client Performance..............................................................................................................60 A.3 Throughput Scaling.....................................................................................
List of Figures 1-1 1-2 A-1 A-2 A-3 A-4 A-5 A-6 A-7 A-8 A-9 6 Platform Overview........................................................................................................................15 Server Pairs....................................................................................................................................16 Benchmark Platform......................................................................................................................59 Storage Configuration.
List of Tables 1-1 3-1 Supported Configurations ............................................................................................................13 Minimum Firmware Versions.......................................................................................................
About This Document This document provides installation and configuration information for HP Scalable File Share (SFS) G3.1-0. Overviews of installing and configuring the Lustre® File System and MSA2000 Storage Arrays are also included in this document. Pointers to existing documents are provided where possible. Refer to those documents for related information. Intended Audience This document is intended for anyone who installs and uses HP SFS.
{} The contents are required in syntax. If the contents are a list separated by |, you must choose one of the items. ... The preceding element can be repeated an arbitrary number of times. \ Indicates the continuation of a code example. | Separates items in a list of choices. WARNING A warning calls attention to important information that if not understood or followed will result in personal injury or nonrecoverable system problems.
For SFS Gen 3 Cabling Tables, see: http://docs.hp.com/en/storage.html and click the Scalable File Share (SFS) link. For SFS V2.3 Release Notes, see: HP StorageWorks Scalable File Share Release Notes Version 2.3 For documentation of previous versions of HP SFS, see: • HP StorageWorks Scalable File Share Client Installation and User Guide Version 2.2 at: http://docs.hp.com/en/8957/HP_StorageWorks_SFS_Client_V2_2-0.
1 What's In This Version 1.1 About This Product HP SFS G3.1-0 uses the Lustre File System on MSA2000fc hardware to provide a storage system for standalone servers or compute clusters. Starting with this release, HP SFS servers can be upgraded. If you are upgrading from one version of HP SFS G3 to a more recent version, see the instructions in “Upgrade Installation” (page 32). IMPORTANT: If you are upgrading from HP SFS version 2.3 or older, you must contact your HP SFS 2.
Table 1-1 Supported Configurations (continued) Component Supported Storage Array Drives SAS, SATA ProLiant Support Pack (PSP) 8.10 and later 1 CentOS 5.2 is available for download from the HP Software Depot at: http://www.hp.com/go/softwaredepot 1.3.
Figure 1-1 Platform Overview 1.
Figure 1-2 Server Pairs Figure 1-2 shows typical wiring for server pairs. 1.3.1.1 Fibre Channel Switch Zoning If your Fibre Channel is configured with a single Fibre Channel switch connected to more than one server node failover pair and its associated MSA2000 storage devices, you must set up zoning on the Fibre Channel switch. Most configurations are expected to require this zoning.
so will limit or eliminate user access to the servers, thereby reducing potential security threats and the need to apply security updates. For information on how to modify validation of user credentials, see “Configuring User Credentials” (page 31). HP provides security updates for all non-operating-system components delivered by HP as part of the HP SFS G3 product distribution. This includes all rpm's delivered in /opt/hp/ sfs.
2 Installing and Configuring MSA Arrays This chapter summarizes the installation and configuration steps for MSA2000fc arrays usee in HP SFS G3.1-0 systems. 2.1 Installation For detailed instructions of how to set up and install the MSA2000fc, see Chapter 4 of the HP StorageWorks 2012fc Modular Smart Array User Guide on the HP website at: http://bizsupport.austin.hp.com/bc/docs/support/SupportManual/c01394283/c01394283.pdf 2.
2.3.2 Creating New Volumes To create new volumes on a set of MSA2000 arrays, follow these steps: 1. 2. Power on all the MSA2000 shelves. Define an alias. One way to execute commands on a set of arrays is to define a shell alias that calls /opt/hp/sfs/msa2000/msa2000cmd.pl for each array. The alias defines a shell for-loop which is terminated with ; done. For example: # alias forallmsas='for NN in `seq 101 2 119` ; do \ ./msa2000cmd.pl 192.168.16.
• MSA2212fc Controller Disks are identified by SCSI ID. The first enclosure has disk IDs 0-11, the second has 16-27, the third has 32-43, and the fourth has 48-59. • MSA2312fc Controller Disks are specified by enclosure ID and slot number. Enclosure IDs increment from 1. Disk IDs increment from 1 in each enclosure. The first enclosure has disk IDs 1.1-12, the second has 2.1-12, the third has 3.1-12, and the fourth has 4.1-12.
a. Create vdisks in the MGS and MDS array. The following example assumes the MGS and MDS do not have attached disk enclosures and creates one vdisk for the controller enclosure. # formdsmsas create vdisk level raid10 disks 1.1-4:1.5-9 assigned-to a spare 1.11-12 mode offline vdisk1; done Creating vdisks using offline mode is faster, but in offline mode the vdisk must be created before you can create the volume. Use the show vdisks command to check the status.
3 Installing and Configuring HP SFS Software on Server Nodes This chapter provides information about installing and configuring HP SFS G3.1-0 software on the Lustre file system server. The following list is an overview of the installation and configuration procedure for file system servers and clients. These steps are explained in detail in the following sections and chapters. 1. Update firmware. 2. Installation Phase 1 a. Choose an installation method.
3.1 Supported Firmware Follow the instructions in the documentation which was included with each hardware component to ensure that you are running the latest qualified firmware versions. The associated hardware documentation includes instructions for verifying and upgrading the firmware. For the minimum firmware versions supported, see Table 3-1. Upgrade the firmware versions, if necessary. You can download firmware from the HP IT Resource Center on the HP website at: http://www.itrc.hp.
3.2 Installation Requirements A set of HP SFS G3.1-0 file system server nodes should be installed and connected by HP in accordance with the HP SFS G3.1-0 hardware configuration requirements. The file system server nodes use the CentOS 5.2 software as a base. The installation process is driven by the CentOS 5.2 Kickstart process, which is used to ensure that required RPMs from CentOS 5.2 are installed on the system. NOTE: CentOS 5.2 is available for download from the HP Software Depot at: http://www.hp.
## Template ADD network --bootproto static --device %{prep_ext_nic} \ --ip %{prep _ext_ip} --netmask %{prep_ext_net} --gateway %{prep_ext_gw} \ --hostname %{host_name}.%{prep_ext_search} --nameserver %{prep_ext_dns} %{prep_ext_nic} must be replaced by the Ethernet interface name. eth1 is recommended for the external interface and eth0 for the internal interface. %{prep_ext_ip} must be replaced by the interface IP address. %{prep_ext_net} must be replaced by the interface netmask.
Please insert the HP SFS G3.1-0 DVD and enter any key to continue: After you insert the HP SFS G3.1-0 DVD and press enter, the Kickstart installs the HP SFS G3.1-0 software onto the system in the directory /opt/hp/sfs. Kickstart then runs the /opt/hp/sfs/ scripts/install1.sh script to perform the first part of the software installation. NOTE: The output from Installation Phase 1 is contained in /var/log/postinstall.log. After the Kickstart completes, the system reboots.
3.3.3 Network Installation Procedure As an alternative to the DVD installation described above, some experienced users may choose to install the software over a network connection. A complete description of this method is not provided here, and should only be attempted by those familiar with the procedure. See your specific Linux system documentation to complete the process. NOTE: The DL380 G5 servers must be set up to network boot for this installation option.
3.4.1 Patch Download and Installation Procedure To download and install HP SFS patches from the ITRC website, follow this procedure: 1. Create a temporary directory for the patch download. # mkdir /home/patches 2. Go to the ITRC website. http://www.itrc.hp.com/ 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. If you have not previously registered for the ITRC, choose Register from the menu on the left. You will be assigned an ITRC User ID upon completion of the registration process.
Description: Node Port1 Port2 Sys image GUIDs: 001a4bffff0cd124 001a4bffff0cd125 001a4bffff0cd126 001a4bffff0 MACs: 001a4b0cd125 001a4b0cd126 Board ID: (HP_09D0000001) VSD: PSID: HP_09D0000001 # mstflint -d 08:00.0 -i fw-25408-2_6_000-448397-B21_matt.bin -nofs burn To ensure the correct firmware version and files for your boards, obtain firmware files from your HP representative. Run the following script: # /opt/hp/sfs/scripts/install10GbE.
3.5.3 Configuring pdsh The pdsh command enables parallel shell commands to be run across the file system cluster. The pdsh RPMs are installed by the HP SFS G3.1-0 software installation process, but some additional steps are needed to enable passwordless pdsh and ssh access across the file system cluster. 1. Put all host names in /opt/hptc/pdsh/nodes. 2. Verify the host names are also defined with their IP addresses in/etc/hosts. 3. Append /root/.ssh/id_rsa.pub from the node where pdsh is run to /root/.
3.5.6 Verifying Digital Signatures (optional) Verifying digital signatures is an optional procedure for customers to verify that the contents of the ISO image are supplied by HP. This procedure is not required. Two keys can be imported on the system. One key is the HP Public Key, which is used to verify the complete contents of the HP SFS image. The other key is imported into the rpm database to verify the digital key signatures of the signed rpms. 3.5.6.
IMPORTANT: All existing file system data must be backed up before attempting an upgrade. HP is not responsible for the loss of any file system data during an upgrade. The safest and recommended method for performing an upgrade is to first unmount all clients, then stop all file system servers before updating any software. Depending on the specific upgrade instructions, you may need to save certain system configuration files for later restoration.
• • • • • • • • • • • • • • /etc/modprobe.conf /etc/ntp.conf /etc/resolv.conf /etc/sysconfig/network /etc/sysconfig/network-scripts/ifcfg-ib0 /etc/sysconfig/network-scripts/ifcfg-eth* /opt/hptc/pdsh/nodes /root/anaconda-ks.cfg /var/lib/heartbeat/crm/cib.xml /var/lib/multipath/bindings The CSV file containing the definition of your file system as used by the lustre_config and gen_hb_config_files.pl programs.
10. Edit the newly created cib.xml files for each failover pair and increase the value of epoch_admin to be 1 larger than the value listed in the active cib.xml. 11. Install the new cib.xml file using the following command: # cibadmin -R -x 12. Run the crm_mon utility on both nodes of the failover pair and verify that no errors are reported. 13. Verify that the file system is operating properly. 14. Repeat the process with the other member of the failover pair. 15.
4 Installing and Configuring HP SFS Software on Client Nodes This chapter provides information about installing and configuring HP SFS G3.1-0 software on client nodes running CentOS 5.2, RHEL5U2, SLES10 SP2, and HP XC V4.0. 4.1 Installation Requirements HP SFS G3.1-0 software supports file system clients running CentOS 5.2/RHEL5U2 and SLES10 SP2, as well as the HP XC V4.0 cluster clients. Customers using HP XC V4.0 clients should obtain HP SFS client software and instructions from the HP XC V4.
Configure the selected Ethernet interface with an IP address that can access the HP SFS G3.1-0 server using one of the methods described in “Configuring Ethernet and InfiniBand or 10 GigE Interfaces” (page 30). 4.2 Installation Instructions The following installation instructions are for a CentOS 5.2/RHEL5U2 system. The other systems are similar, but use the correct Lustre client RPMs for your system type from the HP SFS G3.1-0 software tarball /opt/hp/sfs/lustre/client directory.
7. 8. 9. Repeat steps 1 through 6 for additional client nodes, using the appropriate node replication or installation tools available on your client cluster. After all the nodes are rebooted, the Lustre file system is mounted on /testfs on all nodes. You can also mount and unmount the file system on the clients using the mount and umount commands. For example: # mount /testfs # umount /testfs 4.
1. Install the Lustre source RPM as provided on the HP SFS G3.1-0 software tarball in the /opt/hp/sfs/SRPMS directory. Enter the following command on one line: # rpm -ivh lustre-source-1.6.7-2.6.18_92.1.17.el5_lustre.1.6.7smp.x86_64.rpm 2. Change directories: # cd /usr/src/linux-xxx 3. 4. Copy in the /boot/config-xxx for the running/target kernel, and name it .config. Run the following: # make oldconfig 5. Change directories: # cd /usr/src/lustre-xxx 6. Configure the Lustre build.
5 Using HP SFS Software This chapter provides information about creating, configuring, and using the file system. 5.1 Creating a Lustre File System The first required step is to create the Lustre file system configuration. At the low level, this is achieved through the use of the mkfs.lustre command. However, HP recommends the use of the lustre_config command as described in section 6.1.2.3 of the Lustre 1.6 Operations Manual.
To see the multipath configuration, use the following command. Output will be similar to the example shown below: # multipath -ll mpath7 (3600c0ff000d547b5b0c95f4801000000) dm-5 HP,MSA2212fc [size=4.1T][features=1 queue_if_no_path][hwhandler=0] \_ round-robin 0 [prio=20][active] \_ 0:0:3:5 sdd 8:48 [active][ready] \_ 1:0:3:5 sdh 8:112 [active][ready] mpath6 (3600c0ff000d548aa1cca5f4801000000) dm-4 HP,MSA2212fc [size=4.
node3,options lnet networks=o2ib0,/dev/mapper/mpath6,/mnt/ost4,ost,testfs,icnode1@o2ib0:icnode2@o2ib0 ,,,,"_netdev,noauto",icnode4@o2ib0 node4,options lnet networks=o2ib0,/dev/mapper/mpath7,/mnt/ost5,ost,testfs,icnode1@o2ib0:icnode2@o2ib0 ,,,,"_netdev,noauto",icnode3@o2ib0 node4,options lnet networks=o2ib0,/dev/mapper/mpath8,/mnt/ost6,ost,testfs,icnode1@o2ib0:icnode2@o2ib0 ,,,,"_netdev,noauto",icnode3@o2ib0 node4,options lnet networks=o2ib0,/dev/mapper/mpath9,/mnt/ost7,ost,testfs,icnode1@o2ib0:icnode2@o2ib0
2. Start the file system manually and test for proper operation before configuring Heartbeat to start the file system. Mount the MGS mount-point on the MGS node: # mount /mnt/mgs 3. Mount the MDT on the MDS node: # mount /mnt/mds 4. Mount the OSTs served from each OSS node. For example: # mount /mnt/ ost0 # mount /mnt/ ost1 # mount /mnt/ ost2 # mount /mnt/ ost3 5. Mount the file system on a client node according to the instructions in Chapter 4 (page 37). # mount /testfs 6. 7.
implementation sends these messages using IP multicast. Each failover pair uses a different IP multicast group. When a node determines that its partner has failed, it must ensure that the other node in the pair cannot access the shared disk before it takes over. Heartbeat can usually determine whether the other node in a pair has been shut down or powered off. When the status is uncertain, you might need to power cycle a partner node to ensure it cannot access the shared disk.
# gen_hb_config_files.pl -i ilos.csv -v -e -x testfs.csv Descriptions are included here for reference, or so they can be generated by hand if necessary. For more information, see http://www.linux-ha.org/Heartbeat. • /etc/ha.d/ha.cf Contains basic configuration information. • /etc/ha.d/haresources Describes the resources (in this case file systems corresponding to Lustre servers) managed by Heartbeat. • /etc/ha.d/authkeys Contains information used for authenticating clusters.
node6 Filesystem::/dev/mapper/mpath8::/mnt/ost13::lustre node6 Filesystem::/dev/mapper/mpath9::/mnt/ost14::lustre node6 Filesystem::/dev/mapper/mpath10::/mnt/ost15::lustre The haresources files are identical for both nodes of a failover pair. Each line specifies the preferred node (node5), LUN (/dev/mapper/mpath8), mount-point (/mnt/ost8) and file system type (lustre). authkeys The etc/ha.
5.2.5 Starting Heartbeat IMPORTANT: You must start the Lustre file system manually in the following order; MGS, MDT, OST, and verify proper file system behavior on sample clients before attempting to start the file system using Heartbeat. For more information, see “Creating a Lustre File System” (page 41). Use the mount command to mount all the Lustre file system components on their respective servers, and also to mount the file system on clients.
The destination host name is optional but it is important to note that if it is not specified, crm_resource forces the resource to move by creating a rule for the current location with the value -INFINITY. This prevents the resource from running on that node again until the constraint is removed with crm_resource -U. If you cannot start a resource on a node, check that node for values of -INFINITY in /var/lib/ heartbeat/crm/cib.xml. There should be none. For more details, see the crm_resource manpage.
4. Start the Heartbeat service on the remaining OSS nodes: # pdsh -w oss[1-n] service heartbeat start 5. After the file system has started, HP recommends that you set the Heartbeat service to automatically start on boot: # pdsh -a chkconfig --level 345 heartbeat on This automatically starts the file system component defined to run on the node when it is rebooted. 5.4 Stopping the File System Before the file system is stopped, unmount all client nodes.
Use the following command to show the Lustre network connections that the node is aware of, some of which might not be currently active. # cat /proc/sys/lnet/peers nid refs state 0@lo 1 ~rtr 172.31.97.2@o2ib 1 ~rtr 172.31.64.1@o2ib 1 ~rtr 172.31.64.2@o2ib 1 ~rtr 172.31.64.3@o2ib 1 ~rtr 172.31.64.4@o2ib 1 ~rtr 172.31.64.6@o2ib 1 ~rtr 172.31.64.
# debugfs -c -R 'dump CONFIGS/testfs-client /tmp/testfs-client' /dev/mapper/mpath0 debugfs 1.40.7.sun3 (28-Feb-2008) /dev/mapper/mpath0: catastrophic mode - not reading inode or group bitmaps # llog_reader /tmp/testfs-client Header size : 8192 Time : Fri Oct 31 16:50:52 2008 Number of records: 20 Target uuid : config_uuid ----------------------#01 (224)marker 3 (flags=0x01, v1.6.6.
c. To prevent the file system components and the Heartbeat service from automatically starting on boot, enter the following command: # pdsh -a chkconfig --level 345 heartbeat off This forces you to manually start the Heartbeat service and the file system after a file system server node is rebooted. 3. Verify that the Lustre mount-points are unmounted on the servers. # pdsh -a "df | grep mnt" 4. Run the following command on the MGS node: # tunefs.lustre --writeconf /dev/mapper/mpath[mgs] 5.
# lfs check mds testfs-MDT0000-mdc-ffff81012833ec00 active Use the following command to check OSTs or servers for both MDS and OSTs. This will show the Lustre view of the file system. You should see an MDT connection, and all expected OSTs showing a total of the expected space.
6 Licensing A valid license is required for normal operation of HP SFS G3.1-0. HP SFS G3.1-0 systems are preconfigured with the correct license file at the factory, making licensing transparent for most HP SFS G3.1-0 users. No further action is necessary if your system is preconfigured with a license, or if you have an installed system. However, adding a license to an existing system is required when upgrading a G3.0-0 server to G3.1-0. 6.
1. 2. 3. Stop Heartbeat on the MGS and the MDS. Copy the license file into /var/flexlm/license.lic on the MGS and the MDS. Run the following command on the MGS and the MDS: # service sfslmd restart 4. Restart Heartbeat. This restarts Lustre. The cluster status follows: hpcsfsd1:root> crm_mon -1 ...
7 Known Issues and Workarounds The following items are known issues and workarounds. 7.1 Server Reboot After the server reboots, it checks the file system and reboots again. /boot: check forced You can ignore this message. 7.2 Errors from install2 You might receive the following errors when running install2.
NOTE: b. Use the appropriate device in place of /dev/mapper/mpath? For example, if the --dryrun command returned: Parameters: mgsnode=172.31.80.1@o2ib mgsnode=172.31.80.2@o2ib failover.node=172.31.80.1@o2ib Run: tunefs.lustre --erase-params --param="mgsnode=172.31.80.1@o2ib mgsnode=172.31.80.2@o2ib failover.node=172.31.80.1@o2ib mdt.group_upcall=NONE" --writeconf /dev/mapper/mpath? 4. Manually mount mgs on the MGS node: # mount /mnt/mgs 5.
A HP SFS G3 Performance A.1 Benchmark Platform Performance data in this appendix is based on HP SFS G3.0-0. Performance analysis of HP SFS G3.1-0 is not available at the time of this edition. However, HP SFS G3.1-0 performance is expected to be comparable to HP SFS G3.0-0. Look for updates to performance testing in this document at http://www.docs.hp.com/en/storage. HP SFS G3.
The Lustre servers were DL380 G5s with two quad-core processors and 16 GB of memory, running RHEL v5.1. These servers were configured in failover pairs using Heartbeat v2. Each server could see its own storage and that of its failover mate, but mounted only its own storage until failover. Figure A-2 shows more detail about the storage configuration. The storage comprised a number of HP MSA2212fc arrays. Each array had a redundant pair of RAID controllers with mirrored caches supporting failover.
Figure A-3 shows single stream performance for a single process writing and reading a single 8 GB file. The file was written in a directory with a stripe width of 1 MB and stripe count as shown. The client cache was purged after the write and before the read. Figure A-3 Single Stream Throughput For a file written on a single OST (a single RAID volume), throughput is in the neighborhood of 200 MB per second. As the stripe count is increased, spreading the load over more OSTs, throughput increases.
The test shown in Figure A-5 did not use direct I/O. Nevertheless, it shows the cost of client cache management on throughput. In this test, two processes on one client node each wrote 10 GB. Initially, the writes proceeded at over 1 GB per second. The data was sent to the servers, and the cache filled with the new data. At the point (14:10:14 in the graph) where the amount of data reached the cache limit imposed by Lustre (12 GB), throughput dropped by about a third.
Figure A-6 Multi-Client Throughput Scaling In general, Lustre scales quite well with additional OSS servers if the workload is evenly distributed over the OSTs, and the load on the metadata server remains reasonable. Neither the stripe size nor the I/O size had much effect on throughput when each client wrote to or read from its own OST. Changing the stripe count for each file did have an effect as shown in Figure A-7.
A.4 One Shared File Frequently in HPC clusters, a number of clients share one file either for read or for write. For example, each of N clients could write 1/N'th of a large file as a contiguous segment. Throughput in such a case depends on the interaction of several parameters including the number of clients, number of OSTs, the stripe size, and the I/O size.
Another way to measure throughput is to only average over the time while all the clients are active. This is represented by the taller, narrower box in Figure A-8. Throughput calculated this way shows the system's capability, and the stragglers are ignored. This alternate calculation method is sometimes called "stonewalling". It is accomplished in a number of ways. The test run is stopped as soon as the fastest client finishes. (IOzone does this by default.
For workloads that require a lot of disk head movement relative to the amount of data moved, SAS disk drives provide a significant performance benefit. Random writes present additional complications beyond those involved in random reads. These additional complications are related to Lustre locking, and the type of RAID used. Small random writes to a RAID6 volume requires a read-modify-write sequence to update a portion of a RAID stripe and compute a new parity block.
Index Symbols /etc/hosts file configuring, 30 10 GigE configuring, 30 10 GigE clients, 37 10 GigE installation, 29 RHEL systems, 37 server, 25 SLES systems, 37 XC systems, 37 IOR processes, 59 K benchmark platform, 59 kickstart template, 25 usb drive installation, 27 known issues and workarounds, 57 C L cache limit, 62 cib.
S scaling, 62 server security policy, 16 shared files, 64 stonewalling, 64 stonith, 45 support, 10 T throughput scaling, 62 U upgrade installation, 32 upgrades client, 35 installation, 32 rolling, 33 upgrading servers, 13 usb drive, 27 user access configuring, 31 V volumes, 20 W workarounds, 57 writeconf prcedure, 52 68 Index