HP XC System Software Release Notes Version 3.2.
© Copyright 2007, 2008 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents About This Document.........................................................................................................5 Intended Audience.................................................................................................................................5 Typographic Conventions......................................................................................................................5 HP XC and Related HP Products Information....................................
6.3 Upgrade Process Installs the OFED InfiniBand Software Stack by Default...................................27 7 System Administration, Management, and Monitoring...........................................29 7.1 Perform a Dry Run Before Using the si_updateclient Utility to Update Nodes.............................29 8 Load Sharing Facility and Job Management............................................................31 8.1 Load Sharing Facility.............................................................
About This Document This document contains release notes for HP XC System Software Version 3.2.1. This document contains important information about firmware, software, or hardware that might affect the system. An HP XC system is integrated with several open source software components. Some open source software components are being used for underlying technology, and their deployment is transparent.
Variable [] {} ... | WARNING CAUTION IMPORTANT NOTE The name of a placeholder in a command, function, or other syntax display that you replace with an actual value. The contents are optional in syntax. If the contents are a list separated by |, you can choose one of the items. The contents are required in syntax. If the contents are a list separated by |, you must choose one of the items. The preceding element can be repeated an arbitrary number of times. Separates items in a list of choices.
HP XC Program Development Environment The Program Development Environment home page provide pointers to tools that have been tested in the HP XC program development environment (for example, TotalView® and other debuggers, compilers, and so on). http://h20311.www2.hp.com/HPC/cache/276321-0-0-0-121.html HP Message Passing Interface HP Message Passing Interface (HP-MPI) is an implementation of the MPI standard that has been integrated in HP XC systems.
Standard LSF is also available as an alternative resource management system (instead of LSF-HPC with SLURM) for HP XC. This is the version of LSF that is widely discussed on the Platform website.
• http://linuxvirtualserver.org Home page for the Linux Virtual Server (LVS), the load balancer running on the Linux operating system that distributes login requests on the HP XC system. • http://www.macrovision.com Home page for Macrovision®, developer of the FLEXlm™ license management utility, which is used for HP XC license management. • http://sourceforge.
Compiler Web Sites • http://www.intel.com/software/products/compilers/index.htm Website for Intel® compilers. • http://support.intel.com/support/performancetools/ Website for general Intel software development information. • http://www.pgroup.com/ Home page for The Portland Group™, supplier of the PGI® compiler. Debugger Web Site http://www.etnus.com Home page for Etnus, Inc., maker of the TotalView® parallel debugger. Software RAID Web Sites • http://www.tldp.org/HOWTO/Software-RAID-HOWTO.
HP Encourages Your Comments HP encourages comments concerning this document. We are committed to providing documentation that meets your needs. Send any errors found, suggestions for improvement, or compliments to: feedback@fc.hp.com Include the document title, manufacturing part number, and any comment, error found, or suggestion for improvement you have concerning this document.
1 New and Changed Features This chapter describes the new and changed features delivered in HP XC System Software Version 3.2.1. 1.1 Base Distribution and Kernel The following table lists information about the base distribution and kernel for this release as compared to the last HP XC release. HP XC Version 3.2.1 HP XC Version 3.2 Enterprise Linux 4 Update 5 Enterprise Linux 4 Update 4 HP XC kernel version 2.6.9-55.9hp.4sp.XCsmp HP XC kernel version 2.6.9-42.9hp.XC Based on Red Hat kernel version 2.
1.3 OpenFabrics Enterprise Distribution for InfiniBand Version 1.2.5 HP XC System Software installs the OpenFabrics Enterprise Distribution (OFED) InfiniBand software stack Version 1.2 by default. Support for OFED Version 1.2.5 is available in the form of a patch that you can download from the HP IT Resource Center (ITRC) website: http://www.itrc.hp.com/ IMPORTANT: You must install the OFED Version 1.2.5 patch if your hardware configuration contains ConnectX HCA cards. OFED Version 1.2.
In the previous example, --maxnodes= specifies the total number of nodes in the hardware configuration, including the planned nodes. For example, if the current hardware configuration contains 100 nodes, and you plan to add 96 compute nodes in the future, --maxnodes=196. --single The --single option was added to the enclosure based discover command. This new option must be included on the command line when the hardware configuration contains only one HP BladeSystem model c3000 or model c7000 enclosure.
2 Important Release Information This chapter contains information that is important to know for this release. 2.1 Firmware Versions The HP XC System Software is tested against specific minimum firmware versions. Follow the instructions in the accompanying hardware documentation to ensure that all hardware components are installed with the latest firmware version. The master firmware tables for this release are available at the following website: http://www.docs.hp.com/en/linuxhpc.
3 Hardware Preparation Hardware preparation tasks are documented in the HP XC Hardware Preparation Guide. This chapter contains information that was not included in that document at the time of publication. 3.1 Upgrading BMC Firmware on HP ProLiant DL140 G2 and DL145 G2 Nodes This note applies only if the hardware configuration contains HP ProLiant DL140 G2 or DL145 G2 nodes and you are upgrading an existing HP XC system from Version 3.0, 3.1, or 3.2 to Version 3.2.1.
4 Software Installation on the Head Node This chapter contains notes that apply to the HP XC System Software Kickstart installation session. 4.1 Notes to Read Before the Kickstart Installation Session Read the notes in this section before starting the Kickstart installation session. 4.1.
5 System Discovery, Configuration, and Imaging This chapter contains information about configuring the system. Notes that describe additional configuration tasks are mandatory and have been organized chronologically. Perform these tasks in the sequence presented in this chapter. The HP XC system configuration procedure is documented in the HP XC System Software Installation Guide.
You must correct this mapping if you find that upon the HP XC kernel reboot, eth0 and eth1 are the tg3 devices, and eth2 and eth3 are the e1000 devices. To get the external network connection working, perform this procedure from a locally-connected terminal before invoking the cluster_prep utility: 1. Unload the tg3 and e1000 drivers: # rmmod e1000 # rmmod tg3 2. Use the text editor of your choice to edit the /etc/modprobe.conf file to correct the mapping of drivers to devices.
5.4.1 Scalable File Share Mount Problems With Mixed HCAs A Scalable File Share (SFS) share might not mount properly if the head node and compute nodes have different types of HCA cards. For example, a memfull HCA on the head node and a memfree HCA on the compute nodes, including ConnectX HCAs. HP does not support a mixture of ConnectX and non-ConnectX HCAs, so this situation should rarely be encountered. Follow this procedure to work around the problem before you run the cluster_config utility: 1. 2.
5.4.3 Benign RRDtool Message On c3000 Enclosures With a Nortel switch The following warning might be displayed during cluster_config processing on a c3000 Enclosure that has a Nortel switch: Executing C52xcgraph gconfigure WARNING - "rrdtool returned ERROR: you must define at least one Data Source In this release, Nortel switches are not monitored by the RRD framework, so you can safely ignore this message. 5.
6 Software Upgrades This chapter contains notes about upgrading the HP XC System Software from a previous release to this release. Installation release notes described in Chapter 4 (page 21) and system configuration release notes described in Chapter 5 (page 23) also apply when you upgrade the HP XC System Software from a previous release to this release. Therefore, when performing an upgrade, make sure you also read and follow the instructions in those chapters. 6.
7 System Administration, Management, and Monitoring This chapter contains notes about system administration, management, and monitoring. 7.1 Perform a Dry Run Before Using the si_updateclient Utility to Update Nodes The si_updateclient utility can leave nodes in an unbootable state in certain situations. You can still use si_updateclient to deploy image changes to nodes.
8 Load Sharing Facility and Job Management This chapter addresses the following topics: • Load Sharing Facility (page 31) 8.1 Load Sharing Facility This section contains notes about LSF-HPC with SLURM on HP XC and standard LSF. If a hardware configuration contains systems with multi core CPUs (dual core or quad core), and you are using standard LSF, you must add the following entry to the lsf.
9 Programming and User Environment This chapter contains information that applies to the programming and user environment. 9.1 InfiniBand Multiple Rail Support HP-MPI provides multiple rail support on OpenFabric through the MPI_IB_MULTIRAIL environment variable. This environment variable is ignored by all other interconnects. In multi-rail mode, a rank can use up to all cards on its node, but it is limited to the number of cards on the node to which it is connecting.
10 Cluster Platform 3000 At the time of publication, no release notes are specific to Cluster Platform 3000 systems.
11 Cluster Platform 4000 At the time of publication, no release notes are specific to Cluster Platform 4000 systems.
12 Cluster Platform 6000 This chapter contains information that applies only to Cluster Platform 6000 systems. 12.1 Network Boot Operation and Imaging Failures on HP Integrity rx2600 Systems An underlying issue in the kernel is causing MAC addresses on HP Integrity rx2600 systems to be set to all zeros (for example, 00.00.00.00.00), which results in network boot and imaging failures. To work around this issue, enter the following commands on the head node to network boot and image an rx2600 system: 1.
13 Interconnects This chapter contains information that applies to the supported interconnect types: • InfiniBand Interconnect (page 41) • Myrinet Interconnect (page 41) • QsNetII Interconnect (page 42) 13.1 InfiniBand Interconnect The notes in this section apply to the InfiniBand interconnect. 13.1.1 enable Password Problem with Voltaire Switch Version 4.
13.2.1 Myrinet Monitoring Line Card Can Become Unresponsive A Myrinet monitoring line card can become unresponsive some period of time after it has been set up with an IP address with DHCP. This is a problem known to Myricom. For more information, see the following: http://www.myri.com/fom-serve/cache/321.html If the line card becomes unresponsive, re-seat the line card by sliding it out of its chassis slot and then slide it back in.
In addition to the previous problem, the IP address of a switch module might be incorrectly populated in the switch_modules table, and you might see the following message: # qsctrl qsctrl: failed to parse module name 172.20.66.2 . . . Resolve this issue by deleting the IP address from the switch_modules table and restarting the swmlogger service: # mysql -u root -p qsnet mysql> delete from switch_modules where name="172.20.66.
14 Documentation This chapter describes known issues and omissions in the HP XC System Software Documentation Set and HP XC manpages. 14.1 HP XC System Software Installation Guide The notes in this section apply to the HP XC System Software Installation Guide. The following information is missing from Chapter 6, Reinstalling HP XC System Software Version 3.2.
Index B base operating system, 13 BL2x220c discovering, 14 C C52xcgraph error, 25 clear_counters command, 42 cluster_config utility, 24 C52xcgraph error message, 25 CP3000 system, 35 CP4000 system, 37 CP6000 system, 39 SIGUSR2 signal, 42 D discover command new features, 14 discover utility, 24 discover.
P patches downloading from ITRC website, 17 Q qsnet diagnostics database, 42 QsNet interconnect, 42 R reporting documentation errors feedback e-mail address for, 11 S scalable visual array (see SVA) server blades double density, 23 si_updateclient utility, 29 signal Quadrics QsNet, 42 software RAID documentation, 10 mdadm utility, 10 SVA documentation for, 15 system administration notes, 29 system configuration, 23 system management notes, 29 U upgrade, 27 upgrade installation, 27 W Web site HP XC Syst