HP XC System Software Release Notes October 5, 2005 Product Version: HP XC System Software Version 2.1 This document contains release notes that apply to HP XC System Software Version 2.1 and its accompanying documentation set.
© Copyright 2003–2005 Hewlett-Packard Development Company, L.P. AMD and AMD Opteron are trademarks or registered trademarks of Advanced Micro Devices, Inc. FLEXlm is a trademark of Macrovision Corporation. InfiniBand is a registered trademark and service mark of the InfiniBand Trade Association. Intel, the Intel logo, Itanium, Xeon, and Pentium are trademarks or registered trademarks of Intel Corporation in the United States and other countries. Linux is a U.S. registered trademark of Linus Torvalds.
Contents About This Document 1 New Features 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.10.1 1.10.2 1.10.3 1.10.4 1.10.5 Base Distribution and Kernel ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... .. Additional Hardware Models Supported ... ... .. ... ... .. ... .. ... ... .. ... .. The discover Command .. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... .. Cluster Configuration .. .. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... ..
6 System Administration and Management Notes 6.1 6.2 6.3 Running the dgemm Utility .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... .. Log Files Must Be Rotated and Compressed . .. ... ... .. ... .. ... ... .. ... .. Recommended NFS Mount Options for External Connections . .. ... .. 6-1 6-1 6-2 7 Programming and User Environment Notes 7.1 7.2 7.3 7.3.1 7.3.2 Notes About the HP Math Library .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... ..
12.3.5 13 The qsnet Database May Contain Entries to Nonexistent Switch Modules ... ... .. ... ... .. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... .. 12-3 Documentation Notes 13.1 HP XC System Software Administration Guide ... ... .. ... .. ... ... .. ... .. 13-1 HP Integrity rx4640 Server Rear View .. ... ... .. ... ... .. ... .. ... ... .. ... .. 3-2 IP Addresses for MP Power Management Devices .. .. ... .. ... ... .. ... ..
About This Document This document contains release notes for HP XC System Software Version 2.1. This document contains important information about firmware, software, or hardware that may affect your system. An HP XC system is integrated with several open source software components. Some open source software components are being used for underlying technology, and their deployment is transparent.
• Chapter 8 contains notes that apply to the Load Sharing Facility (LSF®) and interactive job management commands. • Chapter 9 contains notes that apply only to Xeon® with EMT64-based systems. • Chapter 10 contains notes that apply only to AMD Opteron™-based systems. • Chapter 11 contains notes that apply only to Intel® Itanium®-based systems. • Chapter 12 contains notes that apply to the interconnects.
http://www.hp.com/techservers/clusters/xc_clusters.html HP XC Program Development Environment The following URL provides pointers to tools that have been tested in the HP XC program development environment (for example, TotalView® and other debuggers, compilers, and so on): ftp://ftp.compaq.com/pub/products/xc/pde/index.html HP Message Passing Interface HP Message Passing Interface (MPI) is an implementation of the MPI standard for HP systems. The home page is located at the following URL: http://www.hp.
For your convenience, the following Platform LSF documents are shipped on the HP XC documentation CD in PDF format. The Platform LSF documents are also available on the XC Web site. • - Administering Platform LSF - Administration Primer - Platform LSF Reference - Quick Reference Card - Running Jobs with Platform LSF http://www.llnl.
Manpages Manpages provide online reference and command information from the command line. Manpages are supplied with the HP XC system for standard HP XC components, Linux user commands, LSF commands, and other software components that are distributed with the HP XC system. Manpages for third-party vendor software components may be provided as a part of the deliverables for that component.
Contains the official MPI standards documents, errata, and archives of the MPI Forum. The MPI Forum is an open group with representatives from many organizations that define and maintain the MPI standard. • http://www-unix.mcs.anl.gov/mpi/ A comprehensive site containing general information, such as the specification and FAQs, and pointers to a variety of other resources, including tutorials, implementations, and other MPI-related sites. Related Compiler Web Sites • http://www.intel.
$ and # In command examples, a dollar sign ($) represents the system prompt for the bash shell and also shows that a user is in non-root mode. A pound sign (#) indicates that the user is in root or superuser mode. [ ] In command syntax and examples, brackets ([ ]) indicate that the contents are optional. If the contents are separated by a pipe character ( | ), you must choose one of the items. { } In command syntax and examples, braces ({ }) indicate that the contents are required.
HP Encourages Your Comments HP welcomes your comments on this document. Please provide your comments and suggestions at the following URL: http://docs.hp.com/en/feedback.
1 New Features This chapter describes the new features delivered in HP XC System Software Version 2.1. 1.1 Base Distribution and Kernel The following table lists the changes made to the base distribution and kernel. XC Version 2.1 XC Version 2.0A Enterprise Linux 3 Update 4 Enterprise Linux 3 Update 2 Base Red Hat kernel 2.4.21-27.0.2.EL Base Red Hat kernel 2.4.21-15.0.4.EL Quadrics driver kit Version 4.30 Quadrics driver kit Version 4.24 QLogic FC SAN driver 7.01.01 QLogic FC SAN driver 7.00.
The compute node configuration in the slurm.conf file is updated automatically. • The genelanhosts script has been replaced by the spconfig script, which must be run after the startsys command regardless of interconnect type. This change is documented in the HP XC System Software Installation Guide. 1.5 The startsys Command The --max_at_once option has been added to the startsys command. This option allows you to specify the number of nodes to image simultaneously.
- qsnet2diagscommon-1.0.12-2.1hptc - qsnet2diags-1.0.12-3.1hptc - qsnetdiags-1.0.2-14.1hptc 1.9 Serviceability New versions of the following utilities were added: • hpasm Version hpasm-7.1.1b-95 • collectl Version hp-collectl-1.3.1-1 1.10 Documentation Changes All manuals in the HP XC System Software Documentation Set were revised to incorporate the new functionality delivered in this release.
• A new table in Chapter 1 summarizes all XC commands. Previously, each XC command was briefly described in Chapter 2. For this release, refer to the appropriate manpage for XC command descriptions. • Chapter 1 includes a new table of recommended administrative tasks, which was formerly Appendix A. • A new chapter describes how to mount a file system using csys. • A new procedure describes how to add a service to an XC system.
2 General Notes This chapter contains general information that applies to the XC system as a whole. 2.1 XC System Naming Throughout the HP XC System Software Documentation Set, the following terms are used to denote an HP Cluster Platform on which HP XC System Software has been installed: XC Name Cluster Platform (CP) Model Chip Architecture XC3000 Cluster Platform 3000 Xeon with EM64T XC4000 Cluster Platform 4000 AMD Opteron XC6000 Cluster Platform 6000 Intel Itanium 2 2.
3 Hardware Preparation Notes Hardware preparation tasks are documented in the HP XC Hardware Preparation Guide. This chapter contains information that was not included in the manual at the time of publication. The following topics are included in this chapter: • Configuring disks into the smart array (Section 3.1) • Incorrect instruction for preparing HP ProLiant DL145 G2 Nodes (Section 3.2) • Preparing HP Integrity rx4640 servers (Section 3.3) • Preparing HP Integrity rx2620 servers (Section 3.
Figure 3-1: HP Integrity rx4640 Server Rear View 1 2 3 HPTC-0146 1 The port labeled MP LAN is the MP connection to the ProCurve Console Switch. 2 The port labeled LAN Gb connects to the Administrative Switch (branch or root). 3 This unlabeled port is used for an external connection. Perform the following hardware preparation tasks on each HP Integrity rx4640 server in your CP6000 system: 1. For each node in the XC system, ensure that the power cord is connected but that the CPU is not turned on.
• Subnet mask address 255.0.0.0. In this example, IP addresses for additional nodes are assigned as shown in Table 3-1. Table 3-1: IP Addresses for MP Power Management Devices Node IP Address First node after the head node is n15 172.21.0.15 Second node after the head node is n14 172.21.0.14 Third node after the head node is n13 172.21.0.13 .. . .. . n3 172.21.0.3 n2 172.21.0.2 Last node is n1 172.21.0.1 6. Enter XD to apply your changes. Enter R to restart the MP. 7.
14. Perform this step on all nodes, including the head node: a. Choose the Select Input Console option to enable console messages to be displayed on the screen when you turn on the system: i. Enable the Acpi(HWP0002,0)/Pci(1|1)/Uart(9600 N81)/VenMsg(Vt100+) option. ii. Enter Y to save the entry to NVRAM. iii. Choose Exit to return to the menu. b. Choose the Select Output Console option to enable console messages to be displayed on the screen when you turn on the system: i.
4 Installation Notes This chapter contains notes that apply to the XC software installation process. 4.1 Notes to Read Before the Kickstart Installation Process Read the notes in this section before starting the Kickstart installation process. 4.1.1 Prepare Previously Installed Nodes for a Reinstallation If you are reinstalling an XC system that is already running an early, advance version of this release, you must first prepare the nodes to network boot before shutting down the system.
5 Configuration Notes This chapter contains information about configuring the system. Notes that describe additional configuration tasks are mandatory and have been organized chronologically. Perform these tasks in the sequence presented in this chapter. The XC system configuration procedure is documented in Chapter 4 of the HP XC System Software Installation Guide. 5.
Stopping the daemon causes the gnome-settings-daemon to restart itself, which enables you to log in to the head node.
6 System Administration and Management Notes This chapter contains notes about system administration and management commands and tasks. Perform these tasks only when necessary. 6.1 Running the dgemm Utility The dgemm utility does not run on all supported interconnects. Therefore, it is preferable to run the dgemm utility on the Administrative Network instead of the interconnect.
• nodenaming_prefix represents the node naming prefix defined in the /opt/hptc/config/discover_data.ini file. • max_size_of_file is calculated as follows: 30% * (/hptc_cluster partition size in MB) / 5 * (number of nodes in the cluster + number of syslogng_forward server + 1) - The number of syslogng_forward servers represents the number of management aggregators that have been defined for the system. A syslogng-forward service is allocated for each assigned management hub role.
7 Programming and User Environment Notes This chapter contains information that applies to the programming and user environment. 7.1 Notes About the HP Math Library The following notes apply to Intel compilers and the HP Math Library (MLIB): • After installation, MLIB directory information is located in the /opt/mlib/README file. • MLIB requires the Intel Fortran Compiler. • When using /opt/mlib/intel_7.1/hpmpi_2.0, use the Intel Version 7 compilers. • When using /opt/mlib/intel_8.0/hpmpi_2.
For more information, refer to the MLIB User’s Guide, which is located at the following URL and on the XC documentation CD: http://www.hp.com/go/mlib 7.3 Configuring the Intel Trace Collector and Analyzer with HP MPI on XC The Intel Trace Collector was formely known as VampirTrace. The Intel Trace Analyzer was formely known as Vampir. 7.3.1 Installation Notes Tee following are installation-related notes: • • Installation kits: - ITC-IA64-LIN-MPICH-PRODUCT.4.0.2.1.tar.gz - ITA-IA64-LIN-AS21-PRODUCT.4.
cannot open shared object file: No such file or directory MPI Application rank 0 exited before MPI_Init() with status 127 mpirun exits with status: 127 [n1]/nis.home/sballe/xc_PDE_work/ITC_examples_xc6000 > For more information, go to the following URL: http://support.intel.com/support/performancetools/c/linux/sb/CS010097.htm Running Your Program Both the C and Fortran runs were successful when the -static-libcxa flag was added. This will only work if you use mpirun.mpich to launch your program.
88 Difference is 2.381154327036583E-005 90 Difference is 2.018142964565221E-005 92 Difference is 1.710475838933507E-005 94 Difference is 1.449714388058985E-005 96 Difference is 1.228707004052045E-005 98 Difference is 1.041392661369357E-005 [0] Intel Trace Collector INFO: Writing tracefile vtjacobif.stf in /nis.home/user_name/xc_PDE_work/ITC_examples_xc6000 mpirun exits with status: 0 Across Nodes (using LSF) # bsub -n4 -I mpirun.mpich -np 2 .
8 Load Sharing Facility and Job Management Notes This chapter contains notes about the following topics • Load Sharing Facility (LSF) (Section 8.1) • Job management with SLURM (Section 8.2) 8.1 LSF The Load Sharing Facility (LSF), developed by Platform Computing, is available for use in this release.
9 Cluster Platform 3000 Notes This chapter contains information that applies only to Cluster Platform 3000 systems. 9.1 Remote Console Logins Do Not Work on HP ProLiant DL140 G2 Nodes Logging in remotely to a console, either through the XC console command or by a telnet session to the lights-out 100i (LO-100i) remote management processor, does not work on HP ProLiant DL140 G2 nodes.
10 Cluster Platform 4000 Notes This chapter contains information that applies only to Cluster Platform 4000 systems. 10.1 Remote Console Logins Do Not Work on HP ProLiant DL145 G2 Nodes Logging in remotely to a console, either through the XC console command or by a telnet session to the lights-out 100i (LO-100i) remote management processor, does not work on HP ProLiant DL145 G2 nodes.
11 Cluster Platform 6000 Notes This chapter contains information that applies only to CP6000 systems. 11.1 Excessive Boot Time with Unzoned SAN Volume Connected Through an A6824A HBA When a SAN volume is connected to an HP Integrity rx2600 system that has an A6824A (dual-channel fibre) host bus adapter (HBA) installed, excessive boot times (in the 2-4 hour range) have been observed. The solution to this problem is to implement zoning on the SAN switch.
duplicate LABELS 2. It is not currently possible to determine the device to select at installation time. The SAN volume is shown twice in the list of devices; it may appear as /dev/sdb and /dev/sdc (the exact device names depend on the number of SCSI hard disks in the system). During the installation, select a device, and if the installation process is unable to use it, restart the installation and select the other device. 3.
12 Interconnect Notes This chapter contains generic information that applies to the supported interconnect types: • InfiniBand® interconnect (Section 12.1) • Myrinet® interconnect (Section 12.2) • QsNetII® interconnect (Section 12.3) 12.1 InfiniBand Interconnect At the time of publication, there are no release notes specific to the InfiniBand interconnect. 12.2 Myrinet Interconnect The following release notes are specific to the Myrinet interconnect. 12.2.
have been updated as nodes are booted or shut down. It is possible for the mapper to be correctly set and the map version to be incorrect, so check both. There is one mapper per port, so there will be two lines of output for Myrinet XP and four lines of output for Myrinet 2XP. This problem is caused by a node that has either crashed or hung or has become unresponsive in some way but is still powered on. To correct this problem, reboot the unresponsive node. 12.
12.3.4 ELAN TRAP Queue Error Seen on Some Quadrics MPI Applications Some QsNetII MPI applications that generate many concurrent DMA operations might encounter the following error: ELAN TRAP -0- Unknown - Queue Error This error terminates the program, which is believed to be caused by high rates of ELAN PutGet operations. It is possible to work around this problem by setting the LIBELAN_PUTGET_THROTTLE environment variable to a value lower than its default value of 32. 12.3.
13 Documentation Notes This chapter contains notes that apply to the HP XC System Software Documentation Set. 13.1 HP XC System Software Administration Guide There are two omissions in the procedure in Section 17.2 that describes how to replace a node. • In step 8, the cluster_config utility prompts you to regenerate ssh keys. Do not regenerate the ssh keys; answer n (no) to this prompt.
Index A A6824A host bus adapter, 11-1 C clear_counters command, 12-1 CP3000 system, 9-1 CP4000 system, 10-1 Supermon sensor information, 10-1 CP6000 system, 11-1 installing to SAN device, 11-1 SIGUSR2 signal, 12-2 D dgemm utility, 6-1 documentation how-to documents, 2-1 Web site, 2-1 documentation nodes, 13-1 G GNOME desktop hang, 5-1 gnome-settings-daemon, 5-1 H hardware preparation, 3-1 HP Integrity rx2620, 3-4 HP Integrity rx4640, 3-1 head node hang when logging in, 5-1 running LSF, 8-1 how-to docume
Quadrics QsNet interconnect, 12-2 R reinstalling system, 4-1 remote console login, 9-1, 10-1 replace node procedure, 13-1 S SAN device, 4-1 booting from, 11-1 installing to, 11-1 unzoned volume, 11-1 SATA disks, 4-1 signal Quadrics QsNet, 12-2 SLURM, 8-1 smart array, 3-1 Supermon service sensor information, 10-1 Index-2 U unzoned SAN volume, 11-1 URL XC documentation, 2-1 V Vampir, 7-2 W Web site XC documentation, 2-1 X XC3000 system, 9-1 XC4000 system, 10-1 defined, 2-1 XC6000 system, 11-1 defined,