HP XC System Software User's Guide Version 2.1

ManualsBrandsHP ManualsSoftwareHP XC 1 Processor LTU

HP XC System Software

User’s Guide

Part Number: AA-RWJVB-TE

June 2005

Product Version: HP XC Sys

tem Software Version 2.1

This document provides information about the HP XC user a nd programming environment.

Hewlett-Packard Company

lo Alto, California

Summary of content (154 pages)

PAGE 1
HP XC System Software User’s Guide Part Number: AA-RWJVB-TE June 2005 Product Version: HP XC System Software Version 2.1 This document provides information about the HP XC user and programming environment.
PAGE 2
© Copyright 2003–2005 Hewlett-Packard Development Company, L.P. UNIX® is a registered trademark of The Open Group. Linux® is a U.S. registered trademark of Linus Torvalds. LSF, Platform Computing, and the LSF and Platform Computing logos are trademarks or registered trademarks of Platform Computing Corporation. Intel®, the Intel logo, Itanium®, Xeon™, and Pentium® are trademarks or registered trademarks of Intel Corporation in the United States and other countries.
PAGE 3
Contents About This Document 1 Overview of the User Environment 1.1 1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.1.6 1.2 1.2.1 1.2.2 1.2.3 1.2.3.1 1.2.3.2 1.2.3.3 1.2.3.4 1.2.3.5 1.3 1.3.1 1.3.2 1.4 1.4.1 1.4.2 1.4.3 1.4.4 1.5 2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 4
Launching and Managing Jobs Quick Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Getting Information About Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Getting Information About Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 5
Compiling and Linking HP-MPI Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2.2 Examples of Compiling and Linking HP-MPI Applications . . . . . . . . . . . . 3.7.2.3 Developing Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Designing Libraries for XC4000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Advanced Topics . . . . . . . . . .
PAGE 6
.4.6.1 6.4.6.2 6.4.7 6.4.8 6.4.9 6.4.10 6.5 6.6 6.7 6.8 6.9 6.10 7 Introduction to LSF in the HP XC Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of LSF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topology Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes on LSF-HPC . . . . . . . . . . .
PAGE 7
8.2 8.3 8.3.1 8.3.2 8.3.2.1 8.3.2.2 8.3.3 8.3.3.1 8.3.3.2 8.3.3.3 8.3.4 8.3.5 8.4 8.4.1 8.4.2 8.5 8.6 8.7 8.8 8.9 8.9.1 8.9.2 8.9.3 8.9.4 8.9.5 8.9.6 8.9.7 8.9.8 8.9.9 8.9.10 8.10 8.11 8.12 9 HP-MPI Directory Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compiling and Running Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Setting Environment Variables . . . . . . . . .
PAGE 8
9.3.1 9.3.2 9.3.3 9.3.4 9.3.5 9.3.5.1 9.3.5.2 9.3.5.3 9.3.5.4 9.3.6 9.3.7 10 9-4 9-4 9-5 9-5 9-5 9-5 9-5 9-5 9-6 9-6 9-6 Advanced Topics 10.1 10.2 A Platform Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Library Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MPI Parallelism . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 9
7-2 7-3 7-4 7-5 7-6 7-7 7-8 7-9 7-10 7-11 7-12 7-13 7-14 7-15 7-16 7-17 7-18 7-19 7-20 7-21 7-22 7-23 8-1 8-2 8-3 8-4 8-5 7-12 7-12 Using the External Scheduler to Submit a Job to Run on Specific Nodes . . . . . . . Using the External Scheduler to Submit a Job to Run One Task per Node . . . . . . Using the External Scheduler to Submit a Job That Excludes One or More Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
PAGE 11
About This Document This manual provides information about using the features and functions of the HP XC System Software and describes how the HP XC user and programming environments differ from standard Linux® system environments.
PAGE 12
• Chapter 9 describes how to use MLIB on the HP XC system. • Appendix A provides examples of HP XC applications. • The Glossary provides definitions of the terms used in this manual. HP XC Information The HP XC System Software Documentation Set includes the following core documents. All XC documents, except the HP XC System Software Release Notes, are shipped on the XC documentation CD.
PAGE 13
HP Message Passing Interface HP Message Passing Interface (MPI) is an implementation of the MPI standard for HP systems. The home page is located at the following URL: http://www.hp.com/go/mpi HP Mathematical Library The HP math libraries (MLIB) support application developers who are looking for ways to speed up development of new applications and shorten the execution time of long-running technical applications. The home page is located at the following URL: http://www.hp.
PAGE 14
• http://www.nagios.org/ Home page for Nagios®, a system and network monitoring application. Nagios watches specified hosts and services and issues alerts when problems occur and when problems are resolved. Nagios provides the monitoring capabilities on an XC system. • http://supermon.sourceforge.net/ Home page for Supermon, a high-speed cluster monitoring system that emphasizes low perturbation, high sampling rates, and an extensible data protocol and programming interface.
PAGE 15
Related Information This section provides pointers to the Web sites for related software products and provides references to useful third-party publications. The location of each Web site or link to a particular topic is subject to change without notice by the site provider. Related Linux Web Sites • http://www.redhat.com Home page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a Linux distribution with which the HP XC operating environment is compatible. • http://www.linux.
PAGE 16
• Linux Administration Unleashed, by Thomas Schenk, et al. • Managing NFS and NIS, by Hal Stern, Mike Eisler, and Ricardo Labiaga (O’Reilly) • MySQL, by Paul Debois • MySQL Cookbook, by Paul Debois • High Performance MySQL, by Jeremy Zawodny and Derek J. Balling (O’Reilly) • Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington • Perl in A Nutshell: A Desktop Quick Reference , by Ellen Siever, et al.
PAGE 17
discover(8) A cross-reference to a manpage includes the appropriate section number in parentheses. For example, discover(8) indicates that you can find information on the discover command in Section 8 of the manpages. Ctrl/x In interactive command examples, this symbol indicates that you hold down the first named key while pressing the key or button that follows the slash ( / ). When it occurs in the body of text, the action of pressing two or more keys is shown without the box.
PAGE 18
PAGE 19
1 Overview of the User Environment The HP XC system is a collection of computer nodes, networks, storage, and software built into a cluster that work together to present a single system. It is designed to maximize workload and I/O performance, and provide efficient management of large, complex, and dynamic workloads. The HP XC system provides a set of integrated and supported user features, tools, and components which are described in this chapter.
PAGE 20
different roles that can be assigned to a client node, the following roles contain services that are of special interest to the general user: login role The role most visible to users is on nodes that have the login role. Nodes with the login role are where you log in and interact with the system to perform various tasks. For example, once logged in to a node with login role, you can execute commands, build applications, or submit jobs to compute nodes for execution.
PAGE 21
choose to use either the HP XC Administrative Network, or the XC system Interconnect, for NFS operations. The HP XC system interconnect can potentially offer higher performance, but only at the potential expense of the performance of application communications. For high-performance or high-availability file I/O, the Lustre file system is available on HP XC. The Lustre file system uses POSIX-compliant syntax and semantics.
PAGE 22
nodes of the system. The system interconnect network is a private network within the HP XC. Typically, every node in the HP XC is connected to the system interconnect. The HP XC system interconnect can be based on either Gigabit Ethernet or Myrinet-2000 switches. The types of system interconnects that are used on HP XC systems are: • Myricom Myrinet on HP Cluster Platform 4000 (ProLiant/Opteron servers), also referred to as XC4000 in this manual.
PAGE 23
1.2.3.1 Linux Commands The HP XC system supports the use of standard Linux user commands and tools. Standard Linux commands are not described in this document. You can access descriptions of Linux commands in Linux documentation and manpages. Linux manpages are available by invoking the Linux man command with the Linux command name. 1.2.3.2 LSF Commands HP XC supports LSF-HPC and the use of standard LSF commands, some of which operate differently in the HP XC environment from standard LSF behavior.
PAGE 24
1.4 Run-Time Environment In the HP XC environment, LSF-HPC, SLURM, and HP-MPI work together to provide a powerful, flexible, extensive run-time environment. This section describes LSF-HPC, SLURM, and HP-MPI, and how these components work together to provide the HP XC run-time environment. 1.4.1 SLURM SLURM (Simple Linux Utility for Resource Management) is a resource management system that is integrated into the HP XC system. SLURM is suitable for use on large and small Linux clusters.
PAGE 25
request. LSF-HPC always tries to pack multiple serial jobs on the same node, with one CPU per job. Parallel jobs and serial jobs cannot coexist on the same node. After the LSF-HPC scheduler allocates the SLURM resources for a job, the SLURM allocation information is recorded with the job. You can view this information with the bjobs and bhist commands. When LSF-HPC starts a job, it sets the SLURM_JOBID and SLURM_NPROCS environment variables in the job environment.
PAGE 26
supported as part of the HP XC. The tested software packages include, but are not limited to, the following: • Intel Fortran 95, C, C++ Compiler Version 7.1 and 8.0, including OpenMP, for Itanium (includes ldb debugger) • gcc version 3.2.3 (included in the HP XC distribution) • g77 version 3.2.3 (included in the HP XC distribution) • Portland Group PGI Fortran90, C, C++ Version 5.
PAGE 27
2 Using the System This chapter describes tasks and commands that the general user must know to use the system. It contains the following topics: • Logging in to the system (Section 2.1) • Setting up the user environment (Section 2.2) • Launching and managing jobs (Section 2.3) • Performing some common user tasks (Section 2.4) • Getting help (Section 2.5) 2.1 Logging in to the System Logging in to an HP XC system is similar to logging in to any standard Linux system.
PAGE 28
environment variables, such as PATH and MANPATH, to enable access to various installed software. One of the key features of using modules is to allow multiple versions of the same software to be used in your environment in a controlled manner. For example, two different versions of the Intel C compiler can be installed on the system at the same time – the version used is based upon which Intel C compiler modulefile is loaded. The HP XC software provides a number of modulefiles.
PAGE 29
of shared objects. If you have multiple compilers (perhaps with incompatible shared objects) installed, it is probably wise to set MPI_CC (and others) explicitly to the commands made available by the compiler’s modulefile. The contents of the modulefiles in the modulefiles_hptc RPM use the vendor-intended location of the installed software. In many cases, this is under the /opt directory, but in a few cases (for example, the PGI compilers and TotalView) this is under the /usr directory.
PAGE 30
Table 2-1: Supplied Modulefiles (cont.) Modulefile Sets the HP XC User Environment: intel/8.1 For Intel Version 8.1 compilers. mlib/intel/7.1 For MLIB and Intel Version 7.1 compilers. mlib/intel/8.0 For MLIB and Intel Version 8.0 compilers. mlib/pgi/5.1 For MLIB and PGI Version 5.1 compilers. mpi/hp For HP-MPI. pgi/5.1 For PGI Version 5.1 compilers. pgi/5.2 For PGI Version 5.2 compilers. idb/7.3 To use the Intel IDB debugger. totalview/default For the TotalView debugger. 2.2.
PAGE 31
If you encounter a modulefile conflict when loading a modulefile, you must unload the conflicting modulefile before you load the new modulefile. Refer to Section 2.2.8 for further information about modulefile conflicts. 2.2.6.1 Loading a Modulefile for the Current Session You can load a modulefile for your current login session as needed.
PAGE 32
ifort/8.0(19):ERROR:102: Tcl command execution failed: conflict ifort/8.1 In this example, the user attempted to load the ifort/8.0 modulefile, but after issuing the command to load the modulefile, an error message occurred indicating a conflict between this modulefile and the ifort/8.1 modulefile, which is already loaded. When a modulefile conflict occurs, unload the conflicting modulefile(s) before loading the new modulefile. In the above example, you should unload the ifort/8.
PAGE 33
2.3 Launching and Managing Jobs Quick Start This section provides a brief description of some of the many ways to launch jobs, manage jobs, and get information about jobs on an HP XC system. This section is intended only as a quick overview about some basic ways of running and managing jobs. Full information and details about the HP XC job launch environment are provided in the SLURM chapter (Chapter 6) and the LSF chapter (Chapter 7) of this manual. 2.3.1 Introduction As described in Section 1.
PAGE 34
• The LSF lshosts command displays machine-specific information for the LSF execution host node. $ lshosts Refer to Section 7.3.2 for more information about using this command and a sample of its output. • The LSF lsload command displays load information for the LSF execution host node. $ lsload Refer to Section 7.3.3 for more information about using this command and a sample of its output. 2.3.
PAGE 35
2.3.5.2 Submitting a Non-MPI Parallel Job Submitting non-MPI parallel jobs is discussed in detail in Section 7.4.4. The LSF bsub command format to submit a simple non-MPI parallel job is: bsub -n num-procs [bsub-options] srun [srun-options] executable [executable-options] The bsub command submits the job to LSF-HPC. The -n num-procs parameter specifies the number of processors requested for the job. This parameter is required for parallel jobs.
PAGE 36
Example 2-3: Submitting a Non-MPI Parallel Job to Run One Task per Node $ bsub -n4 -ext "SLURM[nodes=4]" -I srun hostname Job <22> is submitted to default queue <> <> n1 n2 n3 n4 2.3.5.3 Submitting an MPI Job Submitting MPI jobs is discussed in detail in Section 7.4.5.
PAGE 37
Example 2-5: Running an MPI Job with LSF Using the External Scheduler Option (cont.) Hello world! Hello world! Hello world! I’m 2 of 4 on host2 I’m 3 of 4 on host3 I’m 4 of 4 on host4 2.3.5.4 Submitting a Batch Job or Job Script Submitting batch jobs is discussed in detail in Section 7.4.6. The bsub command format to submit a batch job or job script is: bsub -n num-procs [bsub-options] script-name The -n num-procs option specifies the number of processors the job requests.
PAGE 38
2.3.6 Getting Information About Your Jobs You can obtain information about your running or completed jobs with the bjobs and bhist commands. bjobs Checks the status of a running job (Section 7.5.2) bhist Gets brief or full information about finished jobs (Section 7.5.3) The components of the actual SLURM allocation command can be seen with the bjobs -l and bhist -l LSF commands. 2.3.7 Stopping and Suspending Jobs You can suspend or stop your jobs with the bstop and bkill commands.
PAGE 39
distributed with the HP XC cluster, such as HP-MPI. Manpages for third-party vendor software components may be provided as a part of the deliverables for that software component. To access manpages, type the man command with the name of a command. For example: $ man sinfo This command accesses the manpage for the SLURM sinfo command.
PAGE 40
PAGE 41
3 Developing Applications This chapter discusses topics associated with developing applications in the HP XC environment. Before reading this chapter, you should you read and understand Chapter 1 and Chapter 2. This chapter discusses the following topics: • HP XC application development environment overview (Section 3.1) • Using compilers (Section 3.2) • Getting system information (Section 3.3) • Getting system information (Section 3.4) • Setting debugging options (Section 3.
PAGE 42
3.2 Using Compilers You can use compilers acquired from other vendors on an HP XC system. For example, HP XC supports Intel C/C++ and Fortran compilers for the 64-bit architecture, and Portland Group C/C++ and Fortran compilers for the XC4000 platform. You can use other compilers and libraries on the HP XC system as on any other system, provided they contain single-processor routines and have no dependencies on another message-passing system. 3.2.
PAGE 43
3.2.4 Pathscale Compilers Compilers in the Pathscale EKOPath Version 2.1 Compiler Suite are supported on HP XC4000 systems only. See the following Web site for more information: http://www.pathscale.com/ekopath.html. 3.2.5 MPI Compiler The HP XC System Software includes MPI. The MPI library on the HP XC system supports HP MPI 2.1. 3.3 Checking Nodes and Partitions Before Running Jobs Before launching an application, you can determine the availability and status of the system’s nodes and partitions.
PAGE 44
• Section 3.6.1 describes the serial application programming model. • Section 3.6.2 discusses how to build serial applications. For further information about developing serial applications, refer to the following sections: • Section 4.1 describes how to debug serial applications. • Section 6.4 describes how to launch applications with the srun command. • Section A.1 provides examples of serial applications. 3.6.
PAGE 45
• Launching applications with the srun command (Section 6.4) • Advanced topics related to developing parallel applications (Section 3.9) • Debugging parallel applications (Section 4.2) 3.7.1 Parallel Application Build Environment This section discusses the parallel application build environment on an HP XC system.
PAGE 46
Compilers from GNU, Intel and PGI provide a -pthread switch to allow compilation with the Pthread library. Packages that link against Pthreads, such as MKL and MLIB, require that the application is linked using the -pthread option. The Pthread option is invoked with the following compiler-specific switches: GNU -pthread Intel -pthread PGI -lpgthread For example: $ mpicc object1.o ... -pthread -o myapp.exe 3.7.1.
PAGE 47
The HP XC cluster comes with a modulefile for HP-MPI. The mpi modulefile is used to set up the necessary environment to use HP-MPI, such as the values of the search paths for header and library files. Refer to Chapter 8 for information and examples that show how to build and run an HP-MPI application. 3.7.1.8 Intel Fortran and C/C++Compilers Intel Fortran compilers (Version 7.x and greater) are supported on the HP XC cluster. However, the HP XC cluster does not supply a copy of Intel compilers.
PAGE 48
3.7.1.15 Reserved Symbols and Names The HP XC system reserves certain symbols and names for internal use. Reserved symbols and names should not be included in user code. If a reserved symbol or name is used, errors could occur. 3.7.2 Building Parallel Applications This section describes how to build MPI and non-MPI parallel applications on an HP XC system. 3.7.2.
PAGE 49
3.8 Developing Libraries This section discusses developing shared and archive libraries for HP XC applications. Building a library generally consists of two phases: • Compiling sources to objects • Assembling the objects into a library - Using the ar archive tool for archive (.a) libraries - Using the linker (possibly indirectly by means of a compiler) for shared (.so) libraries. For sufficiently small shared objects, it is often possible to combine the two steps.
PAGE 50
has /opt/mypackage/lib in it, which will then be able to handle both 32-bit and 64-bit binaries that have linked against libmystuff.so. Example 3-1: Directory Structure /opt/mypackage/ include/ mystuff.h lib/ i686/ libmystuff.a libmystuff.so x86_64/ libmystuff.a libmystuff.so If you have an existing paradigm using different names, HP recommends introducing links with the above names. An example of this is shown in Example 3-2. Example 3-2: Recommended Directory Structure /opt/mypackage/ include/ mystuff.
PAGE 51
single compilation line, so it is common to talk about concurrent compilations, though GNU make is more general. On non-cluster platforms or command nodes, matching concurrency to the number of processors often works well. It also often works well to specify a few more jobs than processors so that one job can proceed while another is waiting for I/O. On an HP XC system, there is the potential to use compute nodes to do compilations, and there are a variety of ways to make this happen.
PAGE 52
srcdir = . HYPRE_DIRS =\ utilities\ struct_matrix_vector\ struct_linear_solvers\ test all: @ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Making $$i ..."; \ (cd $$i; make); \ echo ""; \ fi; \ done clean: @ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Cleaning $$i ..."; \ (cd $$i; make clean); \ fi; \ done veryclean: @ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Very-cleaning $$i ..."; \ (cd $$i; make veryclean); \ fi; \ done 3.9.1.
PAGE 53
By modifying the makefile to reflect the changes illustrated above, we will now be processing each directory serially and parallelize the individual makes within each directory. The modified Makefile is invoked as follows: $ make PREFIX=’srun –n1 –N1 MAKE_J=’-j4’ 3.9.1.2 Example Procedure 2 Go through the directories in parallel and have the make procedure within each directory be serial. For the purpose of this exercise we are only parallelizing the “make all” component.
PAGE 54
utilities/libHYPRE_utilities.a: $(PREFIX) $(MAKE) $(MAKE_J) -C utilities The modified Makefile is invoked as follows: $ make PREFIX=’srun -n1 -N1’ MAKE_J=’-j4’ 3.9.2 Local Disks on Compute Nodes The use of a local disk for private, temporary storage may be configured on the compute nodes of your HP XC system. Contact your system administrator to find out about the local disks configured on your system. A local disk is a temporary storage space and does not hold data across execution of applications.
PAGE 55
3.9.4 Communication Between Nodes On the HP XC system, processes in an MPI application run on compute nodes and use the system interconnect for communication between the nodes. By default, intranode communication is done using shared memory between MPI processes. Refer to Chapter 8 for information about selecting and overriding the default system interconnect.
PAGE 56
PAGE 57
4 Debugging Applications This chapter describes how to debug serial and parallel applications in the HP XC development environment. In general, effective debugging of applications requires the applications to be compiled with debug symbols, typically the -g switch. Some compilers allow -g with optimization. 4.1 Debugging Serial Applications Debugging a serial application on an HP XC system is performed the same as debugging a serial application on a conventional Linux operating system.
PAGE 58
4.2.1 Debugging with TotalView You can purchase the TotalView debugger, from Etnus, Inc., for use on the HP XC cluster. TotalView is a full-featured, GUI-based debugger specifically designed to meet the requirements of parallel applications running on many processors. TotalView has been tested for use in the HP XC environment. However, it is not included with the HP XC software and technical support is not provided by HP XC. If you install and use TotalView, and have problems with it, contact Etnus, Inc.
PAGE 59
• 3. If TotalView is not installed, have your administrator install it. Then either you or your administrator should set up your environment, as described in the next step. Set the DISPLAY environment variable of the system that hosts TotalView to display on your local system. Also, run the xhosts command to accept data from the system that hosts TotalView; see the X(7X) manpage for more information. 4. Set up your environment to run TotalView.
PAGE 60
4.2.1.5 Starting TotalView for the First Time This section tells you what you must do when running TotalView for the first time — before you begin to use it to debug an application. The steps in this section assume that you have already set up your environment to run TotalView, as described in Section 4.2.1.2. The first time you use TotalView, you should set up preferences. For example, you need to tell TotalView how to launch TotalView processes on all of the processors.
PAGE 61
2. Select Preferences from the File pull-down menu of the TotalView Root Window. A Preferences window is displayed, as shown in Figure 4-2.
PAGE 62
3. 4-6 In the Preferences window, click on the Launch Strings tab.
PAGE 63
4. In the Launch Strings tab, ensure that the Enable single debug server launch button is selected. 5. In the Launch Strings table, in the area immediately to the right of Command:, assure that the default command launch string shown is the following string: %C %R -n "%B/tvdsvr -working_directory %D -callback %L -set_pw %P -verbosity %V %F" If it is not the above string, you may be able to obtain this setting by pressing the Defaults button.
PAGE 64
6. In the Preferences window, click on the Bulk Launch tab. Make sure that Enable debug server bulk launch is not selected. 7. Click on the OK button at the bottom-left of the Preferences window to save these changes. The file is stored in the .totalview directory in your home directory. As long as the file exists, you can omit the steps in this section for subsequent TotalView runs. 8. Exit TotalView by selecting Exit from the File pulldown menu.
PAGE 65
3. The TotalView main control window, called the TotalView root window, is displayed. It displays the following message in the window header: Etnus TotalView Version# 4. The TotalView process window is displayed (Figure 4-3). This window contains multiple panes that provides various debugging functions and debugging information. The name of the application launcher that is being used (either srun or mpirun) is displayed in the title bar. Figure 4-3: TotalView Process Window Example 5.
PAGE 66
7. Click Yes in this pop-up window. The TotalView root window appears and displays a line for each process being debugged. If you are running Fortran code, another pop-up window may appear with the following warning: Sourcefile initfdte.f was not found, using assembler mode. Click OK to close this pop-up window . You can safely ignore this warning. 8. 9. You can now set a breakpoint somewhere in your code. The method to do this may vary slightly between versions of TotalView. For TotalView Version 6.
PAGE 67
5. In a few seconds, the TotalView Process Window will appear, displaying information on the srun process. In the TotalView Root Window, click Attached (Figure 4-5). Double-click one of the remote srun processes to display it in the TotalView Process Window. Figure 4-5: Attached Window 6. At this point, you should be able to debug the application as in Step 8 of Section 4.2.1.6. 4.2.1.8 Exiting TotalView It is important that you make sure your job has completed before exiting TotalView.
PAGE 68
PAGE 69
5 Tuning Applications This chapter discusses how to tune applications in the HP XC environment. 5.1 Using the Intel Trace Collector/Analyzer This section describes how to use the Intel Trace Collector (ITC) and Intel Trace Analyzer (ITA) with HP-MPI on an HP XC system. The Intel Trace Collector/Analyzer were formerly known as VampirTrace and Vampir, respectively. The following topics are discussed in this section: • Building a Program (Section 5.1.1) • Running a Program (Section 5.1.
PAGE 70
CLDFLAGS -ldwarf FLDFLAGS -ldwarf = -static-libcxa -L$(VT_ROOT)/lib $(TLIB) -lvtunwind \ -lnsl -lm -lelf -lpthread = -static-libcxa -L$(VT_ROOT)/lib $(TLIB) -lvtunwind \ -lnsl -lm -lelf -lpthread In the cases where Intel compilers are used, add the -static-libcxa option to the link line. Otherwise the following type of error will occur at run-time: $ mpirun.mpich -np 2 ~/examples_directory/vtjacobic ~/examples_directory/vtjacobic: error while loading shared libraries: libcprts.so.
PAGE 71
6 Using SLURM 6.1 Introduction HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource management and job scheduling. SLURM is a reliable, efficient, open source, fault-tolerant, job and compute resource manager with features that make it suitable for large-scale, high performance computing environments. SLURM can report on machine status, perform partition management, job management, and job scheduling.
PAGE 72
Table 6-1: SLURM Commands (cont.) Command Function sinfo Reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options. sinfo displays a summary of available partition and node (not job) information (such as partition names, nodes/partition, and CPUs/node). scontrol Is an administrative tool used to view or modify the SLURM state. Typically, users do not need to access this command.
PAGE 73
6.4.1.1 srun Roles srun options allow you submit a job by: • Specifying the parallel environment for your job, such as the number of nodes to use, partition, distribution of processes among nodes, and maximum time. • Controlling the behavior of your parallel job as it runs, such as by redirecting or labeling its output, sending it signals, or specifying its reporting verbosity. 6.4.1.
PAGE 74
This command forwards the standard output and error messages from the running job with SLURM ID 6543 to the attaching srun command to reveal the job’s current status, and (with -j) also joins the job so that you can send it signals as if this srun command had initiated the job. Omit -j for read-only attachments. Because you are attaching to a running job whose resources have already been allocated, the srun resource-allocation options (such as -N) are incompatible with -a.
PAGE 75
If you specify a script at the end of the srun command line (not as an argument to -A), the spawned shell executes that script using the allocated resources (interactively, without a queue). See the -b option for script requirements. If you specify no script, you can then execute other instances of srun interactively, within the spawned subshell, to run multiple parallel jobs on the resources that you allocated to the subshell.
PAGE 76
Each partition’s node limits supersede those specified by -N. Jobs that request more nodes than the partition allows never leave the PENDING state. To use a specific partition, use the srun -p option. Combinations of -n and -N control how job processes are distributed among nodes according to the following srun policies: -n/-N combinations srun infers your intended number of processes per node if you specify both the number of processes and the number of nodes for your job.
PAGE 77
6.4.5 srun Control Options srun control options determine how a SLURM job manages its nodes and other resources, what its working features (such as job name) are, and how it gives you help. Separate "constraint" options and I/O options are available and are described in other sections of this chapter. The following types of control options are available: • Node management • Working features • Resource control • Help options 6.4.5.
PAGE 78
-J jobname (--job-name=jobname) The -J option specifies jobname as the identifying string for this job (along with its system-supplied job ID, as stored in SLURM_JOBID) in responses to your queries about job status (the default jobname is the executable program’s name). -v (--verbose) The -v option reports verbose messages as srun executes your job. The default is program output with only overt error messages added. Using multiple -v options further increases message verbosity. 6.4.5.
PAGE 79
commands let you choose from among any of five I/O redirection alternatives (modes) that are explained in the next section. -o mode (--output=mode) The -o option redirects standard output stdout for this job to mode, one of five alternative ways to display, capture, or subdivide the job’s I/O, explained in the next section. By default, srun collects stdout from all job tasks and line buffers it to the attached terminal.
PAGE 80
You can use a parameterized "format string" to systematically generate unique names for (usually) multiple I/O files, each of which receives some job I/O depending on the naming scheme that you choose. You can subdivide the received I/O into separate files by job ID, step ID, node (name or sequence number), or individual task. In each case, srun opens the appropriate number of files and associates each with the appropriate subset of tasks.
PAGE 81
--contiguous=yes|no The --contiguous option specifies whether or not your job requires a contiguous range of nodes. The default is YES, which demands contiguous nodes, while the alternative (NO) allows noncontiguous allocation. --mem=size The -mem option specifies a minimum amount of real memory per node, where size is an integer number of megabytes. See also -vmem. --mincpus=n The -mincpus option specifies a minimum number n of CPUs per node.
PAGE 82
6.4.8 srun Environment Variables Many srun options have corresponding environment variables. An srun option, if invoked, always overrides (resets) the corresponding environment variable (which contains each job feature’s default value, if there is a default). In addition, srun sets the following environment variables for each executing task on the remote compute nodes: SLURM_JOBID Specifies the job ID of the executing job. SLURM_NODEID Specifies the relative node ID of the current node.
PAGE 83
The squeue command can report on jobs in the job queue according to their state; valid states are: pending, running, completing, completed, failed, timeout, and node_fail. Example 6-3 uses the squeue command to report on failed jobs. Example 6-3: Reporting on Failed Jobs in the Queue $ squeue --state=FAILED JOBID PARTITION NAME 59 amt1 hostname USER root ST F TIME 0:00 NODES NODELIST 0 6.6 Killing Jobs with the scancel Command The scancel command cancels a pending or running job or job step.
PAGE 84
Example 6-8: Reporting Reasons for Downed, Drained, and Draining Nodes $ sinfo -R REASON Memory errors Not Responding NODELIST dev[0,5] dev8 6.8 Job Accounting HP XC System Software provides an extension to SLURM for job accounting. The sacct command displays job accounting data in a variety of forms for your analysis. Job accounting data is stored in a log file; the sacct command filters that log file to report on your jobs, jobsteps, status, and errors.
PAGE 85
7 Using LSF The Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource manager used on the HP XC system. LSF is included with HP XC, and is an integral part of the HP XC environment. On an HP XC system, a job is submitted to LSF, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF provides extensive job management and information capabilities.
PAGE 86
SLURM views the LSF-HPC system as one large computer with many resources available to run jobs. SLURM does not provide the same amount of information that can be obtained via standard LSF. But on HP XC systems, where the compute nodes have the same architecture and are expected to be allocated solely through LSF on a per-processor or per-node basis, the information provided by SLURM is sufficient and allows the LSF-HPC design to be more scalable and generate less overhead on the compute nodes.
PAGE 87
To illustrate how the external scheduler is used to launch an application, consider the following command line, which launches an application on ten nodes with one task per node: $ bsub -n 10 -ext "SLURM[nodes=10]" srun my_app The following command line launches the same application, also on ten nodes, but stipulates that node n16 should not be used: $ bsub -n 10 -ext "SLURM[nodes=10;exclude=n16]" srun my_app 7.1.
PAGE 88
queue contains the job starter script, but the unscripted queue does not have the job starter script configured. Example 7-1: Comparison of Queues and the Configuration of the Job Starter Script $ bqueues -l normal | grep JOB_STARTER JOB_STARTER: /opt/hptc/lsf/bin/job_starter.sh $ bqueues -l unscripted | grep JOB_STARTER JOB_STARTER: $ bsub -Is hostname Job <66> is submitted to the default queue . <> <
PAGE 89
Figure 7-1: How LSF-HPC and SLURM Launch and Manage a Job User 1 N16 N 16 N16 Login node $ bsub-n4 -ext”SLURM[nodes-4]” -o output.out./myscript 2 lsfhost.localdomain LSF Execution Host job_starter.sh $ srun -nl myscript N2 6 3 4 hostname Compute Node n2 SLURM_JOBID=53 SLURM_NPROCS=4 7 N1 5 $ hostname n1 Compute Node N3 myscript $ hostname $ srun hostname $ mpirun -srun ./hellompi 6 srun hostname Compute Node n3 6 7 hostname N4 n1 6 7 hostname Compute Node n4 7 1.
PAGE 90
4. LSF-HPC prepares the user environment for the job on the LSF-HPC execution host node and dispatches the job with the job_starter.sh script. This user environment includes standard LSF environment variables and two SLURM-specific environment variables: SLURM_JOBID and SLURM_NPROCS. SLURM_JOBID is the SLURM job ID of the job. Note that this is not the same as the LSF jobID. SLURM_NPROCS is the number of processors allocated.
PAGE 91
• LSF does not support chunk jobs. If a job is submitted to chunk queue, SLURM will let the job pend. • LSF does not support topology-aware advanced reservation scheduling. 7.1.6 Notes About Using LSF in the HP XC Environment This section provides some additional information that should be noted about using LSF in the HP XC Environment. 7.1.6.1 Job Startup and Job Control When LSF starts a SLURM job, it sets SLURM_JOBID to associate the job with the SLURM allocation.
PAGE 92
The following example shows the output from the bhosts command: $ bhosts HOST_NAME STATUS JL/U MAX lsfhost.localdomain ok - 16 NJOBS RUN SSUSP USUSP RSV 0 0 0 0 0 Of note in the bhosts output: • The HOST_NAME column displays the name of the LSF execution host. • The MAX column displays the total processor count (usable CPUs) of all available computer nodes in the lsf partition. • The STATUS column shows the state of LSF and displays a status of either ok or closed.
PAGE 93
See the OUTPUT section of the lsload manpage for further information about the output of this example. In addition, refer to the Platform Computing Corporation LSF documentation and the lsload manpage for more information about the features of this command. 7.3.4 Checking LSF System Queues All jobs on the HP XC system that are submitted to LSF-HPC are placed into an LSF job queue.
PAGE 94
The basic synopsis of the bsub command is: bsub [ bsub_options] jobname [ job_options] The HP XC system has several features that make it optimal for running parallel applications, particularly (but not exclusively) MPI applications. You can use the bsub command’s -n to request more than one CPU for a job. This option, coupled with the external SLURM scheduler, discussed in Section 7.4.2, gives you much flexibility in selecting resources and shaping how the job is executed on those resources.
PAGE 95
additional capabilities at the job level and queue level by allowing the inclusion of several SLURM options in the LSF command line. Refer to Section 7.4.2. 7.4.2 LSF-SLURM External Scheduler An important option that can be included in submitting parallel jobs with LSF is the external scheduler option: The external scheduler option provides application—specific external scheduling options for jobs capabilities and enables inclusion of several SLURM options in the LSF command line.
PAGE 96
Example 7-2: Using the External Scheduler to Submit a Job to Run on Specific Nodes $ bsub -n4 -ext "SLURM[nodelist=n6,n8]" -I srun hostname Job <70> is submitted to default queue . <> <> n6 n6 n8 n8 In the previous example, the job output shows that the job was launched from the LSF execution host lsfhost.localdomain, and it ran on four nodes using the specified nodes n6 and n8 as two of the four nodes.
PAGE 97
This example runs the job exactly the same as in Example 2, but additionally requests that node n3 is not to be used to run the job. Note that this command could have been written to exclude additional nodes. 7.4.3 Submitting a Serial Job The synopsis of the bsub command to submit a serial (single CPU) job to LSF-HPC is: bsub [bsub-options ] [ srun [srun-options]] jobname [job-options] The bsub command launches the job.
PAGE 98
The srun command, used by the mpirun command to launch the MPI tasks in parallel, determines the number of tasks to launch from the SLURM_NPROCS environment variable that was set by LSF-HPC. Recall that the value of this environment variable is equivalent to the number provided by the -n option of the bsub command. Consider an HP XC system configuration in which lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the lsf partition.
PAGE 99
7.4.6.1 Examples Consider an HP XC system configuration in which lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain 2 processors, providing 20 processors for use by LSF jobs. Example 7-8 displays, then runs, a simple batch script. Example 7-8: Submitting a Batch Job Script $ cat ./myscript.sh #!/bin/sh srun hostname mpirun -srun hellompi $ bsub -n4 -I ./myscript.sh Job <78> is submitted to default queue . <
PAGE 100
Example 7-11: Submitting a Batch job Script That Uses the srun --overcommit Option $ bsub -n4 -I ./myscript.sh "-n8 -O" Job <81> is submitted to default queue . <> <
PAGE 101
The following example shows this resource requirement string in an LSF command: $ bsub -R "type=SLINUX64" -n4 -I srun hostname 7.5 Getting Information About Jobs There are several ways you can get information about a specific job after it has been submitted to LSF. This section briefly describes some of the commands that are available under LSF to gather information about a job. This section is not intended as complete information about this topic.
PAGE 102
EXTERNAL MESSAGES: MSG_ID FROM 0 1 lsfadmin POST_TIME MESSAGE ATTACHMENT date and time stamp SLURM[nodes=4] N In particular, note the node and job allocation information provided in the above output: date and time stamp: Started on 4 Hosts/Processors <4*lsfhost.localdomain>; date and time stamp: slurm_id=22;ncpus=8;slurm_alloc=n[5-8]; 7.5.1.
PAGE 103
Example 7-14: Using the bjobs Command (Long Output) $ bjobs -l 24 Job <24>, User ,Project ,Status , Queue , Interactive pseudo-terminal shell mode, Extsched , Command date and time stamp: Submitted from host , CWD <$HOME>, 4 Processors Requested, Requested Resources ; date and time stamp: Started on 4 Hosts/Processors <4*lsfhost.
PAGE 104
To get detailed information about a finished job, add the -l option to the bhist command, shown in Example 7-16. The -l option specifies that the long format is requested.
PAGE 105
$ bsub -Is -n4 -ext "SLURM[nodes=4]" /usr/bin/xterm Job <101> is submitted to default queue . <> <> n1 At this time an xterm terminal window appears on your display. The xterm program runs on the first node in the allocation. You can execute multiple srun and mpirun commands from this terminal; they will make use of the resources that were reserved by LSF-HPC. The following examples are from an interactive session.
PAGE 106
Example 7-20: View Job Details in LSF (cont.) , 4 Processors Requested; date and time stamp: Dispatched to 4 Hosts/Processors <4*lsfhost.
PAGE 107
comfortable interactive session, but every job submitted to this queue is executed on the LSF execution host instead of the first allocated node. Example 7-23 shows this subtle difference. Note that the LSF execution host in this example is n20: Example 7-23: Submitting an Interactive Shell Program on the LSF Execution Host $ bsub -Is -n4 -ext "SLURM[nodes=4]" -q noscript /bin/bash Job <96> is submitted to default queue