HP XC System Software User's Guide Version 3.
© Copyright 2003, 2005, 2006 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Table of Contents About This Document...................................................................................13 Intended Audience................................................................................................................................13 Document Organization.........................................................................................................................13 HP XC Information...............................................................................
3 Configuring Your Environment with Modulefiles Overview of Modules............................................................................................................................31 Supplied Modulefiles.............................................................................................................................32 Modulefiles Automatically Loaded on the System.......................................................................................33 Viewing Available Modulefiles.....
Running Preexecution Programs...............................................................................................................51 6 Debugging Applications Debugging Serial Applications...............................................................................................................53 Debugging Parallel Applications.............................................................................................................53 Debugging with TotalView.................................
Getting Information About the lsf Partition......................................................................................76 Submitting Jobs...............................................................................................................................77 Summary of the LSF bsub Command Format...................................................................................77 LSF-SLURM External Scheduler...................................................................................
List of Figures 9-1 How LSF-HPC and SLURM Launch and Manage a Job..............................................................................
List of Tables 1-1 1-2 3-1 4-1 5-1 9-1 9-2 Determining the Node Platform..............................................................................................................20 HP XC System Interconnects...................................................................................................................22 Supplied Modulefiles............................................................................................................................33 Compiler Commands..................
List of Examples 4-1 Directory Structure................................................................................................................................44 4-2 Recommended Directory Structure...........................................................................................................44 5-1 Submitting a Serial Job Using Standard LSF.............................................................................................46 5-2 Submitting a Serial Job Using LSF-HPC ............
About This Document This document provides information about using the features and functions of the HP XC System Software. It describes how the HP XC user and programming environments differ from standard Linux® system environments.
• Chapter 10: Advanced Topics (page 91) provides information on remote execution, running an X terminal session from a remote node, and I/O performance considerations. • Appendix A: Examples (page 99) provides examples of HP XC applications. • The Glossary provides definitions of the terms used in this document. HP XC Information The HP XC System Software Documentation Set includes the following core documents.
Documentation for the HP Integrity and HP ProLiant servers is available at the following URL: http://www.docs.hp.com/ For More Information The HP Web site has information on this product. You can access the HP Web site at the following URL: http://www.hp.com Supplementary Information This section contains links to third-party and open source components that are integrated into the HP XC System Software core technology.
• http://supermon.sourceforge.net/ Home page for Supermon, a high-speed cluster monitoring system that emphasizes low perturbation, high sampling rates, and an extensible data protocol and programming interface. Supermon works in conjunction with Nagios to provide HP XC system monitoring. • http://www.llnl.gov/linux/pdsh/ Home page for the parallel distributed shell (pdsh), which executes commands across HP XC client nodes in parallel. • http://www.balabit.
Related Linux Web Sites • http://www.redhat.com Home page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a Linux distribution with which the HP XC operating environment is compatible. • http://www.linux.org/docs/index.html Home page for the Linux Documentation Project (LDP). This Web site contains guides covering various aspects of working with Linux, from creating your own Linux system from scratch to bash script writing.
• Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington • Perl in A Nutshell: A Desktop Quick Reference , by Ellen Siever, et al. Typographic Conventions This document uses the following typographical conventions: %, $, or # audit(5) Command Computer output Ctrl+x ENVIRONMENT VARIABLE ERROR NAME Key Term User input Variable [] {} ... | WARNING CAUTION IMPORTANT NOTE A percent sign represents the C shell system prompt.
1 Overview of the User Environment The HP XC system is a collection of computer nodes, networks, storage, and software, built into a cluster, that work together. It is designed to maximize workload and I/O performance, and to provide the efficient management of large, complex, and dynamic workloads.
Table 1-1 Determining the Node Platform Platform Partial Output of /proc/cpuinfo CP3000 processor vendor_id cpu family model model name : : : : : 0 GenuineIntel 15 3 Intel(R) Xeon(TM) CP4000 processor vendor_id cpu family model model name : : : : : 0 AuthenticAMD 15 5 AMD Opteron(tm) CP6000 processor vendor arch family model : : : : : 0 GenuineIntel IA-64 Itanium 2 1 Note The /proc/cpuinfo file is dynamic. Node Specialization The HP XC system is implemented as a sea-of-nodes.
SAN Storage The HP XC system uses the HP StorageWorks Scalable File Share (HP StorageWorks SFS), which is based on Lustre technology and uses the Lustre File System from Cluster File Systems, Inc. This is a turnkey Lustre system from HP. It supplies access to Lustre file systems through Lustre client/server protocols over various system interconnects. The HP XC system is a client to the HP StorageWorks SFS server.
Be aware of the following information about the HP XC file system layout: • Open source software that by default would be installed under the /usr/local directory is instead installed in the /opt/hptc directory. • Software installed in the /opt/hptc directory is not intended to be updated by users. • Software packages are installed in directories under the /opt/hptc directory under their own names. The exception to this is third-party software, which usually goes in /opt/r.
free -m Disk Partitions Use the following command to display the disk partitions and their sizes: cat /proc/partitions Swap Use the following command to display the swap usage summary by device: swapon -s Cache Use the following commands to display the cache information; this is not available on all systems. cat /proc/pal/cpu0/cache_info cat /proc/pal/cpu1/cache_info User Environment This section introduces some general information about logging in, configuring, and using the HP XC environment.
SLURM commands HP-MPI commands Modules commands Documentation CD contains XC LSF manuals from Platform Computing. LSF manpages are available on the HP XC system. HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource management and job scheduling. Standard SLURM commands are available through the command line. SLURM functionality is described in Chapter 8. Using SLURM . Descriptions of SLURM commands are available in the SLURM manpages.
by default for LSF-HPC batch jobs. The system administrator has the option of creating additional partitions. For example, another partition could be created for interactive jobs. Load Sharing Facility (LSF-HPC) The Load Sharing Facility for High Performance Computing (LSF-HPC) from Platform Computing Corporation is a batch system resource manager that has been integrated with SLURM for use on the HP XC system.
SLURM HP-MPI Allocates nodes for jobs as determined by LSF-HPC. It CONTROLS task/rank distribution within the allocated nodes. SLURM also starts the executables on each host as requested by the HP-MPI mpirun command. Determines HOW the job runs. It is part of the application, so it performs communication. HP-MPI can also pinpoint the processor on which each rank runs. HP-MPI HP-MPI is a high-performance implementation of the Message Passing Interface (MPI) standard and is included with the HP XC system.
2 Using the System This chapter describes the tasks and commands that the general user must know to use the system. It addresses the following topics: • Logging In to the System (page 27) • Overview of Launching and Managing Jobs (page 27) • Performing Other Common User Tasks (page 29) • Getting System Help and Information (page 30) Logging In to the System Logging in to an HP XC system is similar to logging in to any standard Linux system. Logins are performed on nodes that have the login role.
Introduction As described in Run-Time Environment (page 24), SLURM and LSF-HPC cooperate to run and manage jobs on the HP XC system, combining LSF-HPC's powerful and flexible scheduling functionality with SLURM's scalable parallel job-launching capabilities. SLURM is the low-level resource manager and job launcher, and performs core allocation for jobs. LSF-HPC gathers information about the cluster from SLURM.
$ lsload For more information about using this command and a sample of its output, see Getting Host Load Information (page 76). Getting Information About System Partitions You can view information about system partitions with the SLURM sinfo command. The sinfo command reports the state of all partitions and nodes managed by SLURM and provides a wide variety of filtering, sorting, and formatting options.
Getting System Help and Information In addition to the hardcopy documentation described in the preface of this document (About This Document), the HP XC system also provides system help and information in the form on online manpages. Manpages provide online reference and command information from the system command line.
3 Configuring Your Environment with Modulefiles The HP XC system supports the use of Modules software to make it easier to configure and modify the your environment. Modules software enables dynamic modification of your environment by the use of modulefiles.
could cause inconsistencies in the use of shared objects. If you have multiple compilers (perhaps with incompatible shared objects) installed, it is probably wise to set MPI_CC (and others) explicitly to the commands made available by the compiler's modulefile. The contents of the modulefiles in the modulefiles_hptc RPM use the vendor-intended location of the installed software.
Table 3-1 Supplied Modulefiles Modulefile Sets the HP XC User Environment to Use: icc/8.0 Intel C/C++ Version 8.0 compilers. icc/8.1 Intel C/C++ Version 8.1 compilers. icc/9.0 Intel C/C++ Version 9.0 compilers. ifort/8.0 Intel Fortran Version 8.0 compilers. ifort/8.1 Intel Fortran Version 8.1 compilers. ifort/9.0 Intel Fortran Version 9.0 compilers. intel/7.1 Intel Version 7.1 compilers. intel/8.0 Intel Version 8.0 compilers. intel/8.1 Intel Version 8.1 compilers. intel/9.
you are attempting to load conflicts with a currently loaded modulefile, the modulefile will not be loaded and an error message will be displayed. If you encounter a modulefile conflict when loading a modulefile, you must unload the conflicting modulefile before you load the new modulefile. See Modulefile Conflicts (page 34) for further information about modulefile conflicts. Loading a Modulefile for the Current Session You can load a modulefile for your current login session as needed.
When a modulefile conflict occurs, unload the conflicting modulefile before loading the new modulefile. In the previous example, you should unload the ifort/8.0 modulefile before loading the ifort/8.1 modulefile. For information about unloading a modulefile, see Unloading a Modulefile (page 34). Note To avoid problems, HP recommends that you always unload one version of a modulefile before loading another version.
4 Developing Applications This chapter discusses topics associated with developing applications in the HP XC environment. Before reading this chapter, you should you read and understand Chapter 1. Overview of the User Environment and Chapter 2. Using the System .
Table 4-1, “Compiler Commands” displays the compiler commands for Standard Linux, Intel, and PGI compilers for the C, C++, and Fortran languages. Table 4-1 Compiler Commands Type Compilers Standard Linux Notes C C++ Fortran gcc gcc++ g77 All HP XC platforms. The HP XC System Software supplies these compilers by default. Intel icc icc ifort Version 9.0 compilers For use on the Intel-based 64–bit platform. Intel icc icc ifort Version 8.
The Ctrl/C key sequence will report the state of all tasks associated with the srun command. If the Ctrl/C key sequence is entered twice within one second, the associated SIGINT signal will be sent to all tasks. If a third Ctrl/C key sequence is entered, the job will be terminated without waiting for remote tasks to exit. The Ctrl/Z key sequence is ignored.
Developing Parallel Applications This section describes how to build and run parallel applications.
Pthreads POSIX Threads (Pthreads) is a standard library that programmers can use to develop portable threaded applications. Pthreads can be used in conjunction with HP-MPI on the HP XC system. Compilers from GNU, Intel and PGI provide a -pthread switch to allow compilation with the Pthread library. Packages that link against Pthreads, such as MKL, require that the application is linked using the -pthread option.
http://www.pathscale.com/ekopath.html. GNU Parallel Make The GNU parallel Make command is used whenever the make command is invoked. GNU parallel Make provides the ability to do a parallel Make; however, all compiling takes place on the login node. Therefore, whether a parallel make improves build time depends upon how many cores are on the login node and the load on the login node. Information about using the GNU parallel Make is provided in "Using the GNU Parallel Make Capability" .
Examples of Compiling and Linking HP-MPI Applications The following examples show how to compile and link your application code by invoking a compiler utility. If you have not already loaded the mpi compiler utilities module , load it now as follows: $ module load mpi To compile and link a C application using the mpicc command: $ mpicc -o mycode hello.c To compile and link a Fortran application using the mpif90 command: $ mpif90 -o mycode hello.
recommends an alternative method. The dynamic linker, during its attempt to load libraries, will suffix candidate directories with the machine type. The HP XC system on the CP4000 platform uses i686 for 32-bit binaries and x86_64 for 64-bit binaries. HP recommends structuring directories to reflect this behavior.
5 Submitting Jobs This chapter describes how to submit jobs on the HP XC system; it addresses the following topics: • Overview of Job Submission (page 45) • Submitting a Serial Job Using Standard LSF (page 46) • Submitting a Serial Job Using LSF-HPC (page 46) • Submitting a Non-MPI Parallel Job (page 48) • Submitting a Parallel Job That Uses the HP-MPI Message Passing Interface (page 48) • Submitting a Batch Job or Job Script (page 49) • Running Preexecution Programs (page 51) Overview of Job
Submitting a Serial Job Using Standard LSF Example 5-1 Submitting a Serial Job Using Standard LSF Use the bsub command to submit a serial job to standard LSF. $ bsub hostname Job <61> is submitted to default queue . <> <> n1 Submitting a Serial Job Using LSF-HPC There are various methods for submitting a serial job on the HP XC system: • Using the LSF bsub command alone.
Example 5-3 Submitting an Interactive Serial Job Using LSF-HPC only $ bsub -I hostname Job <73> is submitted to default queue . <> <> n1 Example 5-4 uses the LSF-SLURM External Scheduler to submit a job to run on four cores on two specific compute nodes.
The output for this command could also have been 1 core on each of 4 compute nodes in the SLURM allocation. Submitting a Non-MPI Parallel Job Use the following format of the LSF bsub command to submit a parallel job that does not make use of HP-MPI: bsub -n num-procs [bsub-options] srun [srun-options] jobname [job-options] The bsub command submits the job to LSF-HPC. The -n num-procs parameter, which is required for parallel jobs, specifies the number of cores requested for the job.
to the number provided by the -n option of the bsub command. Any additional SLURM srun options are job specific, not allocation-specific. The mpi-jobname is the executable file to be run. The mpi-jobname must be compiled with the appropriate HP-MPI compilation utility. Refer to the section titled Compiling applications in the HP-MPI User's Guide for more information. Example 5-7 shows an MPI job that runs a hello world program on 4 cores on 2 compute nodes.
In Example 5-9, a simple script named myscript.sh, which contains two srun commands, is displayed then submitted. Example 5-9 Submitting a Job Script $ cat myscript.sh #!/bin/sh srun hostname mpirun -srun hellompi $ bsub -I -n4 myscript.sh Job <29> is submitted to default queue . <> <
Example 5-12 Submitting a Batch job Script That Uses the srun --overcommit Option $ bsub -n4 -I ./myscript.sh Job <81> is submitted to default queue . <> <
program should pick up the SLURM_JOBID environment variable. The SLURM_JOBID has the information LSF-HPC needs to run the job on the nodes required by your preexecution program. The following items provide the information you need to run the preexecution program on the resource manager node, on the first allocated node, or on all the allocated nodes: Table 5-1 To run a preexecution program on the resource manager node: This is the default behavior. Run the pre-execution program normally.
6 Debugging Applications This chapter describes how to debug serial and parallel applications in the HP XC development environment. In general, effective debugging of applications requires the applications to be compiled with debug symbols, typically the -g switch. Some compilers allow -g with optimization.
This section provides only minimum instructions to get you started using TotalView. Instructions for installing TotalView are included in the HP XC System Software Installation Guide. Read the TotalView documentation for full information about using TotalView; the TotalView documentation set is available directly from Etnus, Inc. at the following URL: http://www.etnus.
Using TotalView with LSF-HPC HP recommends the use of xterm when debugging an application with LSF-HPC. You also need to allocate the nodes you will need.
4. The TotalView process window opens. This window contains multiple panes that provide various debugging functions and debugging information. The name of the application launcher that is being used (either srun or mpirun) is displayed in the title bar. 5. Set the search path if you are invoking TotalView from a directory that does not contain the executable file and the source code. If TotalView is invoked from the same directory, you can skip to step 6. Set the search path as follows: a.
Exiting TotalView It is important that you make sure your job has completed before exiting TotalView. This may require that you wait a few seconds from the time your job has completed until srun has completely exited. If you exit TotalView before your job is completed, use the squeue command to ensure that your job is not still on the system.
7 Tuning Applications This chapter discusses how to tune applications in the HP XC environment. Using the Intel Trace Collector and Intel Trace Analyzer This section describes how to use the Intel Trace Collector (ITC) and Intel Trace Analyzer (ITA) with HP-MPI on an HP XC system. The Intel Trace Collector/Analyzer were formerly known as VampirTrace and Vampir, respectively.
Example 7-1 The vtjacobic Example Program For the purposes of this example, the examples directory under /opt/IntelTrace/ITC is copied to the user's home directory and renamed to examples_directory. The GNU makefile looks as follows: CC F77 CLINKER FLINKER IFLAGS CFLAGS FFLAGS LIBS CLDFLAGS -ldwarf FLDFLAGS -ldwarf = mpicc.mpich = mpif77.mpich = mpicc.mpich = mpif77.
/ITA/doc/Intel_Trace_Analyzer_Users_Guide.
8 Using SLURM HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource management and job scheduling.
The srun command has a significant number of options to control the execution of your application closely. However, you can use it for a simple launch of a serial program, as Example 8-1 shows. Example 8-1 Simple Launch of a Serial Program $ srun hostname n1 The srun Roles and Modes The srun command submits jobs to run under SLURM management. The srun command can perform many roles in launching and managing your job.
The squeue command can report on jobs in the job queue according to their state; possible states are: pending, running, completing, completed, failed, timeout, and node_fail. Example 8-3 uses the squeue command to report on failed jobs. Example 8-3 Reporting on Failed Jobs in the Queue $ squeue --state=FAILED JOBID PARTITION NAME 59 amt1 hostname USER root ST F TIME 0:00 NODES NODELIST 0 Terminating Jobs with the scancel Command The scancel command cancels a pending or running job or job step.
# chmod a+r /hptc_cluster/slurm/job/jobacct.log You can find detailed information on the sacct command and job accounting data in the sacct(1) manpage. Fault Tolerance SLURM can handle a variety of failure modes without terminating workloads, including crashes of the node running the SLURM controller. User jobs may be configured to continue execution despite the failure of one or more nodes on which they are executing.
9 Using LSF The Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource manager used on the HP XC system. LSF is an integral part of the HP XC environment. On an HP XC system, a job is submitted to LSF, which places the job in a queue and allows it to run when the necessary resources become available. In addition to launching jobs, LSF provides extensive job management and information capabilities.
job management and information capabilities. LSF-HPC schedules, launches, controls, and tracks jobs that are submitted to it according to the policies established by the HP XC site administrator. This section describes the functionality of LSF-HPC in an HP XC system, and discusses how to use some basic LSF commands to submit jobs, manage jobs, and access job information.
Differences Between LSF-HPC and Standard LSF LSF-HPC for the HP XC environment supports all the standard features and functions that standard LSF supports, except for those items described in this section, in "Notes About Using LSF-HPC in the HP XC Environment" , and in the HP XC release notes for LSF-HPC. • By LSF standards, the HP XC system is a single host. Therefore, all LSF “per-host” configuration and “per-host” options apply to the entire HP XC system.
• All HP XC nodes are dynamically configured as “LSF Floating Client Hosts” so that you can execute LSF commands from any HP XC node. When you do execute an LSF command from an HP XC node, an entry in the output of the lshosts acknowledges the node is licensed to run LSF commands. In the following example, node n15 is configured as an LSF Client Host, not the LSF execution host.
Serial jobs are allocated a single CPU on a shared node with minimal capacities that satisfies other allocation criteria. LSF-HPC always tries to run multiple serial jobs on the same node, one CPU per job. Parallel jobs and serial jobs cannot run on the same node. Pseudo-parallel job A job that requests only one slot but specifies any of these constraints: • mem • tmp • nodes=1 • mincpus > 1 Pseudo-parallel jobs are allocated one node for their exclusive use.
• exclude= list-of-nodes • contiguous=yes The srun(1) manpage provides details on these options and their arguments. The following are interactive examples showing how these options can be used on an HP XC system.
• Use the bjobs command to monitor job status in LSF-HPC. • Use the bqueues command to list the configured job queues in LSF-HPC. How LSF-HPC and SLURM Launch and Manage a Job This section describes what happens in the HP XC system when a job is submitted to LSF-HPC. Figure 9-1 illustrates this process. Use the numbered steps in the text and depicted in the illustration as an aid to understanding the process. Consider the HP XC system configuration shown in Figure 9-1, in which lsfhost.
This bsub command launches a request for four cores (from the -n4 option of the bsub command) across four nodes (from the -ext "SLURM[nodes=4]" option); the job is launched on those cores. The script, myscript, which is shown here, runs the job: #!/bin/sh hostname srun hostname mpirun -srun ./hellompi 3. LSF-HPC schedules the job and monitors the state of the resources (compute nodes) in the SLURM lsf partition.
Preemption LSF-HPC uses the SLURM "node share" feature to facilitate preemption. When a low-priority is job preempted, job processes are suspended on allocated nodes, and LSF-HPC places the high-priority job on the same node. After the high-priority job completes, LSF-HPC resumes suspended low-priority jobs. Determining the LSF Execution Host The lsid command displays the name of the HP XC system, and the name of the LSF execution host, along with some general LSF-HPC information.
The following example shows the output from the lshosts command: $ lshosts HOST_NAME lsfhost.loc n7 n8 n2 type SLINUX6 UNKNOWN UNKNOWN UNKNOWN model cpuf ncpus maxmem maxswp server RESOURCES Itanium2 16.0 12 3456M Yes (slurm) UNKNOWN_ 1.0 No () UNKNOWN_ 1.0 No () UNKNOWN_ 1.0 No () Of note in the lshosts output: • The HOST_NAME column displays the name of the LSF execution host, lsfhost.
$ sinfo -p lsf PARTITION AVAIL TIMELIMIT NODES lsf up infinite 128 STATE NODELIST idle n[1-128] Use the following command to obtain more information on the nodes in the lsf partition: $ sinfo -p lsf -lNe NODELIST NODES PARTITION STATE CPUS MEMORY TMP_DISK WEIGHT FEATURES REASON n[1-128] 128 lsf idle 2 3456 1 1 (null) none Refer to "Getting System Information with the sinfo Command" and the sinfo(1) manpage and for further information about using the sinfo command.
LSF-HPC node allocation (compute nodes). LSF-HPC node allocation is created by -n num-procs parameter, which specifies the number of cores the job requests. The num-procs parameter may be expressed as minprocs[,maxprocs] where minprocs specifies the minimum number of cores and the optional value maxprocs specifies the maximum number of cores. Refer to "Submitting a Non-MPI Parallel Job" for information about running jobs. Refer to "Submitting a Batch Job or Job Script" for information about running scripts.
Refer to the LSF bsub command manpage for additional information about using the external scheduler (-ext) option. See the srun manpage for more details about the above options and their arguments. Consider an HP XC system configuration in which lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain two cores, providing 20 cores for use by LSF-HPC jobs. Example 9-1 shows one way to submit a parallel job to run on a specific node or nodes.
Getting Information About Jobs There are several ways you can get information about a specific job after it has been submitted to LSF-HPC. This section briefly describes some of the commands that are available under LSF-HPC to gather information about a job. This section is not intended as complete information about this topic.
Job Allocation Information for a Finished Job The following is an example of the output obtained using the bhist -l command to obtain job allocation information about a job that has run: $ bhist -l 24 Job <24>, User , Project , Interactive pseudo-terminal shell mode, Extsched , Command date and time stamp: Submitted from host , to Queue , CWD <$HOME>, 4 Processors Requested, Requested Resources ; date and time stamp: Dispatched to 4 Hosts/
Example 9-5 Using the bjobs Command (Long Output) $ bjobs -l 24 Job <24>, User ,Project ,Status , Queue , Interactive pseudo-terminal shell mode, Extsched , Command date and time stamp: Submitted from host , CWD <$HOME>, 4 Processors Requested, Requested Resources ; date and time stamp: Started on 4 Hosts/Processors <4*lsfhost.
Example 9-7 Using the bhist Command (Long Output) $ bhist -l 24 Job <24>, User , Project , Interactive pseudo-terminal shell mode, Extsched , Command date and time stamp: Submitted from host , to Queue , CWD <$HOME>, 4 Processors Requested, Requested Resources ; date and time stamp: Dispatched to 4 Hosts/Processors <4*lsfhost.
$ sacct -j Jobstep ---------123 123.0 123 Jobname -----------------hptclsf@99 hptclsf@99 Partition Ncpus Status ---------- ------- ---------lsf 8 CANCELLED lsf 0 COMPLETED Error ----0 0 The status of a completed job handled by LSF-HPC is always CANCELLED because LSF-HPC destroys the allocation that it creates for the job after the user job completes. LSF-HPC performs the following steps: • Creates the allocation in SLURM. • Submits the user job to SLURM. • Waits for the user job to finish.
Be sure to unset the SLURM_JOBID when you are finished with the allocation, to prevent a previous SLURM JOBID from interfering with future jobs: $ unset SLURM_JOBID The following examples illustrate launching interactive MPI jobs. They use the hellompi job script introduced in Section (page 48).
confirm an expected high load on the nodes. The following is an example of this; the LSF JOBID is 200 and the SLURM JOBID is 250: $ srun --jobid=250 uptime If you are concerned about allocating the resources too long or leaving them allocated long after you finished using them, you could submit a simple sleep job to limit the allocation time, as follows: $ bsub -n4 -ext "SLURM[nodes=4]" -o %J.out sleep 300 Job <125> is submitted to the default queue .
Table 9-2 LSF-HPC Equivalents of SLURM srun Options srun Option Description LSF-HPC Equivalent -n Number of processes (tasks) to run. bsub -n num --ntasks=ntasks -c --processors-per-task=ntasks -N --nodes=min[-max] Specifies the number of cores per task. Min HP XC does not provide this option because the processors per node = MAX(ncpus, mincpus) meaning of this option can be covered by bsub -n and mincpus=n.
srun Option Description LSF-HPC Equivalent --uid=user Root attempts to submit or run a job as normal user. You cannot use this option. LSF-HPC uses it to create allocation. -t Establish a time limit to terminate the job after specified number of minutes. bsub -W runlimit --gid=group Root attempts to submit or run a job as group. You cannot use this option. LSF-HPC uses this option to create allocation. -A Allocate resource and spawn a shell. You cannot use this option.
srun Option Description LSF-HPC Equivalent -W How long to wait after the first task terminates before terminating all remaining tasks. Use as an argument to srun when launching parallel tasks. Quit immediately on single SIGINT. Meaningless under LSF-HPC. Suppress informational message. Use as an argument to srun when launching parallel tasks. --core=type Adjust corefile format for parallel job. Use as an argument to srun when launching parallel tasks. -a Attach srun to a running job.
10 Advanced Topics This chapter covers topics intended for the advanced user.
$ hostname mymachine Then, use the host name of your local machine to retrieve its IP address: $ host mymachine mymachine has address 14.26.206.134 Step 2. Logging in to HP XC System Next, you need to log in to a login node on the HP XC system. For example: $ ssh user@xc-node-name Once logged in to the HP XC system, you can start an X terminal session using SLURM or LSF-HPC. Both methods are described in the following sections. Step 3.
Determine the address of your monitor's display server, as shown at the beginning of "Running an X Terminal Session from a Remote Node" . You can start an X terminal session using this address information in a bsub command with the appropriate options. For example: $ bsub -n4 -Ip srun -n1 xterm -display 14.26.206.134:0.0 Job <159> is submitted to default queue . <> <
Further, if the recursive make is run remotely, it can be told to use concurrency on the remote node. For example: $ cd subdir; srun -n1 -N1 $(MAKE) -j4... This can cause multiple makes to run concurrently, each building their targets concurrently. The -N1 option is used to reserve the entire node, because it is intended to be used for multiple compilations. The following examples illustrate these ideas. In GNU make, a $(VARIABLE) that is unspecified is replaced with nothing.
@ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Cleaning $$i ..."; \ (cd $$i; make clean); \ fi; \ done veryclean: @ \ for i in ${HYPRE_DIRS}; \ do \ if [ -d $$i ]; \ then \ echo "Very-cleaning $$i ..."; \ (cd $$i; make veryclean); \ fi; \ done Example Procedure 1 Go through the directories serially and have the make procedure within each directory be parallel. For the purpose of this exercise we are only parallelizing the “make all” component.
struct_matrix_vector/libHYPRE_mv.a: $(PREFIX) $(MAKE) -C struct_matrix_vector struct_linear_solvers/libHYPRE_ls.a: $(PREFIX) $(MAKE) -C struct_linear_solvers utilities/libHYPRE_utilities.a: $(PREFIX) $(MAKE) -C utilities The modified Makefile is invoked as follows: $ make PREFIX='srun -n1 -N1' MAKE_J='-j4' Example Procedure 3 Go through the directories in parallel and have the make procedure within each directory be parallel.
Shared File View Although a file opened by multiple processes of an application is shared, each core maintains a private file pointer and file position. This means that if a certain order of input or output from multiple cores is desired, the application must synchronize its I/O requests or position its file pointer such that it acts on the desired file location. Output requests to standard output and standard error are line-buffered, which can be sufficient output ordering in many cases.
Appendix A Examples This appendix provides examples that illustrate how to build and run applications on the HP XC system. The examples in this section show you how to take advantage of some of the many methods available, and demonstrate a variety of other user commands to monitor, control, or kill jobs. The examples in this section assume that you have read the information in previous chapters describing how to use the HP XC commands to build and run parallel applications.
Examine the local host information: $ hostname n2 Examine the job information: $ bjobs No unfinished job found Run the LSF bsub -Is command to launch the interactive shell: $ bsub -Is -n1 /bin/bash Job <120> is submitted to default queue . <> <
SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - loadStop - - - - - EXTERNAL MESSAGES: MSG_ID FROM POST_TIME 0 1 lsfadmin date and time MESSAGE SLURM[nodes=2] ATTACHMENT N Example 2. Four cores on Two Specific Nodes This example submits a job that requests four cores on two specific nodes, on an XC system that has three compute nodes. Submit the job: $ bsub -I -n4 -ext "SLURM[nodelist=n[14,16]]" srun hostname Job <9> is submitted to default queue .
Examine the partition information: $ sinfo PARTITION AVAIL TIMELIMIT NODES lsf up infinite 6 STATE NODELIST idle n[5-10] Examine the local host information: $ hostname n2 Examine the job information: $ bjobs No unfinished job found Run the LSF bsub -Is command to launch the interactive shell: $ bsub -Is -n4 -ext "SLURM[nodes=4]" /bin/bash Job <124> is submitted to default queue . <> <
Examine the the finished job's information: $ bhist -l 124 Job <124>, User , Project , Interactive pseudo-terminal shell mode, Extsched , Command date and time stamp: Submitted from host , to Queue , CWD <$HOME>, 4 Processors Requested, Requested Resources ; date and time stamp: Dispatched to 4 Hosts/Processors <4*lsfhost.
n16 n16 Linux n14 2.4.21-15.3hp.XCsmp #2 SMP date ia64 Linux n14 2.4.21-15.3hp.XCsmp #2 SMP date ia64 Linux n16 2.4.21-15.3hp.XCsmp #2 SMP date ia64 Linux n16 2.4.21-15.3hp.XCsmp #2 SMP date ia64 and time stamp ia64 ia64 GNU/Linux and time stamp ia64 ia64 GNU/Linux and time stamp ia64 ia64 GNU/Linux and time stamp ia64 ia64 GNU/Linux Submitting an Interactive Job with LSF-HPC This example shows how to submit a batch interactive job to LSF-HPC with the bsub -Ip command.
n15 n15 n16 n16 $ srun -n3 hostname n13 n14 n15 Exit the pseudo-terminal: $ exit exit View the interactive jobs: $ bjobs -l 1008 Job <1008>, User smith, Project , Status , Queue , Interactive pseudo-terminal mode, Command date and time stamp: Submitted from host n16, CWD <$HOME/tar_drop1/test>, 8 Processors Requested; date and time stamp: Started on 8Hosts/Processors<8*lsfhost.
Copyright 1992-2004 Platform Computing Corporation My cluster name is penguin My master name is lsfhost.localdomain $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lsf up infinite 4 alloc n[13-16] $ lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES lsfhost.loc SLINUX6 DEFAULT 1.0 8 1M Yes (slurm) $ bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV lsfhost.
6 Processors Requested; date and time stamp: Dispatched to 6 Hosts/Processors <6*lsfhost.localdomain>; date and time stamp: slurm_id=22;ncpus=6;slurm_alloc=n[13-15]; date and time stamp: Starting (Pid 11216); date and time stamp: Done successfully. The CPU time used is 0.
Glossary A administration branch The half (branch) of the administration network that contains all of the general-purpose administration ports to the nodes of the HP XC system. administration network The private network within the HP XC system that is used for administrative operations. B base image The collection of files and directories that represents the common files and configuration data that are applied to all nodes in an HP XC system. branch switch A component of the Administration Network.
FCFS First-come, first-served. An LSF job-scheduling policy that specifies that jobs are dispatched according to their order in a queue, which is determined by job priority, not by order of submission to the queue. first-come, first-served See FCFS. G global storage Storage within the HP XC system that is available to all of the nodes in the system. Also known as local storage. golden client The node from which a standard file system image is created.
L Linux Virtual Server See LVS. load file A file containing the names of multiple executables that are to be launched simultaneously by a single command. Load Sharing Facility See LSF-HPC with SLURM. local storage Storage that is available or accessible from one node in the HP XC system. LSF execution host The node on which LSF runs. A user's job is submitted to the LSF execution host. Jobs are launched from the LSF execution host and are executed on one or more compute nodes.
Network Information Services See NIS. NIS Network Information Services. A mechanism that enables centralization of common data that is pertinent across multiple machines in a network. The data is collected in a domain, within which it is accessible and relevant. The most common use of NIS is to maintain user account information across a set of networked hosts. NIS client Any system that queries NIS servers for NIS database information.
SMP Symmetric multiprocessing. A system with two or more CPUs that share equal (symmetric) access to all of the facilities of a computer system, such as the memory and I/O subsystems. In an HP XC system, the use of SMP technology increases the number of CPUs (amount of computational power) available per unit of space. ssh Secure Shell. A shell program for logging in to and executing commands on a remote computer.
Index A ACML library, 42 application development, 37 building parallel applications, 42 building serial applications, 39 communication between nodes, 97 compiling and linking parallel applications, 42 compiling and linking serial applications, 39 debugging parallel applications, 53 debugging serial applications, 53 debugging with TotalView, 53 determining available resources for, 75 developing libraries, 43 developing parallel applications, 40 developing serial applications, 39 examining core availability,
configuring local disk, 96 core availability, 38 interrupting jobs, 38 intranode communication, 97 D J DDT, 53 debugger TotalView, 53 debugging DDT, 53 gdb, 53 idb, 53 pgdbg, 53 TotalView, 53 debugging options setting, 39 debugging parallel applications, 53 debugging serial applications, 53 determining LSF execution host, 75 developing applications, 37 developing libraries, 43 developing parallel applications, 40 developing serial applications, 39 job examining status, 81 getting information about, 80
submitting jobs, 77 summary of bsub command, 77 using srun with, 64 viewing historical information of jobs, 82 LSF-SLURM external scheduler, 45 lshosts command, 75 examining host resources, 75 lsid command, 29, 75 lsload command, 76 LVS routing login requests, 27 M Makefile (see GNU make) manpage HP XC, 30 Linux, 30 third-party vendor, 30 MANPATH environment variable setting with a module, 32 math library, 42 MKL library building parallel applications, 42 module commands avail command, 33 list command, 33
examples of, 99 programming model, 39 shared file view, 97 signal sending to a job, 65 Simple Linux Utility for Resource Management (see SLURM) sinfo command, 65, 76 SLURM, 63 fault tolerance, 66 interaction with LSF-HPC, 73 job accounting, 65 lsf partition, 75 security model, 66 SLURM_JOBID environment variable, 80, 83 SLURM_NPROCS environment variable, 80 submitting a serial job, 47 utilities, 63 SLURM commands sacct, 65 scancel, 65 sinfo, 65, 76 squeue, 64 srun, 38, 63 squeue command, 64 srun, 63 used wi