HP-MPI User's Guide (11th Edition)

ManualsBrandsHP ManualsSoftwareHP-UX Performance Tools

HP-MPI User’s Guide

11th Edition

Manufacturing Part Number : B6060-96024

September 2007

Summary of content (352 pages)

PAGE 2
Table 1 ii Revision history Edition MPN Description Eleventh B6060-96024 Released with HP-MPI Windows V1.1 September, 2007. Tenth B6060-96022 Released with HP-MPI V2.2.5 June, 2007. Ninth B6060-96018 Released with HP-MPI V2.1 April, 2005. Eighth B6060-96013 Released with HP MPI V2.0 September, 2003. Seventh B6060-96008 Released with HP MPI V1.8 June, 2002. Sixth B6060-96004 Released with HP MPI V1.7 March, 2001. Fifth B6060-96001 Released with HP MPI V1.6 June, 2000.
PAGE 3
Notice Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this material, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose.
PAGE 4
iv
PAGE 5
Contents Preface Platforms supported. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Documentation resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
Contents Compiling applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 C++ bindings (for HP-UX and Linux) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Non-g++ ABI compatible C++ compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Autodouble functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 MPI functions . . . . . . .
PAGE 7
Contents Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interconnect support of MPI-2 functionality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resource usage of TCP/IP communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resource usage of RDMA communication modes . . . . . . . . . . . . . . . . . . . . . . . . . . . Improved deregistration via ptmalloc (Linux only) . . . . . . . .
PAGE 8
Contents Debugging HP-MPI applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using a single-process debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using a multi-process debugger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the diagnostics library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enhanced debugging output . . . . . . . . . . . .
PAGE 9
Contents thread_safe.c. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . thread_safe output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sort.C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sort.C output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Contents x
PAGE 11
Figures Figure 3-1. Job Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95 Figure 3-2. Daemon communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .164 Figure 4-1. ASCII instrumentation profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .177 Figure 5-1. Multiple network interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .188 Figure A-1. Array partitioning . . . . . . .
PAGE 12
Figures xii
PAGE 13
Tables Table 1. Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table 2. Supported platforms, interconnects, and operating systems. . . . . . . . . . . xvii Table 3. Typographic conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Table 1-1. Six commonly used MPI routines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Table 1-2. MPI blocking and nonblocking calls . . . . . . .
PAGE 14
Tables xiv
PAGE 15
Preface This guide describes the HP-MPI (version 2.2.5) implementation of the Message Passing Interface (MPI) standard. The guide helps you use HP-MPI to develop and run parallel applications.
PAGE 16
You should already have experience developing UNIX applications. You should also understand the basic concepts behind parallel processing, be familiar with MPI, and with the MPI 1.2 and MPI-2 standards (MPI: A Message-Passing Interface Standard and MPI-2: Extensions to the Message-Passing Interface, respectively). You can access HTML versions of the MPI 1.2 and 2 standards at http://www.mpi-forum.org. This guide supplements the material in the MPI standards and MPI: The Complete Reference.
PAGE 17
Platforms supported Table 2 Supported platforms, interconnects, and operating systems Platform Interconnect Operating System TCP/IP Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.3, 10 Myrinet GM-2 and MX Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.3, 10 InfiniBand Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.
PAGE 18
Table 2 Supported platforms, interconnects, and operating systems Platform Intel Itanium-based xviii Interconnect Operating System TCP/IP Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.3, 10 Windows CCS, HP-UX 11i, HP-UX 11i V2 QsNet Elan4 Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.3, 10 InfiniBand Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.
PAGE 19
Table 2 Supported platforms, interconnects, and operating systems Platform Interconnect TCP/IP Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.3, 10 Myrinet GM-2 and MX Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.3, 10 InfiniBand Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.3, 10 QsNet Elan4 Red Hat Enterprise Linux AS 3.0 and 4.0, SuSE Linux Enterprise Server 9, 9.1, 9.2, 9.
PAGE 20
Table 2 Supported platforms, interconnects, and operating systems Platform Interconnect Operating System QsNet Elan4 HP XC4000 Clusters Myrinet GM-2 and MX HP XC Linux TCP/IP InfiniBand TCP/IP HP XC6000 Clusters QsNet Elan4 HP XC Linux InfiniBand HP Cluster Platforms TCP/IP InfiniBand Microsoft Windows Compute Cluster Pack (CCP) PA-RISC TCP/IP HP-UX 1Supported xx on HP InfiniBand solutions for HP-UX.
PAGE 21
Notational conventions This section describes notational conventions used in this book. Table 3 Typographic conventions bold monospace In command examples, bold monospace identifies input that must be typed exactly as shown. monospace In paragraph text, monospace identifies command names, system calls, and data structures and types. In command examples, monospace identifies command output, including error messages. italic In paragraph text, italic identifies titles of documents.
PAGE 22
Documentation resources Documentation resources include: • HP-MPI product information available at http://www.hp.com/go/hpmpi • MPI: The Complete Reference (2 volume set), MIT Press • MPI 1.2 and 2.0 standards available at http://www.mpi-forum.org: — MPI: A Message-Passing Interface Standard — MPI-2: Extensions to the Message-Passing Interface • TotalView documents available at http://www.totalviewtech.
PAGE 23
Credits HP-MPI is based on MPICH from Argonne National Laboratory and LAM from the University of Notre Dame and Ohio Supercomputer Center. HP-MPI includes ROMIO, a portable implementation of MPI I/O developed at the Argonne National Laboratory.
PAGE 24
xxiv
PAGE 25
1 Introduction This chapter provides a brief introduction about basic Message Passing Interface (MPI) concepts and the HP implementation of MPI.
PAGE 26
Introduction This chapter contains the syntax for some MPI functions. Refer to MPI: A Message-Passing Interface Standard for syntax and usage details for all MPI standard functions. Also refer to MPI: A Message-Passing Interface Standard and to MPI: The Complete Reference for in-depth discussions of MPI concepts.
PAGE 27
Introduction The message passing model The message passing model Programming models are generally categorized by how memory is used. In the shared memory model each process accesses a shared address space, while in the message passing model an application runs as a collection of autonomous processes, each with its own local memory. In the message passing model processes communicate with other processes by sending and receiving messages.
PAGE 28
Introduction MPI concepts MPI concepts The primary goals of MPI are efficient communication and portability. Although several message-passing libraries exist on different systems, MPI is popular for the following reasons: • Support for full asynchronous communication—Process communication can overlap process computation. • Group membership—Processes may be grouped based on context.
PAGE 29
Introduction MPI concepts Table 1-1 Six commonly used MPI routines MPI routine Description MPI_Init Initializes the MPI environment MPI_Finalize Terminates the MPI environment MPI_Comm_rank Determines the rank of the calling process within a group MPI_Comm_size Determines the size of the group MPI_Send Sends messages MPI_Recv Receives messages You must call MPI_Finalize in your application to conform to the MPI Standard.
PAGE 30
Introduction MPI concepts Point-to-point communication Point-to-point communication involves sending and receiving messages between two processes. This is the simplest form of data transfer in a message-passing model and is described in Chapter 3, “Point-to-Point Communication” in the MPI 1.0 standard. The performance of point-to-point communication is measured in terms of total transfer time.
PAGE 31
Introduction MPI concepts number assigned to each member process from the sequence 0 through (size-1), where size is the total number of processes in the communicator.
PAGE 32
Introduction MPI concepts 2. The application does some computation. 3. The application calls a completion routine (for example, MPI_Test or MPI_Wait) to test or wait for completion of the send operation. Blocking communication Blocking communication consists of four send modes and one receive mode. The four send modes are: Standard (MPI_Send) The sending process returns when the system can buffer the message or when the message is received and the buffer is ready for reuse.
PAGE 33
Introduction MPI concepts MPI_Recv (void *buf, int count, MPI_datatype dtype, int source, int tag, MPI_Comm comm, MPI_Status *status); where buf Specifies the starting address of the buffer. count Indicates the number of buffer elements. dtype Denotes the datatype of the buffer elements. source Specifies the rank of the source process in the group associated with the communicator comm. tag Denotes the message label. comm Designates the communication context that identifies a group of processes.
PAGE 34
Introduction MPI concepts Table 1-2 MPI blocking and nonblocking calls Blocking mode Nonblocking mode MPI_Bsend MPI_Ibsend MPI_Ssend MPI_Issend MPI_Rsend MPI_Irsend MPI_Recv MPI_Irecv Nonblocking calls have the same arguments, with the same meaning as their blocking counterparts, plus an additional argument for a request.
PAGE 35
Introduction MPI concepts Collective operations consist of routines for communication, computation, and synchronization. These routines all specify a communicator argument that defines the group of participating processes and the context of the operation. Collective operations are valid only for intracommunicators. Intercommunicators are not allowed as arguments. Communication Collective communication involves the exchange of data among all processes in a group.
PAGE 36
Introduction MPI concepts To code a broadcast, use MPI_Bcast(void *buf, int count, MPI_Datatype dtype, int root, MPI_Comm comm); where buf Specifies the starting address of the buffer. count Indicates the number of buffer entries. dtype Denotes the datatype of the buffer entries. root Specifies the rank of the root. comm Designates the communication context that identifies a group of processes. For example “compute_pi.
PAGE 37
Introduction MPI concepts All–reduce Returns the result of a reduction at all nodes. Reduce-Scatter Combines the functionality of reduce and scatter operations. Scan Performs a prefix reduction on data distributed across a group. Section 4.9, “Global Reduction Operations” in the MPI 1.0 standard describes each of these functions in detail. Reduction operations are binary and are only valid on numeric data. Reductions are always associative but may or may not be commutative.
PAGE 38
Introduction MPI concepts Synchronization Collective routines return as soon as their participation in a communication is complete. However, the return of the calling process does not guarantee that the receiving processes have completed or even started the operation. To synchronize the execution of processes, call MPI_Barrier. MPI_Barrier blocks the calling process until all processes in the communicator have called it.
PAGE 39
Introduction MPI concepts Provide MPI_Pack and MPI_Unpack functions so that a sending process can pack noncontiguous data into a contiguous buffer and a receiving process can unpack data received in a contiguous buffer and store it in noncontiguous locations. Using derived datatypes is more efficient than using MPI_Pack and MPI_Unpack. However, derived datatypes cannot handle the case where the data layout varies and is unknown by the receiver, for example, messages that embed their own layout description.
PAGE 40
Introduction MPI concepts Multilevel parallelism By default, processes in an MPI application can only do one task at a time. Such processes are single-threaded processes. This means that each process has an address space together with a single program counter, a set of registers, and a stack. A process with multiple threads has one address space, but each process thread has its own counter, registers, and stack. Multilevel parallelism refers to MPI processes that have multiple threads.
PAGE 41
2 Getting started This chapter describes how to get started quickly using HP-MPI. The semantics of building and running a simple MPI program are described, for single- and multiple-hosts.
PAGE 42
Getting started environment before running your program. You become familiar with the file structure in your HP-MPI directory. The HP-MPI licensing policy is explained. The goal of this chapter is to demonstrate the basics to getting started using HP-MPI. It is separated into two major sections: Getting started using HP-UX or Linux, and Getting started using Windows.
PAGE 43
Getting started — Building and running on a Windows 2003/XP cluster using appfiles — Directory structure for Windows — Windows man pages — Licensing Policy for Windows Chapter 2 19
PAGE 44
Getting started Getting started using HP-UX or Linux Getting started using HP-UX or Linux Configuring your environment Setting PATH If you move the HP-MPI installation directory from its default location in /opt/mpi for HP-UX, and /opt/hpmpi for Linux: • Set the MPI_ROOT environment variable to point to the location where MPI is installed. • Add $MPI_ROOT/bin to PATH. • Add $MPI_ROOT/share/man to MANPATH. MPI must be installed in the same directory on every execution host.
PAGE 45
Getting started Getting started using HP-UX or Linux HP-MPI allows users to specify the remote execution tool to use when HP-MPI needs to start processes on remote hosts. The tool specified must have a call interface similar to that of the standard utilities: rsh, remsh and ssh.
PAGE 46
Getting started Getting started using HP-UX or Linux Compiling and running your first application To quickly become familiar with compiling and running HP-MPI programs, start with the C version of a familiar hello_world program. This program is called hello_world.c and prints out the text string "Hello world! I’m r of s on host" where r is a process’s rank, s is the size of the communicator, and host is the host on which the program is run. The processor name is the host name for this implementation.
PAGE 47
Getting started Getting started using HP-UX or Linux Step 2. Compile the hello_world executable file: % $MPI_ROOT/bin/mpicc -o hello_world \ $MPI_ROOT/help/hello_world.c Step 3. Run the hello_world executable file: % $MPI_ROOT/bin/mpirun -np 4 hello_world where -np 4 specifies 4 as the number of processes to run. Step 4. Analyze hello_world output. HP-MPI prints the output from running the hello_world executable in non-deterministic order.
PAGE 48
Getting started Getting started using HP-UX or Linux Step 5. Analyze hello_world output. HP-MPI prints the output from running the hello_world executable in non-deterministic order. The following is an example of the output: Hello Hello Hello Hello world! world! world! world! I'm I'm I'm I'm 1 3 0 2 of of of of 4 4 4 4 n01 n02 n01 n02 Refer to “LSF on non-XC systems” on page 77 for examples using LSF.
PAGE 49
Getting started Getting started using HP-UX or Linux If you move the HP-MPI installation directory from its default location in /opt/mpi, set the MPI_ROOT environment variable to point to the new location. Refer to “Configuring your environment” on page 20. Table 2-1 Directory Structure for HP-UX and Linux Subdirectory Contents bin Command files for the HP-MPI utilities gather_info script help Source files for the example programs include Header files lib/pa2.
PAGE 50
Getting started Getting started using HP-UX or Linux page, MPI.1, that is an overview describing general features of HP-MPI. The compilation and run-time man pages are those that describe HP-MPI utilities. Table 2-2 describes the three categories of man pages in the man1 subdirectory that comprise man pages for HP-MPI utilities. Table 2-2 HP-UX and Linux man page categories Category man pages Description General MPI.1 Describes the general features of HP-MPI Compilation mpicc.1 mpiCC.1 mpif77.
PAGE 51
Getting started Getting started using HP-UX or Linux HP-MPI has an Independent Software Vendor (ISV) program that allows participating ISVs to freely distribute HP-MPI with their applications. When the application is part of the HP-MPI ISV program, there is no licensing requirement for the end user. The ISV provides a licensed copy of HP-MPI. Contact your application vendor to find out if they participate in the HP-MPI ISV program.
PAGE 52
Getting started Getting started using HP-UX or Linux If the license needs to be placed in another location which would not be found by the above search, the user may set the environment variable LM_LICENSE_FILE to explicitly specify the location of the license file. For more information, see http://licensing.hp.com. Installing License Files A valid license file contains the system hostid and the associated license key. License files can be named either as license.dat or any name with extension of *.
PAGE 53
Getting started Getting started using HP-UX or Linux SERVER myserver 0014c2c1f34a DAEMON HPQ INCREMENT HP-MPI HPQ 1.0 permanent 8 9A40ECDE2A38 \ NOTICE="License Number = AAAABBBB1111" SIGN=E5CEDE3E5626 SERVER myserver 0014c2c1f34a DAEMON HPQ INCREMENT HP-MPI HPQ 1.0 permanent 16 BE468B74B592 \ NOTICE="License Number = AAAABBBB2222" SIGN=9AB4034C6CB2 The result is a valid license for 24 ranks. Version identification To determine the version of an HP-MPI installation, use the what command on HP-UX.
PAGE 54
Getting started Getting started using Windows Getting started using Windows Configuring your environment The default install directory location for HP-MPI for Windows is one of the following directories: On 64-bit Windows: C:\Program Files (x86)\Hewlett-Packard\HP-MPI On 32-bit Windows: C:\Program Files \Hewlett-Packard\HP-MPI The default install will define the system environment variable MPI_ROOT, but will not put "%MPI_ROOT%\bin" in the system path or your user path.
PAGE 55
Getting started Getting started using Windows The source code for hello_world.c is stored in %MPI_ROOT%\help and can be seen in “Compiling and running your first application” on page 22. Building and running on a single host The example teaches you the basic compilation and run steps to execute hello_world.c on your local host with four-way parallelism. To build and run hello_world.c on a local host named mpiccp1: Step 1. Change to a writable directory. Step 2. Open a Visual Studio command window.
PAGE 56
Getting started Getting started using Windows Building and running multihost on Windows CCS clusters The following is an example of basic compilation and run steps to execute hello_world.c on a cluster with 16-way parallelism. To build and run hello_world.c on a CCS cluster: Step 1. Change to a writable directory on a mapped drive. The mapped drive should be to a shared folder for the cluster. Step 2. Open a Visual Studio command window.
PAGE 57
Getting started Getting started using Windows > job add 4288 /numprocessors:1 ^ /stdout:\\node\path\to\a\shared\file.out ^ /stderr:\\node\path\to\a\shared\file.err ^ "%MPI_ROOT%\bin\mpirun" -ccp \\node\path ^ \to\hello_world.exe Step 6. Submit the job. The machine resources are allocated and the job is run.
PAGE 58
Getting started Getting started using Windows X:\Demo> "%MPI_ROOT%\bin\mpicc" /mpi64 server.c Microsoft (R) C/C++ Optimizing Compiler Version 14.00.50727.762 for x64 Copyright (C) Microsoft Corporation. All rights reserved. server.c Microsoft (R) Incremental Linker Version 8.00.50727.762 Copyright (C) Microsoft Corporation. All rights reserved. /out:server.exe "/libpath:C:\Program Files (x86)\Hewlett-Packard\HP-MPI\lib" /subsystem:console libhpmpi64.lib libmpio64.lib server.
PAGE 59
Getting started Getting started using Windows Step 6. Submit the job using appfile mode: X:\work> "%MPI_ROOT%\bin\mpirun" -ccp -f appfile.txt This will submit the job to the scheduler, allocating the nodes indicated in the appfile. Output and error files will default to appfile--.out and appfile--.err respectively. These file names can be altered using the -ccpout and -ccperr flags. Step 7. Check your results. Assuming the job submitted was job ID 98, the file appfile-98.1.
PAGE 60
Getting started Getting started using Windows NOTE This is the default location on 64-bit machines. The location for 32-bit machines is %ProgramFiles%\Hewlett-Packard\HP-MPI The MPI application can now be built with HP-MPI. The property page sets the following fields automatically, but can also be set manually if the property page provided is not used: • C/C++ — Additional Include Directories Set to "%MPI_ROOT%\include\[32|64]" • Linker — Additional Dependencies Set to libhpmpi32.lib or libhpmpi64.
PAGE 61
Getting started Getting started using Windows > "%MPI_ROOT%\bin\mpirun" -cache -f appfile Password for MPI runs: When typing, the password is not echoed to the screen. The HP-MPI Remote Launch service must be registered and started on the remote nodes. (Refer to “Remote Launch service for Windows 2003/XP” on page 112 for details on how to register and start.) mpirun will authenticate with the service and create processes using your encrypted password to obtain network resources.
PAGE 62
Getting started Getting started using Windows Table 2-3 Directory Structure for Windows (Continued) Subdirectory Contents include\32 32-bit header files include\64 64-bit header files lib HP-MPI libraries man HP-MPI man pages in HTML format devtools Windows HP-MPI services licenses Repository for HP-MPI license file doc Release notes, Debugging with HP-MPI Tutorial Windows man pages The man pages are located in the "%MPI_ROOT%\man\" subdirectory for Windows.
PAGE 63
Getting started Getting started using Windows Table 2-4 Windows man page categories (Continued) Category Runtime man pages mpidebug.1 mpienv.1 mpimtsafe.1 mpirun.1 mpistdio.1 autodbl.1 Description Describes runtime utilities, environment variables, debugging, thread-safe and diagnostic libraries Licensing Policy for Windows HP-MPI for Windows uses FLEXlm licensing technology. A license is required to use HP-MPI for Windows. Licenses can be purchased from HP’s software depot at http://www.hp.
PAGE 64
Getting started Getting started using Windows The hostname can be obtained using the control panel by following Control Panel -> System -> Computer Name tab. The default search path used to find an MPI license file is: "%MPI_ROOT%\licenses:.". If the license needs to be placed in another location which would not be found by the above search, the user may set the environment variable LM_LICENSE_FILE to explicitly specify the location of the license file. For more information, see http://licensing.hp.com.
PAGE 65
3 Understanding HP-MPI This chapter provides information about the HP-MPI implementation of MPI.
PAGE 66
Understanding HP-MPI • Compilation wrapper script utilities — Compiling applications — Fortran 90 — C command line basics for Windows systems — Fortran command line basics for Windows systems • C++ bindings (for HP-UX and Linux) — Non-g++ ABI compatible C++ compilers • Autodouble functionality • MPI functions • 64-bit support — HP-UX — Linux — Windows • Thread-compliant library • CPU binding • MPICH object compatibility for HP-UX and Linux • Examples of building on HP-UX and Linux • Runn
PAGE 67
Understanding HP-MPI — mpijob — mpiclean • Interconnect support — Protocol-specific options and information — Interconnect selection examples • Running applications on Windows — Running HP-MPI from CCS — Running HP-MPI on Windows 2003/XP — Submitting jobs — Submitting jobs from the CCS GUI — Running HP-MPI from command line on CCS systems — Automatic job submittal — Running on CCS with an appfile — Running with a hostfile using CCS — Running with a hostlist using CCS — Building and running on a Windows
PAGE 68
Understanding HP-MPI • 44 Native language support Chapter 3
PAGE 69
Understanding HP-MPI Compilation wrapper script utilities Compilation wrapper script utilities HP-MPI provides compilation utilities for the languages shown in Table 3-1. In general, if a particular compiler is desired, it is best to set the appropriate environment variable such as MPI_CC. Without such a setting, the utility script will search the PATH and a few default locations for a variety of possible compilers.
PAGE 70
Understanding HP-MPI Compilation wrapper script utilities HP-MPI offers a -show option to compiler wrappers. When compiling by hand, run as mpicc -show and a line will print displaying exactly what the job was going to do (and skips the actual build). Fortran 90 In order to use the 'mpi' Fortran 90 module, the user must create the module file by compiling the module.F file in /opt/hpmpi/include/64/module.F for 64-bit compilers. For 32-bit compilers, compile the module.F file in /opt/hpmpi/include/32/module.
PAGE 71
Understanding HP-MPI Compilation wrapper script utilities C command line basics for Windows systems The utility "%MPI_ROOT%\bin\mpicc" is included to aid in command line compilation. To compile with this utility, set the MPI_CC environment variable to the path of the command line compiler you want to use. Specify -mpi32 or -mpi64 to indicate if you are compiling a 32- or 64-bit application. Specify the command line options that you would normally pass to the compiler on the mpicc command line.
PAGE 72
Understanding HP-MPI Compilation wrapper script utilities To compile C code and link against HP-MPI without utilizing the mpicc tool, start a command prompt that has the appropriate environment settings loaded for your compiler, and use it with the compiler option: /I"%MPI_ROOT%\include\[32|64]" and the linker options: /libpath:"%MPI_ROOT%\lib" /subsystem:console ^ [libhpmpi64.lib|libhpmpi32.lib] Specify bitness where indicated. The above assumes the environment variable MPI_ROOT is set.
PAGE 73
Understanding HP-MPI Compilation wrapper script utilities In order to construct the desired compilation command, the mpif90 utility needs to know what command line compiler is to be used, the bitness of the executable that compiler will produce, and the syntax accepted by the compiler. These can be controlled by environment variables or from the command line.
PAGE 74
Understanding HP-MPI C++ bindings (for HP-UX and Linux) C++ bindings (for HP-UX and Linux) HP-MPI supports C++ bindings as described in the MPI-2 Standard. (See “Documentation resources” on page xxii.) If compiling and linking with the mpiCC command, no additional work is needed to include and use the bindings. You can include either mpi.h or mpiCC.h in your C++ source files. The bindings provided by HP-MPI are an interface class, calling the equivalent C bindings.
PAGE 75
Understanding HP-MPI C++ bindings (for HP-UX and Linux) 2. Compile and create the libmpiCC.a library. % make CXX=pgCC MPI_ROOT=$MPI_ROOT pgCC -c intercepts.cc -I/opt/hpmpi/include -DHPMP_BUILD_CXXBINDING PGCC-W-0155-Nova_start() seen (intercepts.cc:33) PGCC/x86 Linux/x86-64 6.2-3: compilation completed with warnings pgCC -c mpicxx.cc - I/opt/hpmpi/include -DHPMP_BUILD_CXXBINDING ar rcs libmpiCC.a intercepts.o mpicxx.o 3. Using a testcase, test that the library works as expected.
PAGE 76
Understanding HP-MPI Autodouble functionality Autodouble functionality HP-MPI supports Fortran programs compiled 64-bit with any of the following options (some of which are not supported on all Fortran compilers): For HP-UX: • +i8 Set default KIND of integer variables is 8. • +r8 Set default size of REAL to 8 bytes. • +autodbl4 Same as +i8 and +r8. • +autodbl Same as +i8, +r8, and set default size of REAL to 16 bytes. For Linux: • -i8 Set default KIND of integer variables is 8.
PAGE 77
Understanding HP-MPI Autodouble functionality NOTE This autodouble feature is supported in the regular and multithreaded MPI libraries, but not in the diagnostic library. For Windows: /integer_size:64 /4I8 -i8 /real_size:64 /4R8 /Qautodouble -r8 If these flags are given to the mpif90.bat script at link time, then the application will be linked enabling HP-MPI to interpret the datatype MPI_REAL as 8 bytes (etc. as appropriate) at runtime.
PAGE 78
Understanding HP-MPI MPI functions MPI functions The following MPI functions accept user-defined functions and require special treatment when autodouble is used: • MPI_Op_create() • MPI_Errhandler_create() • MPI_Keyval_create() • MPI_Comm_create_errhandler() • MPI_Comm_create_keyval() • MPI_Win_create_errhandler() • MPI_Win_create_keyval() The user-defined callback passed to these functions should accept normal-sized arguments.
PAGE 79
Understanding HP-MPI 64-bit support 64-bit support HP-MPI provides support for 64-bit libraries as shown in Table 3-4. More detailed information about HP-UX, Linux, and Windows systems is provided in the following sections.
PAGE 80
Understanding HP-MPI 64-bit support When you use mpif90, compile with the +DD64 option to link the 64-bit version of the library. Otherwise, mpif90 links the 32-bit version. For example, to compile the program myprog.f90 and link the 64-bit library enter: % mpif90 +DD64 -o myprog myprog.f90 If you’re using a third party compiler on HP-UX, you must implicitly pass -mpi32 and -mpi64 options to the compiler wrapper.
PAGE 81
Understanding HP-MPI Thread-compliant library Thread-compliant library HP-MPI provides a thread-compliant library. By default, the non thread-compliant library (libmpi) is used when running HP-MPI jobs. Linking to the thread-compliant library is now required only for applications that have multiple threads making MPI calls simultaneously. In previous releases, linking to the thread-compliant library was required for multithreaded applications even if only one thread was making a MPI call at a time.
PAGE 82
Understanding HP-MPI CPU binding CPU binding The mpirun option -cpu_bind binds a rank to an ldom to prevent a process from moving to a different ldom after startup. The binding occurs before the MPI application is executed. To accomplish this, a shared library is loaded at startup which does the following for each rank: • Spins for a short time in a tight loop to let the operating system distribute processes to CPUs evenly.
PAGE 83
Understanding HP-MPI CPU binding ll — least loaded (ll) Bind each rank to the CPU it is currently running on. For NUMA-based systems, the following options are also available: ldom — Schedule ranks on ldoms according to packed rank id. cyclic — Cyclic dist on each ldom according to packed rank id. block — Block dist on each ldom according to packed rank id. rr — round robin (rr) Same as cyclic, but consider ldom load average. fill — Same as block, but consider ldom load average.
PAGE 84
Understanding HP-MPI CPU binding • MPI_FLUSH_FCACHE Can be set to a threshold percent of memory (0-100) which, if the file cache currently in use meets or exceeds, initiates a flush attempt after binding and essentially before the user’s MPI program starts. Refer to See “MPI_FLUSH_FCACHE” on page 144 for more information. • MPI_THREAD_AFFINITY controls thread affinity. Possible values are: none — Schedule threads to run on all cores/ldoms. This is the default.
PAGE 85
Understanding HP-MPI CPU binding -cpu_bind=MASK_CPU:1,4,6 # map rank 0 to cpu 0 (0001), rank 1 to cpu 2 (0100), rank 2 to cpu 1 or 2 (0110). A rank binding on a clustered system uses the number of ranks and the number of nodes combined with the rank count to determine the CPU binding. Cyclic or blocked launch is taken into account. On a cell-based system with multiple users, the LL strategy is recommended rather than RANK. LL allows the operating system to schedule the computational ranks.
PAGE 86
Understanding HP-MPI MPICH object compatibility for HP-UX and Linux MPICH object compatibility for HP-UX and Linux The MPI standard specifies the function prototypes for the MPI functions, but does not specify the types of the MPI opaque objects like communicators or the values of the MPI constants. Hence an object file compiled using one vendor's MPI will generally not function correctly if linked against another vendor's MPI library. There are some cases where such compatibility would be desirable.
PAGE 87
Understanding HP-MPI MPICH object compatibility for HP-UX and Linux libmpich.so then libmpi.so which are added by the mpicc.mpich compiler wrapper script. Thus libVT.a sees only the MPICH compatible interface to HP-MPI. In general, object files built with HP-MPI's MPICH mode can be used in an MPICH application, and conversely object files built under MPICH can be linked into an HP-MPI app using MPICH mode.
PAGE 88
Understanding HP-MPI Examples of building on HP-UX and Linux Examples of building on HP-UX and Linux This example shows how to build hello_world.c prior to running. Step 1. Change to a writable directory that is visible from all hosts on which the job will run. Step 2. Compile the hello_world executable file. % $MPI_ROOT/bin/mpicc -o hello_world \ $MPI_ROOT/help/hello_world.c This example uses shared libraries, which is recommended.
PAGE 89
Understanding HP-MPI Running applications on HP-UX and Linux Running applications on HP-UX and Linux This section introduces the methods to run your HP-MPI application on HP-UX and Linux. Using one of the mpirun methods is required. The examples below demonstrate six basic methods. Refer to “mpirun” on page 74 for all the mpirun command line options. HP-MPI includes -mpi32 and -mpi64 options for the launch utility mpirun on Opteron and Intel64.
PAGE 90
Understanding HP-MPI Running applications on HP-UX and Linux Some features like mpirun -stdio processing are unavailable. Rank assignments within HP-MPI are determined by the way prun chooses mapping at runtime. The -np option is not allowed with -prun.
PAGE 91
Understanding HP-MPI Running applications on HP-UX and Linux This uses 4 ranks on 4 nodes from the existing allocation. Note that we asked for block. n00 rank1 n00 rank2 n02 rank3 n03 rank4 • Use mpirun with -srun on HP XC clusters. For example, % $MPI_ROOT/bin/mpirun -srun \ Some features like mpirun -stdio processing are unavailable. The -np option is not allowed with -srun.
PAGE 92
Understanding HP-MPI Running applications on HP-UX and Linux % srun -A -n4 This allocates 2 nodes with 2 ranks each and creates a subshell. % $MPI_ROOT/bin/mpirun -srun ./a.out This runs on the previously allocated 2 nodes cyclically. n00 rank1 n00 rank2 n01 rank3 n01 rank4 • Use XC LSF and HP-MPI HP-MPI jobs can be submitted using LSF. LSF uses the SLURM srun launching mechanism. Because of this, HP-MPI jobs need to specify the -srun option whether LSF is used or srun is used.
PAGE 93
Understanding HP-MPI Running applications on HP-UX and Linux Including and excluding specific nodes can be accomplished by passing arguments to SLURM as well. For example, to make sure a job includes a specific node and excludes others, use something like the following. In this case, n9 is a required node and n10 is specifically excluded: % bsub -I -n8 -ext "SLURM[nodelist=n9;exclude=n10]" \ mpirun -srun ./hello_world Job <1892> is submitted to default queue . <
PAGE 94
Understanding HP-MPI Running applications on HP-UX and Linux On non-XC systems, to invoke the Parallel Application Manager (PAM) feature of LSF for applications where all processes execute the same program on the same host: % bsub pam -mpi mpirun \ program In this case, LSF assigns a host to the MPI job. For example: % bsub pam -mpi $MPI_ROOT/bin/mpirun -np 4 compute_pi requests a host assignment from LSF and runs the compute_pi application with four processes.
PAGE 95
Understanding HP-MPI Running applications on HP-UX and Linux Host assignments are returned for the two symbolic links voyager and enterprise. When requesting a host from LSF, you must ensure that the path to your executable file is accessible by all machines in the resource pool. More information about appfile runs This example teaches you to run the hello_world.c application that you built in Examples of building on HP-UX and Linux (above) using two hosts to achieve four-way parallelism.
PAGE 96
Understanding HP-MPI Running applications on HP-UX and Linux The -f option specifies the filename that follows it is an appfile. mpirun parses the appfile, line by line, for the information to run the program. In this example, mpirun runs the hello_world program with two processes on the local machine, jawbone, and two processes on the remote machine, wizard, as dictated by the -np 2 option on each line of the appfile. Step 5. Analyze hello_world output.
PAGE 97
Understanding HP-MPI Running applications on HP-UX and Linux The appfile for the example application contains the two lines shown below (refer to “Creating an appfile” on page 78 for details). -np 1 poisson_master -np 4 poisson_child To build and run the example application, use the following command sequence: % $MPI_ROOT/bin/mpicc -o poisson_master poisson_master.c % $MPI_ROOT/bin/mpicc -o poisson_child poisson_child.
PAGE 98
Understanding HP-MPI Running applications on HP-UX and Linux % module unload hp-mpi unload the hp-mpi module Modules are only supported on Linux. NOTE On XC Linux, the HP-MPI module is named mpi/hp/default and can be abbreviated 'mpi'. Runtime utility commands HP-MPI provides a set of utility commands to supplement the MPI library routines. These commands are listed below and described in the following sections: • mpirun • mpirun.all (see restrictions under “mpirun.
PAGE 99
Understanding HP-MPI Running applications on HP-UX and Linux • LSF on non-XC systems Single host execution • To run on a single host, the -np option to mpirun can be used. For example: % $MPI_ROOT/bin/mpirun -np 4 ./a.out will run 4 ranks on the local host. Appfile execution • For applications that consist of multiple programs or that run on multiple hosts, here is a list of the most common options.
PAGE 100
Understanding HP-MPI Running applications on HP-UX and Linux • Use the -prun option for applications that run on the Quadrics Elan interconnect. When using the -prun option, mpirun sets environment variables and invokes prun utilities. Refer to “Runtime environment variables” on page 131 for more information about prun environment variables. The -prun argument to mpirun specifies that the prun command is to be used for launching. All arguments following -prun are passed unmodified to the prun command.
PAGE 101
Understanding HP-MPI Running applications on HP-UX and Linux % $MPI_ROOT/bin/mpirun -srun \ The -np option is not allowed with srun. Some features like mpirun -stdio processing are unavailable. % $MPI_ROOT/bin/mpirun -srun -n 2 ./a.out launches a.out on two processors. % $MPI_ROOT/bin/mpirun -prot -srun -n 6 -N 6 ./a.out turns on the print protocol option (-prot is an mpirun option, and therefore is listed before -srun) and runs on 6 machines, one CPU per node.
PAGE 102
Understanding HP-MPI Running applications on HP-UX and Linux Creating an appfile The format of entries in an appfile is line oriented. Lines that end with the backslash (\) character are continued on the next line, forming a single logical line. A logical line starting with the pound (#) character is treated as a comment. Each program, along with its arguments, is listed on a separate logical line. The general form of an appfile entry is: [-h remote_host] [-e var[=val] [...
PAGE 103
Understanding HP-MPI Running applications on HP-UX and Linux • mpirun [mpirun_options] -f appfile \ [-- extra_args_for_appfile] • bsub [lsf_options] pam -mpi mpirun [mpirun_options] -f appfile \ [-- extra_args_for_appfile] The -- extra_args_for_appfile option is placed at the end of your command line, after appfile, to add options to each line of your appfile. CAUTION Arguments placed after - - are treated as program arguments, and are not processed by mpirun.
PAGE 104
Understanding HP-MPI Running applications on HP-UX and Linux For example, if your appfile contains -h voyager -np 10 send_receive -h enterprise -np 8 compute_pi HP-MPI assigns ranks 0 through 9 to the 10 processes running send_receive and ranks 10 through 17 to the 8 processes running compute_pi. You can use this sequential ordering of process ranks to your advantage when you optimize for performance on multihost systems.
PAGE 105
Understanding HP-MPI Running applications on HP-UX and Linux This places ranks 0 and 2 on hosta and ranks 1 and 3 on hostb. This placement allows intrahost communication between ranks that are identified as communication hot spots. Intrahost communication yields better performance than interhost communication.
PAGE 106
Understanding HP-MPI Running applications on HP-UX and Linux • Applications must be linked statically • Start-up may be slower • TotalView is unavailable to executables launched with mpirun.all • Files will be copied to a temporary directory on target hosts • The remote shell must accept stdin mpirun.all is not available on HP-MPI for Linux or Windows. mpiexec The MPI-2 standard defines mpiexec as a simple method to start MPI applications.
PAGE 107
Understanding HP-MPI Running applications on HP-UX and Linux gives the same results as in the second example, but using the -configfile option (assuming the file cfile contains -n 4 ./myprog.x -host host2 -n 4 -wdir /some/path ./myprog.x) where mpiexec options are: -n maxprocs Create maxprocs MPI ranks on the specified host. -soft range-list Ignored in HP-MPI. -host host Specifies the host on which to start the ranks. -arch arch Ignored in HP-MPI. -wdir dir Working directory for the created ranks.
PAGE 108
Understanding HP-MPI Running applications on HP-UX and Linux USER User name of the owner. NPROCS Number of processes. PROGNAME Program names used in the HP-MPI application. By default, your jobs are listed by job ID in increasing order. However, you can specify the -a and -u options to change the default behavior. An mpijob output using the -a and -u options is shown below listing jobs for all users and sorting them by user name.
PAGE 109
Understanding HP-MPI Running applications on HP-UX and Linux where -help Prints usage information for the utility. -v Turns on verbose mode. -m Cleans up your shared-memory segments. -j id Kills the processes of job number id. You can specify multiple job IDs in a space-separated list. Obtain the job ID using the -j option when you invoke mpirun. You can only kill jobs that are your own.
PAGE 110
Understanding HP-MPI Running applications on HP-UX and Linux Table 3-5 Interconnect command line options (Continued) command line option protocol specified applies to OS -mx / -MX MX—Myrinet Linux Windows -gm / -GM GM—Myrinet Linux -elan / -ELAN Quadrics Elan3 or Elan4 Linux -itapi / -ITAPI ITAPI—InfiniBand HP-UX -ibal / -IBAL IBAL—Windows IB Access Layer Windows -TCP TCP/IP All The interconnect names used in MPI_IC_ORDER are like the command line options above, but without the dash.
PAGE 111
Understanding HP-MPI Running applications on HP-UX and Linux The default value of MPI_IC_ORDER is specified there, along with a collection of variables of the form MPI_ICLIB_XXX__YYY MPI_ICMOD_XXX__YYY where XXX is one of the interconnects (IBV, VAPI, etc.) and YYY is an arbitrary suffix. The MPI_ICLIB_* variables specify names of libraries to be dlopened. The MPI_ICMOD_* variables specify regular expressions for names of modules to search for.
PAGE 112
Understanding HP-MPI Running applications on HP-UX and Linux This means any of those three names will be accepted as evidence that VAPI is available. Each of the three strings individually is a regular expression that will be grepped for in the output from /sbin/lsmod. In many cases, if a system has a high-speed interconnect that is not found by HP-MPI due to changes in library names and locations or module names, the problem can be fixed by simple edits to the hpmpi.conf file.
PAGE 113
Understanding HP-MPI Running applications on HP-UX and Linux The example above uses the max locked-in-memory address space in KB units. The recommendation is to set the value to half of the physical memory. Machines can have multiple InfiniBand cards. By default each HP-MPI rank selects one card for its communication, and the ranks cycle through the available cards on the system, so the first rank uses the first card, the second rank uses the second card, etc.
PAGE 114
Understanding HP-MPI Running applications on HP-UX and Linux the application to hang due to lack of message progression while inside the Elan collective. This is actually a rather uncommon situation in real applications. But if such hangs are observed, then the use of Elan collectives can be disabled by using the environment variable MPI_USE_LIBELAN=0. ITAPI On HP-UX InfiniBand is available by using the ITAPI protocol, which requires MLOCK privileges.
PAGE 115
Understanding HP-MPI Running applications on HP-UX and Linux % export MPIRUN_OPTIONS="-prot" % $MPI_ROOT/bin/mpirun -srun -n4 ./a.out The command line for the above will appear to mpirun as $MPI_ROOT/bin/mpirun -netaddr 192.168.1.0/24 -prot -srun -n4 ./a.out and the interconnect decision will look for IBV, then VAPI, etc. down to TCP/IP. If TCP/IP is chosen, it will use the 192.168.1.* subnet. If TCP/IP is desired on a machine where other protocols are available, the -TCP option can be used.
PAGE 116
Understanding HP-MPI Running applications on HP-UX and Linux Host 0 -- ip 172.20.0.6 -- ranks 0 Host 1 -- ip 172.20.0.7 -- ranks 1 Host 2 -- ip 172.20.0.8 -- ranks 2 host | 0 1 2 ======|================ 0 : SHM TCP TCP 1 : TCP SHM TCP 2 : TCP TCP SHM Hello world! I'm 0 of 3 on opte6 Hello world! I'm 1 of 3 on opte7 Hello world! I'm 2 of 3 on opte8 • This uses TCP/IP over the Elan subnet using the -TCP option in combination with the -netaddr option for the Elan interface 172.22.x.
PAGE 117
Understanding HP-MPI Running applications on HP-UX and Linux frame:0 TX packets:135950325 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:24498382931 (23363.4 Mb) TX bytes:29823673137 (28442.
PAGE 118
Understanding HP-MPI Running applications on Windows Running applications on Windows Running HP-MPI from CCS There are two ways to run HP-MPI: command line and scheduler GUI. Both approaches can be used to access the functionality of the scheduler. The command line scheduler options are similar to the GUI options. The following instructions are in the context of the GUI, but equivalent command line options are also listed.
PAGE 119
Understanding HP-MPI Running applications on Windows Figure 3-1 Job Allocation To run an MPI application, submit the mpirun command to the scheduler. HP-MPI uses the environment of the task and job where mpirun is executing to launch the required mpids that start the ranks. It is important that mpirun uses only a single processor for its task within the job so the resources can be used by the other processes within the MPI application.
PAGE 120
Understanding HP-MPI Running applications on Windows In order for remote processes to have access to network resources (such as file shares), a domain password must be provided. This user password will be used to create processes on the remote notes. The password is SSPI encrypted before being sent across the network. Passwords are provided using the -pass or -cache flags. The user will be prompted for a password when the -pass or -cache flags are used. Cached passwords are stored in an encrypted format.
PAGE 121
Understanding HP-MPI Running applications on Windows moving on to the next node. Only valid when the -ccp option is used. May not be used with the -f, -hostfile, or -hostlist options. -ccpcyclic Uses cyclic scheduling to place ranks on allocated nodes. Nodes will be processed in the order they were allocated by the scheduler, with one rank allocated per node on each cycle through the node list. The node list will be traversed as many times as is necessary to reach the total rank count requested.
PAGE 122
Understanding HP-MPI Running applications on Windows From the GUI, use the Task Properties window, Environment tab to set the desired environment variable. Environment variables should be set on the mpirun task. Environment variables can also be set using the flag /env. For example: > job add JOBID /numprocessors:1/env: ^ "MPI_ROOT=\\shared\alternate\location" ... Submitting jobs from the CCS GUI To execute an HP-MPI job from the CCS GUI: 1. Bring up the Compute Cluster Job Manager.
PAGE 123
Understanding HP-MPI Running applications on Windows NOTE Chapter 3 Examples were generated using CCP V1.0.
PAGE 124
Understanding HP-MPI Running applications on Windows 4. On the Processors tab, select the total number of processors to allocate to the job (usually the number of ranks).
PAGE 125
Understanding HP-MPI Running applications on Windows 5. Select the Tasks tab and enter the 'mpirun' command as the task. Then highlight the task and select edit. In the above example, the following line has been added into the "Command Line:" by selecting the text box and clicking Add. "%MPI_ROOT%\bin\mpirun.exe" -ccp -netaddr 172.16.150.0 ^ -TCP \\share\dir\pallas.exe NOTE Chapter 3 Unselecting "Use job’s allocated processors" and setting the processors count to 1 now will eliminate Step 7.
PAGE 126
Understanding HP-MPI Running applications on Windows 6. Specify stdout, stderr, and stdin (if necessary) on the Tasks tab. In the above example, the stderr and stdout files are specified using CCP environment variables defined by the job. This is an easy way to create output files unique for each task. \\share\dir\%CCP_JOBNAME%-%CCP_TASKCONTEXT%.
PAGE 127
Understanding HP-MPI Running applications on Windows 7. On the Task Properties window, select the Processors tab and set to one processor for the mpirun task. NOTE In Step 5, you can unselect the "Use job’s allocated processors" box and set the processors count to 1. This eliminates setting the processor count in the task window as shown here in Step 7. 8. To set environment variables for the MPI job, use the Environment tab in the Task Properties window. 9. Select OK on the Task Properties window.
PAGE 128
Understanding HP-MPI Running applications on Windows 10. If you want to restrict the run to a set of machines, on the Submit Job window select the Advanced tab and set the desired machines. NOTE This step is not necessary. The job will select from any available processors if this step is not done. 11. To run the job, select the Submit on the Submit Job window. For convenience, generic templates can be created and saved using Save As Template in the Submit Job window.
PAGE 129
Understanding HP-MPI Running applications on Windows Running HP-MPI from command line on CCS systems To perform the same steps via command line, execute 3 commands: 1. job new [options] 2. job add JOBID mpirun [mpirun options] 3. job submit /id:JOBID For example: > job new /jobname:[example job]/numprocessors:12 ^ /projectname:HPMPI Job Queued, ID: 242 This will create a job resource and return a jobid, but not submit it. > job add 242 /stdout:"\\shared\dir\%CCP_JOBNAME%-^ %CCPTASKCONTEXT%.
PAGE 130
Understanding HP-MPI Running applications on Windows -hostfile Indicates what nodes to use for the job. This filename should contain a list of nodes separated by spaces or new lines. -hostlist Indicates what nodes to use for the job. This hostlist may be delimited with spaces or commas. If spaces are used as delimiters anywhere in the hostlist, it may be necessary to place the entire hostlist inside quotes to prevent the command shell from interpreting it as multiple options.
PAGE 131
Understanding HP-MPI Running applications on Windows The following example changes the directory to a share drive, and uses the current directory as the work directory for the submitted job: C:\> Documents and Settings\smith>s: S:\> cd smith S:\smith> "%MPI_ROOT%\bin\mpirun.exe" -ccp -np 6 ^ -hostlist mpiccp1,mpiccp2 HelloWorld.
PAGE 132
Understanding HP-MPI Running applications on Windows Hello Hello Hello Hello world! world! world! world! I'm I'm I'm I'm 2 1 0 3 of of of of 4 4 4 4 on on on on n02 n01 n01 n02 Running on CCS with an appfile - advanced usage Another method for running with an appfile using CCS is to write a submission script that uses mpi_nodes.exe to dynamically generate an appfile based on the CCS allocation.
PAGE 133
Understanding HP-MPI Running applications on Windows mpirun -ccp -f generated-appfile [other HP-MPI options] Refer to “More information about appfile runs” on page 71. Running with a hostfile using CCS Perform Steps 1 and 2 from “Building and running on a single host” on page 31. Step 1. Change to a writable directory on a mapped drive. The mapped drive should be to a shared folder for the cluster. Step 2. Create a file "hostfile" containing the list of nodes on which to run: n01 n02 n03 n04 Step 3.
PAGE 134
Understanding HP-MPI Running applications on Windows Running with a hostlist using CCS Perform Steps 1 and 2 from “Building and running on a single host” on page 31. Step 1. Change to a writable directory on a mapped drive. The mapped drive should be to a shared folder for the cluster. Step 2. Submit the job to CCS, including the list of nodes on the command line. X:\Demo> "%MPI_ROOT%\bin\mpirun" -ccp -hostlist ^ n01,n02,n03,n04 -np 8 hello_world.
PAGE 135
Understanding HP-MPI Running applications on Windows This example uses the -hostlist flag to indicate which nodes to run on. Also note that the MPI_WORKDIR will be set to your current directory. If this is not a network mapped drive, HP-MPI will not be able to convert this to a Universal Naming Convention (UNC) path, and you will need to specify the full UNC path for hello_world.exe. Step 2. Analyze hello_world output.
PAGE 136
Understanding HP-MPI Running applications on Windows Step 2. Submit the job to CCS without adding any tasks. > job submit /id:4288 Job 4288 has been submitted. Step 3. Run the application(s) as a task in the allocation, optionally waiting for each to finish before starting the following one. > "%MPI_ROOT%\bin\mpirun" -ccp -ccpwait -jobid 4288 ^ \\node\share\hello_world.
PAGE 137
Understanding HP-MPI Running applications on Windows To run the service manually, you must register and start the service. To register the service manually, run the service executable with the -i option. To start the service manually, run the service after it is installed with the -start option. The service executable is located at "%MPI_ROOT%\sbin\HPMPIWin32Service.exe". For example: C:\> "%MPI_ROOT%\sbin\HPMPIWin32Service.exe" -i Creating Event Log Key 'HPMPI'... Installing service 'HP-MPI SMPID'...
PAGE 138
Understanding HP-MPI Running applications on Windows -? | -h | -help show command usage -s | -status show service status -k | -removeeventkey remove service event log key -r | -removeportkey remove default port key -t | -setportkey remove default port key -i | -install [] remove default port key NOTE -start start an installed service -stop stop an installed service -restart restart an installed service It is very important that all remote services use the same port.
PAGE 139
Understanding HP-MPI Running applications on Windows -et Authenticates with the remote service and performs a simple echo test, returning the string. -sys Authenticates with the remote service and returns remote system information, including node name, CPU count, and username. -ps [username] Authenticates with the remote service, and lists processes running on the remote system. If a username is included, only that user’s processes are listed.
PAGE 140
Understanding HP-MPI Running applications on Windows X:\Demo> "%MPI_ROOT%\bin\mpidiag" -s winbl16 -at connect() failed: 10061 Cannot establish connection with server. SendCmd(): send() sent a different number of bytes than expected: 10057 Here the machine cannot connect to the service on the remote machine.
PAGE 141
Understanding HP-MPI Running applications on Windows rdpclip.exe explorer.exe reader_sl.exe cmd.exe ccApp.exe user1 user1 user1 user1 user1 2952 1468 2856 516 2912 0.046875 5488 1.640625 17532 0.078125 3912 0.031250 2112 0.187500 7580 CMD Finished successfully. Here you can see Pallas.exe was killed, and HP-MPI cleaned up the remaining HP-MPI processes.
PAGE 142
Understanding HP-MPI Running applications on Windows dir.pl exportedpath.reg FileList.txt h1.xml HelloWorld-HP64-2960.1.err HelloWorld-HP64-2960.1.out HelloWorld-HP64-2961.1.err HelloWorld-HP64-2961.1.
PAGE 143
Understanding HP-MPI MPI options MPI options The following sections provide definitions of mpirun options and runtime environment variables. mpirun options This section describes the specific options included in for all of the preceding examples.
PAGE 144
Understanding HP-MPI MPI options -udapl/-UDAPL Explicit command line interconnect selection to use uDAPL. The lower and upper case options are analogous to the Elan options (explained above). Dynamic linking is required with uDAPL. Do not link -static. -psm/-PSM Explicit command line interconnect selection to use QLogic InfiniBand. The lower and upper case options are analogous to the Elan options (explained above). -mx/-MX Explicit command line interconnect selection to use Myrinet MX.
PAGE 145
Understanding HP-MPI MPI options -commd Routes all off-host communication through daemons rather than between processes. Local host communication method -intra=mix This same functionality is available through the environment variable MPI_INTRA which can be set to shm, nic, or mix. Use shared memory for small messages. The default is 256k bytes, or what is set by MPI_RDMA_INTRALEN. For larger messages, the interconnect is used for better bandwidth. This option does not work with TCP, Elan, MX, or PSM.
PAGE 146
Understanding HP-MPI MPI options representing how many bits are to be matched. So, for example, a mask of "11" would be equivalent to a mask of "255.224.0.0". If an IP and mask are given, then it is expected that one and only one IP will match at each lookup. An error or warning is printed as appropriate if there are no matches, or too many. If no mask is specified, then the IP matching will simply be done by the longest matching prefix.
PAGE 147
Understanding HP-MPI MPI options option is not allowed with -prun. Any arguments on the mpirun command line that follow -prun are passed down to the prun command. Options for SLURM users -srun Enables start-up on XC clusters. Some features like mpirun -stdio processing are unavailable. The -np option is not allowed with -srun. Any arguments on the mpirun command line that follow -srun are passed to the srun command. Start-up directly from the srun command is not supported.
PAGE 148
Understanding HP-MPI MPI options -np # Specifies the number of processes to run. Generally used in single host mode, but also valid with -hostfile, -hostlist, -lsb_hosts, and -lsb_mcpu_hosts. -stdio=[options] Specifies standard IO options. Refer to “External input and output” on page 210 for more information on standard IO, as well as a complete list of stdio options. This applies to appfiles only.
PAGE 149
Understanding HP-MPI MPI options -ck Behaves like the -p option, but supports two additional checks of your MPI application; it checks if the specified host machines and programs are available, and also checks for access or permission problems. This option is only supported when using appfile mode. -d Debug mode. Prints additional information about application launch. -j Prints the HP-MPI job ID. -p Turns on pretend mode.
PAGE 150
Understanding HP-MPI MPI options underlying interconnect does not use an RDMA transfer mechanism, or if the deferred deregistration is managed directly by the interconnect library. Occasionally deferred deregistration is incompatible with a particular application or negatively impacts performance. Use -ndd to disable this feature if necessary. Deferred deregistration of memory on RDMA networks is not supported on HP-MPI for Windows. -ndd Disable the use of deferred deregistration.
PAGE 151
Understanding HP-MPI MPI options Environment control options -e var[=val] Sets the environment variable var for the program and gives it the value val if provided. Environment variable substitutions (for example, $FOO) are supported in the val argument. In order to append additional settings to an existing variable, %VAR can be used as in the example in “Setting remote environment variables” on page 79. -sp paths Sets the target shell PATH environment variable to paths.
PAGE 152
Understanding HP-MPI MPI options -ccpin Assigns the job’s standard input file to the given filename when starting a job through the Windows CCP automatic job scheduler/launcher feature of HP-MPI. This flag has no effect if used for an existing CCP job. -ccpout Assigns the job’s standard output file to the given filename when starting a job through the Windows CCP automatic job scheduler/launcher feature of HP-MPI. This flag has no effect if used for an existing CCP job.
PAGE 153
Understanding HP-MPI MPI options -headnode This option is used on Windows CCP to indicate the headnode to submit the mpirun job. If omitted, localhost is used. This option can only be used as a command line option when using the mpirun automatic submittal functionality. -hosts This option used on Windows CCP allows you to specify a node list to HP-MPI. Ranks are scheduled according to the host list. The nodes in the list must be in the job allocation or a scheduler error will occur.
PAGE 154
Understanding HP-MPI MPI options -token -tg Authenticates to this token with the HP-MPI Remote Launch service. Some authentication packages require a token name. The default is no token. 130 -pass Prompts the user for his domain account password. Used to authenticate and create remote processes. A password is required to allow the remote process to access network resources (such as file shares). The password provided is encrypted using SSPI for authentication.
PAGE 155
Understanding HP-MPI MPI options Runtime environment variables Environment variables are used to alter the way HP-MPI executes an application. The variable settings determine how an application behaves and how an application allocates internal resources at runtime. Many applications run without setting any environment variables.
PAGE 156
Understanding HP-MPI MPI options The hpmpi.conf file search is performed in three places and each one is parsed, which allows the last one parsed to overwrite values set by the previous files. The three locations are: • $MPI_ROOT/etc/hpmpi.conf • /etc/hpmpi.conf • $HOME/.hpmpi.conf This feature can be used for any environment variable, and is most useful for interconnect specifications.
PAGE 157
Understanding HP-MPI MPI options From the GUI, use the Task Properties window, Environment tab to set the desired environment variable. NOTE These environment variables should be set on the mpirun task. Environment variables can also be set using the flag /env. For example: > job add JOBID /numprocessors:1 /env:"MPI_ROOT=\\shared\alternate\location" ...
PAGE 158
Understanding HP-MPI MPI options List of runtime environment variables The environment variables that affect the behavior of HP-MPI at runtime are described in the following sections categorized by the following functions: • General • CPU bind • Miscellaneous • Interconnect • InfiniBand • Memory usage • Connection related • RDMA • prun/srun • TCP • Elan • Rank ID All environment variables are listed below in alphabetical order.
PAGE 159
Understanding HP-MPI MPI options Chapter 3 • MPI_IB_PKEY • MPI_IBV_QPPARAMS • MPI_IC_ORDER • MPI_IC_SUFFIXES • MPI_INSTR • MPI_LOCALIP • MPI_MAX_REMSH • MPI_MAX_WINDOW • MPI_MT_FLAGS • MPI_NETADDR • MPI_NO_MALLOCLIB • MPI_NOBACKTRACE • MPI_PAGE_ALIGN_MEM • MPI_PHYSICAL_MEMORY • MPI_PIN_PERCENTAGE • MPI_PRUNOPTIONS • MPI_RANKMEMSIZE • MPI_RDMA_INTRALEN • MPI_RDMA_MSGSIZE • MPI_RDMA_NENVELOPE • MPI_RDMA_NFRAGMENT • MPI_RDMA_NONESIDED • MPI_RDMA_NSRQRECV • MPI_R
PAGE 160
Understanding HP-MPI MPI options • MPI_SPAWN_SRUNOPTIONS • MPI_SRUNOPTIONS • MPI_TCP_CORECVLIMIT • MPI_USE_LIBELAN • MPI_USE_LIBELAN_SUB • MPI_USE_MALLOPT_AVOID_MMAP • MPI_USEPRUN • MPI_USEPRUN_IGNORE_ARGS • MPI_USESRUN • MPI_USESRUN_IGNORE_ARGS • MPI_VAPI_QPPARAMS • MPI_WORKDIR • MPIRUN_OPTIONS • TOTALVIEW General environment variables MPIRUN_OPTIONS MPIRUN_OPTIONS is a mechanism for specifying additional command line arguments to mpirun.
PAGE 161
Understanding HP-MPI MPI options MPI_FLAGS MPI_FLAGS modifies the general behavior of HP-MPI. The MPI_FLAGS syntax is a comma separated list as follows: [edde,][exdb,][egdb,][eadb,][ewdb,][l,][f,][i,] [s[a|p][#],][y[#],][o,][+E2,][C,][D,][E,][T,][z] where Chapter 3 edde Starts the application under the dde debugger. The debugger must be in the command search path. See “Debugging HP-MPI applications” on page 197 for more information. exdb Starts the application under the xdb debugger.
PAGE 162
Understanding HP-MPI MPI options Setting the l option may decrease application performance. f Forces MPI errors to be fatal. Using the f option sets the MPI_ERRORS_ARE_FATAL error handler, ignoring the programmer’s choice of error handlers. This option can help you detect nondeterministic error problems in your code. If your code has a customized error handler that does not report that an MPI call failed, you will not know that a failure occurred.
PAGE 163
Understanding HP-MPI MPI options Generating a UNIX signal introduces a performance penalty every time the application processes are interrupted. As a result, while some applications will benefit from it, others may experience a decrease in performance. As part of tuning the performance of an application, you can control the behavior of the heart-beat signals by changing their time period or by turning them off.
PAGE 164
Understanding HP-MPI MPI options process relinquishes the CPU to other processes. Do this in your appfile, by setting y[#] to y0 for the process in question. This specifies zero milliseconds of spin (that is, immediate yield). If you are running an application stand-alone on a dedicated system, the default setting which is MPI_FLAGS=y allows MPI to busy spin, thereby improving latency. To avoid unnecessary CPU consumption when using more ranks than cores, consider using a setting such as MPI_FLAGS=y40.
PAGE 165
Understanding HP-MPI MPI options /opt/mpi/bin/mpirun -np 16 -e MPI_FLAGS=o ./a.
PAGE 166
Understanding HP-MPI MPI options rank= 10 -1 rank= rank= 13 15 rank= 10 rank= 12 14 rank= 14 -1 rank= 11 rank= -1 13 rank= 9 rank= -1 rank= rank= rank= rank= rank= -1 rank= 13 15 rank= rank= 14 rank= 14 -1 rank= rank= 13 rank= 142 11 coords= 2 3 neighbors(u,d,l,r)= 7 15 1 14 coords= 3 2 inbuf(u,d,l,r)= -1 5 0 2 neighbors(u,d,l,r)= 10 -1 9 coords= 2 1 neighbors(u,d,l,r)= 5 13 8 13 coords= 3 1 neighbors(u,d,l,r)= 9 -1 15 coords= 3 3 neighbors(u,d,l,r)= 11 -1 10 12 coords= 3 0 inbuf(u,d,l,r)= 6 14
PAGE 167
Understanding HP-MPI MPI options D Dumps shared memory configuration information. Use this option to get shared memory values that are useful when you want to set the MPI_SHMEMCNTL flag. E[on|off] Function parameter error checking is turned off by default. It can be turned on by setting MPI_FLAGS=Eon. T Prints the user and system times for each MPI rank. z Enables zero-buffering mode. Set this flag to convert MPI_Send and MPI_Rsend calls in your code to MPI_Ssend, without rewriting your code.
PAGE 168
Understanding HP-MPI MPI options The single, fun, serial, and mult options are mutually exclusive. For example, if you specify the serial and mult options in MPI_MT_FLAGS, only the last option specified is processed (in this case, the mult option). If no runtime option is specified, the default is mult. For more information about using MPI_MT_FLAGS with the thread-compliant library, refer to “Thread-compliant library” on page 57. MPI_ROOT MPI_ROOT indicates the location of the HP-MPI tree.
PAGE 169
Understanding HP-MPI MPI options MPI_FLUSH_FCACHE set to 0, fcache pct = 22, attempting to flush fcache on host opteron2 MPI_FLUSH_FCACHE set to 10, fcache pct = 3, no fcache flush required on host opteron2 Memory is allocated with mmap, then munmap'd afterwards. MP_GANG MP_GANG enables gang scheduling on HP-UX systems only. Gang scheduling improves the latency for synchronization by ensuring that all runable processes in a gang are scheduled simultaneously.
PAGE 170
Understanding HP-MPI MPI options Miscellaneous environment variables MPI_2BCOPY Point-to-point bcopy() is disabled by setting MPI_2BCOPY to 1. Valid on PA-RISC only. MPI_MAX_WINDOW MPI_MAX_WINDOW is used for one-sided applications. It specifies the maximum number of windows a rank can have at the same time. It tells HP-MPI to allocate enough table entries. The default is 5. % export MPI_MAX_WINDOW=10 The above example allows 10 windows to be established for one-sided communication.
PAGE 171
Understanding HP-MPI MPI options dumpf:prefix Dumps (formatted) all sent and received messages to prefix.msgs.rank where rank is the rank of a specific process. xNUM Defines a type-signature packing size. NUM is an unsigned integer that specifies the number of signature leaf elements. For programs with diverse derived datatypes the default value may be too small. If NUM is too small, the diagnostic library issues a warning during the MPI_Finalize operation.
PAGE 172
Understanding HP-MPI MPI options prefix[:l][:nc][:off] where prefix Specifies the instrumentation output file prefix. The rank zero process writes the application’s measurement data to prefix.instr in ASCII. If the prefix does not represent an absolute pathname, the instrumentation output file is opened in the working directory of the rank zero process when MPI_Init is called. l Locks ranks to CPUs and uses the CPU’s cycle counter for less invasive timing. If used with gang scheduling, the :l is ignored.
PAGE 173
Understanding HP-MPI MPI options TOTALVIEW When you use the TotalView debugger, HP-MPI uses your PATH variable to find TotalView. You can also set the absolute path and TotalView specific options in the TOTALVIEW environment variable. This environment variable is used by mpirun.
PAGE 174
Understanding HP-MPI MPI options MPI_IC_SUFFIXES When HP-MPI is determining the availability of a given interconnect on Linux, it tries to open libraries and find loaded modules based on a collection of variables of the form This is described in more detail in “Interconnect support” on page 85. The use of interconnect environment variables MPI_ICLIB_ELAN, MPI_ICLIB_GM, MPI_ICLIB_ITAPI, MPI_ICLIB_MX, MPI_ICLIB_UDAPL, MPI_ICLIB_VAPI, and MPI_ICLIB_VAPIDIR has been deprecated.
PAGE 175
Understanding HP-MPI MPI options % setenv MPI_IB_CARD_ORDER [:port#] Where: card# ranges from 0 to N-1 port# ranges from 0 to 1 Card:port can be a comma separated list which drives the assignment of ranks to cards and ports within the cards. Note that HP-MPI numbers the ports on a card from 0 to N-1, whereas utilities such as vstat display ports numbered 1 to N. Examples: To use the 2nd IB card: % mpirun -e MPI_IB_CARD_ORDER=1 ...
PAGE 176
Understanding HP-MPI MPI options By default, HP-MPI will search the unique full membership partition key from the port partition key table used. If no such pkey is found, an error is issued. If multiple pkeys are found, all such pkeys are printed and an error message is issued. If the environment variable MPI_IB_PKEY has been set to a value, either in hex or decimal, the value is treated as the pkey, and the pkey table is searched for the same pkey. If the pkey is not found, an error message is issued.
PAGE 177
Understanding HP-MPI MPI options d RNR retry count before error is issued. Minimum is 0. Maximum is 7. Default is 7 (infinite). Memory usage environment variables MPI_GLOBMEMSIZE MPI_GLOBMEMSIZE=e Where e is the total bytes of shared memory of the job. If the job size is N, then each rank has e/N bytes of shared memory. 12.5% is used as generic. 87.5% is used as fragments. The only way to change this ratio is to use MPI_SHMEMCNTL.
PAGE 178
Understanding HP-MPI MPI options from MPI_GLOBMEMSIZE, which is the total shared memory across all the ranks on the host. MPI_RANKMEMSIZE takes precedence over MPI_GLOBMEMSIZE if both are set. Both MPI_RANKMEMSIZE and MPI_GLOBMEMSIZE are mutually exclusive to MPI_SMEMCNTL. If MPI_SHMEMCNTL is set, then the user cannot set the other two, and vice versa. MPI_PIN_PERCENTAGE MPI_PIN_PERCENTAGE communicates the maximum percentage of physical memory (see MPI_PHYSICAL_MEMORY) that can be pinned at any time.
PAGE 179
Understanding HP-MPI MPI options generic Specifies the size in bytes of the generic-shared memory region. The default is 12.5 percent of shared memory after mailbox and envelope allocation. The generic region is typically used for collective communication. MPI_SHMEMCNTL=a,b,c where: a The number of envelopes for shared memory communication. The default is 8. b The bytes of shared memory to be used as fragments for messages.
PAGE 180
Understanding HP-MPI MPI options variable MPI_MAX_REMSH. When the number of daemons required is greater than MPI_MAX_REMSH, mpirun will create only MPI_MAX_REMSH number of remote daemons directly. The directly created daemons will then create the remaining daemons using an n-ary tree, where n is the value of MPI_MAX_REMSH. Although this process is generally transparent to the user, the new startup requires that each node in the cluster is able to use the specified MPI_REMSH command (e.g.
PAGE 181
Understanding HP-MPI MPI options An alternate remote execution tool, such as ssh, can be used on HP-UX by setting the environment variable MPI_REMSH to the name or full path of the tool to use: % export MPI_REMSH=ssh % $MPI_ROOT/bin/mpirun -f HP-MPI also supports setting MPI_REMSH using the -e option to mpirun: % $MPI_ROOT/bin/mpirun -e MPI_REMSH=ssh -f \ HP-MPI also supports setting MPI_REMSH to a command which includes additional arguments: % $MPI_ROOT/bin/mpirun -e
PAGE 182
Understanding HP-MPI MPI options c Long message fragment size. If the message is greater than b, the message is fragmented into pieces up to c in length (or actual length if less than c) and the corresponding piece of the user’s buffer is pinned directly. The default is 4194304 bytes, but on Myrinet GM and IBAL the default is 1048576 bytes.
PAGE 183
Understanding HP-MPI MPI options % setenv MPI_SRUNOPTIONS
PAGE 184
Understanding HP-MPI MPI options % setenv MPI_SRUNOPTION --label % bsub -I -n4 -ext "SLURM[nodes=4]" \ $MPI_ROOT/bin/mpirun -stdio=bnone -f appfile -- pingpong Job <369848> is submitted to default queue . <> <> /opt/hpmpi/bin/mpirun unset MPI_USESRUN;/opt/hpmpi/bin/mpirun -srun ./pallas.x -npmin 4 pingpong MPI_PRUNOPTIONS Allows prun specific options to be added automatically to the mpirun command line.
PAGE 185
Understanding HP-MPI MPI options Elan environment variables MPI_USE_LIBELAN By default when Elan is in use, the HP-MPI library uses Elan’s native collective operations for performing MPI_Bcast and MPI_ Barrier operations on MPI_COMM_WORLD sized communicators. This behavior can be changed by setting MPI_USE_LIBELAN to “false” or “0”, in which case these operations will be implemented using point-to-point Elan messages.
PAGE 186
Understanding HP-MPI MPI options MPI_LOCALNRANKS This is set to the number of ranks on the local host. MPI_LOCALRANKID This is set to the rank number of the current process relative to the local host (0.. MPI_LOCALNRANKS-1). Note that these settings are not available when running under srun or prun. However, similar information can be gathered from the variables set by those systems; such as SLURM_NPROCS and SLURM_PROCID.
PAGE 187
Understanding HP-MPI Scalability Scalability Interconnect support of MPI-2 functionality HP-MPI has been tested on InfiniBand clusters with as many as 2048 ranks using the VAPI protocol. Most HP-MPI features function in a scalable manner. However, a few are still subject to significant resource growth as the job size grows.
PAGE 188
Understanding HP-MPI Scalability Resource usage of TCP/IP communication HP-MPI has also been tested on large Linux TCP/IP clusters with as many as 2048 ranks. Because each HP-MPI rank creates a socket connection to each other remote rank, the number of socket descriptors required increases with the number of ranks. On many Linux systems, this requires increasing the operating system limit on per-process and system-wide file descriptors.
PAGE 189
Understanding HP-MPI Scalability To use daemon communication, specify the -commd option in the mpirun command. Once you have set the -commd option, you can use the MPI_COMMD environment variable to specify the number of shared-memory fragments used for inbound and outbound messages. Refer to “mpirun” on page 74 and “MPI_COMMD” on page 150 for more information. Daemon communication can result in lower application performance.
PAGE 190
Understanding HP-MPI Improved deregistration via ptmalloc (Linux only) Improved deregistration via ptmalloc (Linux only) To achieve the best performance on RDMA enabled interconnects like InfiniBand and Myrinet, the MPI library must be aware when memory is returned to the system in malloc() and free() calls. To enable more robust handling of that information, HP-MPI contains a copy of the ptmalloc implementation and uses it by default.
PAGE 191
Understanding HP-MPI Signal Propagation (HP-UX and Linux only) Signal Propagation (HP-UX and Linux only) HP-MPI supports the propagation of signals from mpirun to application ranks.
PAGE 192
Understanding HP-MPI Signal Propagation (HP-UX and Linux only) The HP-MPI library also changes the default signal handling properties of the application in a few specific cases. When using the -ha option to mpirun, SIGPIPE is ignored. When using MPI_FLAGS=U, an MPI signal handler for printing outstanding message status is established for SIGUSR1. When using MPI_FLAGS=sa, an MPI signal handler used for message propagation is established for SIGALRM.
PAGE 193
Understanding HP-MPI Dynamic Processes Dynamic Processes HP-MPI provides support for dynamic process management, specifically the spawn, join, and connecting of new processes. MPI_Comm_spawn() starts MPI processes and establishes communication with them, returning an intercommunicator. MPI_Comm_spawn_multiple() starts several different binaries (or the same binary with different arguments), placing them in the same comm_world and returning an intercommunicator.
PAGE 194
Understanding HP-MPI Dynamic Processes Keys interpreted in the info argument to the spawn calls: • host -- We accept standard host.domain strings and start the ranks on the specified host. Without this key, the default is to start on the same host as the root of the spawn call. • wdir -- We accept /some/directory strings. • path -- We accept /some/directory:/some/other/directory:.. A mechanism for setting arbitrary environment variables for the spawned ranks is not provided.
PAGE 195
Understanding HP-MPI MPI-2 name publishing support MPI-2 name publishing support HP-MPI supports the MPI-2 dynamic process functionality MPI_Publish_name, MPI_Unpublish_name, MPI_Lookup_name, with the restriction that a separate nameserver must be started up on a server. The service can be started as: % $MPI_ROOT/bin/nameserver and it will print out an IP and port.
PAGE 196
Understanding HP-MPI Native language support Native language support By default, diagnostic messages and other feedback from HP-MPI are provided in English. Support for other languages is available through the use of the Native Language Support (NLS) catalog and the internationalization environment variable NLSPATH. The default NLS search path for HP-MPI is $NLSPATH. Refer to the environ(5) man page for NLSPATH usage.
PAGE 197
4 Profiling This chapter provides information about utilities you can use to analyze HP-MPI applications.
PAGE 198
Profiling — Creating an instrumentation profile — Viewing ASCII instrumentation data • Using the profiling interface — Fortran profiling interface — C++ profiling interface 174 Chapter 4
PAGE 199
Profiling Using counter instrumentation Using counter instrumentation Counter instrumentation is a lightweight method for generating cumulative runtime statistics for your MPI applications. When you create an instrumentation profile, HP-MPI creates an output file in ASCII format. You can create instrumentation profiles for applications linked with the standard HP-MPI library. For applications linked with HP-MPI version 2.
PAGE 200
Profiling Using counter instrumentation off Specifies counter instrumentation is initially turned off and only begins after all processes collectively call MPIHP_Trace_on. For example, to create an instrumentation profile for an executable called compute_pi: % $MPI_ROOT/bin/mpirun -i compute_pi -np 2 compute_pi This invocation creates an ASCII file named compute_pi.instr containing instrumentation profiling.
PAGE 201
Profiling Using counter instrumentation the file as compute_pi, as you did when you created the instrumentation file in “Creating an instrumentation profile” on page 175, you would print compute_pi.instr. The ASCII instrumentation profile provides the version, the date your application ran, and summarizes information according to application, rank, and routines. Figure 4-1 on page 177 is an example of an ASCII instrumentation profile. The information available in the prefix.
PAGE 202
Profiling Using counter instrumentation ----------------------------------------------------------------Rank Proc Wall Time User MPI ---------------------------------------------------------------0 0.126335 0.008332( 6.60%) 0.118003( 93.40%) 1 0.126355 0.008260( 6.54%) 0.118095( 93.46%) ----------------------------------------------------------------Rank Proc MPI Time Overhead Blocking ----------------------------------------------------------------0 0.118003 0.118003(100.00%) 0.000000( 0.00%) 1 0.118095 0.
PAGE 203
Profiling Using the profiling interface Using the profiling interface The MPI profiling interface provides a mechanism by which implementors of profiling tools can collect performance information without access to the underlying MPI implementation source code. Because HP-MPI provides several options for profiling your applications, you may not need the profiling interface to write your own routines. HP-MPI makes use of MPI profiling interface mechanisms to provide the diagnostic library for debugging.
PAGE 204
Profiling Using the profiling interface int to, int tag, MPI_Comm comm) { printf("Calling C MPI_Send to %d\n", to); return PMPI_Send(buf, count, type, to, tag, comm); } #pragma weak (mpi_send mpi_send) void mpi_send(void *buf, int *count, int *type, int *to, int *tag, int *comm, int *ierr) { printf("Calling Fortran MPI_Send to %d\n", *to); pmpi_send(buf, count, type, to, tag, comm, ierr); } C++ profiling interface The HP-MPI C++ bindings are wrappers to the C calls.
PAGE 205
5 Tuning This chapter provides information about tuning HP-MPI applications to improve performance.
PAGE 206
Tuning • Message latency and bandwidth • Multiple network interfaces • Processor subscription • Processor locality • MPI routine selection • Multilevel parallelism • Coding considerations • Using HP Caliper The tuning information in this chapter improves application performance in most but not all cases. Use this information together with the output from counter instrumentation to determine which tuning changes are appropriate to improve your application’s performance.
PAGE 207
Tuning Tunable parameters Tunable parameters HP-MPI provides a mix of command line options and environment variables that can be used to influence the behavior, and thus the performance of the library. The full list of command line options and environment variables are presented in the sections “mpirun options” and “Runtime environment variables” of Chapter 3.
PAGE 208
Tuning Tunable parameters -intra The -intra command line option controls how messages are transferred to local processes and can impact performance when multiple ranks execute on a host. See “Local host communication method” on page 121 for more information. MPI_RDMA_INTRALEN, MPI_RDMA_MSGSIZE, MPI_RDMA_NENVELOPE These environment variables control various aspects of the way message traffic is handled on RDMA networks.
PAGE 209
Tuning Message latency and bandwidth Message latency and bandwidth Latency is the time between the initiation of the data transfer in the sending process and the arrival of the first byte in the receiving process. Latency is often dependent upon the length of messages being sent. An application’s messaging behavior can vary greatly based upon whether a large number of small messages or a few large messages are sent. Message bandwidth is the reciprocal of the time needed to transfer a byte.
PAGE 210
Tuning Message latency and bandwidth } MPI_Waitall(size-1, requests, statuses); Suppose that one of the iterations through MPI_Irecv does not complete before the next iteration of the loop. In this case, HP-MPI tries to progress both requests. This progression effort could continue to grow if succeeding iterations also do not complete immediately, resulting in a higher latency.
PAGE 211
Tuning Multiple network interfaces Multiple network interfaces You can use multiple network interfaces for interhost communication while still having intrahost exchanges. In this case, the intrahost exchanges use shared memory between processes mapped to different same-host IP addresses. To use multiple network interfaces, you must specify which MPI processes are associated with each IP address in your appfile.
PAGE 212
Tuning Multiple network interfaces Now, when the appfile is run, 32 processes run on host0 and 32 processes run on host1 as shown in Figure 5-1. Figure 5-1 Multiple network interfaces Ranks 0 - 15 ethernet0 ethernet0 shmem Ranks 16 - 31 shmem ethernet1 host0 Ranks 32 - 47 Ranks 48 - 63 ethernet1 host1 Host0 processes with rank 0 - 15 communicate with processes with rank 16 - 31 through shared memory (shmem).
PAGE 213
Tuning Processor subscription Processor subscription Subscription refers to the match of processors and active processes on a host. Table 5-1 lists possible subscription types.
PAGE 214
Tuning Processor locality Processor locality The mpirun option -cpu_bind binds a rank to a locality domain (ldom) to prevent a process from moving to a different ldom after startup. The binding occurs before the MPI application is executed. Similar results can be accomplished using "mpsched" but this has the advantage of being more load-based distribution, and works well in psets and across multiple machines.
PAGE 215
Tuning MPI routine selection MPI routine selection To achieve the lowest message latencies and highest message bandwidths for point-to-point synchronous communications, use the MPI blocking routines MPI_Send and MPI_Recv. For asynchronous communications, use the MPI nonblocking routines MPI_Isend and MPI_Irecv. When using blocking routines, try to avoid pending requests.
PAGE 216
Tuning Multilevel parallelism Multilevel parallelism There are several ways to improve the performance of applications that use multilevel parallelism: 192 • Use the MPI library to provide coarse-grained parallelism and a parallelizing compiler to provide fine-grained (that is, thread-based) parallelism. An appropriate mix of coarse- and fine-grained parallelism provides better overall performance. • Assign only one multithreaded process per host when placing application processes.
PAGE 217
Tuning Coding considerations Coding considerations The following are suggestions and items to consider when coding your MPI applications to improve performance: • Use HP-MPI collective routines instead of coding your own with point-to-point routines because HP-MPI’s collective routines are optimized to use shared memory where possible for performance. Use commutative MPI reduction operations. — Use the MPI predefined reduction operations whenever possible because they are optimized.
PAGE 218
Tuning Using HP Caliper Using HP Caliper HP Caliper is a general-purpose performance analysis tool for applications, processes, and systems. HP Caliper allows you to understand the performance and execution of an application, and identify ways to improve runtime performance. NOTE When running HP-MPI applications under HP Caliper on Linux hosts, it may be necessary to set the HPMPI_NOPROPAGATE_SUSP environment variable to prevent application aborts.
PAGE 219
6 Debugging and troubleshooting This chapter describes debugging and troubleshooting HP-MPI applications.
PAGE 220
Debugging and troubleshooting — Using a single-process debugger — Using a multi-process debugger — Using the diagnostics library — Enhanced debugging output — Backtrace functionality — Debugging tutorial for Windows • Troubleshooting HP-MPI applications — Building on HP-UX and Linux — Building on Windows — Starting on HP-UX and Linux — Starting on Windows — Running on HP-UX, Linux, and Windows — Completing — Testing the network on HP-UX and Linux — Testing the network on Windows 196 Chapter 6
PAGE 221
Debugging and troubleshooting Debugging HP-MPI applications Debugging HP-MPI applications HP-MPI allows you to use single-process debuggers to debug applications. The available debuggers are ADB, DDE, XDB, WDB, GDB, and PATHDB. You access these debuggers by setting options in the MPI_FLAGS environment variable. HP-MPI also supports the multithread, multiprocess debugger, TotalView on Linux and HP-UX for Itanium-based systems.
PAGE 222
Debugging and troubleshooting Debugging HP-MPI applications Step 3. Run your application. When your application enters MPI_Init, HP-MPI starts one debugger session per process and each debugger session attaches to its process. Step 4. (Optional) Set a breakpoint anywhere following MPI_Init in each session. Step 5. Set the global variable MPI_DEBUG_CONT to 1 using each session’s command line interface or graphical user interface.
PAGE 223
Debugging and troubleshooting Debugging HP-MPI applications Using a multi-process debugger HP-MPI supports the TotalView debugger on Linux and HP-UX for Itanium-based systems. The preferred method when you run TotalView with HP-MPI applications is to use the mpirun runtime utility command. For example, % $MPI_ROOT/bin/mpicc myprogram.c -g % $MPI_ROOT/bin/mpirun -tv -np 2 a.out In this example, myprogram.
PAGE 224
Debugging and troubleshooting Debugging HP-MPI applications NOTE When attaching to a running MPI application that was started using appfiles, you should attach to the MPI daemon process to enable debugging of all the MPI ranks in the application. You can identify the daemon process as the one at the top of a hierarchy of MPI jobs (the daemon also usually has the lowest PID among the MPI jobs). Limitations The following limitations apply to using TotalView with HP-MPI applications: 1.
PAGE 225
Debugging and troubleshooting Debugging HP-MPI applications my_appfile resides on the local machine (local_host) in the /work/mpiapps/total directory. To debug this application using TotalView (in this example, TotalView is invoked from the local machine): 1. Place your binary files in accessible locations.
PAGE 226
Debugging and troubleshooting Debugging HP-MPI applications To disable these checks or enable formatted or unformatted printing of message data to a file, set the MPI_DLIB_FLAGS environment variable options appropriately. See “MPI_DLIB_FLAGS” on page 146 for more information. To use the diagnostics library, specify the -ldmpi option to the build scripts when you compile your application. This option is supported on HP-UX, Linux, and Windows. NOTE Using DLIB reduces application performance.
PAGE 227
Debugging and troubleshooting Debugging HP-MPI applications Backtrace functionality HP-MPI handles several common termination signals on PA-RISC differently than earlier versions of HP-MPI.
PAGE 228
Debugging and troubleshooting Troubleshooting HP-MPI applications Troubleshooting HP-MPI applications This section describes limitations in HP-MPI, some common difficulties you may face, and hints to help you overcome those difficulties and get the best performance from your HP-MPI applications. Check this information first when you troubleshoot problems.
PAGE 229
Debugging and troubleshooting Troubleshooting HP-MPI applications Building on HP-UX and Linux You can solve most build-time problems by referring to the documentation for the compiler you are using. If you use your own build script, specify all necessary input libraries. To determine what libraries are needed, check the contents of the compilation utilities stored in the HP-MPI $MPI_ROOT/bin subdirectory. HP-MPI supports a 64-bit version of the MPI library on 64-bit platforms.
PAGE 230
Debugging and troubleshooting Troubleshooting HP-MPI applications • Application binaries are available on the necessary remote hosts and are executable on those machines • The -sp option is passed to mpirun to set the target shell PATH environment variable. You can set this option in your appfile • The .cshrc file does not contain tty commands such as stty if you are using a /bin/csh-based shell Starting on Windows When starting multihost applications using Windows CCS: • Don't forget the -ccp flag.
PAGE 231
Debugging and troubleshooting Troubleshooting HP-MPI applications • Application binaries are accessible from the remote nodes. If the binaries are located on a file share, use the UNC path (i.e. \\node\share\path) to refer to the binary, as these may not be properly mapped to a drive letter by the authenticated logon token. • If a password is not already cached, use the -cache option for your first run, or use the -pass option on all runs so the remote service can authenticate with network resources.
PAGE 232
Debugging and troubleshooting Troubleshooting HP-MPI applications After shared-memory allocation is done, every MPI process attempts to attach to the shared-memory region of every other process residing on the same host. This shared memory allocation can fail if the system is not configured with enough available shared memory. Consult with your system administrator to change system settings. Also, MPI_GLOBMEMSIZE is available to control how much shared memory HP-MPI tries to allocate.
PAGE 233
Debugging and troubleshooting Troubleshooting HP-MPI applications -h remote_host -e var=val [-np #] program [args] Refer to “Creating an appfile” on page 78 for details. On XC, systems the environment variables are automatically propagated by srun. Environment variables can be established by the user with either setenv or export and are passed along to the MPI processes by the SLURM srun utility. Thus, on XC systems, it is not necessary to use the "-e name=value" approach to passing environment variables.
PAGE 234
Debugging and troubleshooting Troubleshooting HP-MPI applications External input and output You can use stdin, stdout, and stderr in your applications to read and write data. By default, HP-MPI does not perform any processing on either stdin or stdout. The controlling tty determines stdio behavior in this case. This functionality is not provided when using -srun or -prun.
PAGE 235
Debugging and troubleshooting Troubleshooting HP-MPI applications The following option is available for prepending: p Enables prepending. The global rank of the originating process is prepended to stdout and stderr output. Although this mode can be combined with any buffering mode, prepending makes the most sense with the modes b and bline.
PAGE 236
Debugging and troubleshooting Troubleshooting HP-MPI applications Testing the network on HP-UX and Linux Often, clusters might have both ethernet and some form of higher speed interconnect such as InfiniBand. This section describes how to use the ping_pong_ring.c example program to confirm that you are able to run using the desired interconnect.
PAGE 237
Debugging and troubleshooting Troubleshooting HP-MPI applications -h hostA -h hostB -h hostC ... -h hostZ -np 1 /path/to/pp.x -np 1 /path/to/pp.x -np 1 /path/to/pp.x -np 1 /path/to/pp.x Then run one of the following commands: % bsub pam -mpi $MPI_ROOT/bin/mpirun -prot -f appfile % bsub pam -mpi $MPI_ROOT/bin/mpirun -prot -f appfile \ -- 1000000 Note that when using LSF, the actual hostnames in the appfile are ignored.
PAGE 238
Debugging and troubleshooting Troubleshooting HP-MPI applications [0:hostA] ping-pong 0 bytes 0 bytes: 4.24 usec/msg [1:hostB] ping-pong 0 bytes 0 bytes: 4.26 usec/msg [2:hostC] ping-pong 0 bytes 0 bytes: 4.26 usec/msg [3:hostD] ping-pong 0 bytes 0 bytes: 4.24 usec/msg ... ... ... ... The table showing SHM/VAPI is printed because of the -prot option (print protocol) specified in the mpirun command.
PAGE 239
Debugging and troubleshooting Troubleshooting HP-MPI applications If the run aborts with some kind of error message, it's possible that HP-MPI incorrectly determined what interconnect was available. One common way to encounter this problem is to run a 32-bit application on a 64-bit machine like an Opteron or Intel64. It's not uncommon for some network vendors to provide only 64-bit libraries. HP-MPI determines which interconnect to use before it even knows the application's bitness.
PAGE 240
Debugging and troubleshooting Troubleshooting HP-MPI applications -h hostA -np 1 \\node\share\path\to\pp.x -h hostB -np 1 \\node\share\path\to\pp.x -h hostC -np 1 \\node\share\path\to\pp.
PAGE 241
Debugging and troubleshooting Troubleshooting HP-MPI applications Host 0 -- ip 172.16.159.3 -- ranks 0 Host 1 -- ip 172.16.150.23 -- ranks 1 Host 2 -- ip 172.16.150.24 -- ranks 2 host | 0 1 2 =====|================ 0 : SHM IBAL IBAL 1 : IBAL SHM IBAL 2 : IBAL IBAL SHM [0:mpiccp3] ping-pong 1000000 bytes ... 1000000 bytes: 1089.29 usec/msg 1000000 bytes: 918.03 MB/sec [1:mpiccp4] ping-pong 1000000 bytes ... 1000000 bytes: 1091.99 usec/msg 1000000 bytes: 915.76 MB/sec [2:mpiccp5] ping-pong 1000000 bytes ...
PAGE 242
Debugging and troubleshooting Troubleshooting HP-MPI applications 218 Chapter 6
PAGE 243
A Example applications This appendix provides example applications that supplement the conceptual information throughout the rest of this book about MPI in general and HP-MPI in particular. Table A-1 summarizes the examples in this appendix.
PAGE 244
Example applications $MPI_ROOT/help subdirectory in your HP-MPI product. Table A-1 Example applications shipped with HP-MPI Name 220 Language Description -np argument send_receive.f Fortran 77 Illustrates a simple send and receive operation. -np >= 2 ping_pong.c C Measures the time it takes to send and receive data between two processes. -np = 2 ping_pong_ring.c C Confirms that an app can run using the desired interconnect -np >= 2 compute_pi.
PAGE 245
Example applications Table A-1 Example applications shipped with HP-MPI (Continued) Name Language Description -np argument io.c C Writes data for each process to a separate file called iodatax, where x represents each process rank in turn. Then, the data in iodatax is read back. -np >= 1 thread_safe.c C Tracks the number of client requests handled and prints a log of the requests to stdout. -np >= 2 sort.C C++ Generates an array of random integers and sorts it. -np >= 1 compute_pi_spawn.
PAGE 246
Example applications Step 2. Copy all files from the help directory to the current writable directory: % cp $MPI_ROOT/help/* . Step 3. Compile all the examples or a single example. To compile and run all the examples in the /help directory, at your prompt enter: % make To compile and run the thread_safe.
PAGE 247
Example applications send_receive.f send_receive.f In this Fortran 77 example, process 0 sends an array to other processes in the default communicator MPI_COMM_WORLD. program main include 'mpif.h' integer rank, size, to, from, tag, count, i, ierr integer src, dest integer st_source, st_tag, st_count integer status(MPI_STATUS_SIZE) double precision data(100) call MPI_Init(ierr) call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr) call MPI_Comm_size(MPI_COMM_WORLD, size, ierr) if (size .eq.
PAGE 248
Example applications send_receive.f st_source = status(MPI_SOURCE) st_tag = status(MPI_TAG) print *, 'Status info: source = ', st_source, + ' tag = ', st_tag, ' count = ', st_count print *, rank, ' received', (data(i),i=1,10) endif call MPI_Finalize(ierr) stop end send_receive output The output from running the send_receive executable is shown below. The application was run with -np = 10.
PAGE 249
Example applications ping_pong.c ping_pong.c This C example is used as a performance benchmark to measure the amount of time it takes to send and receive data between two processes. The buffers are aligned and offset from each other to avoid cache conflicts caused by direct process-to-process byte-copy operations To run this example: • Define the CHECK macro to check data integrity. • Increase the number of bytes to at least twice the cache size to obtain representative bandwidth measurements.
PAGE 250
Example applications ping_pong.c * Page-align buffers and displace them in the cache to avoid collisions. */ buf = (char *) malloc(nbytes + 524288 + (ALIGN - 1)); if (buf == 0) { MPI_Abort(MPI_COMM_WORLD, MPI_ERR_BUFFER); exit(1); } buf = (char *) ((((unsigned long) buf) + (ALIGN - 1)) & ~(ALIGN - 1)); if (rank == 1) buf += 524288; memset(buf, 0, nbytes); /* * Ping-pong. */ if (rank == 0) { printf("ping-pong %d bytes ...
PAGE 251
Example applications ping_pong.c #ifdef CHECK for (j = 0; j < nbytes; j++) { if (buf[j] != (char) (j + i)) { printf("error: buf[%d] = %d, not %d\n",j, buf[j], j + i); break; } } #endif } stop = MPI_Wtime(); printf("%d bytes: %.2f usec/msg\n", nbytes, (stop - start) / NLOOPS / 2 * 1000000); if (nbytes > 0) { printf("%d bytes: %.2f MB/sec\n", nbytes,nbytes / 1000000.
PAGE 252
Example applications ping_pong.c ping-pong 0 bytes ... 0 bytes: 1.
PAGE 253
Example applications ping_pong_ring.c (HP-UX and Linux) ping_pong_ring.c (HP-UX and Linux) Often a cluster might have both regular ethernet and some form of higher speed interconnect such as InfiniBand. This section describes how to use the ping_pong_ring.c example program to confirm that you are able to run using the desired interconnect.
PAGE 254
Example applications ping_pong_ring.
PAGE 255
Example applications ping_pong_ring.
PAGE 256
Example applications ping_pong_ring.c (HP-UX and Linux) if (size < 2) { if ( ! rank) printf("rping: must have two+ processes\n"); MPI_Finalize(); exit(0); } nbytes = (argc > 1) ? atoi(argv[1]) : 0; if (nbytes < 0) nbytes = 0; /* * Page-align buffers and displace them in the cache to avoid collisions.
PAGE 257
Example applications ping_pong_ring.c (HP-UX and Linux) if (nbytes > 0) { sprintf(&str[strlen(str)], "%d bytes: %.2f MB/sec\n", nbytes, nbytes / (1024. * 1024.
PAGE 258
Example applications ping_pong_ring.c (HP-UX and Linux) > > > > > > [1:hostB] ping-pong 0 bytes ... 0 bytes: 4.38 usec/msg [2:hostC] ping-pong 0 bytes ... 0 bytes: 4.42 usec/msg [3:hostD] ping-pong 0 bytes ... 0 bytes: 4.42 usec/msg The table showing SHM/VAPI is printed because of the "-prot" option (print protocol) specified in the mpirun command.
PAGE 259
Example applications ping_pong_ring.c (HP-UX and Linux) % $MPI_ROOT/bin/mpirun -mpi32 ...
PAGE 260
Example applications ping_pong_ring.c (Windows) ping_pong_ring.c (Windows) Often, clusters might have both ethernet and some form of higher-speed interconnect such as InfiniBand. This section describes how to use the ping_pong_ring.c example program to confirm that you are able to run using the desired interconnect. Running a test like this, especially on a new cluster, is useful to ensure that the appropriate network drivers are installed and that the network hardware is functioning properly.
PAGE 261
Example applications ping_pong_ring.c (Windows) In each case above, the first mpirun uses 0 bytes per message and is checking latency. The second mpirun uses 1000000 bytes per message and is checking bandwidth. #include #include #ifndef _WIN32 #include #endif #include #include #include
PAGE 262
Example applications ping_pong_ring.
PAGE 263
Example applications ping_pong_ring.c (Windows) start = MPI_Wtime(); for (i = 0; i < NLOOPS; i++) { SETBUF(); SEND(1000 + i); CLRBUF(); RECV(2000 + i); CHKBUF(); } stop = MPI_Wtime(); sprintf(&str[strlen(str)], "%d bytes: %.2f usec/msg\n", nbytes, (stop - start) / NLOOPS / 2 * 1024 * 1024); if (nbytes > 0) { sprintf(&str[strlen(str)], "%d bytes: %.2f MB/sec\n", nbytes, nbytes / (1024. * 1024.
PAGE 264
Example applications ping_pong_ring.c (Windows) Host 0 -- ip 172.16.159.3 -- ranks 0 Host 1 -- ip 172.16.150.23 -- ranks 1 Host 2 -- ip 172.16.150.24 -- ranks 2 host | 0 1 2 =====|================ 0 : SHM IBAL IBAL 1 : IBAL SHM IBAL 2 : IBAL IBAL SHM [0:mpiccp3] ping-pong 1000000 bytes ... 1000000 bytes: 1089.29 usec/msg 1000000 bytes: 918.03 MB/sec [1:mpiccp4] ping-pong 1000000 bytes ... 1000000 bytes: 1091.99 usec/msg 1000000 bytes: 915.76 MB/sec [2:mpiccp5] ping-pong 1000000 bytes ...
PAGE 265
Example applications compute_pi.f compute_pi.f This Fortran 77 example computes pi by integrating f(x) = 4/(1 + x2). Each process: • Receives the number of intervals used in the approximation • Calculates the areas of its rectangles • Synchronizes for a global summation Process 0 prints the result of the calculation. program main include 'mpif.h' double precision PI25DT parameter(PI25DT = 3.
PAGE 266
Example applications compute_pi.f C C Collect all the partial sums. C call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, + MPI_SUM, 0, MPI_COMM_WORLD, ierr) C C Process 0 prints the result. C if (myid .eq. 0) then write(6, 97) pi, abs(pi - PI25DT) 97 format(' pi is approximately: ', F18.16, + ' Error is: ', F18.16) endif call MPI_FINALIZE(ierr) stop end compute_pi output The output from running the compute_pi executable is shown below. The application was run with -np = 10.
PAGE 267
Example applications master_worker.f90 master_worker.f90 In this Fortran 90 example, a master task initiates (numtasks - 1) number of worker tasks. The master distributes an equal portion of an array to each worker task. Each worker task receives its portion of the array and sets the value of each element to (the element’s index + 1). Each worker task then sends its portion of the modified array back to the master. program array_manipulation include 'mpif.
PAGE 268
Example applications master_worker.f90 call MPI_Recv(result(index), chunksize, MPI_REAL, source, 1, & MPI_COMM_WORLD, status, ierr) end do do i = 1, numworkers*chunksize if (result(i) .ne. (i+1)) then print *, 'element ', i, ' expecting ', (i+1), ' actual is ', result(i) numfail = numfail + 1 endif enddo if (numfail .ne.
PAGE 269
Example applications cart.C cart.C This C++ program generates a virtual topology. The class Node represents a node in a 2-D torus. Each process is assigned a node or nothing. Each node holds integer data, and the shift operation exchanges the data with its neighbors. Thus, north-east-south-west shifting returns the initial data. #include #include
PAGE 270
Example applications cart.
PAGE 271
Example applications cart.C MPI_Barrier(comm); // Each process prints its profile printf("global rank %d: cartesian rank %d, coordinate (%d, %d)\n", grank, lrank, coords[0], coords[1]); } // Program body // // Define a torus topology and demonstrate shift operations. // void body(void) { Node node; node.profile(); node.print(); node.shift(NORTH); node.print(); node.shift(EAST); node.print(); node.shift(SOUTH); node.print(); node.shift(WEST); node.
PAGE 272
Example applications cart.
PAGE 273
Example applications communicator.c communicator.c This C example shows how to make a copy of the default communicator MPI_COMM_WORLD using MPI_Comm_dup. #include #include #include
PAGE 274
Example applications communicator.c communicator output The output from running the communicator executable is shown below. The application was run with -np = 2.
PAGE 275
Example applications multi_par.f multi_par.f The Alternating Direction Iterative (ADI) method is often used to solve differential equations. In this example, multi_par.f, a compiler that supports OPENMP directives is required in order to achieve multi-level parallelism. multi_par.
PAGE 276
Example applications multi_par.f partitioning used for the parallelization of the first outer-loop can accommodate the other of the second outer-loop. The partitioning of the array is shown in Figure A-1. Figure A-1 Array partitioning column block 2 0 1 3 0 0 1 2 3 1 3 0 1 2 2 2 3 0 1 3 1 2 3 0 row block In this sample program, the rank n process is assigned to the partition n at distribution initialization.
PAGE 277
Example applications multi_par.f The second outer-loop (the summations in column-wise fashion) is done in the same manner. For example, at the beginning of the second step for the column-wise summations, the rank 2 process receives data from the rank 1 process that computed the [3,0] block. The rank 2 process also sends the last column of the [2,0] block to the rank 3 process. Note that each process keeps the same blocks for both of the outer-loop computations.
PAGE 278
Example applications multi_par.
PAGE 279
Example applications multi_par.f c the c refer c c c c indices specify a portion (the j'th portion) of a row, and datatype cdtype(j) is created as an MPI vector datatype to to the j'th portion of a row. Note this a vector datatype because adjacent elements in a row are actually spaced nrow elements apart in memory.
PAGE 280
Example applications multi_par.f c Scatter initial data with using derived datatypes defined above c for the partitioning. MPI_send() and MPI_recv() will find out the c layout of the data from those datatypes. This saves application c programs to manually pack/unpack the data, and more importantly, c gives opportunities to the MPI system for optimal communication c strategies. c if (comm_rank.eq.
PAGE 281
Example applications multi_par.f c the c c block next to the computed block. Receive the last row of block that the next block being computed depends on. nrb=rb+1 ncb=mod(nrb+comm_rank,comm_size) call mpi_sendrecv(array(rbe(rb),cbs(cb)),1,cdtype(cb),dest, * 0,array(rbs(nrb)-1,cbs(ncb)),1,cdtype(ncb),src,0, * mpi_comm_world,mstat,ierr) endif enddo c c Sum up in each row. c The same logic as the loop above except rows and columns are c switched.
PAGE 282
Example applications multi_par.f c c c c c c c c c c c Dump to a file if (comm_rank.eq.0) then print*,'Dumping to adi.out...' open(8,file='adi.
PAGE 283
Example applications multi_par.f c************************************************************* ********* subroutine compcolumn(nrow,ncol,array,rbs,rbe,cbs,cbe) c c This subroutine: c does summations of columns in a thread.
PAGE 284
Example applications multi_par.
PAGE 285
Example applications multi_par.f integer nrow,ncol double precision array(nrow,ncol) c do j=1,ncol do i=1,nrow array(i,j)=(j-1.0)*ncol+i enddo enddo end multi_par.f output The output from running the multi_par.f executable is shown below. The application was run with -np1. Initializing 1000 x 1000 array... Start computation Computation took 4.
PAGE 286
Example applications io.c io.c In this C example, each process writes to a separate file called iodatax, where x represents each process rank in turn. Then, the data in iodatax is read back. #include #include #include #include #define SIZE (65536) #define FILENAME "iodata" /*Each process writes to separate files and reads them back. The file name is “iodata” and the process rank is appended to it.
PAGE 287
Example applications io.
PAGE 288
Example applications thread_safe.c thread_safe.c In this C example, N clients loop MAX_WORK times. As part of a single work item, a client must request service from one of Nservers at random. Each server keeps a count of the requests handled and prints a log of the requests to stdout. Once all the clients are done working, the servers are shutdown. #include #include #include
PAGE 289
Example applications thread_safe.
PAGE 290
Example applications thread_safe.c client(rank, size); shutdown_servers(rank); rtn = pthread_join(mtid, 0); if (rtn != 0) { printf("pthread_join failed\n"); MPI_Abort(MPI_COMM_WORLD, MPI_ERR_OTHER); } MPI_Finalize(); exit(0); } thread_safe output The output from running the thread_safe executable is shown below. The application was run with -np = 2.
PAGE 291
Example applications thread_safe.
PAGE 292
Example applications sort.C sort.C This program does a simple integer sort in parallel. The sort input is built using the "rand" random number generator. The program is self-checking and can run with any number of ranks. #define NUM_OF_ENTRIES_PER_RANK100 #include #include #include #include #include #include #include // // Class declarations.
PAGE 293
Example applications sort.
PAGE 294
Example applications sort.C *numOfEntries_p = numOfEntries; // // Add in the left and right shadow entries. // numOfEntries += 2; // // Allocate space for the entries and use rand to initialize the values. // entries = new Entry *[numOfEntries]; for(int i = 1; i < numOfEntries-1; i++) { entries[i] = new Entry; *(entries[i]) = (rand()%1000) * ((rand()%2 == 0)? 1 : -1); } // // Initialize the shadow entries.
PAGE 295
Example applications sort.C //BlockOfEntries::singleStepOddEntries // //Function: - Adjust the odd entries. // void BlockOfEntries::singleStepOddEntries() { for(int i = 0; i < numOfEntries-1; i += 2) { if (*(entries[i]) > *(entries[i+1]) ) { Entry *temp = entries[i+1]; entries[i+1] = entries[i]; entries[i] = temp; } } } // //BlockOfEntries::singleStepEvenEntries // //Function: - Adjust the even entries.
PAGE 296
Example applications sort.
PAGE 297
Example applications sort.C // // Have each rank build its block of entries for the global sort. // int numEntries; BlockOfEntries *aBlock = new BlockOfEntries(&numEntries, myRank); // // Compute the total number of entries and sort them. // numEntries *= numRanks; for(int j = 0; j < numEntries / 2; j++) { // // Synchronize and then update the shadow entries.
PAGE 298
Example applications sort.C MPI_Wait(&sortRequest, &status); aBlock->setRightShadow(Entry(recvVal)); } // // Everyone except 0 posts for the left's leftShadow. // if (myRank != 0) { MPI_Irecv(&recvVal, 1, MPI_INT, myRank-1, MPI_ANY_TAG, MPI_COMM_WORLD, &sortRequest); } // // Everyone except numRanks-1 sends its rightEnd right. // if (myRank != (numRanks-1)) { sendVal = aBlock->getRightEnd().
PAGE 299
Example applications sort.C } else { int recvVal; MPI_Status Status; MPI_Recv(&recvVal, 1, MPI_INT, myRank-1, 2, MPI_COMM_WORLD, &Status); aBlock->printEntries(myRank); aBlock->verifyEntries(myRank, recvVal); if (myRank != numRanks-1) { recvVal = aBlock->getRightEnd().getValue(); MPI_Send(&recvVal, 1, MPI_INT, myRank+1, 2, MPI_COMM_WORLD); } } delete aBlock; MPI_Finalize(); exit(0); } sort.C output The output from running the sort executable is shown below. The application was run with -np4.
PAGE 300
Example applications sort.C ... 383 383 386 386 Rank 3 386 393 393 397 ...
PAGE 301
Example applications compute_pi_spawn.f compute_pi_spawn.f This example computes pi by integrating f(x) = 4/(1 + x**2) using MPI_Spawn. It starts with one process and spawns a new world that does the computation along with the original process. Each newly spawned process receives the # of intervals used, calculates the areas of its rectangles, and synchronizes for a global summation. The original process 0 prints the result and the time it took. program mainprog include 'mpif.
PAGE 302
Example applications compute_pi_spawn.f C Calculate the interval size. C h = 1.0d0 / n sum = 0.0d0 do 20 i = myid + 1, n, numprocs x = h * (dble(i) - 0.5d0) sum = sum + f(x) 20 continue mypi = h * sum C C Collect all the partial sums. C call MPI_REDUCE(mypi, pi, 1, MPI_DOUBLE_PRECISION, + MPI_SUM, 0, mergedcomm, ierr) C C Process 0 prints the result. C if (myid .eq. 0) then write(6, 97) pi, abs(pi - PI25DT) 97 format(' pi is approximately: ', F18.16, + ' Error is: ', F18.
PAGE 303
B Appendix B Standard-flexibility in HP-MPI 279
PAGE 304
Standard-flexibility in HP-MPI HP-MPI implementation of standard flexibility HP-MPI implementation of standard flexibility HP-MPI contains a full MPI-2 standard implementation. There are items in the MPI standard for which the standard allows flexibility in implementation. This appendix identifies HP-MPI’s implementation of many of these standard-flexible issues. Table B-1 displays references to sections in the MPI standard that identify flexibility in the implementation of an issue.
PAGE 305
Standard-flexibility in HP-MPI HP-MPI implementation of standard flexibility Table B-1 Appendix B HP-MPI implementation of standard-flexible issues (Continued) Reference in MPI standard HP-MPI’s implementation MPI does not provide mechanisms to specify the initial allocation of processes to an MPI computation and their initial binding to physical processes. See MPI-1.2 Section 2.6.
PAGE 306
Standard-flexibility in HP-MPI HP-MPI implementation of standard flexibility Table B-1 282 HP-MPI implementation of standard-flexible issues (Continued) Reference in MPI standard HP-MPI’s implementation Vendors may write optimized collective routines matched to their architectures or a complete library of collective communication routines can be written using MPI point-to-point routines and a few auxiliary functions. See MPI-1.2 Section 4.1.
PAGE 307
Standard-flexibility in HP-MPI HP-MPI implementation of standard flexibility Table B-1 HP-MPI implementation of standard-flexible issues (Continued) Reference in MPI standard The format for specifying the filename in MPI_FILE_OPEN is implementation dependent. An implementation may require that filename include a string specifying additional information about the file. See MPI-2.0 Section 9.2.1.
PAGE 308
Standard-flexibility in HP-MPI HP-MPI implementation of standard flexibility 284 Appendix B
PAGE 309
C Appendix C mpirun using implied prun or srun 285
PAGE 310
mpirun using implied prun or srun Implied prun Implied prun HP-MPI provides an implied prun mode. The implied prun mode allows the user to omit the -prun argument from the mpirun command line with the use of the environment variable MPI_USEPRUN. Set the environment variable: % setenv MPI_USEPRUN 1 HP-MPI will insert the -prun argument.
PAGE 311
mpirun using implied prun or srun Implied prun • -n, --ntasks=ntasks Specify the number of processes to run. • -N, --nodes=nnodes Request that nnodes nodes be allocated to this job. • -m, --distribution=(block|cyclic) Specify an alternate distribution method for remote processes. • -w, --nodelist=host1,host2,... or filename Request a specific list of hosts. • -x, --exclude=host1,host2,... or filename Request that a specific list of hosts not be included in the resources allocated to this job.
PAGE 312
mpirun using implied prun or srun Implied srun Implied srun HP-MPI also provides an implied srun mode. The implied srun mode allows the user to omit the -srun argument from the mpirun command line with the use of the environment variable MPI_USESRUN. Set the environment variable: % setenv MPI_USESRUN 1 HP-MPI will insert the -srun argument.
PAGE 313
mpirun using implied prun or srun Implied srun % setenv MPI_USESRUN_IGNORE_ARGS -stdio=bnone % setenv MPI_USESRUN 1 % setenv MPI_SRUNOPTION --label % bsub -I -n4 -ext "SLURM[nodes=4]" \ $MPI_ROOT/bin/mpirun -stdio=bnone -f appfile \ -- pingpong Job <369848> is submitted to default queue . <> <> /opt/hpmpi/bin/mpirun unset MPI_USESRUN;/opt/hpmpi/bin/mpirun -srun ./pallas.
PAGE 314
mpirun using implied prun or srun Implied srun Here is the appfile: -np 1 -h foo -e MPI_FLAGS=T ./pallas.x -npmin 4 % setenv MPI_SRUNOPTION "--label" these are required to use the new feature: % setenv MPI_USESRUN 1 % bsub -I -n4 $MPI_ROOT/bin/mpirun -f appfile -- sendrecv Job <2547> is submitted to default queue . <> <> 0: #--------------------------------------------------0: # PALLAS MPI Benchmark Suite V2.
PAGE 315
mpirun using implied prun or srun Implied srun 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 0: 16384 1000 293.30 293.64 293.49 32768 1000 714.84 715.38 715.05 65536 640 1215.00 1216.45 1215.55 131072 320 2397.04 2401.92 2399.05 262144 160 4805.58 4826.59 4815.46 524288 80 9978.35 10017.87 9996.31 1048576 40 19612.90 19748.18 19680.29 2097152 20 36719.25 37786.09 37253.01 4194304 10 67806.51 67920.30 67873.05 8388608 5 135050.
PAGE 316
mpirun using implied prun or srun Implied srun 292 Appendix C
PAGE 317
D Frequently asked questions This section describes frequently asked HP-MPI questions.
PAGE 318
Frequently asked questions 294 • Installation and setup • Building applications • Performance problems • Network specific • Windows specific Appendix D
PAGE 319
Frequently asked questions General General QUESTION: Where can I get the latest version of HP-MPI? ANSWER: External customers can go to www.hp.com/go/mpi. HP Independent Software Vendors (ISVs) can go to http://www.software.hp.com/kiosk. QUESTION: Where can I get a license for HP-MPI? ANSWER: First, determine if a license is necessary. A license is not necessary if you are running on HP-UX or an HP XC system. Licenses are not necessary for supported ISV applications.
PAGE 320
Frequently asked questions General • TNO (TASS) • UGS (NX Nastran) • University of Birmingham (Molpro) • University of Texas (AMLS) You must have a sufficiently new version of these applications to ensure the ISV licensing mechanism is used. In all other cases, a license is required. If you do need a license, then follow the instructions you received with your purchase. Go to http://licensing.hp.com and enter the information received with your order.
PAGE 321
Frequently asked questions General ANSWER: MPI_ROOT is an environment variable that HP-MPI (mpirun) uses to determine where HP-MPI is installed and therefore which executables and libraries to use. It is particularly helpful when you have multiple versions of HP-MPI installed on a system. A typical invocation of HP-MPI on systems with multiple MPI_ROOTs installed is: % setenv MPI_ROOT /scratch/test-hp-mpi-2.2.5/ % $MPI_ROOT/bin/mpirun ... Or % export MPI_ROOT=/scratch/test-hp-mpi-2.2.
PAGE 322
Frequently asked questions Installation and setup Installation and setup QUESTION: Do I need a license to run HP-MPI? ANSWER: A license is not necessary if you are running on HP-UX or an HP XC system. Licenses are not necessary for supported ISV applications. See “General” on page 295 for a list of currently supported ISV applications. In all other cases, a license is required. If you do need a license, then follow the instructions you received with your purchase. Go to http://licensing.hp.
PAGE 323
Frequently asked questions Installation and setup QUESTION: Can I have multiple versions of HP-MPI installed and how can I switch between them? ANSWER: You can install multiple HP-MPI’s and they can be installed anywhere, as long as they are in the same place on each host you plan to run on. You can switch between them by setting MPI_ROOT. See “General” on page 295 for more information on MPI_ROOT.
PAGE 324
Frequently asked questions Building applications Building applications QUESTION: Which compilers does HP-MPI work with? ANSWER: HP-MPI works well with all compilers. We explicitly test with gcc, Intel, PathScale, and Portland, as well as HP-UX compilers. HP-MPI strives not to introduce compiler dependencies. For Windows, see the Windows FAQ section. QUESTION: What MPI libraries do I need to link with when I build? ANSWER: We recommend using the mpicc, mpif90, and mpi77 scripts in $MPI_ROOT/bin to build.
PAGE 325
Frequently asked questions Building applications a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2, dynamically linked (uses shared libraries), not stripped For more information on running 32-bit applications, see “Network specific” on page 304. For Windows, see the Windows FAQ section.
PAGE 326
Frequently asked questions Performance problems Performance problems QUESTION: How does HP-MPI clean up when something goes wrong? ANSWER: HP-MPI uses several mechanisms to clean up job files. Note that all processes in your application must call MPI_Finalize. • When a correct HP-MPI program (that is, one that calls MPI_Finalize) exits successfully, the root host deletes the job file. • If you use mpirun, it deletes the job file when the application terminates, whether successfully or not.
PAGE 327
Frequently asked questions Performance problems If your application still hangs after you convert MPI_Send and MPI_Rsend calls to MPI_Ssend, you know that your code is written to depend on buffering. You should rewrite it so that MPI_Send and MPI_Rsend do not depend on buffering. Alternatively, use non-blocking communication calls to initiate send operations.
PAGE 328
Frequently asked questions Network specific Network specific QUESTION: What extra software do I need to allow HP-MPI to run on my InfiniBand hardware? ANSWER: On HP-UX, download the IB4X-00 driver from the software depot at http://www.hp.com/go/softwaredepot. Configure /etc/privgroup. (See “ITAPI” on page 90). Otherwise, consult your interconnect vendor. QUESTION: I get an error when I run my 32-bit executable on my AMD64 or Intel(R)64 system.
PAGE 329
Frequently asked questions Network specific ANSWER: The environment variable MPI_IC_ORDER instructs HP-MPI to search in a specific order for the presence of an interconnect. The contents are a colon separated list. For a list of the default contents, see “Interconnect support” on page 85. Or, mpirun command line options can be used which take higher precedence than MPI_IC_ORDER. Lowercase selections imply to use if detected, otherwise keep searching.
PAGE 330
Frequently asked questions Windows specific Windows specific QUESTION: What versions of Windows does HP-MPI support? ANSWER: HP-MPI for Windows V1.0 supports Windows CCS. HP-MPI for Windows V1.1 supports Windows 2003 and Windows XP multinode runs with the HP-MPI Remote Launch service running on the nodes. This service is provided with V1.1. The service is not required to run in an SMP mode.
PAGE 331
Frequently asked questions Windows specific If you are installing using command line flags, use /DIR="" to change the default location. QUESTION: Which compilers does HP-MPI for Windows work with? ANSWER: HP-MPI works well with all compilers. We explicitly test with Visual Studio, Intel, and Portland compilers. HP-MPI strives not to introduce compiler dependencies.
PAGE 332
Frequently asked questions Windows specific To use the low-level InfiniBand protocol, use the -IBAL flag instead of -TCP. For example: R:\> mpirun -IBAL -netaddr 192.168.1.1 -ccp -np 12 rank.exe The use of -netaddr is not required when using -IBAL, but HP-MPI still uses this subnet for administration traffic. By default, it will use the TCP subnet available first in the binding order. This can be found and changed by going to the Network Connections->Advanced Settings windows.
PAGE 333
Frequently asked questions Windows specific ANSWER: The automatic job submittal will set the current working directory for the job to the current directory. (Equivalent to using -e MPI_WORKDIR=.) Because the remote compute nodes cannot access the local disks, they need a UNC path for the current directory. HP-MPI can convert the local drive to a UNC path if the local drive is a mapped network drive.
PAGE 334
Frequently asked questions Windows specific 310 Appendix D
PAGE 335
Glossary application In the context of HP-MPI, an application is one or more executable programs that communicate with each other via MPI calls. buffered send mode Form of blocking send where the sending process returns when the message is buffered in application-supplied space or when the message is received. asynchronous Communication in which sending and receiving processes place no constraints on each other in terms of completion.
PAGE 336
Glossary derived data types derived data types User-defined structures that specify a sequence of basic data types and integer displacements for noncontiguous data. You create derived data types through the use of type-constructor functions that describe the layout of sets of primitive types in memory. Derived types may contain arrays as well as combinations of other primitive data types. determinism A behavior describing repeatability in observed parameters.
PAGE 337
Glossary nonblocking send locality domain (ldom) Consists of a related collection of processors, memory, and peripheral resources that compose a fundamental building block of the system. All processors and peripheral devices in a given locality domain have equal latency to the memory contained within that locality domain. mapped drive In a network, drive mappings reference remote drives, and you have the option of assigning the letter of your choice.
PAGE 338
Glossary non–determinism Nonblocking sends are useful when communication and computation can be effectively overlapped in an MPI application. multiple tasks concurrently as when overlapping computation and communication. non–determinism A behavior describing non-repeatable parameters. A property of computations which may have more than one result. The order of a set of events depends on run time conditions and so varies from run to run.
PAGE 339
Glossary tag Security Support Provider Interface (SSPI) A common interface between transport-level applications such as Microsoft Remote Procedure Call (RPC), and security providers such as Windows Distributed Security. SSPI allows a transport application to call one of several security providers to obtain an authenticated connection. These calls do not require extensive knowledge of the security protocol’s details.
PAGE 340
Glossary task task Uniquely addressable thread of execution. thread Smallest notion of execution in a process. All MPI processes have one or more threads. Multithreaded processes have one address space but each process thread contains its own counter, registers, and stack. This allows rapid context switching because threads require little or no memory management. thread-compliant An implementation where an MPI process may be multithreaded. If it is, each thread can issue MPI calls.
PAGE 341
Symbols +DA2 option, 55 +DD64 option, 55 .
PAGE 342
problems HP-UX and Linux, 205 problems Windows, 205 building applications, 64 C C bindings, 296 C examples communicator.c, 220, 249 io.c, 262 ping_pong.c, 220, 225 ping_pong_ring.c, 220, 229 thread_safe.c, 264 C++, 296 C++ bindings, 50 C++ compilers, 50 C++ examples cart.C, 220, 245 sort.C, 268 C++ profiling, 180 -cache option, 130 caliper, 194 cart.
PAGE 343
Windows, 30 constructor functions contiguous, 15 indexed, 15 structure, 15 vector, 15 context communication, 9, 13 context switching, 189 contiguous and noncontiguous data, 14 contiguous constructor, 15 count variable, 8, 9, 10, 12 counter instrumentation, 147, 175 ASCII format, 176 create profile, 175 cpu binding, 58 -cpu_bind, 190 -cpu_bind option, 124 create appfile, 78 ASCII profile, 175 instrumentation profile, 175 D -d option, 125 daemons multipurpose, 81 number of processes, 81 -dd option, 125 DDE, 1
PAGE 344
MPI_PRUNOPTIONS, 160 MPI_RANKMEMSIZE, 153 MPI_RDMA_INTRALEN, 157 MPI_RDMA_MSGSIZE, 157 MPI_RDMA_NENVELOPE, 158 MPI_RDMA_NFRAGMENT, 158 MPI_RDMA_NONESIDED, 158 MPI_RDMA_NSRQRECV, 158 MPI_REMSH, 156 MPI_ROOT, 144 MPI_SHMEMCNTL, 154 MPI_SOCKBUFSIZE, 160 MPI_SPAWN_PRUNOPTIONS, 158 MPI_SPAWN_SRUNOPTIONS, 158 MPI_SRUNOPTIONS, 158 MPI_TCP_CORECVLIMIT, 160 MPI_USE_LIBELAN, 161 MPI_USE_LIBELAN_SUB, 161 MPI_USE_MALLOPT_AVOID_MMAP, 155 MPI_USEPRUN, 159 MPI_USEPRUN_IGNORE_ARGS, 159 MPI_USESRUN, 159 MPI_USESRUN_IGNORE_A
PAGE 345
global reduction, 12 global variables MPI_DEBUG_CONT, 197 gm, 89 -gm option, 120 gprof on XC, 139 group membership, 4 group size, 5 H -h option, 123 -ha option, 127 header files, 25, 38 -headnode option, 129 heart-beat signals, 139 -help option, 124 hostfile, 109 -hostfile option, 123 -hostlist, 110 hostlist, 110 -hostlist option, 123 hosts assigning using LSF, 70 multiple, 78, 78–84 -hosts option, 129 HP MPI building HP-UX and Linux, 205 building Windows, 205 change behavior, 137 completing, 211 debug, 195
PAGE 346
J -j option, 125 job launcher options, 122 job scheduler options, 122 -jobid option, 129 K kill MPI jobs, 84 L -l option, 123 language bindings, 280 language interoperability, 138 latency, 6, 185, 191 launch spec options, 122 launching ranks, 298 LD_LIBRARY_PATH appending, 131 ldom, 190 libraries to link, 300 license installing on Linux, 28 installing on Windows, 40 merging on Linux, 28 testing on Linux, 28 testing on Windows, 40 licenses Windows, 38 licensing, 295, 298 Linux, 26 Windows, 39 linking thread-
PAGE 347
run application on Linux cluster using appfiles, 23 run application on multiple hosts, 71 run application on single host HP-UX, Linux, 22 run application on Windows, 94 run application on XC cluster, 24 scatter operation, 11 terminate environment, 5 MPI application, starting on HP-UX, Linux, 22 MPI clean up, 302 MPI concepts, 4–16 MPI functions, 54 MPI library extensions 32-bit Fortran, 25 32-bit Linux, 25 64-bit Fortran, 25 64-bit Linux, 25 MPI library routines MPI_Comm_rank, 5 MPI_Comm_size, 5 MPI_Finali
PAGE 348
MPI_ROOT variable HP-UX, Linux, 25 MPI_Rsend, 8 convert to MPI_Ssend, 143 MPI_Scatter, 12 MPI_Send, 5, 8 application hangs, 302 convert to MPI_Ssend, 143 high message bandwidth, 191 low message latency, 191 MPI_SHMCNTL, 142, 143 MPI_SHMEMCNTL, 135, 154 MPI_SOCKBUFSIZE, 135, 160 MPI_SPAWN_PRUNOPTIONS, 135, 158 MPI_SPAWN_SRUNOPTIONS, 136, 158 MPI_SRUNOPTIONS, 136, 158, 288 MPI_Ssend, 8 MPI_TCP_CORECVLIMIT, 136, 160 MPI_THREAD_AFFINITY, 60 MPI_THREAD_IGNSELF, 60 MPI_Unpublish _name, 171 MPI_USE_LIBELAN, 136, 1
PAGE 349
number of MPI library routines, 4 O object compatibility, 62 ofed, 85, 88, 119 one-sided option, 126 op variable, 13 OPENMP, block partitioning, 253 operating systems supported, xvii optimization report, 140 options MPI, 119 windows 2003/xp, 129 windows ccp, 127 P -p option, 125 p2p_bcopy, 198 -package option, 129 packing and unpacking, 14 parent process, 11 -pass option, 130 PATH setting, 20 PATHDB, 197 performance collective routines, 193 communication hot spots, 80 derived data types, 193 latency/bandwid
PAGE 350
receive buffer address, 13 data type of, 13 data type of elements, 9 number of elements in, 9 starting address, 9 recvbuf variable, 12, 13 recvcount variable, 12 recvtype variable, 12 reduce, 12 reduce-scatter, 13 reduction, 13 operation, 13 release notes, 25, 38 remote launch service, 112 remote shell, 71 remsh command, 156, 205 secure, 20, 156 remote shell launching options, 123 remsh, 20, 71 reordering, rank, 140 req variable, 10 rhosts file, 71, 205 root process, 11 root variable, 12, 13 routine selecti
PAGE 351
MPI_RDMA_NSRQRECV, 135, 158 MPI_REMSH, 135, 156 MPI_ROOT, 135, 144 MPI_SHMCNTL, 142, 143 MPI_SHMEMCNTL, 135, 154 MPI_SOCKBUFSIZE, 135, 160 MPI_SPAWN_PRUNOPTIONS, 135, 158 MPI_SPAWN_SRUNOPTIONS, 136, 158 MPI_SRUNOPTIONS, 136, 158 MPI_TCP_CORECVLIMIT, 136, 160 MPI_USE_LIBELAN, 136, 161 MPI_USE_LIBELAN_SUB, 136, 161 MPI_USE_MALLOPT_AVOID_MMAP, 136, 155 MPI_USEPRUN, 136, 159 MPI_USEPRUN_IGNORE_ARGS, 136, 159 MPI_USESRUN, 136, 159 MPI_USESRUN_IGNORE_ARGS, 136, 159 MPI_VAPI_QPPARAMS, 136, 152 MPI_WORKDIR, 136, 1
PAGE 352
submitting Windows jobs, 96 -subnet option, 122 subscription definition of, 189 synchronization, 14 performance, and, 193 variables, 4 synchronous send mode, 8 system test, 298 T -T option, 125 tag variable, 8, 9, 10 tcp interface options, 121 -TCP option, 120 tcp/ip, 88 terminate MPI environment, 5 test system, 298 testing the network, 215 thread multiple, 16 thread-compliant library, 57 +O3, 57 +Oparallel, 57 -tk option, 130 -token option, 130 total transfer time, 6 TOTALVIEW, 136, 149 troubleshooting, 19