HP-MPI V2.
© Copyright 1979-2008 © Hewlett-Packard Development Company, L.P Legal Notices The information in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose.
Table of Contents 1 Information About This Release....................................................................................................7 1.1 Announcement................................................................................................................7 1.2 What’s in This Version.....................................................................................................7 1.2.1 Platforms Supported...........................................................................
3.2 Installation Instructions................................................................................................33 4 Licensing Information................................................................................................................35 4.1 Licensing Policy.............................................................................................................35 4.2 License Installation and Testing.............................................................................
List of Tables 1-1 1-2 1-3 1-4 1-5 5-1 Systems Supported.......................................................................................................8 Interconnect Support for V2.3.......................................................................................8 Directory Structure.......................................................................................................9 MPICH Wrappers..............................................................................................
1 Information About This Release 1.1 Announcement HP-MPI V2.3 for Linux is the November 2008 release of HP-MPI, the HP implementation of the Message Passing Interface standard for Linux. V2.3 for Linux is supported on HP ProLiant and HP Integrity servers running Red Hat Enterprise Linux AS 4 and 5, SuSE Linux Enterprise Server 9 and 10 operating systems, and HP XC3000, HP XC4000, and HP XC6000 Clusters. 1.
Table 1-1 Systems Supported Platform Interconnects IA-32 Myrinet InfiniBand Ethernet1 Itanium® Myrinet InfiniBand Quadrics Ethernet1 AMD Opteron™ Myrinet InfiniBand Quadrics Ethernet1 EM64T Myrinet InfiniBand Quadrics Ethernet1 Interconnect information for XC systems is available at http://docs.hp.com/en/linuxhpc.html in the release notes, installation guides, and hardware preparation guides. 1 Ethernet includes 10baseT, 100baseT, GigE, 10GbE, and 10GbE with RNIC. 1.2.
Table 1-2 Interconnect Support for V2.3 (continued) Protocol Option Supported Architectures NIC Version Driver Version Quadrics Elan -ELAN IA64, i386, x86_64 Rev 01 Elan4 TCP/IP -TCP IA64, i386, x86_64 All cards that support Ethernet Driver, IP IP 1 OFED 1.4 is expected to be compatible with HP-MPI V2.3. However, at the time of this release, the final version of OFED 1.4 is not available for testing. Contact your interconnect provider to verify driver support. NOTE: HP-MPI V2.
Table 1-3 Directory Structure (continued) Subdirectory Contents lib/linux_ia64 HP-MPI Linux 64-bit libraries for Itanium lib/linux_amd64 HP-MPI Linux 64-bit libraries for Opteron and EM64T newconfig Configuration files and release notes share/man/man1* manpages for the HP-MPI utilities share/man/man3* manpages for the HP-MPI library doc Release Notes licenses License files 1.2.5 Documentation Changes This release note might be updated periodically. Visit www.hp.
• • • • • • • • • License Release/Regain on Suspend/Resume (page 14) Expanded Functionality for -ha (page 15) Enhanced InfiniBand Support for Dynamic Processes (page 20) Singleton Launching (page 21) Using the -stdio=files Option (page 21) Using the -stdio=none Option (page 21) Expanded Lightweight Instrumentation (page 21) The api option to MPI_INSTR (page 22) New mpirun option -xrc (page 23) 1.2.
These tests are similar to the code found in $MPI_ROOT/help/hello_world.c and $MPI_ROOT/help/ping_pong_ring.c. The ping_pong_ring test in system_check.c defaults to a message size of 4096 bytes. An optional argument to the system check application can be used to specify an alternate message size. The environment variable HPMPI_SYSTEM_CHECK can be set to run a single test.
Table 1-4 MPICH Wrappers MPICH1 MPICH2 mpirun.mpich mpirun.mpich2 mpicc.mpich mpicc.mpich2 mpif77.mpich mpif77.mpich2 mpif90.mpich mpif90.mpich2 Object files built with HP-MPI MPICH compiler wrappers can be used by an application that uses the MPICH implementation. Applications built using MPICH compliant libraries should be relinked to use HP-MPI in MPICH compatibility mode. NOTE: Do not use MPICH compatibility mode to produce a single executable to run under both MPICH and HP-MPI.
To enable HP-MPI support for these extensions to the MPI-2.1 standard, -non-standard-ext must be added to the command line of the HP-MPI compiler wrappers (mpiCC, mpicc, mpif90, mpif77), as in the following example: % /opt/hpmpi/bin/mpicc -non-standard-ext large_count_test.c The -non-standard-ext flag must also be passed to the compiler wrapper during the link step of building an executable. For a complete list of the new large message interfaces supported, see Appendix A (page 39). 1.2.7.
another HP-MPI job while the first job remains suspended. When a suspended mpirun job receives a SIGCONT, the licenses are reacquired and the job continues. If the licenses cannot be reacquired from the license server, the job exits. NOTE: When a job is suspended in Linux, any memory that is pinned is not swapped to disk, and is not handled by the operating system's virtual memory subsystem. HP-MPI pins memory that is associated with RDMA message transfers.
Table 1-5 High Availability Options (continued) Options Descriptions -ha:recover Recovery of communication connections after failures. HP hardware only. For more information, see “Failure Recover (-ha:recover)” (page 17). -ha:net Enables Automatic Port Migration. HP hardware only. For more information, see “Network High Availability (-ha:net)” (page 19).
Using -ha:infra does not allow a convenient way to terminate all ranks associated with the application. It is the responsibility of the user to have a mechanism for application teardown. The -ha:infra option is available only on HP hardware. Usage on non-HP hardware will result in an error message. 1.2.7.7.3 Using MPI_Comm_connect and MPI_Comm_accept MPI_Comm_connect and MPI_Comm_accept can now be used without the -spawn option to mpirun.
IMPORTANT: A call to MPI_Comm_dup() always terminates all outstanding communications with failures on the communicator regardless of the presence or absence of errors. Therefore, the functionality of MPI_Comm_dup() when using -ha:recover is not standard-compliant in the absence of errors.
rank has called MPI_Comm_dup() on the communicator. After all ranks have called MPI_Comm_dup(), the parent communicator may again be used for point-to-point communication. MPI_Comm_dup() can be called successfully even after a failure is observed on the communicator. Because the results of a collective call can vary by rank, ensure that an application is written to avoid deadlocks. For example, using multiple communicators can be very difficult as the following code demonstrates: ...
-ha:detect, MPI_Errhandler must be set to MPI_ERRORS_RETURN using the MPI_Comm_set_errhandler function. When an error is detected in a communication, the error class MPI_ERR_EXITED is returned for the affected communication. Shared memory is not used for communication between processes. Only IBV and TCP are supported. This mode cannot be used with the diagnostic library. 1.2.7.7.
1.2.7.9 Singleton Launching This release supports the creation of a single rank without the use of mpirun, called singleton launching. It is only valid to launch an MPI_COMM_WORLD of size one using this approach. The single rank created in this way is executed as if it were created with mpirun -np 1 . HP-MPI environment variables can influence the behavior of the rank. Interconnect selection can be controlled using the environment variable MPI_IC_ORDER.
extern int hpmp_instrument_runtime(int reset) A call to hpmp_instrument_runtime(0) populates the output file specified by the -i option to mpirun or the MPI_INSTR environment variable with the statistics available at the time of the call. Subsequent calls to hpmp_instrument_runtime() or MPI_Finalize() will overwrite the contents of the specified file. A call to hpmp_instrument_runtime(1) populates the file in the same way, but also resets the statistics.
1 [0..64] 8 0 0.308 api api The use of the api option to MPI_INSTR is available only on HP hardware. Usage on non-HP hardware will result in an error message. 1.2.7.14 New mpirun option -xrc HP-MPI V2.3 has added support for the -xrc option to mpirun. Extended Reliable Connection (XRC) is a new feature on ConnectX InfiniBand adapters.
2 Known Problems and Workarounds • When running on iWARP hardware, users might see messages similar the following when applications exit: disconnect: ID 0x2b65962b2b10 ret 22 This is a debugging message which is being printed erroneously by the uDAPL library and can be safely ignored. The message can be completely suppressed by passing the -e DAPL_DBG_TYPE=0 option to mpirun. Alternatively, you can set DAPL_DBG_TYPE=0 in the $MPI_ROOT/etc/hpmpi.
• When mapping ranks to a CPU, the ordering of the CPUs relative to the locality domain (ldom)/socket can vary depending on the architecture and operating system. This ordering is not consistent, and therefore the MAP_CPU order for one system may not be the same for a different hardware platform or operating system.
on host MPI_CPU_AFFINITY on host MPI_CPU_AFFINITY on host Hello world! I'm .... mpixbl01 to cpu 4 set to MAP_CPU, setting affinity of rank 6 pid 15807 mpixbl01 to cpu 2 set to MAP_CPU, setting affinity of rank 7 pid 15808 mpixbl01 to cpu 0 1 of 8 on mpixbl01 If the operating system orders the CPUs differently relative to the ldom/socket, this mapping has different results.
• • • included with HP-MPI attempt to link MPI applications in such a way as to make this possible. If you choose not to link your application with the provided compiler wrappers, you must either ensure that libmpi.so precedes libc.so on the linker command line or specify "-e LD_PRELOAD=%LD_PRELOAD:libmpi.so" on the mpirun command line.
• InfiniBand requires pages to be pinned (locked in memory) for message passing. This can become a problem when a child process is forked and a pinned page exists in both the parent's and child's address spaces. Normally a copy-on-write would occur when one of the processes touches memory on a shared page, and the virtual to physical mapping would change for that process.
• The initial release of OFED 1.2 contains a bug that causes the memory pinning function to fail after certain patterns of malloc and free. The symptom, which is visible from HP-MPI, might be any of several error messages such as: > prog.x: Rank 0:1: MPI_Get: Unable to pin memory for put/get This bug has already been fixed in OFED 1.3, but if you are running with the initial release of OFED 1.2, the only workaround is to set MPI_IBV_NO_FORK_SAFE=1. • When upgrading to OFED 1.
• Interval timer functionality used by HP-MPI on HP XC systems can conflict with gprof data collection phase requirements. Set the following two environment variables to workaround this issue: % export MPI_FLAGS=s0 % export GMON_OUT_PREFIX=/tmp/app_name In the above example, setting MPI_FLAGS disables the HP-MPI conflicting use of interval timers. See the mpienv(1) manpage for descriptions of MPI_FLAG settings.
• • • • • High Availability (-ha) mode and the diagnostic library are not allowed at the same time. MPICH mode and the diagnostic library are not allowed at the same time. HA with MPICH has not been tested. The diagnostic library strict mode is not compatible with some MPI-2 features. Some versions of Quadrics have a memory leak.
3 Installation Information 3.1 Installation Requirements HP-MPI V2.3 for Linux is supported on HP ProLiant and HP Integrity servers running Red Hat Enterprise Linux AS 4 and 5, SuSE Linux Enterprise Server 9 and 10 operating systems, and HP XC3000, HP XC4000 and HP XC6000 Clusters. You must install the correct HP-MPI product for your system. HP-MPI requires a minimum of 90MB of disk space in /opt. 3.2 Installation Instructions 1. 2. 3. Run the su command and enter the superuser password.
4 Licensing Information 4.1 Licensing Policy A license is required to use HP-MPI for Linux. You can purchase licenses from the HP software depot at http://www.hp.com/go/softwaredepot, or by contacting your HP representative. Demo licenses for HP-MPI are also available from the HP software depot. No separate HP-MPI license is required on an HP XC system. 4.2 License Installation and Testing HP-MPI V2.3 for Linux uses FLEXnet Publisher licensing technology. A license file can be named either as license.
If the license needs to be placed in another location, which is not found by the previous search, you can set the environment variable LM_LICENSE_FILE to explicitly specify the location of the license file. For more information, see http://licensing.hp.com. 4.2.1 Installing License Files A valid license file contains the system hostid and the associated license key. License files can be named either as license.dat or any name with extension of *.lic (for example mpi.lic).
5 Additional Product Information 5.1 Product Documentation The HP-MPI Documentation Kit is an optional product (product number B6281AA) consisting of the following: • • MPI: The Complete Reference (2 volume set) product number B6011-96012 HP-MPI User’s Guide (Eleventh Edition) product number B6060-96024 The HP-MPI User’s Guide and HP-MPI Release Notes are available from the following resources: • • • $MPI_ROOT/doc (After you install the product.) http://docs.hp.com http://www.hp.
5.2 Product Packaging HP-MPI is packaged as an optional software product installed in /opt/hpmpi by default. 5.3 Software Availability in Native Languages There is no information on non-English languages for HP-MPI for Linux systems.
A HP-MPI Large Message APIs See “Support for Large Messages” (page 13) for more information about large message APIs and how to use them. NOTE: The Fortran and the C++ APIs follow the same convention for MPI_Aint as defined in the MPI Standard. This Appendix only lists the C APIs. A.
IN OUT comm request communicator (handle) communication request (handle) int MPI_IrecvL(void* buf, MPI_Aint count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request) OUT IN IN IN IN IN OUT buf count datatype source tag comm request initial address of receive buffer (choice) number of elements in receive buffer datatype of each receive buffer element (handle) rank of source message tag communicator (handle) communication request (handle) int MPI_IrsendL(void* buf, MPI_Aint
IN IN IN IN OUT datatype source tag comm request type of each element (handle) rank of source or MPI_ANY_SOURCE (integer) message tag or MPI_ANY_TAG (integer) communicator (handle) communication request (handle) int MPI_RsendL(void* buf, MPI_Aint count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) IN IN IN IN IN IN buf count datatype dest tag comm initial address of send buffer (choice) number of elements in send buffer datatype of each send buffer element (handle) rank of destination messa
int MPI_SendrecvL(void *sendbuf, MPI_Aint sendcount, MPI_Datatype sendtype, int dest, int sendtag, void *recvbuf, MPI_Aint recvcount, MPI_Datatype recvtype, int source, int recvtag, MPI_Comm comm, MPI_Status *status) IN IN IN IN IN OUT IN IN IN IN IN OUT sendbuf sendcount sendtype dest sendtag recvbuf recvcount recvtype source recvtag comm status initial address of send buffer (choice) number of elements in send buffer type of elements in send buffer (handle) rank of destination send tag initial address o
A.
int MPI_AlltoallvL(void* sendbuf, MPI_Aint *sendcounts, MPI_Aint *sdispls, MPI_Datatype sendtype, void* recvbuf, MPI_Aint *recvcounts, MPI_Aint *rdispls, MPI_Datatype recvtype, MPI_Comm comm) IN IN IN IN OUT IN IN IN IN sendbuf starting address of send buffer (choice) sendcounts array equal to the group size specifying the number of elements to send to each rank sdispls array of displacements relative to sendbuf sendtype data type of send buffer elements (handle) recvbuf address of receive buffer (choice)
IN IN root comm only at root) (handle) rank of receiving process (integer) communicator (handle) int MPI_GathervL(void* sendbuf, MPI_Aint sendcount, MPI_Datatype sendtype, void* recvbuf, MPI_Aint *recvcounts, MPI_Aint *displs, MPI_Datatype recvtype, int root, MPI_Comm comm) IN IN IN IN OUT sendbuf sendcount send buffer sendtype recvbuf IN recvcounts IN IN displs recvtype IN IN root comm starting address of send buffer (choice) number of elements (non-negative integer) data type of send buffer ele
IN IN op comm operation (handle) communicator (handle) int MPI_ExscanL(void *sendbuf, void *recvbuf, MPI_Aint count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm) IN OUT IN IN IN IN sendbuf recvbuf count datatype op comm starting address of send buffer (choice) starting address of receive buffer (choice) number of elements in input buffer data type of elements of input buffer (handle) operation (handle) intracommunicator (handle) int MPI_ScatterL(void* sendbuf, MPI_Aint sendcount, MPI_Datatype send
int MPI_Get_elementsL(MPI_Status *status, MPI_Datatype datatype, MPI_Aint *count) IN IN OUT status datatype count return status of receive operation (status) datatype used by receive operation (handle) number of received basic elements (integer) int MPI_PackL(void* inbuf, MPI_Aint incount, MPI_Datatype datatype, void *outbuf, MPI_Aint outsize, MPI_Aint *position, MPI_Comm comm) IN IN IN OUT IN INOUT IN inbuf incount datatype outbuf outsize position comm input buffer start (choice) number of input data
IN OUT oldtype newtype old datatype (handle) new datatype (handle) int MPI_Type_sizeL(MPI_Datatype datatype, MPI_Aint *size) IN OUT datatype size datatype (handle) datatype size int MPI_Type_structL(MPI_Aint count, MPI_Aint *array_of_blocklengths, MPI_Aint *array_of_displacements, MPI_Datatype *array_of_types, MPI_Datatype *newtype) IN IN IN IN count array_of_blocklength array_of_displacements array_of_types OUT newtype number of blocks (integer) number of elements in each block byte displacement
int MPI_Type_contiguousL(MPI_Aint count, MPI_Datatype oldtype, MPI_Datatype *newtype) IN IN OUT count oldtype newtype replication count old datatype (handle) new datatype (handle) int MPI_Type_create_hindexedL(MPI_Aint count, MPI_Aint array_of_blocklengths[], MPI_Aint array_of_displacements[], MPI_Datatype oldtype, MPI_Datatype *newtype) IN IN IN IN OUT count array_of_blocklengths array_of_displacements oldtype newtype number of blocks number of elements in each block byte displacement of each block ol
IN IN OUT array_of_displacements byte displacement of each block oldtype old datatype (handle) newtype new datatype (handle) int MPI_Type_hvectorL(MPI_Aint count, MPI_Aint blocklength, MPI_Aint stride, MPI_Datatype oldtype, MPI_Datatype *newtype) IN IN IN IN OUT count blocklength stride oldtype newtype number of blocks number of elements in each block number of bytes between start of each block old datatype (handle) new datatype (handle) A.
IN win (handle) window object used for communication (handle) int MPI_AccumulateL(void *origin_addr, MPI_Aint origin_count, MPI_Datatype origin_datatype, int target_rank, MPI_Aint target_disp, MPI_Aint target_count, MPI_Datatype target_datatype, MPI_Op op, MPI_WIN win) IN IN IN IN IN origin_addr origin_count origin_datatype target_rank target_disp IN IN target_count target_datatype IN IN op win initial address of buffer (choice) number of entries in buffer datatype of each buffer entry (handle) ran
*T1919-90017* Printed in the US