TM Cray XMT™ Programming Environment User's Guide S–2479–20
© 2007–2011 Cray Inc. All Rights Reserved. This document or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Cray Inc. Copyright (c) 2008, 2010, 2011 Cray Inc. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
RECORD OF REVISION S–2479–20 Published May 2011 Supports release 2.0 GA running on Cray XMT compute nodes and on Cray XT 3.1UP02 service nodes. This release uses the System Management Workstation (SMW) version 5.1UP03. 1.5 Published December 2010 Supports release 1.5 running on Cray XMT compute nodes and Cray Linux Environment (CLE) release 2.241A on Cray XT service nodes. This release requires the System Management Workstation (SMW) version 4.0.UP02, which is based on the SLES10 SP3 base operating system.
Changes to this Document Cray XMT™ Programming Environment User's Guide S–2479–20 This rewrite of Cray XMT Programming Environment User's Guide supports the 2.0 release of the Cray XMT operating system and programming environment. For more information see the release announcement that accompanies this release. Added information • Two new pragmas: #pragma mta max n processors and #pragma mta max concurrency c. See Compilation Directives on page 109. • Additional programming examples.
Contents Page Introduction [1] 13 1.1 The Cray XMT Programming Environment . . . . . . . . . . . . . . . . . Setting Up the User Environment [2] 13 15 2.1 Setting Up a Secure Shell . . . . . . . . . . . . . . . . . . . . . . 15 2.1.1 RSA Authentication . . . . . . . . . . . . . . . . . . . . . . 15 . . . . . . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . . . 17 . .
Cray XMT™ Programming Environment User’s Guide Page 3.6 Testing Expressions Using Condition Codes 3.7 File I/O . . . . 3.7.1 Language-level I/O 3.7.2 System-level I/O . . . . . . . . . . . . . . . . 34 . . . . . . . . . . . . . . . . . . . . . . 36 . . . . . . . . . . . . . . . . . . . . . . 36 . . . . . . . . . . . . . . . . . . . . . . 39 . . . . . . . . . . . . . . . . . . . 43 . . . . .
Contents Page 7.9 Setting Debugger Options during Compilation 7.10 Using Compiler Directives and Assertions . . . . . . . . . . . . . . . . . 88 . . . . . . . . . . . . . . . . 89 Running an Application [8] 8.1 Launching the Application 91 . . . 8.2 User Runtime Environment Variables 8.3 Improving Performance . . . . . . . . . . . . . . . . . . . . . . 91 . . . . . . . . . . . . . . . . . . 92 . . . . . . . . . . .
Cray XMT™ Programming Environment User’s Guide Page H.4 LUC Type Definitions . . . . . . . . . . . . . . . . . . . . . . 159 H.5 LUC Callback Functions . . . . . . . . . . . . . . . . . . . . . . 160 . . . . . . . . . . . . . . . . . 160 . . . . . . . . . . . . . . . . 161 . . . . . . . . . . . . . . . . . 162 . . . . . . . . . . . . . . . . . 162 H.5.1 LUC_RPC_Function_InOut H.5.
Contents Page Figures Figure 1. Snapshot Library Data Paths . . . . . . . . . . . 67 Figure 2. Comparison of Whole-program and Separate-module Modes . . . . . . . . . . 78 S–2479–20 . . . . . . .
Introduction [1] This guide describes the Cray XMT Programming Environment. It includes procedures and examples that show you how to set up your user environment and build and run optimized applications. The intended audience is application programmers and users of the Cray XMT system. For information about debugging your application, see Cray XMT Debugger Reference Guide.
Cray XMT™ Programming Environment User’s Guide 14 S–2479–20
Setting Up the User Environment [2] Configuring your user environment on a Cray XMT system is similar to configuring a typical Linux workstation. 2.1 Setting Up a Secure Shell Cray XMT systems use ssh and ssh-enabled applications such as scp for secure, password-free remote access to the login nodes. Before you can use the ssh commands, you must generate an RSA authentication key. The process for generating the key depends on the authentication method you use.
Cray XMT™ Programming Environment User’s Guide 4. Connect to the remote host by typing the following commands. If you are using a C shell, type: % eval `ssh-agent` % ssh-add If you are using a bash shell, type: $ eval `ssh-agent -s` $ ssh-add 5. Enter your passphrase when prompted, followed by: % ssh remote_host_name Procedure 2. Using RSA authentication without a passphrase To enable ssh without a passphrase, complete the following steps. 1.
Setting Up the User Environment [2] 2.2 Using Modules The Cray XMT system uses modules in the user environment to support multiple versions of software, such as compilers, and to create integrated software packages. As new versions of the supported software and associated man pages become available, they are added automatically to the Programming Environment, while earlier versions are retained to support legacy applications.
Cray XMT™ Programming Environment User’s Guide 2.2.3 Module Commands The mta-pe modules are loaded by default.
Developing an Application [3] This chapter provides an overview of some Cray XMT functions and describes how to perform some common programming tasks, such as floating-point operations, sorting, dataflow, searching, and I/O. Before you begin developing your program, you must log in to the login node using ssh. You develop, compile, debug, and launch your program from the login node. Before developing your application, review the data types and keywords that are supported by the Cray XMT compilers.
Cray XMT™ Programming Environment User’s Guide The xmt-tools module contains the tools that you use to run and monitor a program. To run a program, use the mtarun command. For more information, see Launching the Application on page 91 or the mtarun(1) man page. To monitor the program, use the mtatop or dash command. For more information, see Cray XMT System Management or the mtatop(1) man page.
Developing an Application [3] When a set of generic functions access a multiword variable simultaneously, the resulting behavior depends on the generic functions that constitute the set. If all the generic functions in the set require the variable to be in either a full or empty state, the functions access the variable in a serialized manner and the user-visible state is consistent.
Cray XMT™ Programming Environment User’s Guide The Cray XMT compiler recognizes the following generic write functions. writeef(&v, value) Writes value in variable v when v is in an empty state and sets v to a full state. This allows one or more threads waiting for v to change to a full state to resume execution. If v is in a full state, the write operation is blocked until v changes to an empty state. This generic function behaves like a write access to a sync variable.
Developing an Application [3] The Cray XMT compiler recognizes the following generic read functions. readfe(&v) Returns the value of variable v when v is in a full state and sets v to an empty state. This allows one or more threads waiting for v to change to an empty state to resume execution. If v is in an empty state, the read operation is blocked until v changes to a full state. This generic function behaves like a read access to a sync variable.
Cray XMT™ Programming Environment User’s Guide 3.2.2 Intrinsic Functions Cray provides intrinsic functions for the Cray XMT system that allow direct access to machine operations from high-level languages. You can find a list of the C intrinsic functions and the machine functions in the mta_intrinsics(3) man page. The C intrinsic function names use the name of the machine operation and add a prefix of MTA_.
Developing an Application [3] 3.3.1 Synchronizing Data Using int_fetch_add Use the int_fetch_add generic function to synchronize updates to data that represents shared counters without using locks. This function has the following signature: int_fetch_add (&v, i) The int_fetch_add function provides access to the underlying atomic int_fetch_add machine operation. This function atomically adds i to the value at address v, stores the sum at v, returns the original value of v, and sets the state bit to full.
Cray XMT™ Programming Environment User’s Guide In these two cases, each reference to x$ results in a separate read of that variable and requires a separate write to x$. The second write to x$ must be performed by a thread other than the one executing the code in the example. In the first case, it might have been the intention of the programmer to add together two successive values of x$.
Developing an Application [3] A number with a biased exponent of 2047 (0x7FF) is a special floating-point number, known as a SpecialFloat64 on the Cray XMT. If all the fraction bits are zero, the value of the number is plus or minus infinity. Infinity generally occurs in calculations as a result of an overflow or division by zero. For example, 1.0/0.0 is positive infinity, while 1.e300*-1.e300 is negative infinity. Calculations such as 0.0/0.0 create a result that is called not a number (NaN).
Cray XMT™ Programming Environment User’s Guide Subnormal numbers are less precise than normalized numbers. The smallest subnormal number, min_denorm, has only one significant bit while the largest has 52 significant bits. However, whenever 0.5 <= x/y <= 2.0, the difference x y is exact, even though it may have less precision than x and y. This is not true for machines that flush underflow to zero. The Cray XMT floating-point hardware handles gradual underflow transparently.
Developing an Application [3] Or temp1 = a*b; temp2 = c*d; x = temp1 + temp2; The only way to override the compiler instructions for a particular multiply-add operation is to put each multiply operation on a separate line, as in the third example. You can use the -no_mul_add compiler flag to disable multiply-add operations. Rather than using a multiply-add operation, the compiler may use a common subexpression, as shown in the following example.
Cray XMT™ Programming Environment User’s Guide 3.4.3 32-bit and 64-bit Implementation of Floating-point Arithmetic The double data type in C uses the format for double-precision (64-bit) arithmetic provided by IEEE Standard 754 guidelines. Cray XMT hardware does not support IEEE Standard 754 extended precision, and all 32-bit arithmetic is done by promotion to 64-bit formats. Rounding mode on the Cray XMT is controlled on a per-thread basis using mode bits in the stream status word (SSW).
Developing an Application [3] The current rounding mode for the math library is set to round to the nearest place (RND_NEAR). User functions that change the rounding mode must reset it to RND_NEAR before calling the math library functions. Exceptions are handled silently by the math library. No messages are printed, and errno is not set by the library. If functions return NaN or infinity, these arguments are propagated silently by the library. Exception flags are raised as appropriate. 3.
Cray XMT™ Programming Environment User’s Guide Future statements contain the name of a future variable and parameters, a body, and a return statement. The future variable's value is set by the return statement. The future variable is optional; if no future variable is specified, the return statement of the future body supplies no value.
Developing an Application [3] Continuations are normally allocated and deallocated from the heap. However, if the associated future variable is a scalar variable that is located on the stack, the compiler causes the continuation to be placed on the stack. This reduces the overhead associated with allocation and deallocation operations. The compiler does not do this when there is an array of future variables on the stack because this requires an array of continuations.
Cray XMT™ Programming Environment User’s Guide 3.5.2 Anonymous futures Often, a concurrent computation does not have a return value. An example of such a concurrent computation is an I/O statement or a modification of global values. You can express such a computation using an anonymous future. An anonymous future has no name or return statement.
Developing an Application [3] Example 2. Retrieving a condition code and result of a previous operation It is also possible to test the condition code generated by some earlier operation, allowing you to make use of both the condition code and the result of the operation. In the following example, MTA_TEST_CC is used to test whether there was a carry generated by MTA_BIT_LEFT_ZEROS. MTA_BIT_LEFT_ZEROS returns the number of consecutive 0 bits on the left end of the word.
Cray XMT™ Programming Environment User’s Guide 3.7 File I/O The Cray XMT performs I/O to a RAM-based file system (RAMFS) and a network file system (NFS). Neither the RAMFS nor the NFS are high-speed file systems, therefore, any data over 2 gigabytes in size must to be written to a high-speed file system, such as Lustre. You can use the NFS for small amounts of data, such as user files. During the system reboot, all data is lost from the RAMFS because it is not written to disk.
Developing an Application [3] The actual sequence of lines is random because the different iterations are all executed in parallel. However, for a sequence of calls such as the following: #pragma mta assert parallel for (i = 0; i < n; i++) { fprintf(f,"this is "); fprintf(f,"iteration %d\n", i); } The output may look like the following, because only the individual calls to fprintf are atomic: this is iteration i this is this is iteration k iteration j Example 4.
Cray XMT™ Programming Environment User’s Guide In the previous code, flag$ controls access to file f, ensuring that the combination of fseek and fread are executed atomically. In this case, you use SEEK_SET because the SEEK_CUR (positioning relative to the current position) is not useful in a parallel context. Example 6.
Developing an Application [3] If many parallel calls refer to the same file, locking forces a serial execution order. For example, in the following code, it makes little sense to run the loop in parallel because the calls to fprintf are serialized by the lock on the FILE object referred to by g. However, the interpretation of the format string is controlled by the lock.
Cray XMT™ Programming Environment User’s Guide Example 7. Calling UNIX I/O functions from parallel code In serial code, the low-level UNIX functions behave as specified by the Posix standard. In parallel code, all calls are executed atomically. In this case, you must explicitly manage access to a particular file by a sequence of calls, to prevent races.
Developing an Application [3] Example 8. Using synchronization with UNIX I/O functions To correct this problem, you can either rewrite the code in the style of the first example or add some sort of explicit synchronization, as shown in the following example.
Cray XMT™ Programming Environment User’s Guide Internally, the UNIX library enforces locking for each file descriptor so that output to multiple files can occur in parallel, but output to a single file occurs serially. For example, in the following loop, every iteration refers to a different file descriptor, so each call to write runs without interfering with other calls.
Developing an Application [3] 3.8 Porting Programs to the Cray XMT Use the following information when you prepare to port C and C++ programs to the Cray XMT platform. 64-bit issues The following list describes important 64-bit issues. Alignment On the Cray XMT, many data types are aligned on 8-byte boundaries that other machines align on 2- or 4-byte boundaries.
Cray XMT™ Programming Environment User’s Guide errno.h errno is thread-specific and not a global variable. Files that use errno in the same way that it is used by library calls such as perror must include errno.h. This is required by ANSI and Posix, but most systems do not comply with this convention. On the Cray XMT, each thread has its own value of errno, so you must include errno.h for correct behavior. time.h One goal of the Cray XMT is to support a Posix-compliant application programming interface.
Developing an Application [3] Cray XMT keywords You can disable Cray XMT specific keywords (for example, sync and future) by using the compiler flag -no_mta_ext. When this flag is not used, the C compiler for the Cray XMT reserves all keywords—even standard C++ keywords such as new, try, throw, and catch.
Cray XMT™ Programming Environment User’s Guide 46 S–2479–20
Shared Memory Between Processes [4] You can share memory between multiple programs by creating a shared memory region using the mmap system call. 4.1 Mapping a Memory Region for Data Sharing A shared memory region is identified by a file name. Before your applications can use shared memory, you must create an empty readable, writable file and run mmap to map a memory region to use for shared memory.
Cray XMT™ Programming Environment User’s Guide In the previous example, a new readable and writable file is created by using the open system call with the O_RDWR and O_CREAT flags. The fd file descriptor is allocated and refers to the file. The fd is specified as an argument to the mmap system call and identifies the memory region. SHARED_SIZE specifies the size of the memory region to allocate and map into the caller's address space.
Shared Memory Between Processes [4] 4.2 Persisting Shared Memory The remember daemon rememd retains information about shared memory so that programs preserve shared memory throughout the life cycle of the process. Shared memory is allocated by calling mmap with the MMAP_ANON and MMAP_SHARED flags and a valid file descriptor. When the rememd daemon is first started, it reads in all the records from its maps file and calls mmap to map the specified memory into its virtual address space.
Cray XMT™ Programming Environment User’s Guide Use the following functions to call rememd from a program. persist_remember Causes the rememd daemon to call mmap to map the shared memory into its virtual address space and write a record of it to disk. If rememd has already mapped this segment, its reference count is incremented instead. This function returns 0 on success, and errno on failure.
Shared Memory Between Processes [4] } mmap_addr = mmap(0, mmap_len, PROT_WRITE, MAP_ANON | MAP_SHARED, fd, 0); if (MAP_FAILED == mmap_addr) { printf("Unexpected error calling mmap: %s\n", strerror(errno)); close(fd); exit(1); } int remember_ret = persist_remember(remember_path, mmap_len); if(0 != remember_ret) { printf("Unexpected error calling persist_remember: %s\n", strerror(remember_ret)); close(fd); munmap(mmap_addr, mmap_len); exit(1); } } else { // if ret is not -1 or 0, then it's the length of the
Cray XMT™ Programming Environment User’s Guide 52 S–2479–20
Developing LUC Applications [5] This chapter describes how to use the LUC library in your application. The following tasks are discussed: • Constructing a client • Constructing a server • Making remote procedure calls 5.1 Programming Considerations for LUC Applications • On the service (Linux) nodes, int is defined as 4 bytes. On the MTK compute nodes int is defined as 8 bytes. To avoid potential issues, programmers should use types that have explicit sizes, for example int64_t.
Cray XMT™ Programming Environment User’s Guide Procedure 3. Creating and using a LUC client object 1. Include the header file . This header file includes all of the definitions required for both the client and server endpoints, including the LucEndpoint class definition, configuration variables, and external function prototype definitions. 2. Declare a pointer to a LucEndpoint object.
Developing LUC Applications [5] 9. Stop the service by calling stopService. This releases any nearby memory that was allocated by the endpoint, closes all previously opened Fast I/O data streams, and deactivates the object. 10. Delete the object. This invokes the virtual destructor for the derived object. If an endpoint object is deleted before calling stopService, the destructor automatically stops the service and deactivates the object. Example 11. LUC client code example user_application_defs.
Cray XMT™ Programming Environment User’s Guide result = clientEndpoint->remoteCallSync(serverID, QUERY_ENGINE, FUNC_QUERY1, inbuf, INBUF_SIZE, outbuf, &outDatLen); if(result == LUC_ERR_OK) // The RPC was successful. // outDataLen contains the size of data returned in outbuf. else if result < LUC_ERR_MAX) { // Result contains a LUC error code. } else { // Result is the return value from remote function } clientEndpoint->stopService(); delete clientEndpoint; } 5.
Developing LUC Applications [5] shutdown the service. For instance, if a serious application internal error occurs or an application shutdown request is received, the server must be told to halt by the application. Example 12. LUC Server code example #include #include
Cray XMT™ Programming Environment User’s Guide luc_service_type_t int void * luc_error_t serviceType, serviceFunctionIndex, userHandle, remoteLucError) { // In the example given, 'userHandle' will equal 0xf00 return; } void LucClientOnlyUsageModel(void) { // First create an endpoint. This is used to make the remote calls. LucEndpoint *client = luc_allocate_endpoint(LUC_CLIENT_ONLY); // // // // In order to issue the remote calls, we need to know where to send them.
Developing LUC Applications [5] else if (lucError < LUC_ERR_MAX) // LUC library generated error code else // user remote function return value // // An asynchronous (non-blocking) call. // Return data is not supported for asynchronous callers.
Cray XMT™ Programming Environment User’s Guide void LucServerOnlyUsageModel(void) { // First create a communication endpoint. This is used to accept calls from // remote clients. LucEndpoint *server = luc_allocate_endpoint(LUC_SERVER_ONLY); // These values correspond to values used by clients of this service.
Developing LUC Applications [5] Then the client can be run multiple times using the following command: % exluc -c id Where id is the server endpoint ID printed to the command line when the server starts. #include #include #include #include #include // // // // // htonl/ntohl byte swapping The service type is an application-specific major service id. It identifies the general type of service requested by the client.
Cray XMT™ Programming Environment User’s Guide #define NetworkToHost(b,l) ByteSwap((b),(l)) #define HostToNetwork(b,l) ByteSwap((b),(l)) #endif // The LUC client runs on the XMT login node, and // the application user interface. // Return value is 0 for success, 1 for error.
Developing LUC Applications [5] // Reduction service. // This routine is called by the LUC server library // when a client request of type // (svc_type,reduce_func_idx) is received.
Cray XMT™ Programming Environment User’s Guide err = svrEndpoint->registerRemoteCall(svc_type, reduce_func_idx, reduce); if (err != LUC_ERR_OK) { fprintf(stderr,"client: LUC registerRemoteCall error %d\n",err); delete svrEndpoint; return 1; // error } // Begin offering services (begin listening for requests).
Developing LUC Applications [5] // If no valid options were given, print the program usage message. fprintf(stderr,"Usage: exluc -c id | -s\n"); fprintf(stderr,"-c id Run as a client with the given endpoint id.\n"); fprintf(stderr,"-s Run as a server, printing the endpoint id.\n"); return 1; } 5.6 Fast I/O Memory Usage The MTK Fast I/O Library performs all data transfer operations through nearby memory.
Cray XMT™ Programming Environment User’s Guide Initialize the memory region variables from the global variables when creating the LUC Endpoint object. Changes to the global variables are propagated to new endpoint objects, not objects that already exist. An endpoint's memory configuration variables may be changed by using the LucEndpoint::setConfigValue() method until the endpoint is started.
Managing Lustre I/O with the Snapshot Library [6] 6.1 About the Snapshot Library The Cray XMT snapshot library provides a high speed bulk data transfer facility that moves data between memory regions within an MTK application and files hosted on the XMT Linux service partition. The primary use of the snapshot library is to load and save large data sets that are being stored on a Lustre file system.
Cray XMT™ Programming Environment User’s Guide The easiest way to understand this is to imagine data going to a file from the application. In this case, the data is copied by each compute node into the FIO transport and sent to its corresponding fsworker on a login node in the Linux service partition. Each fsworker then uses Linux system calls to write data into the Lustre file, which results in the data moving across the Portals transport from the login node to one or more Lustre OSS nodes.
Managing Lustre I/O with the Snapshot Library [6] For large data transfers starting at the beginning of a file, the best functions to use are dslr_snapshot and dslr_restore, because they are able to transfer data in parallel to achieve high throughput. To store data, the application calls dslr_snapshot, specifying the buffer to be copied, the length of the data, and the name of the file receiving the data.
Cray XMT™ Programming Environment User’s Guide A typical application might use dslr_pread and dslr_pwrite in the following manner: 1. Start up and allocate a small buffer to be initialized from a file. 2. Call dslr_pread specifying the name of the file providing the data, the offset of the data in the file, a pointer to the buffer allocated in 1, and the length of the data. 3. Process and change the data. 4. Call dslr_pwrite to store the data back to the file (or to a new modified data file). 5.
Managing Lustre I/O with the Snapshot Library [6] If the underlying file system is naturally serial (NFS, for example) its performance is constrained by the serial performance of the file system and any contention introduced by trying to use the file system in parallel. Again, the throughput of the snapshot library is bounded by the file system performance, so when using a serial file system a single fsworker provides the best throughput for the snapshot library. Note that fsworkers are not resilient.
Cray XMT™ Programming Environment User’s Guide memset(testBuffer, 0, DEFAULT_BUFFER_SIZE); // Restore a snapshot dataset from disk back into memory. err = dslr_restore ((char *)DEFAULT_FILENAME, testBuffer, DEFAULT_BUFFER_SIZE, &snapError); if (dslr_ERR_OK != err) { fprintf(stderr,"Failed to restore the dataset. Error %d.\n",err); free(testBuffer); return -1; } // At this point, the testBuffer should be full of 'a' free(testBuffer); return 0; } Example 15.
Managing Lustre I/O with the Snapshot Library [6] While these functions are useful for transferring small quantities of data to or from arbitrary locations in files but, because they are unable to benefit from parallelism, they are not useful for bulk data transfer. You should not expect throughput greater than 100MB/second when using dslr_pwrite or dslr_pread. #include #include #include #include #include #include
Cray XMT™ Programming Environment User’s Guide 6.5 Managing File I/O on File Systems Other Than Lustre Using the snapshot library to read and write files on a file system, such as NFS that does not support high performance parallel I/O can result in overloading the underlying file system with data requests and transfers.
Compiler Overview [7] This chapter provides an overview of the Cray XMT compilers. You need to understand these concepts before you compile your program. The Cray XMT platform includes Cray XMT compilers for C and C++ applications. These compilers optimize programs to improve performance. These features include: Debugging support The Cray XMT compilers support multiple levels of debugging.
Cray XMT™ Programming Environment User’s Guide 7.1 The Compilation Process There are two major phases of building a program executable from a number of source files. Compilation The compiler creates object files by invoking subprograms that translate the source files and optimize functions in the program. The compiler starts by invoking the front end. When the front end finishes, the compiler invokes the translator, which is the subprogram that optimizes and parallelizes code, and generates object files.
Compiler Overview [7] The compilation processes for these modes differ in the following ways: Whole-program compilation This is the preferred method for compiling applications. In whole-program compilation, the compilation phase is made up of several sub-phases. The compiler first parses (partially compiles) each source file. During this phase, the compiler gathers information about every module in the program and saves it to the program library. The next phase is the translation phase.
Cray XMT™ Programming Environment User’s Guide Figure 2. Comparison of Whole-program and Separate-module Modes Whole-program Compilation skinny .o Files test.pl Separate-module Compilation fat .o Files arnoldi.o Parsed source code Call graph Object code Debugger information Parsed source code Partial call graph Object code Debugger information blas.o Parsed source code Partial call graph Object code Debugger information arnoldi.o blas.o test.
Compiler Overview [7] In separate-module mode, the .o files are true object files. The compiler optimizes each object file, or module, separate from the others. The link step produces a program library, although this program library primarily contains information that directs the debugger to various object files. Because of the relative sizes of the .o files in the two compilation modes, the qualifier skinny refers to whole-program mode and its products (such as the .
Cray XMT™ Programming Environment User’s Guide 7.2 Invoking the Compiler You can only use the Cray XMT compiler when the Cray XMT Programming Environment (mta-pe) module is loaded. The commands to use to invoke the compiler are cc for a C program and c++ for a C++ program. You can control the operation of the compiler by setting various options when running the compiler command. The compiler uses driver options, language options, parallelization options, and debugging options.
Compiler Overview [7] The following examples show how to use the compiler options for various compiler tasks using the whole-program and separate-module modes. Whole-program: c++ -c a.cc -pl prog.pl c++ -c b.cc -pl prog.pl c++ -pl prog.pl -o prog a.o b.o (parses a.cc) (parses b.cc) (translates a.o, b.o; links prog) Or, as a shortcut: c++ a.cc b.cc -o prog (compiles a.cc, b.cc; links prog, and creates prog.pl) Separate-module: c++ -c a.cc c++ -c b.cc c++ -o prog a.o b.o (parses and translates a.
Cray XMT™ Programming Environment User’s Guide When you use the -pl and -c options to compile a source file, the compiler performs the following tasks during the compilation phase: • Checks the source for syntax errors • Creates an internal representation of each function in the program library • Produces a skinny .
Compiler Overview [7] This produces (barring errors in the source file) a traditional, or fat, object file ddot.o. To produce the two fat object files ddot.o and daxpy.o, each of the two source files can be compiled separately. To do this, use the following command. c++ -c ddot.cc daxpy.cc Using the previous command is the same as using the following sequence of commands. c++ -c ddot.cc c++ -c daxpy.
Cray XMT™ Programming Environment User’s Guide 7.4 Inlining Functions Inline expansion, commonly known as inlining, occurs when the compiler replaces a function reference with the body of the function. The advantages to using inlining include a reduction in memory usage due to the removal of function calls and returns, and the possibility of optimizing code near the function call with the function body.
Compiler Overview [7] 7.5 Optimizing Parallelization You can control how the compiler makes your program parallel in two ways: • You can add parallelization directives to your program. • You can specify a compiler option from the command line that controls parallelization. Parallelization directives and options tell the compiler how to parallelize various sections of a program. The following types of parallelization are allowed.
Cray XMT™ Programming Environment User’s Guide Parallelism that you specify with future statements in your program is always enabled. Compiler options have no effect on future statements. If you do not specify a compiler option, the default is to run using the par option. There are also parallelization directives and compiler options available that you can use to enable or disable loop restructuring.
Compiler Overview [7] 7.7 Creating New Libraries You can create a user-defined library in the same way that you build a program in whole-program mode. To do this, use the -R option to suppress the creation of an executable. For example, to build the library tinyblas.a from functions in the files ddot.cc and dgemv.cc, use the following sequence of commands. c++ -pl tinyblas.a -c ddot.cc dgemv.cc c++ -pl tinyblas.a -R ddot.o dgemv.
Cray XMT™ Programming Environment User’s Guide 7.8 Compiler Messages There are three categories for compiler messages: errors, warnings, and remarks. Errors are the most severe and indicate problems that cause the compiler to halt after parsing without generating object code. Warnings are less severe — the compiler runs to completion and generates object code. Remarks tend to highlight conditions that prevent the code from being portable, but the resulting object code almost always behaves as expected. 7.
Compiler Overview [7] 7.10 Using Compiler Directives and Assertions Directives are metalanguage constructs that you can add to a program to influence how the compiler performs a translation. In C and C++, you prefix directives with #pragma mta. Macros are allowed after the word mta in a pragma, as shown in this example: #define NUMSTREAMS 40 ... #pragma mta use NUMSTREAMS streams The preceding pragma is equivalent to #pragma mta use 40 streams.
Cray XMT™ Programming Environment User’s Guide 90 S–2479–20
Running an Application [8] This chapter contains procedures for launching your application on the Cray XMT. 8.1 Launching the Application You use the mtarun command to launch and run a program. The mtarun command connects to the mtarund daemon that runs on the compute node on the backend. The daemon creates a copy of your environment and runs it on the compute nodes. Your file directories from the login node appear on the compute nodes with the same paths.
Cray XMT™ Programming Environment User’s Guide The mtarun command uses a default configuration file, .mtarunrc, which exists in your home directory. You can modify this file to include any mtarun options, separated by spaces. The configurations in this file are overridden by options that you use from the command line. To monitor process or CPU usage by your program, you use mtatop. For more information about using mtarun to run the program or mtatop to monitor the program, see Cray XMT System Management.
Running an Application [8] 8.3 Improving Performance For information about improving performance on your program, see Cray XMT Performance Tools User's Guide.
Cray XMT™ Programming Environment User’s Guide 94 S–2479–20
Optional Optimizations [9] 9.1 Scalar Replacement of Aggregates Effective with version 2.0 of the Cray XMT software, the XMT compiler provides an optional optimization pass that performs a code transformation called scalar replacement of aggregates. This transformation replaces C++ class objects and C structures (aggregate data types) with collections of temporary scalar variables. Values are copied from the aggregate to the temporary variables and back again as needed.
Cray XMT™ Programming Environment User’s Guide After recompiling this code with automatic scalar replacement enabled, the compiler is able to transform the foobar2 routine into something that resembles the following: myTwoInts foobar2(myTwoInts t, int n, int * restrict foo) { __tmp_t_i = t.i; for (int i = 0; i < n; i++) { __tmp_t_i += foo[i]; } t.i = __tmp_t_i; return t; } Note that the compiler does not bother creating a temporary variable for the unused field j.
Optional Optimizations [9] Alternatively, you can enable scalar replacement for individual aggregates by using the mta assert can replace pragma. This pragma, which takes a list of aggregates and/or aggregate pointers, serves two purposes. First, it tells the compiler that it is safe to perform scalar replacement on the aggregates or pointers listed. The compiler follows this assertion even if it was unable to prove that the replacement was safe.
Cray XMT™ Programming Environment User’s Guide to fields of the aggregate inside the loop will be replaced with the temporaries. This can be useful if scalar replacement is unsafe or undesirable for portions of a routine, but needed to achieve good performance in specific loops.
Error Messages [A] Execution-time errors are directly related to exceptions. An exception is an unexpected condition raised by an event in your program, the operating system, or the hardware. Exceptions can trigger a trap when the stream that issued the exception is ready for execution, unless the trap is disabled. In cases where several exceptions occur simultaneously, the trap handler decides the order in which to process the exceptions.
Cray XMT™ Programming Environment User’s Guide float_inexact An error using a floating-point number has occurred. An operation is attempting to use an inexact floating-point number. This type of error indicates an error in the source registers, the operation, or the value written to the destination. float_invalid An error using a floating-point number has occurred. An operation is attempting to use an invalid floating-point number. float_zero_divide An error using a floating-point number has occurred.
Error Messages [A] prog_prot A program-protection error has occurred. This error occurs when the processor attempts to execute an instruction from a PC that is not a valid PC. unknown_trap A error has occurred that does not fit into any other category on this list.
Cray XMT™ Programming Environment User’s Guide 102 S–2479–20
User Runtime Functions [B] Functions in the runtime library support implicit and explicit parallelism, event logging, and trap handling. The compiler inserts calls to the runtime library into your code to handle programming constructs, such as the future statement, or command-line options, such as the -trace flag. In addition, some functions in the runtime library can be called directly by the user. This appendix contains a list of the runtime functions that you can call from your program.
Cray XMT™ Programming Environment User’s Guide mta_get_num_teams Returns the number of currently executing teams. See the mta_get_num_teams(3) man page. mta_get_rt_teamid Returns the runtime identifier of the caller's team. See the mta_get_rt_teamid(3) man page. mta_get_team_index Returns a user runtime index for a team. See the mta_get_team_index(3) man page. mta_get_thread_name mta_set_thread_name mta_remove_thread_name Retrieves, sets, and removes user-defined thread names.
User Runtime Functions [B] mta_new_trap1_continuation mta_new_trap1_continuation_block mta_delete_trap1_continuation mta_register_trap1_continuation mta_unregister_trap1_continuation mta_update_trap1_value Creates, deletes, binds, or updates trap 1 continuation. See the mta_new_trap1_continuation(3) man page. mta_print_backtrace Prints the thread's call stack. See the mta_print_backtrace(3) man page. mta_probe_location Probes a memory location to determine whether it can be read or written.
Cray XMT™ Programming Environment User’s Guide mta_reserve_task_event_counter mta_get_task_counter mta_get_team_counter Reserves or queries hardware counters. See the mta_reserve_task_event_counter(3) man page. mta_set_crew_limit Sets the maximum number of crews that can be simultaneously active. The term crew is applied to the group of processors that are used when parallelizing the iterations of a loop across multiple processors.
User Runtime Functions [B] mta_start_event_logging mta_suspend_event_logging mta_resume_event_logging mta_is_event_logging_on mta_set_event_flush Traces buffer controls for user-defined event logging. See the mta_start_event_logging(3) man page. mta_yield Yields an active stream to any other thread that needs the stream. See the mta_yield(3) man page.
Cray XMT™ Programming Environment User’s Guide 108 S–2479–20
Compiler Directives and Assertions [C] This appendix provides a complete list of compiler directives specific to the Cray XMT and accepted by the Cray XMT compiler. C.1 Compilation Directives A compilation directive is a command to compile a program in a particular way. #pragma mta autotouch [on|off|default] This directive automatically applies the touch generic whenever a future variable is referenced.
Cray XMT™ Programming Environment User’s Guide the safer complex arithmetic performed when complex limited range is off. This is especially true when the difference between two intermediate computations is very small, such as ac-bd, in the case of multiplication, and bc-ad, in the case of division. This directive applies to whatever follows it textually in the current file. The directive stays in effect until the end of the file or until another directive of the same kind is encountered.
Compiler Directives and Assertions [C] #pragma mta debug level [0|1|2|default|none] Set the debug level to the integer constant 0, 1, or 2, or to no debugging by specifying none. Or, set the debug level back to the level provided on the command line by specifying default. This directive overrides the -g , -g1 , and -g2 compiler flags. However, this directive does not affect any function that contains a call to setjmp or sigsetjmp, which is always compiled as if the -g2 option was specified.
Cray XMT™ Programming Environment User’s Guide #pragma mta fenv_access [on|off|default] This directive specifies whether the full floating-point environment is available. When fenv_access is on, strict rules against the optimization of floating-point operations are enforced. If it is off, extra optimizations are performed, but floating-point exceptions may be lost in certain cases. The compiler is allowed to attempt either one or both of two optimization techniques when fenv_access is off.
Compiler Directives and Assertions [C] You can use this pragma in conjunction with the use n streams to ask the compiler to allocate a certain number of streams per processor to the job. #pragma mta use 100 streams #pragma mta for all streams { // do something } However, there is no guarantee that the runtime will grant the requested number of streams if, for example, they are not available due to other jobs, the OS, or other simultaneous parallel regions in the current job.
Cray XMT™ Programming Environment User’s Guide #pragma mta fused muladd [on|off|default] This directive specifies whether the compiler is allowed to combine floating-point operations into a fused multiply-add operation. Default behavior is to allow fused multiply-add operations to be performed only when float optimization is turned on. When this option is turned on, the compiler is allowed to, but not required to, fuse multiply-add operations into one instruction.
Compiler Directives and Assertions [C] #pragma mta instantiate [none|all|used|local|default] When used inside a template declaration, the effect of this directive is limited to the uses of that template. When used outside a template declaration, this directive sets the template instantiation mode for the text following the directive and stays in effect until the end of the file or until another directive of the same kind is encountered.
Cray XMT™ Programming Environment User’s Guide num_streams is the number of streams the compiler requests for each processor. For loop future parallel loops, the directive limits to c the number of futures created. The directive is ignored for explicityly serial loops and cannot be used on a loop that also uses the use n streams directive.
Compiler Directives and Assertions [C] The output from canal shows that they are both placed into parallel region 1: | for (int i = 0; i < size_foobar; i++) { 3 P | bar[i] = size_foobar - i; | } | | for (int i = 0; i < size_foobar; i++) { 5 P | foo[i] += bar[i+c]/2; | } ... Parallel region 1 in main ... Loop 2 in main in region 1 ... Loop 3 in main at line 4 in loop 2 ... Loop 4 in main in region 1 ...
Cray XMT™ Programming Environment User’s Guide However, when you add the may merge option these two loops remain in the same region: | for (int i = 0; i < size_foobar; i++) { 3 P | bar[i] = size_foobar - i; | } | | #pragma mta max 50 streams per processor may merge | for (int i = 0; i < size_foobar; i++) { 5 P | foo[i] += bar[i+c]/2; | } ... Parallel region 1 in main Using max 50 streams per processor ... Loop 2 in main in region 1 ... Loop 3 in main at line 4 in loop 2 ... Loop 4 in main in region 1 ...
Compiler Directives and Assertions [C] #pragma no mem init This directive affects only the declaration statement immediately following the directive and tells the compiler not to specially initialize the full/empty bit (or bits) of any sync- or future-qualified variables defined in that declaration statement. The directive affects only the definition of variables, including class instance variables; it may not be used on field declarations inside classes.
Cray XMT™ Programming Environment User’s Guide #pragma mta no scalar expansion This directive instructs the compiler not to expand scalar variables to vector temporaries in the next loop. Such expansion allows you to distribute the loop to enhance available parallelism or make effective use of registers. However, if the loop iterates only a few times, the increase in memory usage for the expansion may outweigh the benefits. In this case, you can use the no scalar expansion pragma to prevent expansion.
Compiler Directives and Assertions [C] In this case, the compiler is forced to choose one of two possible implementations. To avoid ambiguity when control of rounding is important, you should use a sequence of simpler assignments to make the meaning clear. The scope of this directive is the entire source file. The use of this directive overrides the -no_mul_add compiler flag and the #pragma mta fused muladd off directive.
Cray XMT™ Programming Environment User’s Guide variable to be updated must occur as the target on the left side of the statement and must occur exactly once as a subexpression on the right side of the statement. For example, void update_example(double A[], int i, int j){ extern double V; extern double X; // This is allowed #pragma mta update V = 1.0 + X + 3.
Compiler Directives and Assertions [C] are performed by 20 threads, the first thread executes iteration 1, iteration 21, iteration 41, and so forth. This scheduling leads to better load balancing for triangular loops. For example: void interleave_example(const double X[100][100], const double Y[100], double Z[100], const int N) { #pragma mta interleave schedule for (int i = 0; i < N; i++) { double sum = 0.
Cray XMT™ Programming Environment User’s Guide C.2 Parallelization Directives The compiler recognizes the following parallelization directives. #pragma mta parallel [on|off|default| single processor|multiprocessor|future] This directive enables or disables automatic generation of parallel code for a section of the program as well as choosing the form of parallelism to use. The single processor, multiprocessor, and future flags indicate the type of parallelism to use.
Compiler Directives and Assertions [C] of the file or until another directive of the same kind is encountered. The directive is ignored if the -nopar flag is used on the command line. #pragma mta loop loop_mod[, loop_mod, ...] This directive takes a comma-separated list of parallelization modes, loop_mod, consisting of no more than one selection from each of the following sets of possible loop modes: restructure, norestructure Enables/disables loop restructuring.
Cray XMT™ Programming Environment User’s Guide The compiler recognizes the following semantic assertions: #pragma mta assert can replace variable-list This directive asserts that it is safe to use scalar replacement of the aggregates (objects or structs) in variable-list and the aggregates pointed to by pointers in variable-list. This pragma is also a request for scalar replacement of those aggregates even if the code was not compilied with the -scalar_replacement option.
Compiler Directives and Assertions [C] #pragma mta assert parallel This directive can appear before a loop construct and asserts that the separate iterations of the loop may execute concurrently without synchronization. It does not guarantee that the compiler parallelizes the loop, but it is a strong suggestion to the compiler. This directive affects the next loop only. The directive is ignored if the -nopar flag is used on the command line.
Cray XMT™ Programming Environment User’s Guide #pragma mta assert no dependence variable-list #pragma mta assert nodep variable-list This directive can appear before a loop construct and asserts that if a word of memory is accessed during execution of the loop through any load or store derived from a variable in variable-list, the word is accessed from exactly one iteration of the loop. You can also use the word nodep in place of no dependence.
Compiler Directives and Assertions [C] However, if we add a #pragma mta may reorder SYNCARRAY$ directive before the loop, each reference to SYNCARRAY$ may occur before or after any of the other references. Explicit serialization is not imposed, and the loop is parallelizable.
Cray XMT™ Programming Environment User’s Guide Alternatively, you can use the following syntax for dynamically allocated arrays. #pragma mta assert par_newdelete foo = new aclass[100]; This directive is placed before the deletion of a dynamically allocated array to indicate that when the elements of the array are destructed, the destructors should be invoked in parallel. To do this, use the following syntax: #pragma mta assert par_newdelete delete [] foo; foo = 0; C.
Compiler Directives and Assertions [C] The compiler tests for case n first, and all other cases after that. n must be an integer constant, in any radix. It may not be an integer expression, nor may it be a member of an enumeration. #pragma mta expect (predicate) This directive can appear before any executable statement and suggests that the compiler should optimize code near that point. This suggestion is based on the assumption that the predicate typically evaluates to true.
Cray XMT™ Programming Environment User’s Guide 132 S–2479–20
Condition Codes [D] You can test the condition codes generated by an expression by using the MTA_TEST_CC intrinsic. The eight possible condition code values and their default meanings are shown in the following table. The Examples column show the operations that meet the criteria for the condition code, where 0, p, and n stand for zero, a positive integer, and a negative integer, respectively.
Cray XMT™ Programming Environment User’s Guide Name Description IF_ZE x = 0 (integer, unsigned, float) IF_F x = 0 (logical) IF_NE y != z (integer, unsigned, float) IF_NZ x != 0 (integer, unsigned, float) IF_T x != 0 (logical) Condition Mask: Integer Comparison IF_ILT y < z (integer) IF_IGE y >= z (integer) IF_IGT y > z (integer) IF_ILE y <= z (integer) IF_IMI x < 0 (integer) IF_IPZ x >= 0 (integer) IF_IPL x > 0 (integer) IF_IMZ x <= 0 (integer) Condition Mask: Unsigned Compariso
Condition Codes [D] S–2479–20 Name Description IF_3 Overflow/NaN, no carry IF_4 Zero, carry IF_5 Negative, carry IF_6 Positive, carry IF_7 Overflow/NaN, carry IF_N0 Not Zero, no carry IF_N1 Not Negative, no carry IF_N2 Not Positive, no carry IF_N3 Not Overflow/NaN, no carry IF_N4 Not Zero, carry IF_N5 Not Negative, carry IF_N6 Not Positive, carry IF_N7 Not Overflow/NaN, carry 135
Cray XMT™ Programming Environment User’s Guide 136 S–2479–20
Data Types [E] This chapter provides information about the C and C++ language data types that you can use with Cray XMT compilers. The floating-point types are float, double, and long double. Their sizes are 4, 8, and 16 bytes, respectively. The integer types short and unsigned short are each 4 bytes long. The data types int, long, long long, and their unsigned equivalents are each 8 bytes long. The compiler flag -short16 converts all short and unsigned short integers to 2 bytes.
Cray XMT™ Programming Environment User’s Guide The Cray XMT C and C++ compilers also support the ten nonstandard integer types in the following list. The -short16 and -i4 compiler flags do not affect the size of these types, so it is preferable that you use these in exported include files. __short16 A 2-byte (16-bit) value. unsigned __short16 A 2-byte (16-bit) value. __short32 A 4-byte (32-bit) value. unsigned __short32 A 4-byte (32-bit) value. __int16 A 2-byte (16-bit) value.
Keywords [F] The C and C++ languages reserve certain words for use as keywords. You cannot use these words for any other purpose. For example, you cannot use them as identifiers such as variable names. Some of these reserved words are required by the standards for the C and C++ languages; others support programming on the Cray XMT. Table 4.
Cray XMT™ Programming Environment User’s Guide The -no_bool compiler switch disables the bool, false and true keywords. The -no_wchar compiler switch disables the wchar_t keyword. The -cfront compiler switch disables the bool, explicit, false, true and typename keywords. The -no_alternative_tokens compiler switch disables the alternate operator keywords and, and_eq, bitand, bitor, compl, not, not_eq, or, or_eq, xor, and xor_eq.
Keywords [F] The following reserved words have been added by Cray to both the C and C++ languages for use on the Cray XMT platform. future __future Both a type qualifier and a statement. Future variables are initially set to a full state. A future variable is set to an empty state when the future statement executes and set to a full state when the return statement of the future executes.
Cray XMT™ Programming Environment User’s Guide The following reserved words have been added by Cray to the C language for use on the Cray XMT platform. Keywords beginning with an underscore have also been added by Cray to the C++ language. The keywords new, delete, and protected are required by the C++ standard and did not need to be added to that language. new __new delete __delete Unary operator; has the same format as the new operator in the C++ language.
MTA_PARAMS [G] The environment variable MTA_PARAMS is used by the Cray XMT user runtime. The following list contains the values that you can set for MTA_PARAMS. debug_data_prot Waits for the debugger to attach rather than exiting when a data protection or poison error occurs. This parameter is useful while troubleshooting a specific problem.
Cray XMT™ Programming Environment User’s Guide u Underflow. Traps underflows that occur. Underflows produce a rounded result smaller in magnitude than 0x0010000000000000, or about 2.225e-308. x Inexact. Traps subnormal numbers. max_readypool_retries n Sets the maximum number n of retries that an idle thread can take when checking random ready pools for new work. mmap_buffer_size n Sets the variable size of the persistent mmap buffers, where n is the size in words.
MTA_PARAMS [G] pc_hash n, m, l Specifies the hash size n, age threshold m, and dump threshold l of an event. The has size determines the number of event types that can be hashed at one time. The age threshold determines the age at which an event is considered stale, in which case it will be discarded rather than reported. The age threshold also determines the frequency with which events are captured in event records.
Cray XMT™ Programming Environment User’s Guide 146 S–2479–20
LUC API Reference [H] The XMT-PE contains two user-level libraries for LUC, libluc.a, that use a C++ interface. One version of libluc.a is built for Linux applications and one is built for MTK applications. Both versions present the same interface to LUC applications. For LUC applications, you use the header file. H.1 LucEndpoint Class The LucEndpoint class defines a LucEndpoint object.
Cray XMT™ Programming Environment User’s Guide The LucEndpoint class provides the interface methods that the application uses to call functions on a remote server.
LUC API Reference [H] H.2 luc_allocate_endpoint Function Use luc_allocate_endpoint to construct LucEndpoint objects. The default value for LucServiceType is LUC_CLIENT_SERVER. See LUC Type Definitions on page 159. LucEndpoint *luc_allocate_endpoint(LucServiceType_t etype); H.3 LUC Methods The LucEndpoint class uses the following methods: • startService • stopService • getMyEndpointID • remoteCall • remoteCallSync • registerRemoteCall • setConfigValue • getConfigValue H.3.
Cray XMT™ Programming Environment User’s Guide Parameters threadCount Specifies the number of server threads that are assigned to an object. Note: The MTK LUC library ignores the threadCount parameter. requestedPid Specifies a Portals process ID to use when setting up the endpoint. By default, the LUC library chooses a Portals process ID to use. Note: MTK ignores the requestedPid parameter. Return Codes LUC_ERR_OK The service was stopped.
LUC API Reference [H] Gets the ID of the endpoint. This method is valid only after startService has returned. Return Codes This method returns the endpoint's identifier on successful completion. LUC_ENDPOINT_INVALID The endpoint is invalid because the service has not yet been started. To start the service, use the startService method. H.3.4 remoteCall Method Makes an asynchronous remote procedure call.
Cray XMT™ Programming Environment User’s Guide Parameters serverEndpoint Specifies the endpoint identifier for the desired server of this RPC. serviceType serviceFunctionIndex These parameters specify the particular remote function to invoke on a server. The server uses the same values in its registerRemoteCall method. userData userDataLen Specifies an optional pointer to input data and the length of the data. userHandle Contains the value passed to the specified userCompletionHandler when it is invoked.
LUC API Reference [H] H.3.5 remoteCallSync Method Makes a synchronous remote procedure call. Syntax luc_error_t remoteCallSync(luc_endpoint_id_t serverEndpoint, luc_service_type_t serviceType, int serviceFunctionIndex, void *inputData, size_t inputDataLen, void *outputData, size_t *outputDataLen); The synchronous procedure call is used in synchronous programming models or in cases where the caller expects the remote function to return data. This method is valid only on started objects.
Cray XMT™ Programming Environment User’s Guide Return Codes LUC_ERR_OK The remote procedure call was completed. Data may have been returned. LUC_ERR_NOT_STARTED The service has not yet been started. This error is returned by the stopService, To start the service, use the startService method. LUC_ERR_BAD_ADDRESS Indicates an attempt to use a NULL input or output buffer while specifying a non-zero size for the corresponding buffer.
LUC API Reference [H] Parameters serviceType Specifies the service type of the service being provided. serviceFunctionIndex Specifies the specific function (by index) being provided by theFunction. theFunction Specifies the application defined function to be called by LUC when RPC requests arrive at the endpoint with a matching serviceType and serviceFunctionIndex. Return Codes LUC_ERR_OK The function was registered successfully.
Cray XMT™ Programming Environment User’s Guide Values to use for this option: LUC_DBG_NONE — The library logs assertions that are fatal to the application. LUC_DBG_LOW — The library logs fatal assertions and errors. LUC_DBG_MEDIUM — The library logs errors and warnings. LUC_DBG_HIGH — The library logs errors, warnings, and verbose information about RPCs and the endpoints. LUC_CONFIG_SERVER_RPC_COUNT This configuration key sets the number of RPCs that a server endpoint should be able to handle at once.
LUC API Reference [H] Values to use for this option: powers-of-two from 1 MB to 256 MBs, inclusive. LUC_CONFIG_SWAP_CLIENT_INBOUND LUC_CONFIG_SWAP_CLIENT_OUTBOUND LUC_CONFIG_SWAP_SERVER_INBOUND LUC_CONFIG_SWAP_SERVER_OUTBOUND This configuration key uses boolean flags to enable byte swapping on messages sent to a LUC client, from a LUC client, to a LUC server, and from a LUC server, respectively. These are not valid for Linux endpoints. Values to use for this option: 0 and 1.
Cray XMT™ Programming Environment User’s Guide This key is not valid for Linux endpoints. Values to use for this option: powers of two from 1 MB to 256 MBs, inclusive. LUC_CONFIG_SMALL_NEARMEM_SIZE This configuration key adjusts the amount of nearby memory allocated for the endpoint's small I/O buffer region. This key is not valid for Linux endpoints. Values to use for this option: powers of two from 1 MB to 256 MBs, inclusive. This buffer region may not be disabled.
LUC API Reference [H] Parameters key Identifies the configuration option to get. For a list of configuration options, see setConfigValue Method on page 155. value Returns a pointer to the value for the corresponding configuration key. Return Codes LUC_ERR_OK The operation was successful. LUC_ERR_INVALID_KEY The key parameter is not one of the predefined LUC configuration keys (LUC_CONFIG_* ). H.4 LUC Type Definitions LucServiceType defines the type of the LucEndpoint object.
Cray XMT™ Programming Environment User’s Guide H.5 LUC Callback Functions The LucEndpoint class uses the following callback functions: • LUC_RPC_Function_InOut • LUC_Mem_Avail_Completion • LUC_Completion_Handler H.5.1 LUC_RPC_Function_InOut The LUC runtime calls LUC_RPC_Function_InOut callback when a remote client makes a request. The application must call the registerRemoteCall method to register LUC_RPC_Function_InOut callback functions. The application should return LUC_ERR_OK when successful.
LUC API Reference [H] Parameters inData (input parameter) Specifies a pointer to a buffer containing input data to the remote function. NULL if there is no input data. inDataLen (input parameter) Specifies the length of the inData buffer. outData (output parameter) Specifies a pointer to the output data returned by the application. NULL if there is no output data. outDataLen (output parameter) Specifies the length of the data returned by the application if there is returning data.
Cray XMT™ Programming Environment User’s Guide H.5.3 LUC_Completion_Handler The LUC_Completion_Handler callback function is used by a client for asynchronous remote procedure calls.
LUC API Reference [H] LUC_ERR_OK • The function was registered successfully. • This object is ready to accept remote requests. • The remote procedure call was launched. • The remote procedure call was completed. • The endpoint has been stopped successfully. • The function was prepared for transmission. The application's completion handler is guaranteed to fire with a real status at some later point. LUC_ERR_MAX Special value set to be the highest numerical error code generated by the library.
Cray XMT™ Programming Environment User’s Guide LUC_ERR_BAD_PARAMETER • The specified service type or function index is out of range. • The specified configuration value is out of range. LUC_ERR_RESOURCE_FAILURE A transient resource allocation failure has occurred. The caller should retry the operation at a later time. LUC_ERR_TOO_LARGE The remote procedure is trying to return more data than the client is able to accept.
LUC API Reference [H] LUC_ERR_FIO The (MTK) LUC Library received an unexpected error from the Fast I/O System Call Library. LUC_ERR_INVALID_ENDPOINT The endpoint parameter to the method was invalid. LUC_ERR_ALREADY_STOPPED User attempted to stopService on a previously stopped, or never started, LucEndpoint object. LUC_ERR_IO_ERROR An underlying transport error occurred. The remote procedure call may or may not have fired. LUC_ERR_NOT_STARTED The service has not yet been started.
Cray XMT™ Programming Environment User’s Guide 166 S–2479–20
Glossary barrier In code, a barrier is used after a phase. The barrier delays the streams that were executing parallel operations in the phase until all the streams from the phase reach the barrier. Once all the streams reach the barrier, the streams begin work on the next phase. block scheduling A method of loop scheduling used by the compiler where contiguous blocks of loop iterations are divided equally and assigned to available streams.
Cray XMT™ Programming Environment User’s Guide future Implements user-specified or explicit parallelism by creating a continuation that points to a sequence of statements that may be executed by another idle thread. Futures also optionally contain a return value. Execution of code that uses the return value is delayed until the future completes. The thread that spawns the future uses parameters to pass data to the thread that executes the future.
Glossary recurrence Occurs when a loop uses values computed in one iteration in subsequent iterations. These subsequent uses of the value imply loop-carried dependences and thus usually prevent parallelization. To increase parallelization, use linear recurrences. reduction A simple form of recurrence that reduces a large amount of data to a single value. It is commonly used to find the minimum and maximum elements of a vector.