Fortran 77 Programmer’s Guide Document Number 007-0711-060
CONTRIBUTORS Written by CJ Silverio, David Graves, and Chris Hogue Edited by Janiece Carrico Illustrated by Melissa Heinrich Production by Gloria Ackley Engineering contributions by Calvin Vu, Bron Nelson, and Deb Ryan © Copyright 1992, 1994, Silicon Graphics, Inc.— All Rights Reserved This document contains proprietary and confidential information of Silicon Graphics, Inc.
Contents Introduction xi Corequisite Publications xi Organization of Information xii Typographical Conventions xiii 1.
Contents 2. 3.
4. System Functions and Subroutines 55 Library Functions 55 Intrinsic Subroutine Extensions 63 DATE 64 IDATE 65 ERRSNS 65 EXIT 66 TIME 66 MVBITS 66 Function Extensions 67 SECNDS 68 RAN 68 5.
Contents Advanced Features 97 mp_block and mp_unblock 97 mp_setup, mp_create, and mp_destroy 98 mp_blocktime 98 mp_numthreads, mp_set_numthreads 99 mp_my_threadnum 99 Environment Variables: MP_SET_NUMTHREADS, MP_BLOCKTIME, MP_SETUP 100 Environment Variables: MP_SCHEDTYPE, CHUNK 101 Environment Variable: MP_PROFILE 101 mp_setlock, mp_unsetlock, mp_barrier 102 Local COMMON Blocks 102 Compatibility With sproc 103 DOACROSS Implementation 104 Loop Transformation 104 Executing Spooled Routines 106 6.
Figures Figure 1-1 Figure 1-2 Figure 1-3 Figure 3-1 Compilation Process 3 Compiling Multilanguage Programs Link Editing 6 Array Subscripts 36 5 vii
Tables Table 1-1 Table 1-2 Table 1-3 Table 1-4 Table 1-5 Table 1-6 Table 1-7 Table 2-1 Table 2-2 Table 3-1 Table 3-2 Table 3-3 Table 3-4 Table 3-5 Table 3-6 Table 4-1 Table 4-2 Table 4-3 Table 4-4 Table 4-5 Table A-1 Link Libraries 6 Source Statement Settings for -col72 Option 10 Source Statement Settings for -col120 Option 10 Source Statement Settings for -extend_source Option 11 Source Statement Settings for -noextend_source Option 13 Optimizer Options 17 Preconnected Files 21 Size, Alignment, and Value
Introduction This manual provides information on implementing Fortran 77 programs using IRIX™ and the IRIS®-4D™ series workstation. This implementation of Fortran 77 contains full American National Standard (ANSI) Programming Language Institute Fortran (X3.9–1978). Extensions provide full VMS Fortran compatibility to the extent possible without the VMS operating system or VAX data representation.
Introduction Refer to the dbx Reference Manual for a detailed description of the debugger. For information on interfaces to programs written in assembly language, refer to the Assembly Language Programmer's Guide. Organization of Information This manual contains the following chapters and appendix: xii • Chapter 1, “Compiling, Linking, and Running Programs,” gives an overview of components of the compiler system, and describes how to compile, link edit, and execute a Fortran program.
Typographical Conventions Typographical Conventions The following conventions and symbols are used in the text to describe the form of Fortran statements: Bold Indicates literal command line options, filenames, keywords, function/subroutine names, pathnames, and directory names. Italics Represents user-defined values. Replace the item in italics with a legal value. Italics are also used for command names, manual page names, and manual titles.
Introduction Here are two examples illustrating the syntax conventions. DIMENSION a(d) [,a(d)] … indicates that the Fortran keyword DIMENSION must be written as shown, that the user-defined entity a(d) is required, and that one or more of a(d) can be optionally specified. Note that the pair of parentheses ( ) enclosing d is required.
Chapter 1 1. Compiling, Linking, and Running Programs This chapter contains the following major sections: • “Compiling and Linking” describes the compilation environment and how to compile and link Fortran programs. This section also contains examples that show how to create separate linkable objects written in Fortran, C, Pascal, or other languages supported by the compiler system and how to link them into an executable object program.
Chapter 1: Compiling, Linking, and Running Programs Compiling and Linking Drivers Programs called drivers invoke the major components of the compiler system: the Fortran compiler, the intermediate code optimizer, the code generator, the assembler, and the link editor. The f77 command runs the driver that causes your programs to be compiled, optimized, assembled, and link edited. The format of the f77 driver command is as follows: f77 [option] … filename.
Compiling and Linking more.f Fortran Front End Optimizer (optional) Code Generator Figure 1-1 Assembler more.o Link Editor a.out Compilation Process Note the following: • The source file ends with the required suffixes .f or .F. • The source file is passed through the C preprocessor, cpp, by default. cpp does not accept C-style comments in Hollerith strings. The –nocpp option skips the pass through cpp and therefore, allows C-style comments in Hollerith strings.
Chapter 1: Compiling, Linking, and Running Programs • The default name of the executable object file is a.out. For example, the command line % f77 myprog.f produces the executable object a.out. • You can specify a name other than a.out for the executable object by using the driver option –o name, where name is the name of the executable object. For example, the command line % f77 myprog.o -o myprog link edits the object module myprog.o and produces an executable object named myprog.
Compiling and Linking main.c rest.f C Preprocessor C Preprocessor C Front End Fortran Front End Code Generator Code Generator Assembler Assembler main.o rest.o Figure 1-2 Compiling Multilanguage Programs Linking Objects You can use the f77 driver command to link edit separate objects into one executable program when any one of the objects is compiled from a Fortran source. The driver recognizes the .
Chapter 1: Compiling, Linking, and Running Programs Figure 1-3 shows the flow of control for this link edit. main.o rest.o Link Editor C All Figure 1-3 Fortran Link Editing Both f77 and cc use the C link library by default. However, the cc driver command does not know the names of the link libraries required by the Fortran objects; therefore, you must specify them explicitly to the link editor using the –l option as shown in the example.
Compiling and Linking See the section called “FILES” in the f77(1) manual page for a complete list of the files used by the Fortran driver. Also refer to the ld(1) manual page for information on specifying the –l option. Specifying Link Libraries You must explicitly load any required run-time libraries when compiling multilanguage programs. For example, when you link a program written in Fortran and some procedures written in Pascal, you must explicitly load the Pascal library libp.
Chapter 1: Compiling, Linking, and Running Programs Driver Options This section contains a summary of the Fortran–specific driver options. See the f77(1) manual page for a complete description of the compiler options; see the ld(1) manual page for a description of the link editor options. –66 Compiles Fortran 66 source programs. When used at compile time, the following four options generate various degrees of misaligned data in common blocks. They then generate the code to deal with the misalignment.
Driver Options To load the system libraries capable of handling misaligned data, use the –L/usr/lib/align switch at load time. The trap handler may be needed to handle misaligned data passed to system libraries that are not included in the /usr/lib/align directory (see fixade(3f) and unalign(3x)). –backslash Allows the backslash character to be used as a normal Fortran character instead of the beginning of an escape sequence. –C Generates code for run-time subscript range checking.
Chapter 1: Compiling, Linking, and Running Programs –col72 Table 1-2 Sets the source statement format as described in Table 1-2. Source Statement Settings for -col72 Option Column Contents 1–5 Statement label 6 Continuation indicator 7–72 Statement body 73–end Ignored If the source statement contains fewer than 72 characters, no blank padding occurs; the TAB-format facility is disabled. This option provides the SVS Fortran 72-column option mode.
Driver Options Causes any lines with a D in column 1 to be compiled. By default, the compiler treats all lines with a character in column 1 as comment lines. –d_lines –expand_include Expands all include statements in the Fortran source listing file .L. This option is only applicable with the –listing option. –extend_source Sets the source statement format as described in Table 1-4.
Chapter 1: Compiling, Linking, and Running Programs –i2 All small integer constants become INTEGER*2. All variables and functions implicitly or explicitly declared type INTEGER or LOGICAL (without a size designator, that is, *2, *4, and so on) will be INTEGER *2 or LOGICAL *2, respectively. –listing Produces the source listing file with .L suffix containing line numbers, error messages, symbol table information, and cross references.
Driver Options –nocpp Does not run the C preprocessor on the source files. Specifying this option allows you to specify C-style comments inside Hollerith strings. Use this option when you want your program to strictly conform to the Fortran 77 standard. –noexpopt Excludes floating point constant exponent optimization to achieve the same precision as releases prior to 4D1-4.0. –noextend_source Sets the source statement format as described in Table 1-5.
Chapter 1: Compiling, Linking, and Running Programs –old_rl Interprets the record length specifier for a direct unformatted file as a number of bytes instead of a number of words. This option provides backward compatibility with 4D1-3.1 releases and earlier. –onetrip Same as –1 option. –1 Compiles DO loops so that they execute at least once if reached. By default, DO loops are not executed if the upper limit is smaller than the lower limit. Similar to the –nof77 option.
Driver Options –trapeuv Sets unitialized local variables to 0xFFFA5A5A. This value is treated as a floating point NaN and causes a floating point trap. –U Causes the compiler to differentiate upper- and lowercase alphabetic characters. For example, the compiler considers a and A as distinct characters. Note that this option causes the compiler to recognize lowercase keywords only. Therefore, lowercase keywords must be used in writing case-sensitive programs (or in writing generic header files).
Chapter 1: Compiling, Linking, and Running Programs Debugging The compiler system provides a source-level, interactive debugger called dbx that you can use to debug programs as they execute. With dbx you can control program execution to set breakpoints, monitor what is happening, modify values, and evaluate results. dbx keeps track of variables, subprograms, subroutines, and data types in terms of the symbols used in the source language.
Driver Options Optimizing The default optimizing option,–O1, causes the code generator and assembler phases of compilation to improve the performance of your executable object. You can prevent optimization by specifying –O0. Table 1-6 summarizes the optimizing functions available. Table 1-6 Optimizer Options Option Result –O3 Performs all optimizations, including global register allocation. With this option, a ucode object file is created for each Fortran source file and left in a .u file.
Chapter 1: Compiling, Linking, and Running Programs See the IRIX Series Compiler Guide for details on the optimization techniques used by the compiler and tips on writing optimal code for optimizer processing.
Archiver size Prints information about the text, rdata, data, sdata, bss, and sbss sections of the specified object or archive files. See Chapter 10 of the Assembly Language Programmer’s Guide for a description of the contents and format of section data. For more information on these tools, see the odump(1), stdump(1), nm(1), file(1), or size(1) manual pages. Archiver An archive library is a file that contains one or more routines in object (.o) file format.
Chapter 1: Compiling, Linking, and Running Programs File Formats Fortran supports five kinds of external files: • sequential formatted • sequential unformatted • direct formatted • direct unformatted • key indexed file The operating system implements other files as ordinary files and makes no assumptions about their internal structure. Fortran I/O is based on records. When a program opens a direct file or key indexed file, the length of the records must be given.
Run-Time Considerations Preconnected Files Table 1-7 shows the standard preconnected files at program start. Table 1-7 Preconnected Files Unit # Unit 5 Standard input 6 Standard output 0 Standard error All other units are also preconnected when execution begins. Unit n is connected to a file named fort.n. These files need not exist, nor will they be created unless their units are used without first executing an open. The default connection is for sequentially formatted I/O.
Chapter 1: Compiling, Linking, and Running Programs Run-Time Error Handling When the Fortran run-time system detects an error, the following action takes place: • A message describing the error is written to the standard error unit (unit 0). See Appendix A, “Run-Time Error Messages,” for a list of the error messages. • A core file is produced if the f77_dump_flag environment variable is set, as described in Appendix A, “Run-Time Error Messages.”.
Chapter 2 2. Storage Mapping This chapter contains two sections: • “Alignment, Size, and Value Ranges” describes how the Fortran compiler implements size and value ranges for various data types as well as how data alignment occurs under normal conditions. • “Access of Misaligned Data” describes two methods of accessing misaligned data.
Chapter 2: Storage Mapping Alignment, Size, and Value Ranges Table 2-1 contains information about various data types.
Alignment, Size, and Value Ranges The following notes provide details on some of the items in Table 2-1. • Table 2-2 lists the approximate valid ranges for REAL and DOUBLE. Table 2-2 Valid Ranges for REAL and DOUBLE Data Types Range REAL DOUBLE Maximum 3.40282356 * 1038 1.7976931348623158 * 10 308 Minimum normalized 1.17549424 * 10 -38 2.2250738585072012 * 10 -308 Minimum denormalized 1.40129846 * 10 -46 2.
Chapter 2: Storage Mapping • • 26 You must explicitly declare an array in a DIMENSION declaration or in a data type declaration.
Access of Misaligned Data Access of Misaligned Data The Fortran compiler allows misalignment of data if specified by the use of special options. As discussed in the previous section, the architecture of the IRIS-4D series assumes a particular alignment of data. ANSI standard Fortran 77 cannot violate the rules governing this alignment.
Chapter 2: Storage Mapping To use this method, keep the Fortran front end from padding data to force alignment by compiling your program with one of two options to f77. • Use the –align8 option if your program expects no restrictions on alignment. • Use the –align16 option if your program expects to be run on a machine that requires half-word alignment. You must also use the misalignment trap handler.
Chapter 3 3. Fortran Program Interfaces This chapter contains the following major sections: • “Fortran/C Interface” describes the interface between Fortran routines and routines written in C. It contains rules and gives examples for making calls and passing arguments between the two languages. • “Fortran/C Wrapper Interface” describes the process of generating wrappers for C routines called by Fortran.
Chapter 3: Fortran Program Interfaces Fortran/C Interface When writing Fortran programs that call C functions, consider procedure and function declaration conventions for both languages. Also, consider the rules for argument passing, array handling, and accessing common blocks of data. Procedure and Function Declarations This section discusses items to consider before calling C functions from Fortran.
Fortran/C Interface Note that only one main routine is allowed per program. The main routine can be written in either C or Fortran. Table 3-1 contains an example of a C and a Fortran main routine. Table 3-1 Main Routines C Fortran main () { printf("hi!\n"); } write (6,10) 10 format ('hi!') end Invocations Invoke a Fortran subprogram as if it were an integer-valued function whose value specifies which alternate return to use.
Chapter 3: Fortran Program Interfaces Note the following: • Avoid calling Fortran functions of type FLOAT, COMPLEX, and CHARACTER from C. • You cannot write a C function so that it will return a COMPLEX value to Fortran. • A character-valued Fortran subprogram is equivalent to a C language routine with two extra initial arguments: a data address and a length. However, if the length is one, no extra argument is needed and the single character result is returned as in a normal numeric function.
Fortran/C Interface • When passing the address of a variable, the data representations of the variable in the calling and called routines must correspond, as shown in Table 3-3.
Chapter 3: Fortran Program Interfaces 3. Specify the length of each normal character parameter in the order it appeared in the argument list. The length must be specified as a constant value or INTEGER variable (that is, not an address). The examples on the following pages illustrate these rules. Example 1 This example shows how a C routine specifies the destination address of a Fortran function (which is only implied in a Fortran program).
Fortran/C Interface Fortran C C Fortran call to F, a function written in Fortran EXTERNAL F CHARACTER*10 F, G G = F() C /* C call to SAM, a routine written in Fortran */ /* which returns a string. */ CHAR S[10]; . . . f_(S, 10); The function F, written in Fortran C function F, written in Fortran CHARACTER*10 FUNCTION F() F = ‘0123456789’ RETURN END Array Handling Fortran stores arrays in column-major order with the leftmost subscript varying the fastest.
Chapter 3: Fortran Program Interfaces When a C routine uses an array passed by a Fortran subprogram, the dimensions of the array and the use of the subscripts must be interchanged, as shown in Figure 3-1. Fortran caller 10 C called routine integer a(2,3) call p (a, 1, 3) write (6, 10) a(1, 3) format (1x, I6) stop end void p_(a, i, j) int *i, *j, a[3] [3] { a[*j-1] [*i-1] = 99; } A. Dimensions and subscripts are reversed. B.1 is subtracted from the indices. j and i are pointers to integers.
Fortran/C Interface • If the same common block is of unequal length, the largest size is used to allocate space. • Unnamed common blocks are given the name _BLNK_. The following examples show C and Fortran routines that access common blocks of data. Fortran subroutine sam() common /r/ i, r i = 786 r = 3.2 return end C struct S {int i; float j;}r_; main () { sam_() ; printf(“%d %f\n”,r_.i,r_.j); } The C routine prints out 786 and 3.2.
Chapter 3: Fortran Program Interfaces Fortran/C Wrapper Interface This section describes the process of generating wrappers for C routines called by Fortran. If you want to call existing C routines (which use value parameters rather than reference parameters) from Fortran, these wrappers convert the parameters during the call. The program mkf2c provides an alternate interface for C routines called by Fortran.
Fortran/C Wrapper Interface Here is another example: simplefunc (a) int a; {} In this example, the function simplefunc has one argument, a. The argument is of type int. For this function, mkf2c produces three items: a Fortran entry, simple, and two pieces of code. The first piece of code dereferences the address of a, which was passed by Fortran. The second passes the resulting int to C. It then calls the C routine simplefunc().
Chapter 3: Fortran Program Interfaces This length is necessary to compute the indexes of the array elements. The program mkf2c has special constructs for dealing with the lengths of Fortran character variables. Reduction of Parameters The program mkf2c reduces each parameter to one of seven simple objects. The following list explains each object. 64-bit value The quantity is loaded indirectly from the passed address, and the result is passed to C.
Fortran/C Wrapper Interface character array When using mkf2c to call C from Fortran, the address of the Fortran character variable is passed. This character array can be modified by C. It is not guaranteed to be null terminated. The length of the Fortran character variable is treated differently (as discussed in the next section). pointer The value found on the stack is treated as a pointer and is passed without alteration.
Chapter 3: Fortran Program Interfaces • S, C, and D would be passed as values of length 16 bits, 64 bits, and 8 bits, respectively. F would be converted to a 64-bit DOUBLE before being passed, unless the –f option had been specified. If the –f option had been specified, F would be passed as a 32-bit value. Because the type of I is not specified, it would be assumed to be INT and would also be passed as a 32-bit value.
Fortran/C Wrapper Interface varargs macro va_alist appearing at the end of the parameter name list and its counterpart va_alist appearing at the end of the parameter type list. In the case above, use of these macros would produce the function header #include "varargs.
Chapter 3: Fortran Program Interfaces To illustrate the use of extcentry, the C file foo.c is shown below. It contains the function foo, which is to be made Fortran callable. typedef unsigned short grunt [4]; struct { long 1,11; char *str; } bar; main () { int kappa =7; foo (kappa,bar.
Fortran/C Wrapper Interface Makefile Considerations make(1) contains default rules to help automate the control of wrapper generation. The following example of a makefile illustrates the use of these rules. In the example, an executable object file is created from the files main.f (a Fortran main program) and callc.c: test: main.o callc.o f77 -o test main.o callc.o callc.o: callc.fc clean: rm -f *.o test *.fc In this program, main calls a C routine in callc.c. The extension .
Chapter 3: Fortran Program Interfaces Fortran/Pascal Interface This section discusses items you should consider when writing a call between Fortran and Pascal. Procedure and Function Declarations This section explains procedure and function declaration considerations. Names In calling a Fortran program from Pascal, you must place an underscore (_) as a suffix to routine names and data names.
Fortran/Pascal Interface Invocation If you have alternate return labels, you can invoke a Fortran subprogram as if it were an integer-valued function whose value specifies which alternate return to use. Alternate return arguments (statement labels) are not passed to the function but cause an indexed branch in the calling subprogram. If the subprogram is not a function and has no entry points with alternate return arguments, the returned value is undefined.
Chapter 3: Fortran Program Interfaces The following Fortran statement character*15 function g (…) is equivalent to the Pascal code type string = array [1..15]; var length: integer; a: array[1..15] of char; procedure g_(var a:string;length:integer;…); external; and could be invoked by the Pascal line g_ (a, 15); Arguments The following rules apply to argument specifications in both Fortran and Pascal programs: • All arguments must be passed by reference.
Fortran/Pascal Interface Table 3-6 (continued) Equivalent Fortran and Pascal Data Types Pascal Fortran record r:real; i:real; end; complex record r:double; i:double; end; double complex • Note that Fortran requires that each INTEGER, LOGICAL, and REAL variable occupy 32 bits of memory. • Functions of type INTEGER, REAL, or DOUBLE PRECISION are interchangeable between Fortran and Pascal and require no special considerations.
Chapter 3: Fortran Program Interfaces Example The following example shows how a Pascal routine must specify the length of a character string (which is only implied in a Fortran call). Fortran call to SAM C SAM IS A ROUTINE WRITTEN IN FORTRAN EXTERNAL F CHARACTER*7 S INTEGER B(3) … CALL SAM (F, B(1), S) <– Length of S is implicit. Pascal call to SAM PROCEDURE F_; EXTERNAL; S: ARRAY[1..7] OF CHAR; B: ARRAY[1..3] OF INTEGER; … SAM_ (F, B[1], S, 7); <– Length of S is explicit.
Fortran/Pascal Interface Fortran integer t (2,3) t(1,1), t(2,1), t(1,2), t(2,2), t(1,3), t(2,3) Pascal var t: array[1..2,1..3] of integer; t[1,1], t[1,2], t[1,3], t[2,1], t[2,2], t[2,3] When a Pascal routine uses an array passed by a Fortran program, the dimensions of the array and the use of the subscripts must be interchanged. The example below shows the Pascal code that interchanges the subscripts.
Chapter 3: Fortran Program Interfaces Pascal TYPE STRING = ARRAY[1..10] OF CHAR; PROCEDURE S_( VAR A: STRING; I: INTEGER); EXTERNAL; /* Note the underbar */ PROGRAM TEST; VAR R: STRING; BEGIN R:= “0123456789”; S_(R,10); END. Fortran SUBROUTING S(C) CHARACTER*10 C WRITE (6,10) C 10 FORMAT (6,10) C RETURN END Accessing Common Blocks of Data The following rules apply to accessing common blocks of data: 52 • Fortran common blocks must be declared by common statements; Pascal can use any global variable.
Fortran/Pascal Interface Example The following examples show Fortran and Pascal routines that access common blocks of data. Pascal VAR A_: RECORD I : INTEGER; R : REAL; END; PROCEDURE SAM_; EXTERNAL; PROGRAM S; BEGIN A_.I := 4; A_.R := 5.3; SAM_; END. Fortran SUBROUTINE SAM() COMMON /A/I,R WRITE (6,10) i,r 10 FORMAT (1x,I5,F5.2) RETURN END The Fortran routine prints out 4 and 5.30.
Chapter 4 4. System Functions and Subroutines This chapter describes extensions to Fortran 77 that are related to the IRIX compiler and operating system. • “Library Functions” summarizes the Fortran run-time library functions. • “Intrinsic Subroutine Extensions” describes the extensions to the Fortran intrinsic subroutines. • “Function Extensions” describes the extensions to the Fortran functions.
Chapter 4: System Functions and Subroutines Table 4-1 summarizes the functions in the Fortran run-time library.
Library Functions Table 4-1 (continued) Summary of System Interface Library Routines Function Purpose free_barrier free barrier fseek reposition a file on a logical unit fstat get file status ftell reposition a file on a logical unit gerror get system error messages getarg return command line arguments getc get a character from a logical unit getcwd get pathname of current working directory getdents read directory entries getegid get effective group ID gethostid get unique identifi
Chapter 4: System Functions and Subroutines Table 4-1 (continued) 58 Summary of System Interface Library Routines Function Purpose ierrno get system error messages ioctl control device isatty determine if unit is associated with tty itime return date or time in numerical form kill send a signal to a process link make a link to an existing file loc return the address of an object lseek move read/write file pointer lstat get file status ltime return system time m_fork create paralle
Library Functions Table 4-1 (continued) Summary of System Interface Library Routines Function Purpose new_barrier initialize a barrier structure nice lower priority of a process open open a file oserror get/set system error pause suspend process until signal perror get system error messages pipe create an interprocess channel plock lock process, test, or data in memory prctl control processes profil execution-time profile ptrace process trace putc write a character to a Fortran l
Chapter 4: System Functions and Subroutines Table 4-1 (continued) 60 Summary of System Interface Library Routines Function Purpose setoserror set system error setpgrp set process group ID setsockopt set options on sockets setuid set user ID sginap put process to sleep shmat attach shared memory shmdt detach shared memory sighold raise priority and hold signal sigignore ignore signal signal change the action for a signal sigpause suspend until receive signal sigrelse release sign
Library Functions Table 4-1 (continued) Summary of System Interface Library Routines Function Purpose taskctl control task taskdestroy kill task tasksetblockcnt set task semaphore count taskunblock unblock task timea return system time ttynam find name of terminal port uadmin administrative control ulimit get and set user limits umask get and set file creation mask umount dismount a file system unblockproc unblock processes unlink remove a directory entry uscalloc shared memory
Chapter 4: System Functions and Subroutines Table 4-1 (continued) 62 Summary of System Interface Library Routines Function Purpose usfreesema free a semaphore usgetinfo exchange information through an arena usinit semaphore and lock initialize routine usinitlock initialize a lock usinitsema initialize a semaphore usmalloc allocate shared memory usmallopt control allocation algorithm usnewlock allocate and initialize a lock usnewpollsema allocate and initialize a pollable semaphore us
Intrinsic Subroutine Extensions a. The library function time can be invoked only if it is declared in an external statement. Otherwise, it will be misinterpreted as the VMS-compatible intrinsic subroutine time. You can display information on a function with the man command: % man function Intrinsic Subroutine Extensions This section describes the intrinsic subroutines that are extensions to Fortran 77.
Chapter 4: System Functions and Subroutines Table 4-2 gives an overview of the system subroutines and their function; they are described in detail in the sections following the table.
Intrinsic Subroutine Extensions IDATE The IDATE routine returns the current date as three integer values representing the month, date, and year; the format is as follows: CALL IDATE (m, d, y) where m, d, and y are either INTEGER*4 or INTEGER*2 values representing the current month, day and year.
Chapter 4: System Functions and Subroutines EXIT The EXIT routine causes normal program termination and optionally returns an exit-status code; the format is as follows: CALL EXIT (status) where status is an INTEGER*4 or INTEGER*2 argument containing a status code. TIME The TIME routine returns the current time in hours, minutes, and seconds; the format is as follows: CALL TIME (clock) where clock is a variable, array, array element, or character substring; it must be eight bytes long.
Function Extensions Table 4-4 defines the arguments. Arguments can be declared as INTEGER*2 or INTEGER*4. Table 4-4 Arguments to MVBITS Argument Type source Integer variable or array element Source location of bit field to be transferred sbit Integer expression First bit position in the field to be transferred from source. length Integer expression Length of the field to be transferred from source.
Chapter 4: System Functions and Subroutines SECNDS SECNDS is an intrinsic routine that returns the number of seconds since midnight, minus the value of the passed arguments; the format is as follows: s = SECNDS(n) After execution, s contains the number of seconds past midnight less the value specified by n. Both s and n are single-precision, floating point values.
Chapter 5 5. Fortran Enhancements for Multiprocessors This chapter contains these sections: • “Overview” provides an overview of this chapter. • “Parallel Loops” discusses the concept of parallel DO loops. • “Writing Parallel Fortran” explains how to use compiler directives to generate code that can be run in parallel. • “Analyzing Data Dependencies for Multiprocessing” describes how to analyze DO loops to determine whether they can be parallelized.
Chapter 5: Fortran Enhancements for Multiprocessors Overview The Silicon Graphics Fortran compiler allows you to apply the capabilities of a Silicon Graphics multiprocessor workstation to the execution of a single job. By coding a few simple directives, the compiler splits the job into concurrently executing pieces, thereby decreasing the run time of the job. This chapter discusses techniques for analyzing your program and converting it to multiprocessing operations.
Writing Parallel Fortran For multiprocessing to work correctly, the iterations of the loop must not depend on each other; each iteration must stand alone and produce the same answer regardless of whether any other iteration of the loop is executed. Not all DO loops have this property, and loops without it cannot be correctly executed in parallel. However, any of the loops encountered in practice fit this model.
Chapter 5: Fortran Enhancements for Multiprocessors The C$DOACROSS directive has the form C$DOACROSS [clause [ , clause]… ] where a clause is one of the following: SHARE (variable list) LOCAL (variable list) LASTLOCAL (variable list) REDUCTION (scalar variable list) IF (logical expression) CHUNK=integer expression MP_SCHEDTYPE=schedule type The meaning of each clause is discussed below. All of these clauses are optional.
Writing Parallel Fortran REDUCTION The REDUCTION clause lists those variables involved in a reduction operation. The meaning and use of reductions are discussed in Example 4 of “Breaking Data Dependencies” on page 85. An element of the REDUCTION list must be an individual variable (also called a scalar variable) and may not be an array. However, it may be an individual element of an array. In this case, it would appear in the list with the proper subscripts.
Chapter 5: Fortran Enhancements for Multiprocessors Four methods of scheduling the iterations are supported. A single program may use any or all of them as it finds appropriate. The simple method (MP_SCHEDTYPE=SIMPLE) divides the iterations among the processes by dividing them into contiguous pieces and assigning one piece to each process.
Writing Parallel Fortran Example 1 The code fragment DO 10 I = 1, 100 A(I) = B(I) 10 CONTINUE could be multiprocessed with the directive C$DOACROSS LOCAL(I), SHARE(A, B) DO 10 I = 1, 100 A(I) = B(I) 10 CONTINUE Here, the defaults are sufficient, provided A and B are mentioned in a nonparallel region or in another SHARE list.
Chapter 5: Fortran Enhancements for Multiprocessors See Example 5 in “Analyzing Data Dependencies for Multiprocessing” on page 79 for more information on this example. Example 3 DO 10 I = M, K, N X = D(I)**2 Y = X + X DO 20 J = I, MAX A(I,J) = A(I,J) + B(I,J) * C(I,J) * X + Y 20 CONTINUE 10 CONTINUE PRINT*, I, X Here, the final values of I and X are needed after the loop completes.
Writing Parallel Fortran I is a loop index variable for the C$DOACROSS loop, so it is LASTLOCAL by default. However, even though J is a loop index variable, it is not the loop index of the loop being multiprocessed and has no special status. If it is not declared, it is given the normal default of SHARE, which would be wrong. C$& Occasionally, the clauses in the C$DOACROSS directive are longer than one line. The C$& directive is used to continue the directive onto multiple lines.
Chapter 5: Fortran Enhancements for Multiprocessors C$MP_SCHEDTYPE, C$CHUNK The C$MP_SCHEDTYPE=schedule_type directive acts as an implicit MP_SCHEDTYPE clause. A DOACROSS directive that does not have an explicit MP_SCHEDTYPE clause is given the value specified in the directive, rather than the normal default. If the DOACROSS does have an explicit clause, then the explicit value is used.
Analyzing Data Dependencies for Multiprocessing However, to simplify separate compilation, a different form of nesting is allowed. A routine that uses C$DOACROSS can be called from within a multiprocessed region. This can be useful if a single routine is called from several different places: sometimes from within a multiprocessed region, sometimes not. Nesting does not increase the parallelism. When the first C$DOACROSS loop is encountered, that loop is run in parallel.
Chapter 5: Fortran Enhancements for Multiprocessors loop can write a value into a memory location that is read or written by any other iteration of that loop. It is also all right if the same iteration reads and/or writes a memory location repeatedly as long as no others do; it is all right if many iterations read the same location, as long as none of them write to it. In a Fortran program, memory locations are represented by variable names.
Analyzing Data Dependencies for Multiprocessing The rest of this section is devoted to analyzing sample loops, some parallel and some not parallel. Example 1: Simple Independence DO 10 I = 1,N 10 A(I) = X + B(I)*C(I) In this example, each iteration writes to a different location in A, and none of the variables appearing on the right-hand side is ever written to, only read from. This loop can be correctly run in parallel.
Chapter 5: Fortran Enhancements for Multiprocessors Example 4: Local Variable DO I = 1, N X = A(I)*A(I) + B(I) B(I) = X + B(I)*X END DO In this loop, each iteration of the loop reads and writes the variable X. However, no loop iteration ever needs the value of X from any other iteration. X is used as a temporary variable; its value does not survive from one iteration to the next. This loop can be parallelized by declaring X to be a LOCAL variable within the loop.
Analyzing Data Dependencies for Multiprocessing Subroutines,”) cannot safely be included in a parallel loop. In particular, rand is not safe for multiprocessing. For user-written routines, it is the responsibility of the user to ensure that the routines can be correctly multiprocessed. Caution: Routines called within a parallel loop cannot be compiled with the –static flag.
Chapter 5: Fortran Enhancements for Multiprocessors At first glance, this loop looks like it cannot be run in parallel because it uses both W(I) and W(I-K). Closer inspection reveals that because the value of I varies between K+1 and 2*K, then I-K goes from 1 to K. This means that the W(I-K) term varies from W(1) up to W(K), while the W(I) term varies from W(K+1) up to W(2*K). So W(I-K) in any iteration of the loop is never the same memory location as W(I) in any other iterations.
Breaking Data Dependencies In this fragment, each iteration of the loop uses the same locations in the D array. However, closer inspection reveals that the entire D array is being used as a temporary. This can be multiprocessed by declaring D to be LOCAL. The Fortran compiler allows arrays (even multidimensional arrays) to be LOCAL variables with one restriction: the size of the array must be known at compile time.
Chapter 5: Fortran Enhancements for Multiprocessors This is the same as Example 6 in “Writing Parallel Fortran” on page 71. Here, INDX has its value carried from iteration to iteration. However, it is possible to compute the appropriate value for INDX without making reference to any previous value: C$DOACROSS LOCAL (I, INDX) DO I = 1, N INDX = (I*(I+1))/2 A(I) = B(I) + C(INDX) END DO In this loop, the value of INDX is computed without using any values computed on any other iteration.
Breaking Data Dependencies C$DOACROSS LOCAL(IX, IY, I) DO I = 1, N IX = INDEXX(I) IY = INDEXY(I) XFORCE(I) = XFORCE(I) + NEWXFORCE(IX) YFORCE(I) = YFORCE(I) + NEWYFORCE(IY) IXX(I) = IXOFFSET(IX) IYY(I) = IYOFFSET(IY) END DO DO 100 I = 1, N TOTAL(IXX(I),IYY(I)) = TOTAL(IXX(I), IYY(I)) + EPSILON 100 CONTINUE Here, IXX and IYY have been turned into arrays to hold all the values computed by the first loop. The first loop (containing most of the work) can now be run in parallel.
Chapter 5: Fortran Enhancements for Multiprocessors Example 4: Sum Reduction sum = 0.0 amax = a(1) amin = a(1) c$doacross local(1), REDUCTION(asum, AMAX, AMIN) do i = 1,N asum = asum + a(i) if (a(i) .gt. amax) then imin = a(i) else if (a(i) .lt. amin) then imin = a(i) end if end do This operation is known as a reduction. Reductions occur when an array of values are combined and reduced into a single value. This example is a sum reduction because the combining operation is addition.
Breaking Data Dependencies DO I = 1, NUM_THREADS SUM = SUM + PARTIAL_SUM(I) END DO The outer K loop can be run in parallel. In this method, the array pieces for the partial sums are contiguous, resulting in good cache utilization and performance. This is an important and common transformation, and so automatic support is provided by the REDUCTION clause: SUM = 0.
Chapter 5: Fortran Enhancements for Multiprocessors For example, c$doacross local(1), REDUCTION(asum, AMAX, AMIN) do i = 1,N big_sum = big_sum + a(i) big_prod = big_prod * a(i) big_min = min(big_min, a(i)) big_max = max(big_max, a(i) end do One further reduction is noteworthy. DO I = 1, N TOTAL = 0.0 DO J = 1, M TOTAL = TOTAL + A(J) END DO B(I) = C(I) * TOTAL END DO Initially, it may look as if the reduction in the inner loop needs to be rewritten in a parallel form. However, look at the outer I loop.
Work Quantum Example 1: Loop Interchange DO K = 1, N DO I = 1, N DO J = 1, N A(I,J) = A(I,J) + B(I,K) * C(K,J) END DO END DO END DO Here you have several choices: parallelize the J loop or the I loop. You cannot parallelize the K loop because different iterations of the K loop will all try to read and write the same values of A(I,J). Try to parallelize the outermost DO loop possible, because it encloses the most work. In this example, that is the I loop.
Chapter 5: Fortran Enhancements for Multiprocessors Example 2: Conditional Parallelism J = (N/4) * 4 DO I = J+1, N A(I) = A(I) + X*B(I) END DO DO I = 1, J, 4 A(I) = A(I) + X*B(I) A(I+1) = A(I+1) + X*B(I+1) A(I+2) = A(I+2) + X*B(I+2) A(I+3) = A(I+3) + X*B(I+3) END DO Here you are using loop unrolling of order four to improve speed. For the first loop, the number of iterations is always fewer than four, so this loop does not do enough work to justify running it in parallel.
Cache Effects Cache Effects It is good policy to write loops that take the effect of the cache into account, with or without parallelism. The technique for the best cache performance is also quite simple: make the loop step through the array in the same way that the array is laid out in memory. For Fortran, this means stepping through the array without any gaps and with the leftmost subscript varying the fastest.
Chapter 5: Fortran Enhancements for Multiprocessors Example 2: Trade-Offs Sometimes you must choose between the possible optimizations and their costs. Look at the following code segment: DO J = 1, N DO I = 1, M A(I) = A(I) + B(J)*C(I,J) END DO END DO This loop can be parallelized on I but not on J. You could interchange the loops to put I on the outside, thus getting a bigger work quantum.
Cache Effects If A is large, however, that may take more memory than you can spare. NUM = MP_NUMTHREADS() IPIECE = (N + (NUM-1)) / NUM C$DOACROSS LOCAL(K,J,I) DO K = 1, NUM DO J = K*IPIECE - IPIECE + 1, MIN(N, K*IPIECE) DO I = 1, M PARTIAL_A(I,K) = PARTIAL_A(I,K) + B(J)*C(I,J) END DO END DO END DO C$DOACROSS LOCAL (I,K) DO I = 1, M DO K = 1, NUM A(I) = A(I) + PARTIAL_A(I,K) END DO END DO You must trade off the various possible optimizations to find the combination that is right for the particular job.
Chapter 5: Fortran Enhancements for Multiprocessors This can be parallelized on the I loop. Because the inner loop goes from 1 to I, the first block of iterations of the outer loop will end long before the last block of iterations of the outer loop.
Advanced Features The way that iterations are assigned to processes is known as scheduling. Interleaving is one possible schedule. Both interleaving and the “simple” scheduling methods are examples of fixed schedules; the iterations are assigned to processes by a single decision made when the loop is entered. For more complex loops, it may be desirable to use DYNAMIC or GSS schedules.
Chapter 5: Fortran Enhancements for Multiprocessors mp_setup, mp_create, and mp_destroy The mp_setup(3f), mp_create(3f), and mp_destroy(3f) subroutine calls create and destroy threads of execution. This can be useful if the job has only one parallel portion or if the parallel parts are widely scattered. When you destroy the extra execution threads, they cannot consume system resources; they must be re-created when needed.
Advanced Features This trade-off between response time and CPU usage can be adjusted with the mp_blocktime(3f) call. mp_blocktime takes a single integer argument that specifies the number of times to spin before blocking. By default, it is set to 10,000,000; this takes roughly 3 seconds. If called with an argument of 0, the slave threads will not block themselves no matter how much time has passed. Explicit calls to mp_block, however, will still block the threads.
Chapter 5: Fortran Enhancements for Multiprocessors Environment Variables: MP_SET_NUMTHREADS, MP_BLOCKTIME, MP_SETUP These environment variables act as an implicit call to the corresponding routine(s) of the same name at program start-up time.
Advanced Features Environment Variables: MP_SCHEDTYPE, CHUNK These environment variables specify the type of scheduling to use on DOACROSS loops that have their scheduling type set to RUNTIME. For example, the following csh commands cause loops with the RUNTIME scheduling type to be executed as interleaved loops with a chunk size of 4: % setenv MP_SCHEDTYPE INTERLEAVE % setenv CHUNK 4 The defaults are the same as on the DOACROSS directive; if neither variable is set, SIMPLE scheduling is assumed.
Chapter 5: Fortran Enhancements for Multiprocessors mp_setlock, mp_unsetlock, mp_barrier These zero-argument functions provide convenient (although limited) access to the locking and barrier functions provided by ussetlock(3p), usunsetlock(3p), and barrier(3p). The convenience is that no user initialization need be done because calls such as usconfig(3p) and usinit(3p) are done automatically. The limitation is that there is only one lock and one barrier. For a great many programs, this is sufficient.
Advanced Features Each item must be a member of a local COMMON block. It can be a variable, an array, an individual element of an array, or the entire COMMON block. For example, C$COPYIN x,y, /foo/, a(i) will propagate the values for x and y, all the values in the COMMON block foo, and the ith element of array a. All these items must be members of local COMMON blocks. Note that this directive is translated into executable code, so in this example i is evaluated at the time this statement is executed.
Chapter 5: Fortran Enhancements for Multiprocessors DOACROSS Implementation This section discusses how multiprocessing is implemented in a DOACROSS routine. This information is useful when you use the debugger and interpret the results of an execution profile. Loop Transformation When the Fortran compiler encounters a C$DOACROSS statement, it spools the corresponding DO loop into a separate subroutine and replaces the loop statement with a call to a special library routine.
DOACROSS Implementation As an example, the following routine that appears on line 1000 SUBROUTINE EXAMPLE(A, B, C, N) REAL A(*), B(*), C(*) C$DOACROSS LOCAL(I,X) DO I = 1, N X = A(I)*B(I) C(I) = X + X**2 END DO C(N) = A(1) + B(2) RETURN END produces this spooled routine to represent the loop: SUBROUTINE _EXAMPLE_1000_aaaa X ( _LOCAL_START, _LOCAL_NTRIP, _INCR, _THREADINFO) INTEGER*4 _LOCAL_START INTEGER*4 _LOCAL_NTRIP INTEGER*4 _INCR INTEGER*4 _THREADINFO INTEGER*4 I REAL X INTEGER*4 _DUMMY I = _LOCAL_STA
Chapter 5: Fortran Enhancements for Multiprocessors Executing Spooled Routines The set of processes that cooperate to execute the parallel Fortran job are members of a process share group created by the system call sproc. The process share group is created by special Fortran start-up routines that are used only when the executable is linked with the –mp option, which enables multiprocessing. The first process is the master process. It executes all the nonparallel portions of the code.
Chapter 6 6. Compiling and Debugging Parallel Fortran This chapter gives instructions on how to compile and debug a parallel Fortran program and contains the following sections: • “Compiling and Running” explains how to compile and run a parallel Fortran program. • “Profiling a Parallel Fortran Program” describes how to use the system profiler, prof, to examine execution profiles. • “Debugging Parallel Fortran” presents some standard techniques for debugging a parallel Fortran program.
Chapter 6: Compiling and Debugging Parallel Fortran Using the –static Flag A few words of caution about the –static flag: The multiprocessing implementation demands some use of the stack to allow multiple threads of execution to execute the same code simultaneously. Therefore, the parallel DO loops themselves are compiled with the –automatic flag, even if the routine enclosing them is compiled with –static.
Profiling a Parallel Fortran Program After linking, the resulting executable can be run like any standard executable. Creating multiple execution threads, running and synchronizing them, and task terminating are all handled automatically. When an executable has been linked with –mp, the Fortran initialization routines determine how many parallel threads of execution to create. This determination occurs each time the task starts; the number of threads is not compiled into the code.
Chapter 6: Compiling and Debugging Parallel Fortran In addition to the loops, the profile shows the special routines that actually do the multiprocessing. The mp_simple_sched routine is the synchronizer and controller. Slave threads wait for work in the routine mp_slave_wait_for_work. The less time they wait, the more time they work. This gives a rough estimate of how parallel the program is.
Debugging Parallel Fortran Example: Erroneous C$DOACROSS In this example, the bug is that the two references to a have the indexes in reverse order. If the indexes were in the same order (if both were a(i,j) or both were a(j,i)), the loop could be multiprocessed. As written, there is a data dependency, so the C$DOACROSS is a mistake.
Chapter 6: Compiling and Debugging Parallel Fortran 112 • Check for EQUIVALENCE problems. Two variables of different names may in fact refer to the same storage location if they are associated through an EQUIVALENCE. • Check for the use of uninitialized variables. Some programs assume uninitialized variables have the value 0. This works with the –static flag, but without it, uninitialized values assume the value left on the stack.
Debugging Parallel Fortran Multiprocess Debugging Session This section takes you through the process of debugging the following incorrectly multiprocessed code. SUBROUTINE TOTAL(N, M, IOLD, INEW) IMPLICIT NONE INTEGER N, M INTEGER IOLD(N,M), INEW(N,M) DOUBLE PRECISION AGGREGATE(100, 100) COMMON /WORK/ AGGREGATE INTEGER I, J, NUM, II, JJ DOUBLE PRECISION TMP C$DOACROSS LOCAL(I,II,J,JJ,NUM) DO J = 2, M–1 DO I = 2, N–1 NUM = 1 IF (IOLD(I,J) .EQ.
this code reasoned that because J is different in each iteration, J/10 will also be different. Unfortunately, because J/10 uses integer division, it often gives the same results for different values of J. Although this is a fairly simple error, it is not easy to see. When run on a single processor, the program always gets the right answer. Some of the time it gets the right answer when multiprocessing.
Debugging Parallel Fortran dbx version 1.31 Copyright 1987 Silicon Graphics Inc. Copyright 1987 MIPS Computer Systems Inc. Type 'help' for help. Reading symbolic information of `total.ex' . . . MAIN:14 14 do i = 1, isize Tell dbx to pause when sproc is called. (dbx) set $promptonfork=1 Start the job: (dbx) run Warning: MP_SET_NUMTHREADS greater than available cpus (MP_SET_NUMTHREADS = 2; cpus = 1) Process 19324(total.ex) started Process 19324(total.
Chapter 6: Compiling and Debugging Parallel Fortran Process 19325(total.ex) breakpoint/trace trap[_total_99_aaaa:16,0x4006d0] Look at the complete listing of the multiprocessed loop routine.
Debugging Parallel Fortran 38 39 40 41 42 end do j=j+1 end do end To look at AGGREGATE, stop at that line with (dbx) stop at 36 pgrp [4] stop at "/tmp/Ptotalkea_11561_":36 [5] stop at "/tmp/Ptotalkea_11561":36 Continue the current process (the master process). Note that cont continues only the current process; other members of the process group (pgrp) are unaffected. (dbx) cont [4] Process 19324(total.
Chapter 6: Compiling and Debugging Parallel Fortran (dbx) where > 0 _total_99_aaaa(_local_start = 6, _local_ntrip = 4, _incr = 1, my_threadno = 1) ["/tmp/Ptotalkea_11561":36, 0x400974] 1 mp_slave_sync(0x0,0x0,0x1,0x1,0x0,0x0)["mp_slave.s":119, 0x402964] The slave process has entered the multiprocessed routine from the slave synchronization routine mp_slave_sync. Both processes are now at the AGGREGATE assignment statement. Look at the values of the indexes in both processes.
Parallel Programming Exercise (dbx) where > 0 _total_99_aaaa(_local_start = 2, _local_ntrip = 4, _incr = 1, _my_threadno = 0) ["/tmp/Ptotalkea_11561":36, 0x400974] 1 mp_simple_sched_(0x0, 0x0, 0x0, 0x0, 0x0, 0x40034c) [0x400e38] 2 total.total(n = 100, m = 10, iold = (...), inew = (...)) ["total.f":15, 0x4005f4] 3 MAIN() ["driver.f":25, 0x400348] 4 main.main(0x0, 0x7fffc7a4, 0x7fffc7ac, 0x0, 0x0, 0x0) ["main.c":35, 0x400afc] (dbx) func total [using total.
Chapter 6: Compiling and Debugging Parallel Fortran 4. If necessary, rewrite the code to make it parallelizable. Add C$DOACROSS statements as appropriate. 5. Debug the rewritten code on a single processor. 6. Run the parallel version on a multiprocessor. Verify that the answers are correct. 7. If the answers are wrong, debug the parallel code. Always return to step 5 (single-process debugging) whenever any change is made to the code. 8.
Parallel Programming Exercise prof –pixie –quit 1% orig orig.Addrs orig.Counts ------------------------------------------------------* -p[rocedures] using basic-block counts; sorted in * * descending order by the number of cycles executed in* * each procedure; unexecuted procedures are excluded * ------------------------------------------------------10864760 cycles cycles %cycles cum % cycles /call bytes procedure (file) /line 10176621 93.67 (/tmp/ctmpa00845) 282980 2.60 (/tmp/ctmpa00837) 115743 1.
Chapter 6: Compiling and Debugging Parallel Fortran FORCE(I,1) = FORCE(I,1) + WEIGHT(I) FORCE(I,2) = FORCE(I,2) + WEIGHT(I) FORCE(I,3) = FORCE(I,3) + WEIGHT(I) C C C C ... AND THE FORCE OF THIS ATOM ACTING ON THE NEARBY ATOM FORCE(J,1) = FORCE(J,1) + WEIGHT(J) FORCE(J,2) = FORCE(J,2) + WEIGHT(J) FORCE(J,3) = FORCE(J,3) + WEIGHT(J) END IF END DO END DO RETURN END Step 3: Analyze It is better to parallelize the outer loop, if possible, to enclose the most work. To do this, analyze the variable usage.
Parallel Programming Exercise both FORCE(I,1) and FORCE(J,1). There is no certainty that I and J will ever be the same, so you cannot directly parallelize the outer loop. The uses of FORCE look similar to sum reductions but are not quite the same. A likely fix is to use a technique similar to sum reduction. In analyzing this, notice that the inner loop runs from 1 up to I–1. Therefore, J is always less than I, and so the various references to FORCE do not overlap with iterations of the inner loop.
Chapter 6: Compiling and Debugging Parallel Fortran DO I = 1, NUM_ATOMS DO J = 1, I-1 DIST_SQ(1) = (ATOMS(I,1) - ATOMS(J,1)) ** 2 DIST_SQ(2) = (ATOMS(I,2) - ATOMS(J,2)) ** 2 DIST_SQ(3) = (ATOMS(I,3) - ATOMS(J,3)) ** 2 TOTAL_DIST_SQ=DIST_SQ(1)+DIST_SQ(2)+ DIST_SQ(3) C C C C SET A FLAG IF THE DISTANCE IS WITHIN THE THRESHOLD IF (TOTAL_DIST_SQ .LE. THRESHOLD_SQ) THEN FLAGS(I,J) = .TRUE. ELSE FLAGS(I,J) = .FALSE.
Parallel Programming Exercise You have parallelized the distance calculations, leaving the summations to be done serially. Because you did not alter the order of the summations, this should produce exactly the same answer as the original version. Step 5: Debug on a Single Processor The temptation might be strong to rush the rewritten code directly to the multiprocessor at this point. Remember, single-process debugging is easier than multiprocess debugging.
Chapter 6: Compiling and Debugging Parallel Fortran ---------------------------------------------------------* -p[rocedures] using basic-block counts; sorted in * * descending order by the number of cycles executed in * * each procedure; unexecuted procedures are excluded * ---------------------------------------------------------13302554 cycles cycles %cycles cum % cycles /call bytes procedure (file) /line 12479754 93.81 (/tmp/ctmpa00857) 282980 2.13 (/tmp/ctmpa00837) 155721 1.17 93.
Parallel Programming Exercise Multiprocessing has helped very little compared with the single-process run of the modified code: the program is running slower than the original. What happened? The cycle counts tell the story. The routine calc_ is what remains of the original routine after the C$DOACROSS loop _calc_88_aaaa is extracted (refer to “Loop Transformation” on page 104 for details about loop naming conventions). calc_ still takes nearly 70 percent of the time of the original.
Chapter 6: Compiling and Debugging Parallel Fortran Repeat Step 4: Rewrite As before, changes are noted in bold.
Parallel Programming Exercise DIST_SQ3 = (ATOMS(I,3) - ATOMS(J,3)) ** 2 TOTAL_DIST_SQ = DIST_SQ1 + DIST_SQ2 + DIST_SQ3 IF (TOTAL_DIST_SQ .LE. THRESHOLD_SQ) THEN C C C C ADD THE FORCE OF THE NEARBY ATOM ACTING ON THIS ATOM ... PARTIAL(I,1,THREAD_INDEX) + THREAD_INDEX) PARTIAL(I,2,THREAD_INDEX) + THREAD_INDEX) PARTIAL(I,3,THREAD_INDEX) + THREAD_INDEX) C C C C = + = + = + PARTIAL(I,1, WEIGHT(I) PARTIAL(I,2, WEIGHT(I) PARTIAL(I,3, WEIGHT(I) ...
Chapter 6: Compiling and Debugging Parallel Fortran Repeat Step 5: Debug on a Single Processor Because you are doing sum reductions in parallel, the answers may not exactly match the original. Be careful to distinguish between real errors and variations introduced by round-off. In this example, the answers agreed with the original for 10 digits.
Parallel Programming Exercise now named _calc_88_aaaa and the main loop is now _calc_88_aaab. The initialization took less than 1 percent of the total time and so does not even appear on the listing. The large number for the routine mp_waitmaster indicates a problem. Look at the pixie run for the slave process % prof -pixie -quit 1% try2.mp try2.mp.Addrs try2.mp.
Chapter 6: Compiling and Debugging Parallel Fortran Repeat Step 4 Again: Rewrite The new version looks like this, with changes in bold: SUBROUTINE CALC(NUM_ATOMS,ATOMS,FORCE,THRESHOLD,WEIGHT) IMPLICIT NONE INTEGER MAX_ATOMS PARAMETER(MAX_ATOMS = 1000) INTEGER NUM_ATOMS DOUBLE PRECISION ATOMS(MAX_ATOMS,3), FORCE(MAX_ATOMS,3) DOUBLE PRECISION THRESHOLD DOUBLE PRECISION WEIGHT(MAX_ATOMS) DOUBLE PRECISION DOUBLE PRECISION DIST_SQ(3), TOTAL_DIST_SQ THRESHOLD_SQ INTEGER I, J INTEGER MP_SET_NUMTHREADS, MP_NUMT
Parallel Programming Exercise TOTAL_DIST_SQ = DIST_SQ1 + DIST_SQ2 + DIST_SQ3 IF (TOTAL_DIST_SQ .LE. THRESHOLD_SQ) THEN C C C C ADD THE FORCE OF THE NEARBY ATOM ACTING ON THIS ATOM ...
Chapter 6: Compiling and Debugging Parallel Fortran With these final fixes in place, repeat the same steps to verify the changes: 1. Debug on a single processor. 2. Run the parallel version. 3. Debug the parallel version. 4. Profile the parallel version. Repeat Step 7 Again: Profile The pixie output for the latest version of the code looks like this: % prof -pixie -quit 1% try3.mp try3.mp.Addrs try3.mp.
Parallel Programming Exercise Epilogue After considerable effort, you reduced execution time by about 30 percent by using two processors. Because the routine you multiprocessed still accounts for the majority of work, even with two processors, you would expect considerable improvement by moving this code to a four-processor machine. Because the code is parallelized, no further conversion is needed for the more powerful machine; you can just transport the executable image and run it.
Appendix A A. Run-Time Error Messages Table A-1 lists possible Fortran run-time I/O errors. Other errors given by the operating system may also occur.
Appendix A: Run-Time Error Messages When a run-time error occurs, the program terminates with one of the error messages shown in Table A-1. All of the errors in the table are output in the format user filename : message. Table A-1 Run-Time Error Messages Number Message/Cause 100 error in format Illegal characters are encountered in FORMAT statement. 101 out of space for I/O unit table Out of virtual space that can be allocated for the I/O unit table.
Table A-1 (continued) Run-Time Error Messages Number Message/Cause 114 unit not connected Attempt to do I/O on unit that has not been opened and cannot be opened. 115 read unexpected character Unexpected character encountered in formatted or directed read. 116 blank logical input field Invalid character encountered for logical value. 117 bad variable type Specified type for the namelist is invalid.
Appendix A: Run-Time Error Messages Table A-1 (continued) Run-Time Error Messages Number Message/Cause 126 'new' file exists The file is opened as new but already exists. 127 can’t find 'old' file The file is opened as old but does not exist. 130 illegal argument Invalid value in the I/O control list. 131 duplicate key value on write Cannot write a key that already exists. 132 indexed file not open Cannot perform indexed I/O on an unopened file.
Table A-1 (continued) Run-Time Error Messages Number Message/Cause 141 beginning or end of file reached The index for the specified key points beyond the length of the indexed data file. This error is probably because of corrupted ISAM files or a bad indexed I/O run-time library. 142 cannot find request record The requested key for indexed READ does not exist. 143 current record not defined Cannot execute REWRITE, UNLOCK, or DELETE before doing a READ to define the current record.
Appendix A: Run-Time Error Messages Table A-1 (continued) Run-Time Error Messages Number Message/Cause 155 character key field value length too long The length of the character key value exceeds the length specification for that key. 156 fixed record on sequential file not allowed RECORDTYPE='fixed' cannot be used with a sequential file. 157 variable records allowed only on unformatted sequential file RECORDTYPE='variable' can only be used with an unformatted sequential file.
Table A-1 (continued) Run-Time Error Messages Number Message/Cause 167 invalid code in format specification Unknown code is encountered in format specification. 168 invalid record number in direct access file The specified record number is less than 1. 169 cannot have endfile record on non-sequential file Cannot have an endfile on a direct- or keyed-access file. 170 cannot position within current file Cannot perform fseek() on a file opened for sequential unformatted I/O.
Appendix A: Run-Time Error Messages Table A-1 (continued) Run-Time Error Messages Number Message/Cause 184 function not declared as varargs Variable argument routines called in subroutines that have not been declared in a $VARARGS directive. 185 internal error Internal run-time library error.
Index A C –align16 compiler option, 8, 28 –align32 compiler option, 8 –align64 compiler option, 8 –align8 compiler option, 8, 28 alignment, 25, 27 archiver, ar, 19 arguments order, 33 passing between C and Fortran, 32 passing between Fortran and Pascal, 48 arrays C, 35 character, 42 declaring, 26 Pascal, 50 –automatic compiler option, 108 C$, 77 –C compiler option, 9, 112 C functions calling from Fortran, 30 C macro preprocessor, 3, 11 C$&, 77 C-style comments accepting in Hollerith strings, 3 cache, 93
Index –col72 compiler option, 10 comments, 3 COMMON blocks, 72, 112 making local to a process, 102 common blocks, 26, 36 compilation, 2 compiler options, 8 –1, 14 –align16, 8, 25, 28 –align32, 8 –align64, 8 –align8, 8, 25, 28 –automatic, 108 –backslash, 9 –bestG, 18 –C, 9, 112 –check_bounds, 9 –chunk, 9, 78 –col72, 10 –cord, 18 –cpp, 10 –d_lines, 11 –E, 11 –expand_include, 11 –extend_source, 11 –F, 11 –feedback, 18 –framepointer, 11 –G, 18 –g, 16, 114 –i2, 12 –jmopt, 18 –l, 6 list of, 8 –listing, 12 –lm, 7
inconsequential, 84 rewritable, 83 data independence, 79 data types alignment, 25, 27 C, 33 Fortran, 33, 49 Pascal, 49 DATE, 64 dbx, 16, 137 debugging, 125 parallel Fortran programs, 110 with dbx, 16 direct files, 20 directives C$, 77 C$&, 77 C$CHUNK, 78 C$DOACROSS, 71 C$MP_SCHEDTYPE, 78 list of, 71 DO loops, 70, 80, 91, 112 DOACROSS, 78 and multiprocessing, 104 driver options, 8 drivers, 2 dynamic scheduling, 74 E –E compiler option, 11 environment variables, 100, 101, 109 f77_dump_flag, 22, 137 equivalen
Index in parallel loops, 82 intrinsic, 67, 83 SECNDS, 68 library, 55, 83 RAN, 68 side effects, 82 G –G compiler option, 18 –g compiler option, 16, 114 global data area reducing, 18 guided self-scheduling, 74 H handle_sigfpes, 22 Hollerith strings and C-style comments, 3 I –i2 compiler option, 12 IDATE, 65 IF clause, 73 IGCLD signal intercepting, 103 interleave scheduling, 74 intrinsic subroutines, 63 DATE, 64 ERRSNS, 65 EXIT, 66 IDATE, 65 MVBITS, 66 TIME, 66 148 J –jmpopt compiler option, 18 L –l com
misaligned data, 27 mkf2c, 38 –mp compiler option, 12, 106, 107, 112 mp_barrier, 102 mp_block, 97 mp_blocktime, 99 mp_create, 98 mp_destroy, 98 mp_my_threadnum, 99 mp_numthreads, 99 MP_PROFILE, 101 MP_SCHEDTYPE, 73, 78 –mp_schedtype compiler option, 12, 78 MP_SET_NUMTHREADS, 100 mp_set_numthreads, 99 mp_setlock, 102 mp_setup, 98 mp_simple_sched, 110 and loop transformations, 104 tasks executed, 106 mp_slave_control, 106 mp_slave_wait_for_work, 110 mp_unblock, 97 mp_unsetlock, 102 multi-language programs, 4
Index PFA, 80, 122 associated directives, 78 running from f77, 14 –pfa compiler option, 14 pixie, 16 and multiprocessing, 101 power Fortran accelerator, 80, 122 preconnected files, 21 preprocessor cpp, 3 processes master, 70, 106 slave, 70, 106 prof, 16 profiling, 16, 120 and multiprocessing, 101 parallel Fortran program, 109 programs multi-language, 4 R –R compiler option, 14 –r8 compiler option, 14 RAN, 68 rand and multiprocessing, 83 RATFOR and –R option, 14 records, 20 recurrence and data dependency,
IDATE, 65 MVBITS, 66 subscripts checking range, 9 sum reduction, example, 89 SVS Fortran, 10 sychronizer, 110 symbol table information producing, 18 syntax conventions, xiii system interface, 55 system subroutines, 63 T TIME, 66 trap handling, 22 –trapeuv compiler option, 15 –vms_endfile compiler option, 15 –vms_library compiler option, 15 –vms_stdin compiler option, 15 W –w compiler option, 15 –w66 compiler option, 15 where command, 118 work quantum, 90 wrapper generator mkf2c, 38 X –Xlocaldata loader
Tell Us About This Manual As a user of Silicon Graphics products, you can help us to better understand your needs and to improve the quality of our documentation. Any information that you provide will be useful.