HP MLIB User's Guide Vol. 2 7th Ed.

ManualsBrandsHP ManualsSoftwareHP-UX Performance Tools

HP MLIB User’s Guide

Seventh Edition

HP Part No. B6061-96027 and B6061-96028

HP MLIB

December 2004

Printed in USA

Summary of content (612 pages)

PAGE 1
HP MLIB User’s Guide Seventh Edition HP Part No.
PAGE 2
Edition: Seventh Document Numbers: B6061-96027 and B6061-96028 Remarks: Released Dcember 2004 with HP MLIB software version9.0. User’s Guide split into two volumes with this release. Edition: Sixth Document Number: B6061-96023 Remarks: Released September 2003 with HP MLIB software version 8.50. Edition: Fifth Document Number: B6061-96020 Remarks: Released September 2002 with HP MLIB software version 8.30.
PAGE 3
Notice  Copyright 1979-2004 Hewlett-Packard Development Company. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this material, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose.
PAGE 4
PAGE 5
Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv VECLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv LAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi ScaLAPACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
Message passing-based nested parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Default CPS library stack is too small for MLIB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Default Pthread library stack is too small for MLIB . . . . . . . . . . . . . . . . . . . . . . . . . 22 Roundoff effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Data types and precision . . . . . . . . . . . . . . . .
PAGE 7
SASUM/DASUM/IASUM/SCASUM/DZASUM Sum of magnitudes . . . . . . . . . . . . 62 SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC Elementary vector operation 65 SAXPYI/DAXPYI/CAXPYI/ZAXPYI Sparse elementary vector operation . . . . . . . 68 SCLIP/DCLIP/ICLIP Two sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 SCLIPL/DCLIPL/ICLIPL Left sided vector clip . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 SCLIPR/DCLIPR/ICLIPR Right sided vector clip . . . . . . . . . . . . . . . . .
PAGE 8
plane rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 F_SAXPBY/F_DAXPBY/F_CAXPBY/F_ZAXPBY Scaled vector accumulation . . . 161 F_SAXPY_DOT/F_DAXPY_DOT/F_CAXPY_DOT/F_ZAXPY_DOT Combine AXPY and DOT routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 F_SCOPY/F_DCOPY/F_CCOPY/F_ZCOPY Copy vector. . . . . . . . . . . . . . . . . . . . .
PAGE 9
DGEMMS/ZGEMMS Strassen matrix-matrix multiply . . . . . . . . . . . . . . . . . . . . . 227 SGEMV/DGEMV/CGEMV/ZGEMV Matrix-vector multiply . . . . . . . . . . . . . . . . . 232 SGER/DGER/CGERC/CGERU/ZGERC/ZGERU Rank-1 update . . . . . . . . . . . . . . 237 SGETRA/DGETRA/CGETRA/ZGETRA In-place transpose of a general square matrix 241 SSBMV/DSBMV/CHBMV/ZHBMV Matrix-vector multiply. . . . . . . . . . . . . . . . . . 244 SSPMV/DSPMV/CHPMV/ZHPMV Matrix-vector multiply . . . . . . . . . . . . . . . . . .
PAGE 10
F_SGEMM/F_DGEMM/F_CGEMM/F_ZGEMM General matrix-matrix multiply 362 F_SGEMV/F_DGEMV/F_CGEMV/F_ZGEMV General matrix-vector multiply . . 365 F_SGEMVER/F_DGEMVER/F_CGEMVER/F_ZGEMVER Multiple matrix-vector multiply, rank 2 update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 F_SGEMVT/F_DGEMVT/F_CGEMVT/F_ZGEMVT Multiple matrix-vector multiply 372 F_SGER/F_DGER/F_CGER/F_ZGER General rank-1 update . . . . . . . . . . . . . . . .
PAGE 11
Common arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 SM arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 Order of arguments for args(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 12
SVBRSM/DVBRSM/CVBRSM/ZVBRSM 533 xii Variable block row format triangular solve Table of Contents
PAGE 13
Figures List of Figures xiii
PAGE 14
xiv List of Figures
PAGE 15
Tables Table 1-1 VECLIB and VECLIB8 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Table 1-2 Compiler Defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Table 1-3 VECLIB Naming Convention—Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Table 1-4 BLAS Standard Operator Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Table 2-1 FPINFO return values . . .
PAGE 16
Table 4-19 BSR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Table 4-20 6 x 6 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Table 4-21 VBR Format Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 17
VECLIB Preface Hewlett-Packard’s high-performance math libraries (HP MLIB) help you speed development of applications and shorten execution time of long-running technical applications. HP MLIB is a collection of subprograms optimized for use on HP servers and workstations, providing mathematical software and computational kernels for engineering and scientific applications. HP MLIB can be used on systems ranging from single-processor workstations to multiprocessor high-end servers.
PAGE 18
LAPACK LAPACK HP Linear Algebra Package (LAPACK) is a collection of subprograms that provide mathematical software for applications involving linear equations, least squares, eigenvalue problems, and the singular value decomposition. LAPACK is designed to supersede the linear equation and eigenvalue packages, LINPACK and EISPACK.
PAGE 19
SuperLU Refer to Part 3 of this manual for information specific to HP ScaLAPACK. To supplement the HP specific information provided in Part 3 of this document, refer to the standard ScaLAPACK Users’ Guide. You can access the latest edition of the ScaLAPACK Users’ Guide at the Netlib repository at the following URL: http://www.netlib.org/scalapack/slug/index..html SuperLU This implementation provides the Distributed SuperLU library designed for distributed memory parallel computers.
PAGE 20
VMATH • Sparse symmetric and structurally-symmetric linear equation solutions. • Sparse symmetric ordinary and generalized eigensystem solutions. • Out-of-core symmetric and structurally-symmetric linear equation and eigensystems solutions. • Full METIS functionality This implementation provides the METIS Version 4.0.1 library. It is based on the public-domain METIS, which was developed at the University of Minnesota, Department of Computer Science, and the Army HPC Research Center.
PAGE 21
Purpose and audience Purpose and audience This guide describes the MLIB software library and shows how to use it. This library provides mathematical software and computational kernels for applications.
PAGE 22
Organization Organization The HP MLIB User’s Guide describes HP MLIB VECLIB in Part 1, HP MLIB LAPACK in Part 2, HP MLIB ScaLAPACK in Part 3, and HP MLIB Distributed SuperLU in Part 4. To learn fundamental information necessary for using the VECLIB library, read Chapter 1 and the introductory sections of the other chapters. These sections of background information will help you efficiently use the library subprograms.
PAGE 23
Organization Part 6 of this document is organized as follows: • Chapter 13 explains sparse symmetric linear equation subprograms • Chapter 14 describes METIS subprograms • Chapter 15 describes sparse symmetric eigenvalue subprograms • Chapter 16 describes BCSLIB-EXT functionality Supplemental material is provided as follows: • Appendix A describes how to call VECLIB and LAPACK subprograms from within C programs • Appendix B describes LINPACK subprograms available in HP MLIB • Appendix C lists parallelized
PAGE 24
Notational conventions Notational conventions The following conventions are used in this manual: Italics Italics within text indicate mathematical entities used or manipulated by the program: for example, solve the n-by-n system of linear equations Ax = b. Italics within command lines indicate generic commands, file names, or subprogram names. Substitute actual commands, file names, or subprograms for the italicized words. For example, the command line f90 prog_name.
PAGE 25
Documentation resources UPPERCASE MONOSPACE UPPERCASE MONOSPACE indicates Fortran programs. Brackets ( [ ] ) Square brackets in command examples designate optional entries. NOTE A NOTE highlights important supplemental information. Documentation resources The HP MLIB User’s Guide, the LAPACK Users’ Guide, and the ScaLAPACK Users’ Guide are available in hardcopy and online formats. For the HP MLIB User’s Guide, refer to: http:// www.hp.
PAGE 26
Documentation resources • HP-UX Floating-Point Guide. Describes how floating-point arithmetic is implemented on HP 9000 systems and discusses how floating-point behavior affects the programmer. • HP Fortran 90 Programmer’s Guide. Provides extensive usage information (including how to compile and link), suggestions and tools for migrating to HP Fortran 90, and how to call C and HP-UX routines for HP Fortran 90. • HP Fortran 90 Programmer’s Reference.
PAGE 27
Overview 5 Fast Fourier Transforms Overview This chapter explains how to use the VECLIB fast Fourier transform (FFT) subprograms.
PAGE 28
Associated documentation Associated documentation The following documents provide supplemental material for this chapter: Brigham, E.O. The Fast Fourier Transform. Englewood Cliffs, NJ: Prentice-Hall, Inc. 1974. Rabiner, L.R., and B. Gold. Theory and Application of Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, Inc. 1975. Van Loan, C. Computational Frameworks for the Fast Fourier Transform. Philadelphia: Society for Industrial and Applied Mathematics, 1992.
PAGE 29
One-dimensional FFT Subprograms for Fast Fourier Transforms Subprograms for Fast Fourier Transforms The following sections describe FFT subprograms included with VECLIB. Name C1DFFT/Z1DFFT One-dimensional FFT Purpose Given an array of complex data, these subprograms compute the one-dimensional forward or inverse DFT. Two companion subprograms, S1DFFT and D1DFFT, perform the same operation but with the complex data presented with real and imaginary parts in separate real arrays.
PAGE 30
C1DFFT/Z1DFFT Usage One-dimensional FFT Because it is common to use one data set length repetitively, these subprograms have a separate initialization call such that the setup can be performed only once for each different transform size. You will, therefore, always have at least two CALL statements to the FFT subprogram using the same working storage array.
PAGE 31
One-dimensional FFT Output C1DFFT/Z1DFFT z ier Example If iopt ≠ −3, transformed data replaces the input if ier = 0 is returned. Not used as output if iopt = −3. Status response: ier = 0 Normal return—transform or initialization successful. ier = −1 l < 0. ier = −2 l = 0. ier = −3 Invalid value of iopt. ier = −4 Insufficient dynamic memory available for workspace. Compute the forward discrete Fourier transform of two COMPLEX*8 data sets of length 1024.
PAGE 32
S1DFFT/D1DFFT One-dimensional FFT Name S1DFFT/D1DFFT One-dimensional FFT Purpose Given a set of complex data with the real and imaginary parts in separate real arrays, these subprograms compute the one-dimensional forward or inverse DFT. Two companion subprograms, C1DFFT and Z1DFFT, perform the same operation but with the complex data presented in a complex array. The one-dimensional forward discrete Fourier transform of z(n), for n = 1, 2, ...
PAGE 33
One-dimensional FFT Usage S1DFFT/D1DFFT Because it is common to use one data set length repetitively, these subprograms have a separate initialization call such that the setup can be performed only once for each different transform size. You will, therefore, always have at least two CALL statements to the FFT subprogram using the same working storage array. Refer to “Example” on page 546.
PAGE 34
S1DFFT/D1DFFT Output One-dimensional FFT x and y ier Example If iopt ≠ −3, the transformed data replaces the input if ier = 0 is returned. Not used as output if iopt = −3. Status response: ier = 0 Normal return—transform or initialization successful. ier = −1 l < 0. ier = −2 l = 0. ier = −3 Invalid value of iopt. ier = −4 Insufficient dynamic memory available for workspace. Compute the forward discrete Fourier transform of two REAL*8 data sets of length 512.
PAGE 35
Two-dimensional FFT C2DFFT/Z2DFFT Name C2DFFT/Z2DFFT Two-dimensional FFT Purpose Given an array of complex data, these subprograms compute the two-dimensional forward or inverse DFT. Two companion subprograms, S2DFFT and D2DFFT, perform the same operation but with the complex data presented with real and imaginary parts in separate real arrays. The two-dimensional forward discrete Fourier transform of z(n1,n2), for n1 = 1, 2, ..., l1 and n2 = 1, 2, ...
PAGE 36
C2DFFT/Z2DFFT Two-dimensional FFT VECLIB8: INTEGER*8 l1, l2, ldz, iopt, ier COMPLEX*8 z(ldz, l2) CALL C2DFFT(z, l1, l2, ldz, iopt, ier) INTEGER*8 l1, l2, ldz, iopt, ier COMPLEX*16 z(ldz, l2) CALL Z2DFFT(z, l1, l2, ldz, iopt, ier) Input z l1 l2 ldz iopt Output z ier 548 HP MLIB User’s Guide Array of data to be transformed. Number of rows of data, l1 > 0. Number of columns of data, l2 > 0. The leading dimension of array z, with ldz ≥ l1. Option flag: iopt ≥ 0 Compute forward transform.
PAGE 37
Two-dimensional FFT S2DFFT/D2DFFT Name S2DFFT/D2DFFT Two-dimensional FFT Purpose Given a set of complex data with the real and imaginary parts in separate real arrays, these subprograms compute the two-dimensional forward or inverse DFT. Two companion subprograms, C2DFFT and Z2DFFT, perform the same operation but with the complex data presented in a complex array. The two-dimensional forward DFT of z(n1,n2), for n1 = 1, 2, ..., l1 and n2 = 1, 2, ...
PAGE 38
S2DFFT/D2DFFT Two-dimensional FFT INTEGER*8 l1, l2, ldxy, iopt, ier REAL*8 x(ldxy, l2), y(ldxy, l2) CALL D2DFFT(x, y, l1, l2, ldxy, iopt, ier) INTEGER*8 l1, l2, ldxy, iopt, ier REAL*4 x(ldxy, l2), y(ldxy, l2) CALL S2DFFT(x, y, l1, l2, ldxy, iopt, ier) Input x y l1 l2 ldxy iopt Output x and y ier 550 HP MLIB User’s Guide Array of real parts of the data to be transformed. Array of imaginary parts of the data to be transformed. Number of rows of data, l1 > 0. Number of columns of data, l2 > 0.
PAGE 39
Three-dimensional FFT C3DFFT/Z3DFFT Name C3DFFT/Z3DFFT Three-dimensional FFT Purpose Given an array of complex data, these subprograms compute the three-dimensional forward or inverse DFT. Two companion subprograms, S3DFFT and D3DFFT, perform the same operation but with the complex data presented with real and imaginary parts in separate real arrays. The three-dimensional forward DFT of z(n1, n2, n3), for n1 = 1, 2, ..., l1, n2 = 1, 2, ..., l2, and n3 = 1, 2, ...
PAGE 40
C3DFFT/Z3DFFT Three-dimensional FFT VECLIB8: INTEGER*8 l1, l2, l3, ldz, mdz, iopt, ier COMPLEX*8 z(ldz, mdz, l3) CALL C3DFFT(z, l1, l2, l3, ldz, mdz, iopt, ier) INTEGER*8 l1, l2, l3, ldz, mdz, iopt, ier COMPLEX*16 z(ldz, mdz, l3) CALL Z3DFFT(z, l1, l2, l3, ldz, mdz, iopt, ier) Input z l1 l2 l3 ldz mdz iopt Output z ier 552 HP MLIB User’s Guide Array of data to be transformed. Number of rows of data, l1 > 0. Number of columns of data, l2 > 0. Number of planes of data, l3 > 0.
PAGE 41
Three-dimensional FFT S3DFFT/D3DFFT Name S3DFFT/D3DFFT Three-dimensional FFT Purpose Given a set of complex data with real and imaginary parts in separate real arrays, these subprograms compute the three-dimensional forward or inverse DFT. Two companion subprograms, C3DFFT and Z3DFFT, perform the same operation but with the complex data presented in a complex array. The three-dimensional forward DFT of z(n1, n2 ,n3), for n1 = 1, 2, ..., l1, n2 = 1, 2, ..., l2, and n3 = 1, 2, ...
PAGE 42
S3DFFT/D3DFFT Usage Three-dimensional FFT VECLIB: INTEGER*4 l1, l2, l3, ldxy, mdxy, iopt, ier REAL*4 x(ldxy, mdxy, l3), y(ldxy, mdxy, l3) CALL S3DFFT(x, y, l1, l2, l3, ldxy, mdxy, iopt, ier) INTEGER*4 l1, l2, l3, ldxy, mdxy, iopt, ier REAL*8 x(ldxy, mdxy, l3), y(ldxy, mdxy, l3) CALL D3DFFT(x, y, l1, l2, l3, ldxy, mdxy, iopt, ier) VECLIB8: INTEGER*8 l1, l2, l3, ldxy, mdxy, iopt, ier REAL*4 x(ldxy, mdxy, l3), y(ldxy, mdxy, l3) CALL S3DFFT(x, y, l1, l2, l3, ldxy, mdxy, iopt, ier) INTEGER*8 l1, l2, l3, ldxy
PAGE 43
Three-dimensional FFT S3DFFT/D3DFFT ier = 0 ier = −1 ier = −2 ier = −3 ier = −4 ier = −5 ier = −6 Normal return—transform successful. l1 ≤ 0. l2 ≤ 0. l3 ≤ 0. ldxy < l1. mdxy < l2. Probable error in ldxy or mdxy.
PAGE 44
CFFTS/ZFFTS Simultaneous one-dimensional FFT Name CFFTS/ZFFTS Simultaneous one-dimensional FFT Purpose Given a number of sets of one-dimensional complex data in a complex array, these subprograms compute all of their one-dimensional forward or inverse DFT. Two companion subprograms, SFFTS and DFFTS, perform the same operation but with the complex data presented with real and imaginary parts in separate real arrays.
PAGE 45
Simultaneous one-dimensional FFT CFFTS/ZFFTS VECLIB8: INTEGER*8 l, incl, n, incn, iopt, ier COMPLEX*8 z(lenz) CALL CFFTS(z, l, incl, n, incn, iopt, ier) INTEGER*8 l, incl, n, incn, iopt, ier COMPLEX*16 z(lenz) CALL ZFFTS(z, l, incl, n, incn, iopt, ier) Input z Array containing n data sets, each consisting of l data points, to be transformed. Typically, z will be a two- or three-dimensional array with each set of data being a one-dimensional array section. Refer to “Notes” for suggested usages.
PAGE 46
CFFTS/ZFFTS Notes Simultaneous one-dimensional FFT ier = −2 incl ≤ 0. ier = −3 n ≤ 0. ier = −4 incn ≤ 0. ier = −5 l, incl, n, and incn are incompatible. Refer to “Notes.” Typically, z will be a two- or three-dimensional array with each set of data being a one-dimensional section of the array, that is, all but one subscript will be constant within a data set.
PAGE 47
Simultaneous one-dimensional FFT Example 1 CFFTS/ZFFTS Compute the forward DFT of 256 COMPLEX*8 data sets of length 1024. The data sets are stored as the columns of array Z whose dimensions are 1025 by 256. INTEGER*4 L,INCL,N,INCN,IOPT,IER COMPLEX*8 Z(1025,256) L = 1024 INCL = 1 N = 256 INCN = 1025 IOPT = 1 CALL CFFTS (Z,L,INCL,N,INCN,IOPT,IER) IF ( IER .NE. 0 ) THEN handle error condition END IF Example 2 Compute the inverse DFT of 1024 COMPLEX*8 data sets of length 256.
PAGE 48
SFFTS/DFFTS Simultaneous one-dimensional FFT Name SFFTS/DFFTS Simultaneous one-dimensional FFT Purpose Given a number of sets of one-dimensional complex data with real and imaginary parts in separate real arrays, these subprograms compute all of their one-dimensional forward or inverse DFT. Two companion subprograms, CFFTS and ZFFTS, perform the same operation but with the complex data presented in a complex array.
PAGE 49
Simultaneous one-dimensional FFT SFFTS/DFFTS VECLIB8: INTEGER*8 l, incl, n, incn, iopt, ier REAL*4 x(lenxy), y(lenxy) CALL SFFTS(x, y, l, incl, n, incn, iopt, ier) INTEGER*8 l, incl, n, incn, iopt, ier REAL*8 x(lenxy), y(lenxy) CALL DFFTS(x, y, l, incl, n, incn, iopt, ier) Input x and y Arrays containing n data sets, each consisting of l data points, to be transformed. Typically, x and y will be two- or three-dimensional arrays with each data set being a one-dimensional array section.
PAGE 50
SFFTS/DFFTS Output Simultaneous one-dimensional FFT x and y ier Notes The transformed data replaces the input if ier = 0 is returned. Unchanged if ier < 0. Status response: ier = 0 Normal return—transform successful. ier = −1 l ≤ 0. ier = −2 incl ≤ 0. n ≤ 0. ier = −3 incn ≤ 0. ier = −4 ier = −5 l, incl, n, and incn are incompatible. Refer to “Notes.
PAGE 51
Simultaneous one-dimensional FFT SFFTS/DFFTS Similarly, if the subscript that varies between data sets is the 1st subscript, use 2nd subscript, use 3rd subscript, use incn = 1. incn = ldxy. incn = ldxy×mdxy. l, incl, n, and incn must be such that no two points of any data sets occupy the same elements of x and y. These subprograms detect this situation and return ier = −5 if incl < n × gcd ( incl, incn ) and incn < l × gcd ( incl, incn ) where gcd(.,.) is the greatest common divisor.
PAGE 52
CRC1FT/ZRC1FT Real-to-complex one-dimensional FFT Name CRC1FT/ZRC1FT Real-to-complex one-dimensional FFT Purpose These subprograms compute either the forward real-to-complex or the inverse complex-to-real, one-dimensional, DFT. Two companion subprograms, SRC1FT and DRC1FT, perform a similar operation but in a space-conserving manner. The one-dimensional forward DFT of a real data set, z(n), for n = 1, 2, ..., l, is defined by l ∑ Z(m) = z ( n )e – 2 πi ( m – 1 ) ( n – 1 ) ⁄ l n=1 for m = 1, 2, ..
PAGE 53
Real-to-complex one-dimensional FFT Usage CRC1FT/ZRC1FT Because it is common to use one data set length repetitively, these subprograms have a separate initialization call such that the setup can be performed only once for each different transform size. Therefore, you will always have at least two CALL statements to the FFT subprogram using the same working storage array.
PAGE 54
CRC1FT/ZRC1FT Real-to-complex one-dimensional FFT Working Storage work If iopt = −3, work is initialized for computing transforms of length l. If iopt ≠ −3, work must have been initialized by a previous call with this value of l in which iopt was −3. Output z If iopt ≠ −3, the transformed data replaces the input if ier = 0 is returned. For a forward real-to-complex transform, all l elements of the transform are returned.
PAGE 55
Real-to-complex one-dimensional FFT SRC1FT/DRC1FT Name SRC1FT/DRC1FT Real-to-complex one-dimensional FFT Purpose Given a set of real data in a real array, these subprograms compute the nonredundant part of the complex one-dimensional forward DFT. Alternatively, given the nonredundant part of a conjugate-symmetric complex one-dimensional data set, these subprograms compute the real inverse discrete Fourier transform.
PAGE 56
SRC1FT/DRC1FT Real-to-complex one-dimensional FFT For best performance, these subprograms require that l be a product of powers of 2, 3, and 5, that is, of the form p q r l = 2 3 5 , where p, q, r ≥ 0 and where either l = 1 or l is even. Usage Because it is common to use one data set length repetitively, these subprograms have a separate initialization call such that the setup can be performed only once for each different transform size.
PAGE 57
Real-to-complex one-dimensional FFT iopt SRC1FT/DRC1FT Option flag: iopt = +1 iopt = −1 iopt = −2 iopt = −3 Compute forward transform. Compute scaled inverse transform. Compute unscaled inverse transform. Initialize work for subsequent transforms of length l. Working Storage work If iopt = −3, work is initialized for computing transforms of length l. If iopt ≠ −3, work must have been initialized by a previous call with this value of l in which iopt was −3.
PAGE 58
CRC2FT/ZRC2FT Real-to-complex two-dimensional FFT Name CRC2FT/ZRC2FT Real-to-complex two-dimensional FFT Purpose These subprograms compute either the forward real-to-complex or the inverse complex-to-real, two-dimensional, discrete Fourier transform. A pair of companion subprograms, SRC2FT and DRC2FT, performs a similar operation, but in a space-conserving manner. The two-dimensional, complex, forward discrete Fourier transform of a real data set, z(n1,n2), for n1 = 1, 2, ..., l1 and n2 = 1, 2, ...
PAGE 59
Real-to-complex two-dimensional FFT Usage CRC2FT/ZRC2FT VECLIB: INTEGER*4 l1, l2, ldz, iopt, ier COMPLEX*8 z(ldz, l2) CALL CRC2FT(z, l1, l2, ldz, iopt, ier) INTEGER*4 l1, l2, ldz, iopt, ier COMPLEX*16 z(ldz, l2) CALL ZRC2FT(z, l1, l2, ldz, iopt, ier) VECLIB8: INTEGER*8 l1, l2, ldz, iopt, ier COMPLEX*8 z(ldz, l2) CALL CRC2FT(z, l1, l2, ldz, iopt, ier) INTEGER*8 l1, l2, ldz, iopt, ier COMPLEX*16 z(ldz, l2) CALL ZRC2FT(z, l1, l2, ldz, iopt, ier) Input z l1 l2 ldz iopt Output z Array of data to be tra
PAGE 60
CRC2FT/ZRC2FT Real-to-complex two-dimensional FFT ier 572 HP MLIB User’s Guide Status response: ier = 0 Normal return—transform successful. ier = −1 l1 ≤ 0. ier = −2 l2 ≤ 0. ldz < l1. ier = −3 Probable error in ldz or dimensions ier = −4 of z.
PAGE 61
Real-to-complex two-dimensional FFT SRC2FT/DRC2FT Name SRC2FT/DRC2FT Real-to-complex two-dimensional FFT Purpose Given a set of two-dimensional real data, these subprograms compute the nonredundant portion of the complex, two-dimensional, forward DFT. Alternatively, given the nonredundant part of a conjugate-symmetric two-dimensional complex data set, these subprograms compute the real inverse discrete Fourier transform.
PAGE 62
SRC2FT/DRC2FT Real-to-complex two-dimensional FFT For best performance, these subprograms require that l1 and l2 be products of powers of 2, 3, and 5, that is, of the form p k q k rk lk = 2 3 5 , where pk, qk, rk ≥ 0, k = 1, 2, and where either l1=1 or l1 is even.
PAGE 63
Real-to-complex two-dimensional FFT Output x SRC2FT/DRC2FT The transformed data replaces the input if ier = 0 is returned. For a forward real-to-complex transform, the real part of Z(m1,m2) is stored in x(2×m1−1,m2)and the imaginary part is stored in x(2×m1,m2), m1 = 1, 2, ..., l1/2+1, m2 = 1, 2, ..., l2. If needed, the remaining (l1/2 − 1) × l2 complex output values may be formed by using the conjugate-symmetry condition.
PAGE 64
CRC3FT/ZRC3FT Real-to-complex three-dimensional FFT Name CRC3FT/ZRC3FT Real-to-complex three-dimensional FFT Purpose These subprograms compute either the forward real-to-complex or the inverse complex-to-real, three-dimensional DFT. A pair of companion subprograms, SRC3FT and DRC3FT, performs a similar operation but in a space-conserving manner. The three-dimensional complex forward DFT of a real data set z(n1,n2,n3), for n1 = 1, 2, ..., l1, n2 = 1, 2, ..., l2, and n3 = 1, 2, ...
PAGE 65
Real-to-complex three-dimensional FFT CRC3FT/ZRC3FT Alternatively, if Z(m1,m2,m3), for m1 = 1, 2, ..., l1, m2 = 1, 2, ..., l2, and m3 = 1, 2, ...
PAGE 66
CRC3FT/ZRC3FT Real-to-complex three-dimensional FFT l1 l2 l3 ldz mdz iopt Output z ier 578 HP MLIB User’s Guide Number of rows of data, l1 > 0. Number of columns of data, l2 > 0. Number of planes of data, l3 > 0. The leading dimension of array z, with ldz ≥ l1. The middle dimension of array z, with mdz ≥ l2. Option flag: iopt ≥ 0 Compute forward transform. iopt < 0 Compute inverse transform. The transformed data replaces the input if ier = 0 is returned.
PAGE 67
Real-to-complex three-dimensional FFT SRC3FT/DRC3FT Name SRC3FT/DRC3FT Real-to-complex three-dimensional FFT Purpose Given a set of three-dimensional real data, these subprograms compute the nonredundant portion of the complex three-dimensional forward DFT. Alternatively, given the nonredundant part of a conjugate-symmetric, three-dimensional, complex data set, these subprograms compute the real inverse DFT.
PAGE 68
SRC3FT/DRC3FT Real-to-complex three-dimensional FFT Alternatively, the three-dimensional, real, inverse discrete Fourier transform of Z(m1,m2,m3), for m1 = 1, 2, ..., l1, m2 = 1, 2, ..., l2, and m3 = 1, 2, ..., l3, is defined by 1 z ( n 1, n 2, n 3 ) = -------------l1 l2 l3 ×e +2πi ( m 1 – 1 ) ( n 1 – 1 ) ⁄ l 1 l1 ∑ l2 ∑ l3 ∑ Z ( m 1, m 2, m 3 ) m1 = 1 m2 = 1 m3 = 1 e +2πi ( m 2 – 1 ) ( n 2 – 1 ) ⁄ l 2 e +2πi ( m 3 – 1 ) ( n 3 – 1 ) ⁄ for n1 = 1, 2, ..., l1, n2 = 1, 2, ...
PAGE 69
Real-to-complex three-dimensional FFT SRC3FT/DRC3FT For an inverse complex-to-real transform, the real part of Z(m1,m2,m3) is stored in x(2×m1−1,m2,m3)and the imaginary part is stored in x(2×m1,m2,m3), m1 = 1, 2, ..., l1/2+1, m2 = 1, 2, ..., l2, m3 = 1, 2, ..., l3. l1 l2 l3 ldx mdx iopt Output x Number of rows of data, l1 > 0. Number of columns of data, l2 > 0. Number of planes of data, l3 > 0. The leading dimension of array x, with ldx ≥ l1+2. The middle dimension of array x, with mdx ≥ l2.
PAGE 70
CRCFTS/ZRCFTS Simultaneous real-to-complex one-dimensional FFT Name CRCFTS/ZRCFTS Simultaneous real-to-complex one-dimensional FFT Purpose Given a number of one-dimensional data sets, these subprograms compute all of their one-dimensional forward real-to-complex or inverse complex-to-real discrete Fourier transforms. A pair of companion subprograms, SRCFTS and DRCFTS, performs the same operation, but in a space-conserving manner.
PAGE 71
Simultaneous real-to-complex one-dimensional FFT Usage CRCFTS/ZRCFTS VECLIB: INTEGER*4 l, incl, n, incn, iopt, ier COMPLEX*8 z(lenz) CALL CRCFTS(z, l, incl, n, incn, iopt, ier) INTEGER*4 l, incl, n, incn, iopt, ier COMPLEX*16 z(lenz) CALL ZRCFTS(z, l, incl, n, incn, iopt, ier) VECLIB8: INTEGER*8 l, incl, n, incn, iopt, ier COMPLEX*8 z(lenz) CALL CRCFTS(z, l, incl, n, incn, iopt, ier) INTEGER*8 l, incl, n, incn, iopt, ier COMPLEX*16 z(lenz) CALL ZRCFTS(z, l, incl, n, incn, iopt, ier) Input z Array con
PAGE 72
CRCFTS/ZRCFTS Simultaneous real-to-complex one-dimensional FFT iopt 584 HP MLIB User’s Guide Option flag: iopt ≥ 0 iopt < 0 Compute forward transform. Compute inverse transform.
PAGE 73
Simultaneous real-to-complex one-dimensional FFT Output z ier Notes CRCFTS/ZRCFTS The transformed data replaces the input if ier = 0 is returned. Unchanged if ier < 0. For a forward real-to-complex transform, each transformed data set satisfies the conjugate-symmetry condition. For an inverse complex-to-real transform, the real result is stored in the real parts of z and the imaginary parts of z are set to zero. Status response: ier = 0 Normal return—transform successful. ier = −1 l ≤ 0.
PAGE 74
CRCFTS/ZRCFTS Simultaneous real-to-complex one-dimensional FFT Similarly, if the subscript that varies between data sets is the 1st 2nd 3rd subscript, use subscript, use subscript, use incn = 1. incn = ldz. incn = ldz×mdz. l, incl, n, and incn must be such that no two points of any data sets occupy the same element of z. These subprograms detect this situation and return ier = −5 if incl < n × gcd ( incl, incn ) and incn < l × gcd ( incl, incn ) , where gcd(.,.) is the greatest common divisor.
PAGE 75
Simultaneous real-to-complex one-dimensional FFT SRCFTS/DRCFTS Name SRCFTS/DRCFTS Simultaneous real-to-complex one-dimensional FFT Purpose Given a number of one-dimensional real data sets, these subprograms compute nonredundant portions of all of their one-dimensional forward real-to-complex discrete Fourier transforms.
PAGE 76
SRCFTS/DRCFTS Simultaneous real-to-complex one-dimensional FFT These subprograms perform forward real-to-complex or inverse complex-to-real transform operations simultaneously on a number of data sets. For best performance, they require that the length l of the data sets be a product of powers of 2, 3, and 5, that is, of the form p q r l = 2 3 5 , where p, q, r ≥ 0, and where either l = 1 or l is even.
PAGE 77
Simultaneous real-to-complex one-dimensional FFT SRCFTS/DRCFTS For an inverse complex-to-real transform, the real part of the i-th data point of the j-th data set, 1 ≤ i ≤ l/2+1, 1 ≤ j ≤ n, is stored in x ( ( 2 × i – 2 ) × incl + ( j – 1 ) × incn + 1 ) and the imaginary part is stored in x ( ( 2 × i – 1 ) × incl + ( j – 1 ) × incn + 1 ) l incl n incn iopt Output x respectively. Number of data points in each complete data set, l > 0.
PAGE 78
SRCFTS/DRCFTS Simultaneous real-to-complex one-dimensional FFT ier Notes Status response: ier = 0 Normal return—transform successful. ier = −1 l ≤ 0. ier = −2 incl ≤ 0. ier = −3 n ≤ 0. ier = −4 incn ≤ 0. ier = −5 l, incl, n, and incn are incompatible. Refer to “Notes.” Typically, x will be a two- or three-dimensional array with each set of data being a one-dimensional section of the array, that is, all but one subscript will be constant within a data set.
PAGE 79
Simultaneous real-to-complex one-dimensional FFT SRCFTS/DRCFTS l, incl, n, and incn must be such that no two points of any data sets occupy the same elements of x. These subprograms detect this situation and return ier = −5 if incl < n × gcd ( incl, incn ) and incn < ( l + 2 ) × gcd ( incl, incn ) , where gcd(.,.) is the greatest common divisor. Example 1 Compute the forward real-to-complex discrete Fourier transform of 256 real data sets of length 1024.
PAGE 80
SRCFTS/DRCFTS 592 HP MLIB User’s Guide Simultaneous real-to-complex one-dimensional FFT
PAGE 81
Overview 6 Correlation and Convolution Subprograms Overview This chapter explains how to use the VECLIB subprograms available for correlations and convolutions. The following document provides supplemental material for this chapter: Rabiner, L.R., and B. Gold. Theory and Application of Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, Inc. 1975. Chapter objectives After reading this chapter you will know how to use the described subprograms to compute correlation and convolution.
PAGE 82
Subprograms for correlations and convolutions Correlation and convolution Subprograms for correlations and convolutions The following sections describe subprograms included with VECLIB for correlations and convolutions. Name SCONV/DCONV Correlation and convolution Purpose These subprograms compute the fully engaged portion of the discrete correlation or discrete convolution of two real data sequences.
PAGE 83
Correlation and convolution Usage SCONV/DCONV VECLIB: INTEGER*4 incx, incw, incy, m, n REAL*4 x(lenx), w(lenw), y(leny) CALL SCONV(x, incx, w, incw, y, incy, m, n) INTEGER*4 incx, incw, incy, m, n REAL*8 x(lenx), w(lenw), y(leny) CALL DCONV(x, incx, w, incw, y, incy, m, n) VECLIB8: INTEGER*8 incx, incw, incy, m, n REAL*4 x(lenx), w(lenw), y(leny) CALL SCONV(x, incx, w, incw, y, incy, m, n) INTEGER*8 incx, incw, incy, m, n REAL*8 x(lenx), w(lenw), y(leny) CALL DCONV(x, incx, w, incw, y, incy, m, n) Inpu
PAGE 84
SCONV/DCONV Correlation and convolution that is, if wj is stored in w(j). In this case, in order to index backward through the w array, the w argument is usually coded as w(n). Refer to “Notes” on page 597 and “Example 2” on page 598.
PAGE 85
Correlation and convolution incy SCONV/DCONV Increment for the array y, incy ≠ 0. ỹ k is stored in y((k−1)×incy+1). incy will normally be positive whether computing the correlation or the convolution. Use incy = 1 if the vector y is stored contiguously in array y, that is, if m n Output Notes y ỹ k is stored in y(k). Refer to “Notes.” The length of the y vector. m > 0. The length of the w vector. n > 0. The convolution or correlation vector y of length m. leny = (m−1)×|incy|+1.
PAGE 86
SCONV/DCONV Example 2 Correlation and convolution Compute the complete discrete convolution of the REAL*4 vectors x = (1, 3, 5) and w = (2, 1) stored in arrays X and W, respectively. To get the complete convolution, append n−1 = 1 zeros to each end of x, getting x = (0, 1, 3, 5, 0) . Then, you have n = 2 and m+n−1 = 5, so m = 4. INTEGER*4 INCX,INCW,INCY,M,N REAL*4 X(5),W(2),Y(4) DATA X / 0.0 , 1.0 , 3.0 , 5.0, 0.0 / DATA W / 2.0 , 1.
PAGE 87
Tapered correlation and convolution STCONV/DTCONV Name STCONV/DTCONV Tapered correlation and convolution Purpose These subprograms compute the discrete tapered correlation or discrete tapered convolution of two real data sequences. If xi and wi, i = 1, 2, ..., n are two data sequences, their discrete correlation yk is defined by yk = ∑ xk + i – 1 wi , i for k = −n+2, −n+3, ..., n, where the sum is taken over all indices i for which both xk+i−1 and wi are defined. Again, if xi and wi, i = 1, 2, ...
PAGE 88
STCONV/DTCONV Tapered correlation and convolution VECLIB8: INTEGER*8 incx, incw, incy, m, n REAL*4 x(lenx), w(lenw), y(leny) CALL STCONV(x, incx, w, incw, y, incy, m, n) INTEGER*8 incx, incw, incy, m, n REAL*8 x(lenx), w(lenw), y(leny) CALL DTCONV(x, incx, w, incw, y, incy, m, n) 600 HP MLIB User’s Guide
PAGE 89
Tapered correlation and convolution Input x incx STCONV/DTCONV Array containing the operand (or trace) vector x of length n. lenx = (n−1)×|incx|+1. Increment for the array x, incx ≠ 0. xi is stored in x((i−1)×incx+1). incx is normally positive whether computing the correlation or the convolution. Use incx = 1 if the vector x is stored contiguously in array x, that is, if xi is stored in x(i). Refer to “Notes.” w incw Array containing the operator (or kernel) vector w of length n.
PAGE 90
STCONV/DTCONV Output Tapered correlation and convolution The tapered convolution or correlation vector y of length m. leny = (m−1)×|incy|+1. y Notes These subprograms do not use the BLAS indexing scheme described in “BLAS Indexing Conventions” in the introduction to Chapter 2. Indexing works the same way as the BLAS for a positive increment, but for a negative increment, these subprograms access data backward from the beginning of the array passed to it rather than from the end.
PAGE 91
Overview 7 Miscellaneous Routines Overview This chapter explains how to use VECLIB subprograms for a variety of operations that are not included in other chapters.
PAGE 92
Miscellaneous subprograms Measure CPU time Miscellaneous subprograms The following sections describe miscellaneous subprograms included with VECLIB. Name CPUTIME Measure CPU time Purpose CPUTIME returns the total amount of CPU time used by a process. Both user time and system time are included. The time unit is seconds and the resolution of the timer is 0.01 seconds. CPUTIME also makes it easy to time a segment of code in a program.
PAGE 93
BLAS Standard error handler Name F_BLASERROR F_BLASERROR BLAS Standard error handler Purpose F_BLASERROR is an error handler for the BLAS Standard routines. It is called by a BLAS Standard routine when an input parameter has an invalid value. F_BLASERROR prints an error message and execution stops. You may supply a version of F_BLASERROR that modifies the action performed.
PAGE 94
LSAME return .TRUE Name LSAME return .TRUE Purpose LSAME returns .TRUE if CA is the same letter as CB regardless of case. Usage Input CHARACTER*1 CA, CB CALL LSAME(CA, CB) CA, CB CA and CB specify the single characters to be compared.
PAGE 95
Determine permitted extent of parallelism MLIB_GETNUMTHREADS Name MLIB_GETNUMTHREADS Determine permitted extent of parallelism Purpose This subprogram is used by all parallelized MLIB subprograms to determine the extent to which they may use shared-memory parallelism.
PAGE 96
MLIB_SETNUMTHREADS Control extent of parallelism Name MLIB_SETNUMTHREADS Control extent of parallelism Purpose This subprogram is used to control the extent to which parallelized MLIB subprograms may use shared-memory parallelism. nthread becomes internal state for MLIB_GETNUMTHREADS. Only one value of nthread is saved within a UNIX process; subsequent calls to MLIB_SETNUMTHREADS overwrite the previous value.
PAGE 97
Control extent of parallelism MLIB_SETNUMTHREADS If a parallel VECLIB subprogram is called from a parallelized loop or region, VECLIB will automatically avoid over-subscription of the CPUs. The number of threads spawned by each call to a parallelized VECLIB subroutine on a nested parallel region is limited by MLIB_NUMBER_OF_THREADS, by the number of threads still available in the system, and will never be larger than four.
PAGE 98
RAN VAX-compatible random numbers Name RAN VAX-compatible random numbers Purpose This subprogram produces uniform [0,1) pseudorandom numbers identical to those generated by the VAX random-number generator, RAN. RAN is identical in usage to its VAX counterpart. RAN returns one random number per call, while the companion subroutine RANV returns an array of random numbers per call. For details about RANV, refer to “RANV” on page 612.
PAGE 99
VAX-compatible random numbers Notes RAN Starting a sequence twice with the same value of iseed will produce the same pseudorandom sequence. You may have as many independent sequences going at a time as you desire by having a different iseed for each one. The subprograms treat iseed as an unsigned 32-bit quantity.
PAGE 100
RANV VAX-compatible random numbers Name RANV VAX-compatible random numbers Purpose This subprogram produces uniform [0,1) pseudorandom numbers identical to those generated by the VAX random-number generator, RAN. RANV returns an array of random numbers per call while the companion subroutine RAN returns one random number per call. For details about RAN, refer to “RAN” on page 610. These generators are based on the linear congruent method introduced by D. H.
PAGE 101
VAX-compatible random numbers Output iseed x RANV The seed that produces the next pseudorandom number in the sequence replaces the input seed. The next n pseudorandom numbers in the sequence are stored in the first n elements of x.
PAGE 102
RANV Notes VAX-compatible random numbers CALL RANV (iseed, n, x) produces the same n pseudorandom numbers as returned by n successive references to RAN(iseed). CALL RANV (iseed, n, x) produces the same value of iseed that would have resulted from the last of n successive references to RAN(iseed). Starting a sequence twice with the same value of iseed will produce the same pseudorandom sequence.
PAGE 103
Scalar long period random number generator SRAN/DRAN Name SRAN/DRAN Scalar long period random number generator Purpose These subprograms produce a sequence of uniformly-distributed [0,1) pseudorandom numbers with a period of 248. Functions SRAN and DRAN return one random number per call, while companion subprograms SRANV and DRANV, also documented in this chapter, return an array of random numbers per call. These generators are based on the linear congruential method introduced by D. H.
PAGE 104
SRAN/DRAN Notes Scalar long period random number generator Starting a sequence twice with the same value of iseed will produce the same pseudorandom sequence. You may have as many independent sequences going at a time as you desire by having a different iseed for each one. The subprograms treat iseed as an unsigned 48-bit quantity.
PAGE 105
Vector long period random number generator SRANV/DRANV Name SRANV/DRANV Vector long period random number generator Purpose These subprograms produce a sequence of uniformly-distributed [0,1) pseudorandom numbers with a period of 248. Subroutines SRANV and DRANV return an array of random numbers per call, while companion functions SRAN and DRAN return one random number per call. These generators are based on the linear congruent method introduced by D. H.
PAGE 106
SRANV/DRANV Vector long period random number generator incx Increment for the array x: incx ≥ 0 x is stored forward in array x; that is, xi is stored in x((i−1)×incx+1). incx < 0 x is stored backward in array x; that is, xi is stored in x((i−n)×incx+1). Use incx = 1 if the vector x is stored contiguously in array x, that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in the introduction to Chapter 2.
PAGE 107
Vector long period random number generator Example SRANV/DRANV Fill an array X that is 10 elements long with a sequence of pseudorandom numbers using the vector subprogram SRANV. INTEGER*8 ISEED INTEGER*4 N,INCX REAL*4 X(10) ISEED = 1234 ! STARTING SEED N = 10 INCX = 1 CALL SRANV (ISEED,N,X,INCX) This fills array X with the same sequence of pseudorandom numbers and also returns the same final value of ISEED, as does the “Example” on page 616 for SRAN.
PAGE 108
SSORT/DSORT/ISORT Sort array Name SSORT/DSORT/ISORT Sort array Purpose These subprograms use a quicksort algorithm to sort elements of a vector into ascending or descending algebraic order. The vector may be stored in a one-dimensional array or in either a row or a column of a two-dimensional array.
PAGE 109
Sort array SSORT/DSORT/ISORT n x incx Number of elements of vector x to be sorted. If n ≤ 1, the subprograms do not reference x. Array of length lenx = (n−1)×|incx|+1 containing the data to be sorted. Increment for the array x. x is stored forward in array x with increment |incx|; that is, xi is stored in x((i−1)×|incx|+1). Use incx = 1 if the vector x is stored contiguously in array x, that is, if xi is stored in x(i). Refer to “BLAS Indexing Conventions” in Chapter 2.
PAGE 110
WALLTIME Measure wall-clock time Name WALLTIME Measure wall-clock time Purpose WALLTIME returns the current wall-clock time. A base time is set during the first execution of WALLTIME, and the return value is the elapsed time from this base. The time unit is seconds, and the resolution of the timer is microseconds. WALLTIME also makes it easy to time a segment of code in a program.
PAGE 111
VECLIB error handler XERVEC Name XERVEC VECLIB error handler Purpose XERVEC is the error handler for many of the subprograms in the VECLIB library, as indicated in the “Notes” section in the subprogram descriptions.
PAGE 112
XERVEC VECLIB error handler iarg messag 624 HP MLIB User’s Guide If iarg > 0, the error message is printed in the first form given above and iarg is the number of the argument that was found to be in error. If iarg = 0, the error message is printed in the second form shown and iarg is not part of the error message. If iarg < 0, the error message is printed in the third form shown and iarg is the error number. The text of the error message to be printed if iarg ≤ 0. Not used as input if iarg > 0.
PAGE 113
Part 2 HP LAPACK
PAGE 114
PAGE 115
Overview 8 Introduction to LAPACK Overview LAPACK, a component of HP MLIB, is a collection of Fortran-callable subprograms that provides mathematical software for application programs involving linear algebra, including linear equations, least squares, eigenvalue problems, and the singular value decomposition. The package supersedes LINPACK and EISPACK.
PAGE 116
Chapter objectives Refer to “Associated documentation” on page 628 for information about the LAPACK Users’ Guide.
PAGE 117
Accessing LAPACK Accessing LAPACK The LAPACK and LAPACK8 libraries are available as an archive and a shared library. It consists of compiled subprograms ready for you to incorporate into your programs with the linker. To use LAPACK, include the appropriate declarations and CALL statements in your Fortran source program, and specify that LAPACK or LAPACK8 be used as an object library at link time.
PAGE 118
Accessing LAPACK Processor Type OS Version Address Width Itanium 2 Red Hat Linux Enterprise 3 Intel IA-32 V8.0 Compiler 32-bit Itanium 2 HP XC6000 Intel V8.0 Compiler 64-bit Itanium 2 HP XC6000 Intel V7.1 Compiler 64-bit Itanium 2 HP XC4000 PGI V5.1 Compiler 64-bit Itanium 2 Opteron PGI V5.1 Compiler 64-bit Installation Directory Libraries liblapack.a /opt/mlib/intel_8.0/hpmpi_2.1/lib/32 liblapack.so liblapack8.a liblapack8.so liblapack.a /opt/mlib/intel_8.0/hpmpi_2.1/lib/64 liblapack.
PAGE 119
Accessing LAPACK Compiling and linking (LAPACK) There are several ways to link your program with the version of LAPACK tuned for the machine on which the program runs.By default, the −llapack option on the f90, cc, or c89 command line that links your program selects the library that corresponds to 32-bit addressing on the machine on which you link. cc is the HP C compiler and c89 is the HP POSIX-conforming C compiler. The remainder of this book refers to the cc and f90 compilers.
PAGE 120
Accessing LAPACK cc [options] file ... /opt/mlib/lib/[pa2.0|pa20_64]/liblapack.a −lcl −lm aCC [options] file ... /opt/mlib/lib/[pa2.0|pa20_64]/liblapack.a −lcl −lm Replace liblapack.a with liblapack.sl on your compiler command line if you want to link the shared library on a PA-based system. 3. Use the −llapack option on the compiler command line that links your program, preceded by: −Wl,–aarchive_shared,−L/opt/mlib/lib/[pa2.
PAGE 121
Accessing LAPACK • +autodbl promotes all integer, logical, and real items to 8 bytes, and all double-precision and complex items to 16 bytes. • +autodbl4 promotes all integer logical, and real items to 8 bytes, and complex items to 16 bytes. The +autodbl4 option does not promote the size of double-precision and double-complex items. f90 +DA2.0W +i8 [options] file ... −Wl,–aarchive_shared −llapack8 cc +DA2.0W [options] file ... −Wl,–aarchive_shared −llapack8 −lcl −lm aCC +DA2.0W [options] file ...
PAGE 122
Accessing LAPACK −Wl,–aarchive_shared,−L/opt/mlib/lib/[hpux32|hpux64] For example, the command lines in Method 2 for HP-UX could be written: f90 [+DD32|+DD64] [options] file...−Wl,–aarchive_shared,−L/opt/mlib/lib/ [hpux32|hpux64] −llapack cc [+DD32|+DD64] [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/lib/ [hpux32|hpux64] −llapack −lcl −lm aCC [+DD32|+DD64] [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/lib/ [hpux32|hpux64] −llapack −lcl −lm 4.
PAGE 123
Accessing LAPACK aCC +DD64 [options] file ... −Wl,–aarchive_shared −llapack8 −lcl −lm For Itanium-Based Red Hat Linux Systems 1. To link a program that uses LAPACK for use on the same machine, use one of the following commands: ifort [options] file ... –Wl,–Bstatic −llapack −openmp icc [options] file ... –Wl,–Bstatic −llapack −openmp 2. Specify the entire path of the library file on the compiler command line that links your program.
PAGE 124
Accessing LAPACK ifort [options] file ... −llapack −openmp icc [options] file ... −llapack −openmp NOTE An LDOPTS specification takes precedence over using -Wl on the compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that specification with a -Wl option on your compiler command line. 5. Use the following commands to link your programs for use with the LaPACK8 library on a Red Hat Linux system. ifort −i8 [options] file ...
PAGE 125
Accessing LAPACK ifort [options] file...−Wl,–Bstatic,−L/opt/mlib/lib/intel_8.0/ hpmpi_2.1/lib/64 −llapack −openmp icc [options] file ... −Wl,–Bstatic,−L/opt/mlib/lib/intel_8.0/ hpmpi_2.1/lib/64 −llapack −openmp 4. Set the LDOPTS environment variable to include: –Bstatic,−L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64 For example: setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64” Then use the −llapack option on the compiler command line that links your program: ifort [options] file ...
PAGE 126
Accessing LAPACK For example, the command lines in Method 2 for XC6000 could be written: efc [options] file...−Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/ lib/64 −llapack −openmp icc [options] file ...−Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/ lib/64 −llapack −openmp 4. Set the LDOPTS environment variable to include: –Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64 For example: setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/intel_7.1/hpmpi_2.
PAGE 127
Accessing LAPACK −Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64 For example, the command lines in Method 2 for XC4000 could be written: pgf90 [options] file...−Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/ hpmpi_2.1/lib/64 −llapack −mp pgcc [options] file ...−Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/ lib/64 −llapack −mp −lpgf90 −lpgf90_rpml −lpgf902 −lpgf90rtl −lpgftnrtl 4. Set the LDOPTS environment variable to include: –Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.
PAGE 128
Accessing LAPACK PA-RISC processors Suppose you write your own Fortran version of the XERBLA subroutine. You compile on PA-RISC with +DA2.0W and the compiler turns on +ppu (See Table 10-1), so your xerbla.o object file has an entry point xerbla_. However, the PA-RISC MLIB libraries were compiled with +noppu and the entry points with underbars were added as synonyms to the entry points without underbars.
PAGE 129
Accessing LAPACK from the library. This file contains both entry points xerbla and xerbla_, and xerbla conflicts with your XERBLA subroutine. Moreover, the various MLIB C source subprograms that call XERBLA still reference them as xerbla (without the postpended underbar). It means that both xerbla and xerbla_ entry points must be provided in your xerbla.o.
PAGE 130
Optimizing • Another MLIB subprogram used by your program also calls that subprogram PA-RISC and Itanium processors Also note that all the workarounds listed for Itanium are more generic and can also be used for PA-RISC. Therefore, if you are coding your own version of a MLIB routine called "mlib_routine" on PA-RISC or Itanium, a Fortran version might be implemented as: !$HP$ ALIAS mlib_routine='mlib_routine' !$HP$ ALIAS mlib_routine_='mlib_routine_' SUBROUTINE mlib_routine(...) ENTRY mlib_routine_(...) .
PAGE 131
Parallel processing Parallel processing Parallel processing is available on multi-processor HP platforms running the HP-UX version 11i or greater operating system. These systems can divide a single computational process into small streams of execution, called threads. The result is that you can have more than one processor executing on behalf of the same process. Also, refer to Appendix C, “Parallelized Subprograms,” for a list of HP LAPACK parallelized subprograms.
PAGE 132
Parallel processing The following command lines show the C shell syntax and Korn shell syntax to use when setting the variable to eight processors: For C shell: setenv MLIB_NUMBER_OF_THREADS 8 For Korn shell: export MLIB_NUMBER_OF_THREADS=8 MLIB_NUMBER_OF_THREADS is examined upon the first call to a parallelized LAPACK subprogram to establish the default parallel action within LAPACK.
PAGE 133
Parallel processing • Call any parallelized LAPACK subprogram. Let it use parallelism internally if it determines that it is appropriate to do so, based on such factors as problem size, system configuration, and user environment. • Call LAPACK subprograms in a parallelized loop or region. LAPACK supports nested parallelism where the outer parallelism is implemented through OpenMP while the inner parallelism is implemented with LAPACK SMP parallelism.
PAGE 134
Parallel processing else call dgemm(‘n’, ‘n’, m, m, m, alpha, d, ldd, e, lde, beta, f,ldf) endif c$omp end parallel call omp_set_nested(.false.) ... Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel: one OpenMP thread for C = αAB + βC and another for F = αDE + βF Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism and run the code four-way parallel.
PAGE 135
Parallel processing Assume the application started on two MPI processes. Using MLIB_NUMBER_OF_THREADS set to 1, the code would run two-way parallel: one MPI process for C = αAB + βC and another for F = αDE + βF Setting MLIB_NUMBER_OF_THREADS to 2 would allow nested parallelism and run the code four-way parallel. Default CPS library stack is too small for MLIB In libcps, the HP Compiler Parallel Support library, a CPS thread has a default stack size of 8M bytes.
PAGE 136
Rounding off Currently, 8 MB*(the number of threads) should be sufficient for all MLIB subprograms. If your application launchs threads directly from pthread library calls, set the minimum allocated stack size to match HP MLIB needs on each new thread. Setting the stack size on HP-UX as follows would be sufficient for programs that execute on two threads: int stacksize = 8388608; (...
PAGE 137
Error handling The performance of these subprograms can be improved, often dramatically, by providing at least the optimum amount of work space. This amount can be a complicated function of the problem size and job arguments, so you should make lwork large enough or compare it to the value returned in work(1).
PAGE 138
Troubleshooting Troubleshooting Here are some suggestions to help you find the cause of a problem: 1. Verify that the subprogram usage in the program matches the subprogram specifications in the documentation. Pay attention to the number of arguments in the CALL statement and to the declarations of arrays and integer constants or variables that describe them. Write out all the arguments immediately before and after the CALL statement. 2. Make sure there really is a problem.
PAGE 139
Overview 9 LAPACK Auxiliary Subprograms Overview This chapter describes selected LAPACK auxiliary subprograms. Although the auxiliary subprograms are a part of the public domain release of LAPACK, the Hewlett-Packard implementation of LAPACK does not support all of them. They are internal to the package and subject to change. The auxiliary subprograms described in this chapter, however, are supported as part of LAPACK and will exist in subsequent releases of the product.
PAGE 140
Associated documentation Associated documentation The following documents provide supplemental material for this chapter: • Anderson et al., LAPACK Users’ Guide. For the latest edition of the LAPACK Users’ Guide, refer to the Netlib repository at the following URL: http://www.netlib.org/lapack/lug/index.html • Golub, G.H. and C.F. Van Loan. Matrix Computations, 2nd Edition. Baltimore, MD: The Johns Hopkins University Press. 1989.
PAGE 141
What you need to know to use these subprograms Definition: A matrix norm on Rm×n, the vector space of m-by-n real matrices, is a function f:Rm×n → R that has the following properties: f(A) ≥ 0 f(A) = 0 A ε Rm×n if and only if A = 0 f(A+B) ≤ f(A)+f(B) A, B ε Rm×n f(αA) = |α|f(A) α ε R, A ε Rm×n As with vector norms, f is denoted with the double-bar notation: f(A) = ||A||, again using subscripts to designate different matrix norms.
PAGE 142
What you need to know to use these subprograms The norm-computing auxiliary subprograms in LAPACK will evaluate the 1-, ∞-, Frobenius-, or ∆-norms of a matrix stored in a variety of forms, shown in Table 11-1.
PAGE 143
Return machine-dependent parameters Auxiliary subprograms Auxiliary subprograms Following are the auxiliary subprograms included with LAPACK. Name SLAMCH/DLAMCH Return machine-dependent parameters Purpose These subprograms return machine-dependent parameters, thus promoting machine independence in LAPACK.
PAGE 144
SLAMCH/DLAMCH Output Notes Return machine-dependent parameters name = ’E’ or ’e’ eps name = ’S’ or ’s’ sfmin name = ’B’ or ’b’ name = ’P’ or ’p’ name = ’N’ or ’n’ base prec t name = ’R’ or ’r’ rnd name = ’M’ or ’m’ emin name = ’U’ or ’u’ rmin name = ’L’ or ’l’ name = ’O’ or ’o’ emax rmax value The relative machine precision: the smallest number ε such that 1 + ε can be represented as a floating-point number with 1 + ε > 1.
PAGE 145
Compute norm of general band matrix SLANGB/DLANGB/CLANGB/ZLANGB Name SLANGB/DLANGB/CLANGB/ZLANGB Compute norm of general band matrix Purpose These subprograms compute the norm of an m-by-n general band matrix A. A band matrix is a matrix whose nonzero elements all lie near the principal diagonal. Specifically, aij = 0 if i−j > kl or j−i > ku for some integers kl and ku.
PAGE 146
SLANGB/DLANGB/CLANGB/ZLANGB Compute norm of general band matrix Note that this storage format omits the first kl rows reserved for fill-in in the general band storage for _GBSV and _GBTRF.
PAGE 147
Compute norm of general band matrix SLANGB/DLANGB/CLANGB/ZLANGB CHARACTER*1 norm INTEGER*8 kl, ku, ldab, n REAL*4 rwork(n) COMPLEX*8 ab(ldab, n) REAL*4 anorm, CLANGB anorm = CLANGB(norm, n, kl, ku, ab, ldab, rwork) CHARACTER*1 norm INTEGER*8 kl, ku, ldab, n REAL*8 rwork(n) COMPLEX*16 ab(ldab, n) REAL*8 anorm, ZLANGB anorm = ZLANGB(norm, n, kl, ku, ab, ldab, rwork) Input norm Specifies which norm is to be computed, as follows: norm = ’F’, ’f’, ’E’, or ’e’ Compute ||A||F = the Frobenius norm = ’I’ or ’i’
PAGE 148
SLANGB/DLANGB/CLANGB/ZLANGB Output Notes anorm Compute norm of general band matrix The function value is the value of the requested norm of A. Actual character arguments in a subroutine call may be longer than the corresponding dummy arguments. Therefore, readability of the CALL statement may be improved by coding the norm argument as ’Frobenius’ for ’F’, ’Infinity-Norm’ for ’I’, ’One-norm’ or ’1-norm’ for ’O’, or ’Max-Element’ for ’M’.
PAGE 149
Compute norm of general matrix SLANGE/DLANGE/CLANGE/ZLANGE Name SLANGE/DLANGE/CLANGE/ZLANGE Compute norm of general matrix Purpose These subprograms compute a norm of a general m-by-n matrix A.
PAGE 150
SLANGE/DLANGE/CLANGE/ZLANGE Compute norm of general matrix CHARACTER*1 norm INTEGER*8 lda, m, n REAL*4 rwork(n) COMPLEX*8 a(lda, n) REAL*4 anorm, CLANGE anorm = CLANGE(norm, m, n, a, lda, rwork) CHARACTER*1 norm INTEGER*8 lda, m, n REAL*8 rwork(n) COMPLEX*16 a(lda, n) REAL*8 anorm, ZLANGE anorm = ZLANGE(norm, m, n, a, lda, rwork) Input norm Specifies which norm is to be computed, as follows: norm = ’F’, ’f’, ’E’, or ’e’ Compute ||A||F = the Frobenius norm = ’I’ or ’i’ norm.
PAGE 151
Compute norm of general tridiagonal matrix SLANGT/DLANGT/CLANGT/ZLANGT Name SLANGT/DLANGT/CLANGT/ZLANGT Compute norm of general tridiagonal matrix Purpose These subprograms compute a norm of a general tridiagonal matrix A. A matrix A = (aij) is tridiagonal if its nonzero elements lie only on the principal diagonal (i = j), the subdiagonal (i = j+1), and the superdiagonal (i = j−1) of the matrix. Matrix Storage The following example illustrates the storage of general tridiagonal matrices.
PAGE 152
SLANGT/DLANGT/CLANGT/ZLANGT Compute norm of general tridiagonal matrix CHARACTER*1 norm INTEGER*4 n REAL*8 d(n), dl(n−1), du(n−1) REAL*8 anorm, DLANGT anorm = DLANGT(norm, n, dl, d, du) CHARACTER*1 norm INTEGER*4 n COMPLEX*8 d(n), dl(n−1), du(n−1) REAL*4 anorm, CLANGT anorm = CLANGT(norm, n, dl, d, du) CHARACTER*1 norm INTEGER*4 n COMPLEX*16 d(n), dl(n−1), du(n−1) REAL*8 anorm, ZLANGT anorm = ZLANGT(norm, n, dl, d, du) LAPACK8: CHARACTER*1 norm INTEGER*8 n REAL*4 d(n), dl(n−1), du(n−1) REAL*4 anorm, SLAN
PAGE 153
Compute norm of general tridiagonal matrix SLANGT/DLANGT/CLANGT/ZLANGT norm = ’F’, ’f’, ’E’, or ’e’ Compute ||A||F = the Frobenius n dl d du Output Notes anorm norm = ’I’ or ’i’ norm. Compute ||A||∞ = maximum norm = ’1’, ’O’, or ’o’ row sum. Compute ||A||1 = maximum norm = ’M’ or ’m’ column sum. Compute max(|Aij|). The order of the matrix A. n ≥ 0. The n−1 subdiagonal elements of A. The diagonal elements of A. The n−1 superdiagonal elements of A.
PAGE 154
SLANSB/DLANSB/CLANHB/CLANSB/ZLANHB/ZLANSB Compute norm of symmetric or Hermitian band matrix Name SLANSB/DLANSB/CLANHB/CLANSB/ZLANHB/ZLANSB Compute norm of symmetric or Hermitian band matrix Purpose These subprograms compute a norm of a real or complex symmetric or complex Hermitian band matrix A. A real matrix is symmetric if A = AT, its transpose; a complex matrix is Hermitian if A = A*, its conjugate transpose. Tridiagonal matrices are the special case kd = 1.
PAGE 155
Compute norm of symmetric or Hermitian band matrix SLANSB/DLANSB/CLANHB/CLANSB/ZLANHB/ZLANSB The asterisks represent elements in the kd-by-kd triangle at the upper left corner of ab that are not referenced. Thus, if aij is an element within the band of the upper triangle of A, it is stored in ab(kd+1+i−j,j). Therefore, the columns of the upper triangle of A are stored in the columns of ab, and the diagonals of the upper triangle of A are stored in the rows of ab.
PAGE 156
SLANSB/DLANSB/CLANHB/CLANSB/ZLANHB/ZLANSB Compute norm of symmetric or Hermitian band matrix CHARACTER*1 norm, uplo INTEGER*4 kd, ldab, n REAL*4 rwork(n) COMPLEX*8 ab(ldab, n) REAL*4 anorm, CLANSB anorm = CLANSB(norm, uplo, n, kd, ab, ldab, rwork) CHARACTER*1 norm, uplo INTEGER*4 kd, ldab, n REAL*8 rwork(n) COMPLEX*16 ab(ldab, n) REAL*8 anorm, ZLANHB anorm = ZLANHB(norm, uplo, n, kd, ab, ldab, rwork) CHARACTER*1 norm, uplo INTEGER*4 kd, ldab, n REAL*8 rwork(n) COMPLEX*16 ab(ldab, n) REAL*8 anorm, ZLANSB an
PAGE 157
Compute norm of symmetric or Hermitian band matrix SLANSB/DLANSB/CLANHB/CLANSB/ZLANHB/ZLANSB CHARACTER*1 norm, uplo INTEGER*8 kd, ldab, n REAL*4 rwork(n) COMPLEX*8 ab(ldab, n) REAL*4 anorm, CLANSB anorm = CLANSB(norm, uplo, n, kd, ab, ldab, rwork) CHARACTER*1 norm, uplo INTEGER*8 kd, ldab, n REAL*8 rwork(n) COMPLEX*16 ab(ldab, n) REAL*8 anorm, ZLANHB anorm = ZLANHB(norm, uplo, n, kd, ab, ldab, rwork) CHARACTER*1 norm, uplo INTEGER*8 kd, ldab, n REAL*8 rwork(n) COMPLEX*16 ab(ldab, n) REAL*8 anorm, ZLANSB an
PAGE 158
SLANSB/DLANSB/CLANHB/CLANSB/ZLANHB/ZLANSB Compute norm of symmetric or Hermitian band matrix ab ldab Working Storage work, rwork Output anorm Notes The upper or lower triangle of the symmetric or Hermitian band matrix A, stored in the first kd+1 rows of the array. The j-th column of A is stored in the j-th column of array ab as follows: If uplo = ’U’ or ’u’, ab(kd+1+i−j,j) = A(i,j) for max(1,j−kd) ≤ i ≤ j; If uplo = ’L’ or ’l’, ab(1+i−j,j) = A(i,j) for j ≤ i ≤ min(n,j+kd).
PAGE 159
Compute norm of symmetric or Hermitian packed matrix SLANSP/DLANSP/CLANHP/CLANSP/.../ZLANSP Name SLANSP/DLANSP/CLANHP/CLANSP/.../ZLANSP Compute norm of symmetric or Hermitian packed matrix Purpose These subprograms compute a norm of a real or complex symmetric or complex Hermitian matrix A that is stored in an array in packed form. A matrix is symmetric if A = AT, its transpose; a matrix is Hermitian if A = A*, its conjugate transpose.
PAGE 160
SLANSP/DLANSP/CLANHP/CLANSP/.../ZLANSP Compute norm of symmetric or Hermitian packed matrix Lower triangular storage If the lower triangle of A is 11 21 31 41 22 32 42 33 43 44 then A is packed column-by-column into an array ap as follows: k ap(k) 1 11 2 21 3 31 4 41 5 22 6 32 7 42 8 33 9 43 Lower triangular matrix element aij is stored in array element ap(i+((j−1)×(2n−j))/2).
PAGE 161
Compute norm of symmetric or Hermitian packed matrix SLANSP/DLANSP/CLANHP/CLANSP/...
PAGE 162
SLANSP/DLANSP/CLANHP/CLANSP/...
PAGE 163
Compute norm of symmetric or Hermitian packed matrix Working Storage work, rwork Output anorm Notes SLANSP/DLANSP/CLANHP/CLANSP/.../ZLANSP Arrays used for work space. Not referenced unless norm = ’1’ or ’I’ or ’i’ or ’O’ or ’o’. The function value is the value of the requested norm of A. Actual character arguments in a subroutine call may be longer than the corresponding dummy arguments.
PAGE 164
SLANST/DLANST/CLANHT/ZLANHT Compute norm of symmetric or Hermitian tridiagonal matrix Name SLANST/DLANST/CLANHT/ZLANHT Compute norm of symmetric or Hermitian tridiagonal matrix Purpose These subprograms compute a norm of a real symmetric or complex Hermitian tridiagonal matrix A. A real matrix is symmetric if A = AT, its transpose; a complex matrix is Hermitian if A = A*, its conjugate transpose.
PAGE 165
Compute norm of symmetric or Hermitian tridiagonal matrix Usage SLANST/DLANST/CLANHT/ZLANHT LAPACK: CHARACTER*1 norm INTEGER*4 n REAL*4 d(n), e(n−1) REAL*4 anorm, SLANST anorm = SLANST(norm, n, d, e) CHARACTER*1 norm INTEGER*4 n REAL*8 d(n), e(n−1) REAL*8 anorm, DLANST anorm = DLANST(norm, n, d, e) CHARACTER*1 norm INTEGER*4 n REAL*4 d(n) COMPLEX*8 e(n−1) REAL*4 anorm, CLANHT anorm = CLANHT(norm, n, d, e) CHARACTER*1 norm INTEGER*4 n REAL*8 d(n) COMPLEX*16 e(n−1) REAL*8 anorm, ZLANHT anorm = ZLANHT(norm,
PAGE 166
SLANST/DLANST/CLANHT/ZLANHT Compute norm of symmetric or Hermitian tridiagonal matrix CHARACTER*1 norm INTEGER*8 n REAL*4 d(n) COMPLEX*8 e(n−1) REAL*4 anorm, CLANHT anorm = CLANHT(norm, n, d, e) CHARACTER*1 norm INTEGER*8 n REAL*8 d(n) COMPLEX*16 e(n−1) REAL*8 anorm, ZLANHT anorm = ZLANHT(norm, n, d, e) Input norm Specifies which norm is to be computed, as follows: norm = ’F’, ’f’, ’E’, or ’e’ Compute ||A||F = the Frobenius norm = ’I’ or ’i’ norm. Compute ||A||∞ = maximum row sum.
PAGE 167
Compute norm of symmetric or Hermitian tridiagonal matrix Output Notes anorm SLANST/DLANST/CLANHT/ZLANHT The function value is the value of the requested norm of A. Actual character arguments in a subroutine call may be longer than the corresponding dummy arguments. Therefore, readability of the CALL statement may be improved by coding the norm argument as ’Frobenius’ for ’F’, ’Infinity-Norm’ for ’I’, ’One-norm’ or ’1-norm’ for ’O’, or ’Max-Element’ for ’M’.
PAGE 168
SLANSY/DLANSY/CLANHE/CLANSY/ZLANHE//ZLANSY Compute norm of symmetric or Hermitian matrix Name SLANSY/DLANSY/CLANHE/CLANSY/ZLANHE//ZLANSY Compute norm of symmetric or Hermitian matrix Purpose These subprograms compute a norm of a real or complex symmetric or complex Hermitian matrix A. A matrix is symmetric if A = AT, its transpose; a matrix is Hermitian if A = A*, its conjugate transpose.
PAGE 169
Compute norm of symmetric or Hermitian matrix SLANSY/DLANSY/CLANHE/CLANSY/ZLANHE//ZLANSY CHARACTER*1 norm, uplo INTEGER*4 lda, n REAL*4 rwork(n) COMPLEX*8 a(lda, n) REAL*4 anorm, CLANSY anorm = CLANSY(norm, uplo, n, a, lda, rwork) CHARACTER*1 norm, uplo INTEGER*4 lda, n REAL*8 rwork(n) COMPLEX*16 a(lda, n) REAL*8 anorm, ZLANHE anorm = ZLANHE(norm, uplo, n, a, lda, rwork) CHARACTER*1 norm, uplo INTEGER*4 lda, n REAL*8 rwork(n) COMPLEX*16 a(lda, n) REAL*8 anorm, ZLANSY anorm = ZLANSY(norm, uplo, n, a, lda,
PAGE 170
SLANSY/DLANSY/CLANHE/CLANSY/ZLANHE//ZLANSY Compute norm of symmetric or Hermitian matrix CHARACTER*1 norm, uplo INTEGER*8 lda, n REAL*4 rwork(n) COMPLEX*8 a(lda, n) REAL*4 anorm, CLANSY anorm = CLANSY(norm, uplo, n, a, lda, rwork) CHARACTER*1 norm, uplo INTEGER*8 lda, n REAL*8 rwork(n) COMPLEX*16 a(lda, n) REAL*8 anorm, ZLANHE anorm = ZLANHE(norm, uplo, n, a, lda, rwork) CHARACTER*1 norm, uplo INTEGER*8 lda, n REAL*8 rwork(n) COMPLEX*16 a(lda, n) REAL*8 anorm, ZLANSY anorm = ZLANSY(norm, uplo, n, a, lda,
PAGE 171
Compute norm of symmetric or Hermitian matrix lda Working Storage work, rwork Output anorm Notes SLANSY/DLANSY/CLANHE/CLANSY/ZLANHE//ZLANSY matrix A, and the strictly lower triangular part of a is not referenced. If uplo = ’L’ or ’l’, the leading n-by-n lower triangular part of a contains the lower triangular part of the matrix A, and the strictly upper triangular part of a is not referenced. The leading dimension of array a in the calling program unit. lda ≥ max(1,n). Arrays used for work space.
PAGE 172
XERBLA Error handler Name XERBLA Error handler Purpose This subprogram is the error handler for many LAPACK subprograms, as indicated in the “Notes” section in the applicable subprogram descriptions.
PAGE 173
Error handler Input XERBLA name iarg Notes The name of the subprogram in which the error was detected. The number of the argument that was found to be in error. This subprogram conforms to specifications of the Level 2 and 3 BLAS and LAPACK.
PAGE 174
XERBLA 686 HP MLIB LAPACK User’s Guide Error handler
PAGE 175
Part 3 HP ScaLAPACK
PAGE 176
PAGE 177
Overview 10 Introduction to ScaLAPACK Overview This chapter provides a brief overview of HP’s implementation of ScaLAPACK functionality. ScaLAPACK is a library of high-performance linear algebra routines capable of solving systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. ScaLAPACK can also handle many associated computations such as matrix factorizations or estimating condition numbers.
PAGE 178
Chapter objectives Chapter objectives After reading this chapter you will: • Know how to access ScaLAPACK library subprograms • Understand ScaLAPACK naming conventions • Be familiar with ScaLAPACK functionality • Understand ScaLAPACK arguments • Know where to find additional documentation Associated documentation The HP MLIB documentation set includes the ScaLAPACK User’s Guide V1.0.
PAGE 179
Accessing ScaLAPACK Accessing ScaLAPACK The ScaLAPACK and ScaLAPACK8 libraries are available as an archive and a shared library. It consists of compiled subprograms ready for you to incorporate into your programs with the linker. To use ScaLAPACK, include the appropriate declarations and CALL statements in your Fortran source program, and specify that ScaLAPACK or ScaLAPACK8 be used as an object library at link time.
PAGE 180
Accessing ScaLAPACK Processor Type OS Version Address Width Itanium 2 Red Hat Linux Enterprise 3 Intel IA-32 V8.0 Compiler 32-bit Itanium 2 HP XC6000 Intel V8.0 Compiler 64-bit Itanium 2 HP XC6000 Intel V7.1 Compiler 64-bit Itanium 2 Opteron PGI V5.1 Compiler Itanium 2 HP XC4000 PGI V5.1 Compiler Installation Directory Libraries libscalapack.a /opt/mlib/intel_8.0/hpmpi_2.1/lib/32 libscalapack.so libscalapack8.a libscalapack8.so libscalapack.a /opt/mlib/intel_8.0/hpmpi_2.
PAGE 181
Accessing ScaLAPACK Compiling and linking (ScaLAPACK) There are several ways to link your program with the version of ScaLAPACK tuned for the machine on which the program runs. By default, the −lscalapack option on the mpif90 or mpicc command line that links your program selects the library that corresponds to 32-bit addressing on the machine on which you link. NOTE Unlike other MLIB libraries, executable files using the ScaLAPACK library must be linked using either mpif90 or mpicc, instead of f90 or cc.
PAGE 182
Accessing ScaLAPACK mpiCC [options] file ... /opt/mlib/lib/[pa2.0|pa20_64]/libscalapack.a −lcl −lm Replace libscalapack.a with libscalapack.sl on your compiler command line if you want to link the shared library on a PA-based system. 3. Use the −lscalapack option on the compiler command line that links your program, preceded by: −Wl,–aarchive_shared,−L/opt/mlib/lib/[pa2.0|pa20_64] For example, the command lines in Method 2 for PA could be written: mpif90 [options] file ...
PAGE 183
Accessing ScaLAPACK • +autodbl4 promotes all integer logical, and real items to 8 bytes, and complex items to 16 bytes. The +autodbl4 option does not promote the size of double-precision and double-complex items. mpif90 +DA2.0W +i8 [options] file ... −Wl,–aarchive_shared −lscalapack8 mpicc +DA2.0W [options] file ... −Wl,–aarchive_shared −lscalapack8 /opt/mpi/lib/pa20_64/hpmpautodbl_isi8.o −lcl −lm mpiCC +DA2.0W [options] file ... −Wl,–aarchive_shared −lscalapack8 /opt/mpi/lib/pa20_64/hpmpautodbl_isi8.
PAGE 184
Accessing ScaLAPACK For example, the command lines in Method 2 for HP-UX could be written: mpif90 [+DD32|+DD64] [options] file...−Wl,–aarchive_shared,−L/opt/mlib/lib/ [hpux32|hpux64] −lscalapack mpicc [+DD32|+DD64] [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/lib/ [hpux32|hpux64] −lscalapack −lcl −lm mpiCC [+DD32|+DD64] [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/ lib/[hpux32|hpux64] −lscalapack −lcl −lm 4.
PAGE 185
Accessing ScaLAPACK mpiCC +DD64 [options] file ... −Wl,–aarchive_shared −lscalapack8 /opt/mpi/ lib/pa20_64/hpmpautodbl_isi8.o −lcl −lm For Itanium-Based Red Hat Linux Systems 1. To link a program that uses ScaLAPACK for use on the same machine, use one of the following commands: mpif90 [options] file ... –Wl,–Bstatic −lscalapack −openmp mpicc [options] file ... –Wl,–Bstatic −lscalapack −openmp Link with –Bdynamic if you want the archive library on a Linux system. 2.
PAGE 186
Accessing ScaLAPACK Then use the −lscalapack option on the compiler command line that links your program: mpif90 [options] file ... −lscalapack −openmp mpicc [options] file ... −lscalapack −openmp NOTE An LDOPTS specification takes precedence over using -Wl on the compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that specification with a -Wl option on your compiler command line. 5.
PAGE 187
Accessing ScaLAPACK For example, the command lines in Method 2 for XC6000 could be written: mpif90 [options] file...−Wl,–Bstatic,−L/opt/mlib/lib/intel_8.0/ hpmpi_2.1/lib/64 −lscalapack −openmp −lmpi mpicc [options] file ... −Wl,–Bstatic,−L/opt/mlib/lib/intel_8.0/ hpmpi_2.1/lib/64 −lscalapack −openmp −lmpi 4. Set the LDOPTS environment variable to include: –Bstatic,−L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64 For example: setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/intel_8.0/hpmpi_2.
PAGE 188
Accessing ScaLAPACK 3. Use the −lscalapack option on the compiler command line that links your program, preceded by: −Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64 For example, the command lines in Method 2 for XC6000 could be written: mpif90 [options] file...−Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/ hpmpi_2.1/lib/64 −lscalapack −openmp −lmpi mpicc [options] file ...−Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/ hpmpi_2.1/lib/64 −lscalapack −openmp −lmpi 4.
PAGE 189
Accessing ScaLAPACK Replace libscalapack.a with libscalapack.so on your compiler command line if you want to link the shared library on an XC4000 system. 3. Use the −lscalapack option on the compiler command line that links your program, preceded by: −Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64 For example, the command lines in Method 2 for XC4000 could be written: mpif90 [options] file...−Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/ hpmpi_2.1/lib/64 −lscalapack −mp mpicc [options] file ...
PAGE 190
Accessing ScaLAPACK Table 10-2 Compiler Defaults PA-RISC Itanium 32-bit Addressing +noppu +ppu 64-bit Addressing +ppu +ppu Different compiler defaults can cause duplicate symbol definitions at link time as the following examples illustrate. PA-RISC processors Suppose you write your own Fortran version of the XERBLA subroutine. You compile on PA-RISC with +DA2.0W and the compiler turns on +ppu (See Table 10-2), so your xerbla.o object file has an entry point xerbla_.
PAGE 191
Accessing ScaLAPACK Itanium processors If you compile on Itanium using the compiler option +noppu your xerbla.o object has an entry point xerbla (without the postpended underbar). However the Itanium MLIB libraries were compiled with +ppu option (following the default behavior described in Table 10-2) and the entry points without underbars were added as synonyms to the entry points with underbars. Hence the various MLIB Fortran source subprograms that call XERBLA reference them as xerbla_.
PAGE 192
ScaLAPACK naming conventions This problem is not confined to the XERBLA subroutine. It occurs on Itanium when the following conditions are met: • Any subprogram with the same name as a user-visible MLIB subprogram is compiled in your program with +noppu • Another MLIB subprogram used by your program also calls that subprogram PA-RISC and Itanium processors Also note that all the workarounds listed for Itanium are more generic and can also be used for PA-RISC.
PAGE 193
ScaLAPACK functionality • X indicates the data type: S - Single-precision real C - Single-precision complex D - Double-precision real Z - Double-precision complex • YY indicates the type of matrix: DB - General band (diagonally dominant-like) DT - General tridiagonal (diagonally dominant-like) GB - General band GE - General (i.e., unsymmetric, in some cases rectangular) GG - general matrices, generalized problem (i.e.
PAGE 194
ScaLAPACK functionality ScaLAPACK contains driver routines for solving standard types of problems, computational routines to perform a distinct computational task, and auxiliary routines to perform a certain subtask or common low-level computation. ScaLAPACK provides functionality to solve dense and band matrix problems. The functionality is provided for real and complex matrix problems.
PAGE 195
ScaLAPACK functionality Table 10-3 Linear Equation Driver Routines Type of matrix and storage scheme General (partial pivoting) General Band (partial pivoting) General Band (no pivoting) General Tridiagonal (no pivoting) Symmetric/Hermitian Positive Definite Single Precision Double Precision Real Complex Real Complex Simple Driver PSGESV PCGESV PDGESV PZGESV Expert Driver PSGESVX PCGESVX PDGESVX PZGESVX Operation Simple Driver PSGBSV PCGBSV PDGBSV PZGBSV Simple Driver PSDBSV PCDBSV PDDBSV PZDBSV
PAGE 196
ScaLAPACK functionality Singular value problems (SVD) A single driver routine, PxGESVD, computes the 'economy size' or 'thin' singular value decomposition of a general nonsymmetric matrix. Currently, only PSGESVD and PDGESVD are provided.
PAGE 197
ScaLAPACK arguments ScaLAPACK arguments ScaLAPACK routine arguments appear in the following order: • Arguments specifying options • Problem dimensions • Array or scalar arguments defining the input data (some of them may be overwritten by results) • Other array or scalar arguments returning results • Work arrays (and associated array dimensions) • Diagnostic argument INFO Note that not every category is present in each of the routines.
PAGE 198
ScaLAPACK arguments UPLO DIAG Used by the Hermitian, symmetric, and triangular distributed matrix routines to specify whether the upper or lower triangle is being referenced. U - Upper triangle. L - Lower triangle. Used by the triangular distributed matrix routines to specify whether the distributed matrix is unit triangular. U - Unit triangular. R - Non-unit triangular. When DIAG is supplied as ‘U’ the diagonal elements are not referenced.
PAGE 199
ScaLAPACK arguments Array descriptors are differentiated by the DTYPE_ entry in the descriptor. At the present time, the following values of DESC_(DTYPE_) are valid. Table 10-7 Valid Values of DESC_(DTYPE_) DESC_(DTYPE_) 1 501 502 601 Designation dense matrices narrow band and tridiagonal coefficient matrices narrow band and tridiagonal right-hand-side matrices out-of-core dense matrices Array descriptor for in-core dense matrices The array descriptor DESC_ is an integer array of length 9.
PAGE 200
ScaLAPACK arguments Array descriptor for in-core narrow band and tridiagonal matrices The array descriptor DESC_ is an integer array of length 7. This descriptor type is used in the ScaLAPACK narrow band routines and tridiagonal routines to specify a block-column distribution of a global array over a one-dimensional process grid. In the general and symmetric positive definite banded and tridiagonal routines, a one-dimensional block-column distribution is specified for the coefficient matrix.
PAGE 201
ScaLAPACK arguments Table 10-10 Content of the Array Descriptor for Right-Hand-Side Dense Matrices for Narrow Band and Tridiagonal Solvers DESC_( ) Symbolic Name Scope 1 DTYPE_B (global) 2 CTXT_B (global) 3 4 M_B MB_B (global) (global) 5 RSRC_B (global) 6 LLD_B (local) 7 Definition The descriptor type (DTYPE_B=502) for Pr x 1 process grid for block-row distributed matrices The BLACS context handle, indicating the BLACS process grid over which the global matrix B is distributed.
PAGE 202
ScaLAPACK arguments Table 10-11 Content of the Array Descriptor for Out-of-Core Dense Matrices DESC_( ) Symbolic Name 1 DTYPE_A Scope Definition (global) 2 CTXT_A (global) 3 4 M_A N_A (global) (global) 5 MB_A (global) 6 NB_A (global) 7 RSRC_A (global) 8 CSRC_A (global) 9 LLD_A (global) 10 IODEV_A global 11 SIZE_A local Descriptor type DTYPE_A=601 for an out-of-core matrix.
PAGE 203
ScaLAPACK arguments A IA JA DESCA B IB JB DESCB B(IB:IB+N-1, JB:JB+NRHS-1). NRHS is greater than or equal to zero. (Local input/Local output) REAL - Pointer into the local memory to an array of local dimension (LLD_A, LOC(JA+N-1)). (Global input) INTEGER - The row index in the global array A indicating the first row of A(IA:IA+M-1, JA:JA+N-1). (Global input) INTEGER - The column index in the global array A indicating the first column of A(IA:IA+M-1, JA:JA+N-1).
PAGE 204
ScaLAPACK arguments WORK(1), and the routine PXERBLA is called. The user is thus strongly advised to always check the value of INFO on exit from the called routine. Diagnostic argument INFO All driver and computational routines have a diagnostic argument INFO that indicates the success or failure of the computation. It is recommended that the user always check the value of INFO on exit from calling a ScaLAPACK routine. The value of INFO is defined as follows: INFO=0 Successful exit.
PAGE 205
Part 4 HP SuperLU
PAGE 206
PAGE 207
Overview 11 Introduction to Distributed SuperLU Overview This chapter provides an overview of HP’s implementation of Distributed SuperLU functionality. This implementation provides the Distributed SuperLU library designed for distributed memory parallel computers. It was based on the public-domain SuperLU_DIST, which was developed at the Lawrence Berkeley National Lab and the University of California at Berkeley. HP’s Distributed SuperLU contains two libraries: SuperLU_DIST and SuperLU_DIST8.
PAGE 208
Chapter objectives Chapter objectives After reading this chapter you will: • Know how to access Distributed SuperLU library subprograms • Know what Distributed SuperLU can do • Know the driver routines and the computational routines • Know the basic steps to call a Distributed SuperLU routine • Know the appropriate header files to include • Know the Distributed SuperLU defined data structures • Know the most important input argument for the driver routines • Know an issue relating to calling SuperLU_DIST8
PAGE 209
Accessing Distributed SuperLU (1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. (2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. (3) Neither the name of Lawrence Berkeley National Laboratory, U.S. Dept.
PAGE 210
Accessing Distributed SuperLU Table 11-1 Processor Type SuperLU_Dist and SuperLU_Dist8 Libraries Address Width Installation Directory 32-bit /opt/mlib/lib/pa2.0 libsuperlu_dist.a libsuperlu_dist.sl 64-bit /opt/mlib/lib/pa20_64 libsuperlu_dist.a libsuperlu_dist.sl libsuperlu_dist8.a libsuperlu_dist8.sl 32-bit /opt/mlib/lib/hpux32 libsuperlu_dist.a libsuperlu_dist.so 64-bit /opt/mlib/lib/hpux64 libsuperlu_dist.a libsuperlu_dist.so libsuperlu_dist8.a libsuperlu_dist8.
PAGE 211
Accessing Distributed SuperLU Processor Type OS Version Address Width Installation Directory Itanium 2 HP XC6000 Intel V7.1 Compiler 64-bit /opt/mlib/intel_7.1/hpmpi_2.1/lib/64 Itanium 2 HP XC4000 PGI V5.1 Compiler Itanium 2 Opteron PGI V5.1 Compiler 64-bit 64-bit Libraries libsuperlu_dist.a libsuperlu_dist.so libsuperlu_dist8.a libsuperlu_dist8.so /opt/mlib/intel_5.1/hpmpi_2.1/lib/64 libsuperlu_dist.a libsuperlu_dist.so libsuperlu_dist8.a libsuperlu_dist8.so /opt/mlib/intel_5.1/hpmpi_2.
PAGE 212
Accessing Distributed SuperLU When you use the –aarchive_shared flag on the compiler command line for HP-UX, it ensures that the compiler links with the archive library. If the archive library is not available, then it links the shared library. If you omit –aarchive_shared and –ashared_archive, the linker defaults to linking the shared library. Link with –Bstatic on Linux systems.
PAGE 213
Accessing Distributed SuperLU mpicc [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/lib/[pa2.0|pa20_64] −lsuperlu_dist −lcl −lm mpiCC [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/lib/[pa2.0| pa20_64] −lsuperlu_dist −lcl −lm 4. Set the LDOPTS environment variable to include: –aarchive_shared,−L/opt/mlib/lib/[pa2.0|pa20_64] For example: setenv LDOPTS “–aarchive_shared,–L/opt/mlib/lib/pa2.
PAGE 214
Accessing Distributed SuperLU Linking with libisamstub.a C language codes that call Fortran77 routines from the BLAS Standard, the sparse linear equation system, or the sparse eigenvalue system, must explicitly link the ISAM (Indexed Sequential Access Method) stubs library into the program. For example, mpicc [options] file ... –Wl,–aarchive_shared,–L/opt/fortran/lib/libisamstub.a −lsuperlu_dist −lcl −lm This only applies if you are linking with the SuperLU_DIST archive library.
PAGE 215
Accessing Distributed SuperLU mpiCC [+DD32|+DD64] [options] file ... −Wl,–aarchive_shared,−L/opt/mlib/ lib/[hpux32|hpux64] −lsuperlu_dist −lcl −lm 4. Set the LDOPTS environment variable to include: –aarchive_shared,−L/opt/mlib/lib/[hpux32|hpux64] For example: setenv LDOPTS “–aarchive_shared,–L/opt/mlib/lib/hpux32” Then use the −lsuperlu_dist option on the compiler command line that links your program: mpif90 [options] file ... −lsuperlu_dist mpicc [options] file ...
PAGE 216
Accessing Distributed SuperLU mpif90 [options] file ... –Wl,–Bstatic −lsuperlu_dist −openmp mpicc [options] file ... –Wl,–Bstatic −lsuperlu_dist −openmp Link with –Bdynamic if you want the archive library on a Linux system. 2. Specify the entire path of the library file on the compiler command line that links your program. For example, to link your program with SuperLU_DIST for use with 32 or 64-bit addressing on a Linux system, use one of the following: mpif90 [options] file ... /opt/mlib/intel_8.
PAGE 217
Accessing Distributed SuperLU mpicc [options] file ... −lsuperlu_dist −openmp NOTE An LDOPTS specification takes precedence over using -Wl on the compiler command line. That is, if you use the LDOPTS environment variable to specify a library path, you cannot override that specification with a -Wl option on your compiler command line. 5. Use the following commands to link your programs for use with the SuperLU_DIST8 library on a Red Hat Linux system. mpif90 −i8 [options] file ... -L/opt/mlib/intel_8.
PAGE 218
Accessing Distributed SuperLU mpicc [options] file ... −Wl,–Bstatic,−L/opt/mlib/lib/intel_8.0/hpmpi_2.1/ lib/64 −lsuperlu_dist −openmp −lmpi 4. Set the LDOPTS environment variable to include: –Bstatic,−L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64 For example: setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/intel_8.0/hpmpi_2.1/lib/64” Then use the −lsuperlu_dist option on the compiler command line that links your program: mpif90 [options] file ... −lsuperlu_dist −openmp −lmpi mpicc [options] file ...
PAGE 219
Accessing Distributed SuperLU mpif90 [options] file...−Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/ lib/64 −lsuperlu_dist −openmp −lmpi mpicc [options] file ...−Wl,–Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/ lib/64 −lsuperlu_dist −openmp −lmpi 4. Set the LDOPTS environment variable to include: –Bstatic,−L/opt/mlib/lib/intel_7.1/hpmpi_2.1/lib/64 For example: setenv LDOPTS “–Bstatic,–L/opt/mlib/lib/intel_7.1/hpmpi_2.
PAGE 220
Accessing Distributed SuperLU −Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64 For example, the command lines in Method 2 for XC4000 could be written: mpif90 [options] file...−Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64 −lsuperlu_dist −mp mpicc [options] file ...−Wl,–Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.1/lib/64 −lsuperlu_dist −mp −lpgf90 −lpgf90_rpml −lpgf902 −lpgf90rtl −lpgftnrtl 4. Set the LDOPTS environment variable to include: –Bstatic,−L/opt/mlib/lib/pgi_5.1/hpmpi_2.
PAGE 221
Accessing Distributed SuperLU PA-RISC processors Suppose you write your own Fortran version of the XERBLA subroutine. You compile on PA-RISC with +DA2.0W and the compiler turns on +ppu (See Table 11-2), so your xerbla.o object file has an entry point xerbla_. However, the PA-RISC MLIB libraries were compiled with +noppu and the entry points with underbars were added as synonyms to the entry points without underbars.
PAGE 222
Accessing Distributed SuperLU from the library. This file contains both entry points xerbla and xerbla_, and xerbla conflicts with your XERBLA subroutine. Moreover, the various MLIB C source subprograms that call XERBLA still reference them as xerbla (without the postpended underbar). It means that both xerbla and xerbla_ entry points must be provided in your xerbla.o.
PAGE 223
What can Distributed SuperLU do? • Another MLIB subprogram used by your program also calls that subprogram PA-RISC and Itanium processors Also note that all the workarounds listed for Itanium are more generic and can also be used for PA-RISC. Therefore, if you are coding your own version of a MLIB routine called "mlib_routine" on PA-RISC or Itanium, a Fortran version might be implemented as: !$HP$ ALIAS mlib_routine='mlib_routine' !$HP$ ALIAS mlib_routine_='mlib_routine_' SUBROUTINE mlib_routine(...
PAGE 224
Distributed SuperLU driver routines column-compressed format (“Harwell-Boeing Format”). B, which is overwritten by the solution X, is stored as a dense matrix in column-major order. A and B are replicated across all processors. It is very common to solve a sequence of related linear systems A(1)X(1)=B(1),A(2)X(2)=B(2),...rather than just one.
PAGE 225
Distributed SuperLU computational routines The difference between pdgssvx (pzgssvx) and pdgssvx_ABglobal (pzgssvx_ABglobal) is that, for pdgssvx_ABglobal (pzgssvx_ABglobal), the input matrices A and B are globally available (replicated) on all processes, whereas for pdgssvx (pzgssvx), the input matrices A and B are distributed among all processes.
PAGE 226
Basic steps to call a Distributed SuperLU routine These routines solve the system by forward and back substitutions using the L and U factors computed by pdgstrf or pzgstrf. Currently, B must be globally available on all processes. • pdgsrfs, pdgsrfs_ABXglobal, pzgsrfs, pzgsrfs_ABXglobal: Refine solution in parallel. Given A, its factors L and U, and an initial solution X, these routines perform iterative refinement.
PAGE 227
Header files Header files To call double precision routines in SuperLU_DIST, you need to include superlu_ddefs.h. To call double precision routines in SuperLU_DIST8, you need to include superlu_ddefs8.h. To call double complex routines in SuperLU_DIST, you need to include superlu_zdefs.h. To call double complex routines in SuperLU_DIST8, you need to include superlu_zdefs8.h.
PAGE 228
Distributed SuperLU defined data structures int_t **bsendx_plist; int_t *brecv; int_t nbrecvx; int_t *ilsum; int_t ldalsum; } LocalLU_t; /***************************************************/ /* Definition of structure: SOLVEstruct_t */ /***************************************************/ typedef struct { int_t *extern_start; int_t *ind_tosend; int_t *ind_torecv; int_t *ptr_ind_tosend; int_t *ptr_ind_torecv; int_t *SendCounts; int_t *RecvCounts; double *val_tosend; double *val_torecv; int_t TotalIndSend; i
PAGE 229
Distributed SuperLU defined data structures doublecomplex *Lval_buf_2[2]; int_t *Usub_buf; doublecomplex *Uval_buf; doublecomplex *ujrow; int_t bufmax[NBUFFERS]; int_t *ToRecv; int_t *ToSendD; int_t **ToSendR; int_t *fmod; int_t **fsendx_plist; int_t *frecv; int_t nfrecvx; int_t *bmod; int_t **bsendx_plist; int_t *brecv; int_t nbrecvx; int_t *ilsum; int_t ldalsum; } LocalLU_t; /***************************************************/ /* Definition of structure: SOLVEstruct_t */ /*******************************
PAGE 230
Distributed SuperLU defined data structures typedef enum {NO, YES} yes_no_t; typedef enum {NOROWPERM, LargeDiag, MY_PERMR} rowperm_t; typedef enum {NATURAL, MMD_ATA, MMD_AT_PLUS_A, COLAMD, MY_PERMC} colperm_t; typedef enum {NOREFINE, DOUBLE = 1, EXTRA} IterRefine_t; typedef struct { fact_t Fact; trans_t Trans; yes_no_t Equil; rowperm_t RowPerm; colperm_t ColPerm; yes_no_t ReplaceTinyPivot; IterRefine_t IterRefine; yes_no_t SolveInitialized; yes_no_t RefineInitialized; } superlu_options_t; /****************
PAGE 231
Distributed SuperLU defined data structures typedef struct { int_t nnz; /* number of nonzeros in the matrix */ void *nzval; /* pointer to array of nonzero values, packed by column */ int_t *rowind; /* pointer to array of row indices of the nonzeros */ int_t *colptr; /* pointer to array of beginning of columns in nzval[] and rowind[] */ /* Note: Zero-based indexing is used; colptr[] has ncol+1 entries, the last one pointing beyond the last column, so that colptr[ncol] = nnz.
PAGE 232
Distributed SuperLU defined data structures /***************************************************/ /* Definition of structure: gridinfo_t */ /***************************************************/ typedef struct { MPI_Comm comm; int_t Np; int_t Iam; } superlu_scope_t; typedef struct { MPI_Comm superlu_scope_t superlu_scope_t int_t int_t int_t } gridinfo_t; comm; rscp; cscp; iam; nprow; npcol; /***************************************************/ /* Definition of structure: Glu_persist_t */ /****************
PAGE 233
The input argument: options typedef enum { COLPERM, ROWPERM, RELAX, ETREE, EQUIL, SYMBFAC, DIST, FACT, COMM, SOL_COMM, RCOND, SOLVE, REFINE, FLOAT, TRSV, GEMV, FERR, NPHASES } PhaseType; typedef float flops_t; typedef struct { int_t *panel_histo; /* histogram of panel size distribution */ double *utime; /* running time at various phases */ flops_t *ops; /* operation count at various phases */ int_t TinyPivots; /* number of tiny pivots */ int_t RefineSteps; /* number of iterative refinement steps */ } Super
PAGE 234
The input argument: options as well as the following options, which are described in more detail below: • options->Equil, to specify how to scale the rows and columns of A to equilibrate it (to try to reduce its condition number and so improve the accuracy of the computed solution). • options->RowPerm, to specify how to permute the rows of A (typically to control numerical stability).
PAGE 235
The input argument: options • options->Fact = SamePattern: A is factored, assuming that it has the same nonzero pattern as a previously factored matrix. In this case the algorithm saves time by reusing the previously computed column permutation vector stored in ScalePermstruct->perm_c and the elimination tree of A stored in LUstruct->etree.
PAGE 236
The input argument: options • A, the unput matrix. • ScalePermstruct->DiagScale, how the previous matrix was row and/or column scaled. • ScalePermstruct->R, the row scalings of the previous matix, if any. • ScalePermstruct->C, the columns scalings of the previous matrix, if any. • ScalePermstruct->perm_r, the row permutation of the previous matrix. • ScalePermstruct->perm_c, the column permutation of the previous matrix.
PAGE 237
MPI issue when calling SuperLU_DIST8 routines MPI issue when calling SuperLU_DIST8 routines In SuperLU_DIST8 routines, integers are represented using 64-bit addressing, which is "long long" in C. However, MPI C interfaces only accept 32-bit addressing integers, which is "int" in C. Thus, when using 64-bit addressing integers and calling SuperLU_DIST8 routines, for the MPI C routines you want to call, you need to take care of the integer type conversions in your applications. Sample programs 1.
PAGE 238
Sample programs * * Assume the generated executable program is dexample.
PAGE 239
Sample programs * initializes the SuperLU_DIST 2D process grid. * * MPI_COMM_WORLD is the base communicator upon which the new * grid is formed. * nprow is the number of process rows. * npcol is the number of process columns. * grid points to the storage of the SuperLU_DIST * 2D process grid. */ superlu_gridinit(MPI_COMM_WORLD, nprow, npcol, &grid); /* Bail out if I do not belong in the grid. */ iam = grid.
PAGE 240
Sample programs rowind[10]=1; rowind[11]=2; rowind[12]=3; rowind[13]=4; colptr[0]=0; colptr[1]=3; colptr[2]=6; colptr[3]=9; colptr[4]=13; colptr[5]=14; b[0]=3.8; b[1]=4.7; b[2]=13.0; b[3]=8.6; b[4]=15.8; /* * Create SuperMatrix A: * Astore points to the storage of a Harwell-Boeing sparse matrix * format for A. * A.Stype MUST be NC, A.Dtype MUST be D for pdgssvx_ABglobal and * MUST be Z for pzgssvx_ABglobal, and A.Mtype MUST be GE. */ A.Stype = SLU_NC; A.Dtype = SLU_D; A.Mtype = SLU_GE; A.nrow = m; A.
PAGE 241
Sample programs exit (-1); } if ( !(ScalePermstruct.perm_c = (int_t *) sizeof(int_t)) ) ) { malloc(n * printf(“malloc fails for perm_c[].”); exit (-1); } /* Initialize LUstruct. */ if ( !(LUstruct.etree =(int_t *) malloc(n * sizeof(int_t))) ) { printf(“malloc fails for etree[].”); exit (-1); } if ( !(LUstruct.Glu_persist = (Glu_persist_t *) malloc(sizeof(Glu_persist_t))) ) { printf(“malloc fails for Glu_persist_t.”); exit (-1); } if ( !(LUstruct.
PAGE 242
Sample programs printf (“The solution is:\n”); for (i = 0; i < n; ++i) { printf (“x(%d)=%f\n”, i+1, b[i]); } } /* Deallocate storage */ free(stat.utime); free(stat.ops); free(((NCformat *)A.Store)->rowind); free(((NCformat *)A.Store)->colptr); free(((NCformat *)A.Store)->nzval); free(A.Store); /* void Destroy_LU(int_t n, gridinfo_t *grid, LUstruct_t *LUstruct) * frees the memory allocated for the L and U * factor matrices. * * n is the number of columns of A. * grid points to storage for the grid.
PAGE 243
Sample programs /* void superlu_gridexit(gridinfo_t *grid) * releases the SuperLU_DIST 2D process grid: * * grid points to the storage of the SuperLU_DIST * 2D process grid. */ superlu_gridexit(&grid); MPI_Finalize(); } 2. Sample program for using the driver routine pdgssvx. #include #include #include "/opt/mlib/include/superlu_ddefs.
PAGE 244
Sample programs * 3. Generally, the distributing of rows in this example assumed that * the number of processes, *, used to solve * dimension n problems is roughly not larger than sqrt(n) (i.e. (*)**2
PAGE 245
Sample programs /* -----------------------------------------------------------Initialize MPI environment and SuperLU_DIST process grid. ------------------------------------------------------------*/ MPI_Init( &argc, &argv ); /* void superlu_gridinit(MPI_Comm Bcomm, int_t nprow, int_t npcol, * gridinfo_t *grid) * initializes the SuperLU_DIST 2D process grid. * * MPI_COMM_WORLD is the base communicator upon which the new * grid is formed. * nprow is the number of process rows.
PAGE 246
Sample programs colptr[0]=0; colptr[1]=3; colptr[2]=6; colptr[3]=9; colptr[4]=13; colptr[5]=14; /* sol_recvcounts and sol_displs store information that will be used in gathering the global solution to rank 0. */ if (!iam) { if (!(sol_recvcounts=(int_t *) malloc(grid.nprow * grid.npcol*sizeof(int_t )))) { printf(“malloc fails for sol_recvcounts[]\n”); exit (-1); } if (!(sol_displs=(int_t *) malloc(grid.nprow * grid.
PAGE 247
Sample programs } /* Create compressed column matrix for GA. */ /* * Create SuperMatrix GA: * GAstore points to the storage of a Harwell-Boeing sparse matrix * format for GA. * GA.Stype MUST be SLU_NC, GA.Dtype MUST be SLU_D for pdgssvx and * MUST be SLU_Z for pzgssvx, and GA.Mtype MUST be SLU_GE. */ GA.Stype = SLU_NC; GA.Dtype = SLU_D; GA.Mtype = SLU_GE; GA.nrow = m; GA.ncol = n; if (!(GA.Store = (void *) malloc( sizeof(NCformat) ))){ printf(“malloc fails for GA.
PAGE 248
Sample programs /* Set up row pointers */ rowptr[0] = 0; fst_row = iam * m_loc_fst; nnz_loc = 0; for (j = 0; j < m_loc; ++j) { row = fst_row + j; rowptr[j+1] = rowptr[j] + marker[row]; marker[j] = rowptr[j]; } nnz_loc = rowptr[m_loc]; if (!(nzval_loc=(double *) malloc(nnz_loc*sizeof(double)))) { printf(“malloc fails for nzval_loc[]\n”); exit (-1); } if (!( colind= (int_t *) malloc(nnz_loc * sizeof(int_t)))) { printf(“malloc fails for colind[]\n”); exit (-1); } /* Transfer the matrix into the compressed row
PAGE 249
Sample programs Astore = (NRformat_loc *) A.
PAGE 250
Sample programs if ( !(ScalePermstruct.perm_c = (int_t *) sizeof(int_t)) ) ) { printf(“malloc fails for perm_c[].”); exit (-1); } malloc(n * /* Initialize ScalePermstruct. */ if ( !(LUstruct.etree =(int_t *) malloc(n * sizeof(int_t))) ) { printf(“malloc fails for etree[].”); exit (-1); } if ( !(LUstruct.Glu_persist = (Glu_persist_t *) malloc(sizeof(Glu_persist_t))) ) { printf(“malloc fails for Glu_persist_t.”); exit (-1); } if ( !(LUstruct.
PAGE 251
Sample programs printf (“x(%d, %d)=%f\n”, i+1, j+1, b_global[j*m+i]); } } } /* -----------------------------------------------------------DEALLOCATE STORAGE. ------------------------------------------------------------*/ free(stat.utime); free(stat.ops); free( free( free( free( Astore->rowptr ); Astore->colind ); Astore->nzval ); Astore ); free(ScalePermstruct.perm_r); free(ScalePermstruct.perm_c); switch ( ScalePermstruct.DiagScale ) { case ROW: free(ScalePermstruct.
PAGE 252
Sample programs if ( it = (SOLVEstruct.gsmv_comm)->ind_tosend ) free(it); if ( it = (SOLVEstruct.gsmv_comm)->ind_torecv ) free(it); free((SOLVEstruct.gsmv_comm)->ptr_ind_tosend); free((SOLVEstruct.gsmv_comm)->SendCounts); if ( dt = (SOLVEstruct.gsmv_comm)->val_tosend ) free(dt); if ( dt = (SOLVEstruct.gsmv_comm)->val_torecv ) free(dt); free(SOLVEstruct.gsmv_comm); options.RefineInitialized = NO; } free(SOLVEstruct.row_to_proc); free(SOLVEstruct.inv_perm_c); free(SOLVEstruct.diag_procs); free(SOLVEstruct.
PAGE 253
Distributed SuperLU routines Distributed SuperLU routines Chapter 11 Introduction to Distributed SuperLU 765
PAGE 254
pdgsrfs/pzgsrfs Name Improve the computed solutions to systems of distributed linear equations pdgsrfs/pzgsrfs Improve the computed solutions to systems of distributed linear equations Purpose pdgsrfs and pzgsrfs improve the computed solutions to systems of distributed linear equations and provide error bounds and backward error estimates for the solutions.
PAGE 255
Improve the computed solutions to systems of distributed linear equations pdgsrfs/pzgsrfs or pzgstrf for the possibly scaled and permuted matrix A. ScalePermstruct (input)(global) The data structure to store the scaling and permutation vectors describing the trasformations performed to the matrix A. grid (input) B (input)(local) ldb (input)(local) X (input/output) (local) The 2D process mesh.
PAGE 256
pdgsrfs/pzgsrfs Improve the computed solutions to systems of distributed linear equations stat (output) info (output) 768 HP MLIB LAPACK User’s Guide Record the statistics about the refinement steps. = 0: successful exit > 0: if info= -i, the i-th argument had an illegal value.
PAGE 257
Improve the computed solutions to systems of linear equations pdgsrfs_ABXglobal/pzgsrfs_ABXglobal Name pdgsrfs_ABXglobal/pzgsrfs_ABXglobal Improve the computed solutions to systems of linear equations Purpose pdgsrfs_ABXglobal and pzgsrfs_ABXglobal improve the computed solutions to systems of linear equations and provide error bounds and backward error estimates for the solutions.
PAGE 258
pdgsrfs_ABXglobal/pzgsrfs_ABXglobal equations LUstruct (input) grid (input) Improve the computed solutions to systems of linear The distributed data structures storing L and U factors. The L and U factors are obtained from pdgstrf or pzgstrf for the possibly scaled and permuted matrix A. The 2D process mesh. It contains the MPI communicator, the number of process rows (NPROW), the number of process columns (NPCOL), and my process rank, It is an input argument to all the parallel routines.
PAGE 259
Improve the computed solutions to systems of linear equations pdgsrfs_ABXglobal/pzgsrfs_ABXglobal any element of A or B that makes X(j) an exact solution). stat (output) info (output) Record the statistics about the refinement steps. = 0: successful exit > 0: if info= -i, the i-th argument had an illegal value.
PAGE 260
pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal Name Solve systems of linear equations pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal Solve systems of linear equations Purpose pdgssvx/pzgssvx and pdgssvx_ABglobal/pzgssvx_ABglobal solve systems of linear equations AX=B. They perform the following functions: • Equilibrate the system (scale A’s rows and columns to have unit norm) if A is poorly scaled. • Find a row permutation that makes diagonal of A large relative to the off-diagonal.
PAGE 261
Solve systems of linear equations pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal • options->RowPerm, to specify how to permute the rows of A (typically to control numerical stability). • options->ColPerm, to specify how to permute the columns of A (typically to control fill-in and enhance parallelsim during factorization). • options->ReplaceTinyPivot, to specify how to deal with tiny pivots encountered during factorization (to control numerical stability).
PAGE 262
pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal Solve systems of linear equations But not options->ColPerm, whose value is ignored. This is because the previous column permutation from ScalePermstruct->perm_c is used as input. The user must also supply: • A, the input matrix • ScalePermstruct->perm_c, the column permutation. • LUstruct->etree, the elimination tree. The outputs returned include: • A, the input matrix A overwritten by the scaled an permuted matrix as described above.
PAGE 263
Solve systems of linear equations pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal • ScalePermstruct->perm_c, the column permutation of the previous matrix. • all of LUstruct, the previously computed information about L and U (the actual numerical values of L and U stored in LUstruct->Llu are ignored). The outputs returned include: • A, the input matrix A overwritten by the scaled an permuted matrix as described above.
PAGE 264
pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal Solve systems of linear equations void pzgssvx (superlu_options_t *options, SuperMatrix *A, ScalePermstruct_t *ScalePermstruct, double B[], int ldb, int nrhs, gridinfo_t *grid, LUstruct_t *LUstruct, SOLVEstruct_t *SOLVEstruct, double *berr, SuperLUStat_t *stat, int *info) void pzgssvx_ABglobal (superlu_options_t, *options, SuperMatrix *A, ScalePermstruct_t *ScalePermstruct, doublecomplex B[ ], int ldb, int nrhs, gridinfo_t *grid, LUstruct_t *LUstruct, do
PAGE 265
Solve systems of linear equations pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal LUstruct_t, please refer to the general man page superlu_dist. Arguments options (input) The structure defines the input parameters to control how the LU decomposition will be performed.
PAGE 266
pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal Solve systems of linear equations Inputs: A, options->Equil, options->RowPerm, options->ReplaceTinyPivot, all of ScalePermstruct, all of LUstruct Outputs: modified A (possibly row and/or column scaled and/or permuted), modified LUstruct->Llu = FACTORED: The matrix A is already factored. Inputs: all of ScalePermstruct, all of LUstruct • Equil (yes_no_t) Specifies whether to equilibrate the system. = NO: No equilibration.
PAGE 267
Solve systems of linear equations pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal = NATURAL: Use the natural ordering. = COLAMD: Use approximate minimum degree column ordering. = MMD_ATA: Use minimum degree ordering on structure of A'*A. = MMD_AT_PLUS_A: Use minimum degree ordering on structure of A'+A. = MY_PERMC: Use the ordering given in ScalePermstruct->perm_c. • ReplaceTinyPivot (yes_no_t) = NO: Do not modify pivots.
PAGE 268
pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal Solve systems of linear equations The type of A must be: Stype=SLU_NR_loc for pdgssvx and pzgssvx and SLU_NC for pdgssvx_ABglobal and pzgssvx_ABglobal; Dtype=SLU_D for pdgssvx_ABglobal or Z for pzgssvx_ABglobal; Mtype=SLU_GE. That is, A is stored in distributed compressed row format for pdgssvx and pzgssvx, and in compressed column format (also known as Herwell-Boeing format) for pdgssvx_ABglobal and pzgssvx_ABglobal.
PAGE 269
Solve systems of linear equations pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal = ROW: Row equilibration, i.e., A was premultiplied by diag(R). = COL: Column equilibration, i.e., A was postmultiplied by diag(C). = BOTH: Both row column equilibration, i.e., A was replaced by diag(R)*A*diag(C). If options->Fact=FACTORED or SamePattern_SameRowPattern, DiagScale is an input argument; otherwise it is an output argument.
PAGE 270
pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal Solve systems of linear equations If options->Fact=FACTORED or SamePattern_SameRowPerm, R is an input argument; otherwise R is an output argument. • C (double*) The vector with dimension A->ncol containing column scale factors for A. If DiagScale=COL or BOTH, A is multiplied on the right by diag(C). If diagScale=NOEQUIL or ROW, C is not defined.
PAGE 271
Solve systems of linear equations pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal • etree (int*) Elimation tree of A'*A, dimension A->ncol. It is computed in sp_colorder( ) during the first factorizations of the matrices with the same nonzero pattern. On exit of sp_colorder( ), the columns of A are permuted so that the etree is in a certain postorder. This postorder is reflected in ScalePermstruct->perm_c.
PAGE 272
pdgssvx/ pzgssvx, pdgssvx_ABglobal/pzgssvx_ABglobal info (output) Solve systems of linear equations = 0: successful exit > 0: if info=i, and i is <= A->ncol: U(i, i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, so the solution could not be computed. > A->ncol: Number of bytes allocated when memory allocation failure occurred plus A->ncol.
PAGE 273
pdgstrf/pzgstrf Perform the LU factorization in parallel. Name pdgstrf/pzgstrf Perform the LU factorization in parallel.
PAGE 274
pdgstrf/pzgstrf Perform the LU factorization in parallel. • Glu_persist (input) Glu_persist_t* Global data structure (xsup, supno) replicated on all processes, describing the supernode partition in the factored matrices L and U: xsup[s] is the leading column of the s-th supernode. supno[i] is the supernode number to which column i belongs. • Llu (input/output) LocalLU-t* The distributed data structures to store L and U factors. grid (input) stat (output) info (output) The 2D process mesh.
PAGE 275
pdgstrs/pzgstrs Solve systems of distributed linear equations Name pdgstrs/pzgstrs Solve systems of distributed linear equations Purpose pdgstrs and pzgstrs solve systems of distributed linear equations A*X=B with a general N-by-N matrix A using the LU factorization computed by pdgstrf or pzgstrf.
PAGE 276
pdgstrs/pzgstrs Solve systems of distributed linear equations routines. B (input/output) On entry, the distributed right-hand side matrix of the possibly equilibrated system. On exit, the distributed solution matrix Y of the possibly equilibrated system if info=0. m_loc (input)(local) The local row dimension of matrix B. fst_row (input)(global) The row number of B’s first row in the global matrix. ldb (input)(local) The leading dimension of matrix B.
PAGE 277
Solve systems of distributed linear equations. Name pdgstrs_Bglobal/pzgstrs_Bglobal pdgstrs_Bglobal/pzgstrs_Bglobal Solve systems of distributed linear equations. Purpose pdgstrs_Bglobal and pzgstrs_Bglobal solve systems of distributed linear equations A*X=B with a general N-by-N matrix A using the LU factorization computed by pdgstrf or pzgstrf.
PAGE 278
pdgstrs_Bglobal/pzgstrs_Bglobal NOTE Solve systems of distributed linear equations. The N-by-NRHS matrix B must reside on all processes. ldb (input)(global) Leading dimension of matrix B. nrhs (input)(global) Number of right-hand sides. stat (output) info (output) 790 HP MLIB LAPACK User’s Guide Record the statistics about the triangular solves. = 0: successful exit < 0: if info= -i, the i-th argument had an illegal value.
PAGE 279
Part 5 HP VMATH
PAGE 280
PAGE 281
Overview 12 VMATH Overview VMATH, a component of HP MLIB, is a library of vector math routines corresponding to many of the widely used scalar math routines available with C, C++, and Fortran90. For example, the VMATH Fortran call vsexp(n, a, y), where a and y are arrays of at least n real elements, computes the vector of elements y(i)=exp(a(i)), for i=1 through n. VMATH is intended for computationally intensive mathematical applications amenable to a vector programming style.
PAGE 282
Chapter objectives Chapter objectives After reading this chapter you will: • Know how to access VMATH library subprograms • Know how to use the described subprograms • Know how to access the online HP VMATH man pages Associated documentation The following documents provide supplemental material for this chapter: VMATH provides the interfaces described in • Intel Math Kernel Library, Vector Math Library documentation available at http://www.intel.com/software/products/mkl/features/vmf.
PAGE 283
Accessing VMATH Include files For C or C++, include the header to use the functions and auxiliary macros for the VMATH library, or include to use them for the VMATH8 library. For Fortran, the include file vmath.fi provides interfaces for the VMATH subprograms and defines auxiliary parameters. This include file serves for both VMATH and VMATH8 libraries. Compiling and linking If your program uses the VMATH library, specify -lvmath on the command line that effects the link.
PAGE 284
HP VMATH man pages HP VMATH man pages The HP MLIB man pages contain online documentation that includes information from the HP MLIB User’s Guide. The HP MLIB man pages are installed in the directory /opt/mlib/share/man. Set this path in your MANPATH environment variable to access man pages for VMATH. HP VMATH man pages provide a man page for each subprogram.
PAGE 285
The cosine of the elements of a vector with variable increments. VMATH Subprograms VMATH Subprograms Name vcos The cosine of the elements of a vector with variable increments. Purpose vcos calculates the cosine function of vector elements: y(1+(i-1)*incy) = cos(x(1+(i-1)*incx)), for i=1 to n Usage The Fortran include file math.fi declares an interface for this subroutine, suitable for both VMATH and VMATH8. The interface is compatible with Tru64 CXML (Compaq Extended Math Library).
PAGE 286
vcos_sin The cosine and sine of the elements of a vector with variable increments Name vcos_sin The cosine and sine of the elements of a vector with variable increments Purpose vcos_sin calculates both the cosine and sine functions of vector elements: y(1+(i-1)*incy) = cos(x(1+(i-1)*incx)), and z(1+(i-1)*incy) = sin(x(1+(i-1)*incx)), for i=1 to n Usage The Fortran include file math.fi declares an interface for this subroutine, suitable for both VMATH and VMATH8.
PAGE 287
The base-e exponential of the elements of a vector, with variable increments vexp Name vexp The base-e exponential of the elements of a vector, with variable increments Purpose vexp calculates the base-e exponential function of vector elements. y(1+(i-1)*incy) = exp(x(1+(i-1)*incx)), for i=1 to n Usage The Fortran include file math.fi declares an interface for this subroutine, suitable for both VMATH and VMATH8. The interface is compatible with Tru64 CXML (Compaq Extended Math Library).
PAGE 288
vlog The base-e logarithm of the elements of a vector, with variable increments Name vlog The base-e logarithm of the elements of a vector, with variable increments Purpose vlog calculates the base-e logarithm function of vector elements. y(1+(i-1)*incy) = log(x(1+(i-1)*incx)), for i=1 to n Usage The Fortran include file math.fi declares an interface for this subroutine, suitable for both VMATH and VMATH8. The interface is compatible with Tru64 CXML (Compaq Extended Math Library).
PAGE 289
Get, set, and clear error status for VMATH functions vmlGetErrStatus, vmlSetErrStatus, vmlClearErrStatus Name vmlGetErrStatus, vmlSetErrStatus, vmlClearErrStatus Get, set, and clear error status for VMATH functions Purpose vmlGetErrStatus returns the current error status for VMATH functions. vmlSetErrStatus sets the new error status for VMATH functions according to the err parameter and returns the error status in effect at the time of the call.
PAGE 290
vmlGetErrStatus, vmlSetErrStatus, vmlClearErrStatus Get, set, and clear error status for VMATH functions exception flags, which represent exceptional (error) conditions separately (see fenv.h), provide a fuller accounting. Usage VMATH: #include unsigned int vmlGetErrStatus (void) ; unsigned int vmlSetErrStatus (unsigned int err) ; unsigned int vmlClearErrStatus (void) ; VMATH8: #include
PAGE 291
Set and get the mode for VMATH functions vmlSetMode, vmlGetMode Name vmlSetMode, vmlGetMode Set and get the mode for VMATH functions Purpose vmlSetMode sets the new mode for VMATH functions according to the mode parameter and returns the mode in effect at the time of the call. vmlGetMode returns the current VMATH mode (as does vmlSetMode). The mode for the VMATH library comprises an accuracy mode, an error mode, and an FPU mode.
PAGE 292
vmlSetMode, vmlGetMode Set and get the mode for VMATH functions ... vmlSetMode (savemode) ; The value of the mode returned by vmlSetMode and vmlGetMode is a bitwise OR of the macros for the accuracy, error, and FPU modes in effect at the time of the call.
PAGE 293
Get, set, and clear error status for VMATH library functions vmlgeterrstatus, vmlseterrstatus, vmlclearerrstatus Name vmlgeterrstatus, vmlseterrstatus, vmlclearerrstatus Get, set, and clear error status for VMATH library functions Purpose vmlgeterrstatus returns the current error status for VMATH subroutines. vmlseterrstatus sets the new error status for VMATH subroutines according to the err argument and returns the error status in effect at the time of the call.
PAGE 294
vmlgeterrstatus, vmlseterrstatus, vmlclearerrstatus Get, set, and clear error status for VMATH library functions exception flags, which represent exceptional (error) conditions separately (see fenv.h), provide a fuller accounting. Usage INTEGER (KIND=4), FUNCTION vmlgeterrstatus( ) INTEGER (KIND=4), FUNCTION vmlseterrstatus(err) INTEGER (KIND=4) , INTENT (IN) : : err INTEGER (KIND=4), FUNCTION vmlclearerrstatus( ) The Fortran include file vmath.
PAGE 295
Set and get the mode for VMATH library functions vmlsetmode, vmlgetmode Name vmlsetmode, vmlgetmode Set and get the mode for VMATH library functions Purpose vmlsetmode sets the new mode for VMATH subroutines according to the mode argument and returns the mode in effect at the time of the call. vmlgetmode returns the current VMATH mode (as does vmlsetmode). The mode for the VMATH library comprises an accuracy mode, an error mode, and an FPU mode.
PAGE 296
vmlsetmode, vmlgetmode Set and get the mode for VMATH library functions savemode = vmlsetmode(IOR(VML_LA, VML_ERRMODE_IGNORE)) ... CALL vmlsetmode(savemode) The value of the mode returned by vmlsetmode and vmlgetmode is a bitwise OR of the parameters for the accuracy, error, and FPU modes in effect at the time of the call.
PAGE 297
The reciprocal of the elements of a vector, with variable increments vrecip Name vrecip The reciprocal of the elements of a vector, with variable increments Purpose vrecip calculates the reciprocal function of vector elements. y(1+(i-1)*incy) = 1.0/x(1+(i-1)*incx), for i=1 to n Usage The Fortran include file math.fi declares an interface for this subroutine, suitable for both VMATH and VMATH8. The interface is compatible with Tru64 CXML (Compaq Extended Math Library). The C include files vmath.
PAGE 298
vsAcos, vdAcos The arccosine of the elements of a vector Name vsAcos, vdAcos The arccosine of the elements of a vector Purpose vsAcos and vdAcos calculate the arccosine function of vector elements: y[i] = acos(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsAcos (int n, const float a[ ], float y [ ] ) ; void vdAcos (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 299
The arc hyperbolic cosine of the elements of a vector vsAcosh, vdAcosh Name vsAcosh, vdAcosh The arc hyperbolic cosine of the elements of a vector Purpose vsAcosh and vdAcosh calculate the arc hyperbolic cosine function of vector elements: y[i] = acosh(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsAcosh (int n, const float a[ ], float y [ ] ) ; void vdAcosh (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 300
vsAsin, vdAsin The arcsine of the elements of a vector Name vsAsin, vdAsin The arcsine of the elements of a vector Purpose vsAsin and vdAsin calculate the arcsine function of vector elements: y[i] = asin(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsAsin (int n, const float a[ ], float y [ ] ) ; void vdAsin (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 301
The arc hyperbolic sine of the elements of a vector vsAsinh, vdAsinh Name vsAsinh, vdAsinh The arc hyperbolic sine of the elements of a vector Purpose vsAsinh and vdAsinh calculate the arc hyperbolic sine function of vector elements: y[i] = asinh(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsAsinh (int n, const float a[ ], float y [ ] ) ; void vdAsinh (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 302
vsAtan, vdAtan The arctangent of the elements of a vector Name vsAtan, vdAtan The arctangent of the elements of a vector Purpose vsAtan and vdAtan calculate the arctangent function of vector elements: y[i] = atan(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsAtan (int n, const float a[ ], float y [ ] ) ; void vdAtan (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 303
The arctangent and quadrant function of the corresponding elements of two vectors vsAtan2, vdAtan2 Name vsAtan2, vdAtan2 The arctangent and quadrant function of the corresponding elements of two vectors Purpose vsAtan2 and vdAtan2 calculate the arctangent function of the quotient of vector element pairs a(i)/b(i), in the range -Pi to Pi, using the signs of both arguments to determine the quandrant of the result value: y[i] = atan2(a[i],b[i]), for 0 <= i <= n-1 Usage VMATH: #include
PAGE 304
vsAtanh, vdAtanh The arc hyperbolic tangent of the elements of a vector Name vsAtanh, vdAtanh The arc hyperbolic tangent of the elements of a vector Purpose vsAtanh and vdAtanh calculate the archyperbolic tangent function of vector elements: y[i] = atanh(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsAtanh (int n, const float a[ ], float y [ ] ) ; void vdAtanh (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 305
The cube root of the elements of a vector vsCbrt, vdCbrt Name vsCbrt, vdCbrt The cube root of the elements of a vector Purpose vsCbrt and vdCbrt calculate the cube root function of vector elements: y[i] = cbrt(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsCbrt (int n, const float a[ ], float y [ ] ) ; void vdCbrt (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 306
vsCos, vdCos The cosine of the elements of a vector Name vsCos, vdCos The cosine of the elements of a vector Purpose vsCos and vdCos calculate the cosine function of vector elements: y[i] = cos(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsCos (int n, const float a[ ], float y [ ] ) ; void vdCos (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 307
The hyperbolic cosine of the elements of a vector vsCosh, vdCosh Name vsCosh, vdCosh The hyperbolic cosine of the elements of a vector Purpose vsCosh and vdCosh calculate the hyperbolic cosine function of vector elements: y[i] = cosh(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsCosh (int n, const float a[ ], float y [ ] ) ; void vdCosh (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 308
vsDiv, vdDiv The quotient of corresponding elements of two vectors Name vsDiv, vdDiv The quotient of corresponding elements of two vectors Purpose vsDiv and vdDiv calculate the quotient of vector element pairs: y[i] = a[i]/b[i], for 0 <= i <= n-1 Usage VMATH: #include void vsDiv (int n, const float a[ ], const float b [ ], float y [ ] ) ; void vdDiv (int n, const double a [ ], const double b [ ], double y [ ] ) ; VMATH8: #include
PAGE 309
The error function of the elements of a vector vsErf, vdErf Name vsErf, vdErf The error function of the elements of a vector Purpose vsErf and vdErf calculate the error function of vector elements: y[i] = erf(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsErf (int n, const float a[ ], float y [ ] ) ; void vdErf (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 310
vsErfc, vdErfc The complimentary error function of the elements of a vector Name vsErfc, vdErfc The complimentary error function of the elements of a vector Purpose vsErfc and vdErfc calculate the complimentary error function of vector elements: y[i] = 1.0 - erfc(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsErfc (int n, const float a[ ], float y [ ] ) ; void vdErfc (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 311
The base-e exponential of the elements of a vector vsExp, vdExp Name vsExp, vdExp The base-e exponential of the elements of a vector Purpose vsExp and vdExp calculate the base-e exponential function of vector elements: y[i] = exp(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsExp (int n, const float a[ ], float y [ ] ) ; void vdExp (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 312
vsInv, vdInv The reciprocal of the elements of a vector Name vsInv, vdInv The reciprocal of the elements of a vector Purpose vsInv and vdInv calculate the reciprocal function of vector elements: y[i] = 1.0/a[i], for 0 <= i <= n-1 Usage VMATH: #include void vsInv (int n, const float a[ ], float y [ ] ) ; void vdInv (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 313
The reciprocal cube root of the elements of a vector vsInvCbrt, vdInvCbrt Name vsInvCbrt, vdInvCbrt The reciprocal cube root of the elements of a vector Purpose vsInvCbrt and vdInvCbrt calculate the reciprocal cube root function of vector elements: y[i] = 1.0/cbrt(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsInvCbrt (int n, const float a[ ], float y [ ] ) ; void vdInvCbrt (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 314
vsInvSqrt, vdInvSqrt The reciprocal square root of the elements of a vector Name vsInvSqrt, vdInvSqrt The reciprocal square root of the elements of a vector Purpose vsInvSqrt and vdInvSqrt calculate the reciprocal square root function of vector elements: y[i] = 1.0/sqrt(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsInvSqrt (int n, const float a[ ], float y [ ] ) ; void vdInvSqrt (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 315
The base-e logarithm of the elements of a vector vsLn, vdLn Name vsLn, vdLn The base-e logarithm of the elements of a vector Purpose vsLn and vdLn calculate the base-e logarithm function of vector elements: y[i] = log(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsLn (int n, const float a[ ], float y [ ] ) ; void vdLn (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 316
vsLog10, vdLog10 The base-10 logarithm of the elements of a vector Name vsLog10, vdLog10 The base-10 logarithm of the elements of a vector Purpose vsLog10 and vdLog10 calculate the base-10 logarithm function of vector elements: y[i] = log10(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsLog10 (int n, const float a[ ], float y [ ] ) ; void vdLog10 (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 317
Pack (gather) a vector with specified indexing into a contiguous vector vdPackV, vdPackM vsPackI, vsPackV, vsPackM, vdPackI, Name vsPackI, vsPackV, vsPackM, vdPackI, vdPackV, vdPackM Pack (gather) a vector with specified indexing into a contiguous vector Purpose vsPackI and vdPackI pack the vector of n elements stored (forward) with increment inca in array a into a contiguous vector in array y: y[i] = a[i*inca], for i = 0 to n-1 vsPackV and vdPackV pack the vector of elements in array a, with index vec
PAGE 318
vsPackI, vsPackV, vsPackM, vdPackI, vdPackV, vdPackM contiguous vector inca ia ma Output y 830 HP MLIB User’s Guide Pack (gather) a vector with specified indexing into a 1+(n-1)*inca for vsPackI and vdPackI 1 plus the maximum index ia[i] (i=0 to n-1) for vsPackV and vdPackV n for vsPackM and vdPackM. Index increment for vsPackI and vdPackI. Array containing the index vector for vsPackV and vdPackV. Length of ia is at least n. Array containing the mask vector for vsPackM and vdPackM.
PAGE 319
The power function of corresponding elements of two vectors vsPow, vdPow Name vsPow, vdPow The power function of corresponding elements of two vectors Purpose vsPow and vdPow calculate the power function of vector element pairs: y[i] = pow(a[i],b[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsPow (int n, const float a[ ], const float b [ ], float y [ ] ) ; void vdPow (int n, const double a [ ], const double b [ ], double y [ ] ) ; VMATH8: #include
PAGE 320
vsPowx, vdPowx Elements of a vector raised to a scalar power Name vsPowx, vdPowx Elements of a vector raised to a scalar power Purpose vsPowx and vdPowx raise vector elements to a scalar power: y[i] = powx(a[i],b[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsPowx (int n, const float a[ ], float b [ ], float y [ ] ) ; void vdPowx (int n, const double a [ ], double b [ ], double y [ ] ) ; VMATH8: #include
PAGE 321
The sine of the elements of a vector vsSin, vdSin Name vsSin, vdSin The sine of the elements of a vector Purpose vsSin and vdSin calculate the sine function of vector elements: y[i] = sin(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsSin (int n, const float a[ ], float y [ ] ) ; void vdSin (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 322
vsSinCos, vdSinCos Both sine and cosine of elements of a vector Name vsSinCos, vdSinCos Both sine and cosine of elements of a vector Purpose vsSinCos and vdSinCos calculate both the sine and cosine of vector elements: s[i] = sin(a[i]) and c[i] = cos(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsSinCos (int n, const float a[ ], float s [ ], float c [ ] ) ; void vdSinCos (int n, const double a [ ], double s [ ], double c [ ] ) ; VMATH8: #include
PAGE 323
The hyperbolic sine of the elements of a vector vsSinh, vdSinh Name vsSinh, vdSinh The hyperbolic sine of the elements of a vector Purpose vsSinh and vdSinh calculate the hyperbolic sine function of vector elements: y[i] = sinh(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsSinh (int n, const float a[ ], float y [ ] ) ; void vdSinh (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 324
vsSqrt, vdSqrt The square root of the elements of a vector Name vsSqrt, vdSqrt The square root of the elements of a vector Purpose vsSqrt and vdSqrt calculate the square root function of vector elements: y[i] = sqrt(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsSqrt (int n, const float a[ ], float y [ ] ) ; void vdSqrt (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 325
The tangent of the elements of a vector vsTan, vdTan Name vsTan, vdTan The tangent of the elements of a vector Purpose vsTan and vdTan calculate the tangent function of vector elements: y[i] = tan(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsTan (int n, const float a[ ], float y [ ] ) ; void vdTan (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 326
vsTanh, vdTanh The hyperbolic tangent of the elements of a vector Name vsTanh, vdTanh The hyperbolic tangent of the elements of a vector Purpose vsTanh and vdTanh calculate the hyperbolic tangent function of vector elements: y[i] = tanh(a[i]), for 0 <= i <= n-1 Usage VMATH: #include void vsTanh (int n, const float a[ ], float y [ ] ) ; void vdTanh (int n, const double a [ ], double y [ ] ) ; VMATH8: #include
PAGE 327
Unpack (scatter) a contiguous vector to a vector with specified indexing vdUnpackI, vdUnpackV, vdUnpackM vsUnpackI, vsUnpackV, vsUnpackM, Name vsUnpackI, vsUnpackV, vsUnpackM, vdUnpackI, vdUnpackV, vdUnpackM Unpack (scatter) a contiguous vector to a vector with specified indexing Purpose vsUnpackI and vdUnpackI unpack a contiguous vector of n elements in array a into a vector with increment incy in array y: y[i*incy] = a[i], for i = 0 to n-1 vsUnpackV and vdUnpackV unpack a contiguous vector of n eleme
PAGE 328
vsUnpackI, vsUnpackV, vsUnpackM, vdUnpackI, vdUnpackV, vdUnpackM Unpack (scatter) a contiguous vector to a vector with specified indexing incy iy my Output y 840 HP MLIB User’s Guide (at most n) in the mask vector for svUnpackM and vdUnpackM. Index increment for vsUnpackI and vdUnpackI. Array containing the index vector for vsUnpackV and vdUnpackV. Length of iy is at least n. Array containing the mask vector for vsUnpackM and vdUnpackM. Length of my is at least n. Array receiving the output vector.
PAGE 329
The arccosine of the elements of a vector vsacos, vdacos Name vsacos, vdacos The arccosine of the elements of a vector Purpose vsacos and vdacos calculate the arccosine function of vector elements: y(i) = acos(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 330
vsacosh, vdacosh The arc hyperbolic cosine of the elements of a vector Name vsacosh, vdacosh The arc hyperbolic cosine of the elements of a vector Purpose vsacosh and vdacosh calculate the arc hyperbolic cosine function of vector elements: y(i) = acosh(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 331
The arcsine of the elements of a vector vsasin, vdasin Name vsasin, vdasin The arcsine of the elements of a vector Purpose vsasin and vdasin calculate the arcsine function of vector elements: y(i) = asin(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 332
vsasinh, vdasinh The arc hyperbolic sine of the elements of a vector Name vsasinh, vdasinh The arc hyperbolic sine of the elements of a vector Purpose vsasinh and vdasinh calculate the arc hyperbolic sine function of vector elements: y(i) = asinh(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 333
The arctangent of the elements of a vector vsatan, vdatan Name vsatan, vdatan The arctangent of the elements of a vector Purpose vsatan and vdatan calculate the arctangent function of vector elements: y(i) = atan(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 334
vsatan2, vdatan2 The arctangent and quandrant function of the corresponding elements of two vectors Name vsatan2, vdatan2 The arctangent and quandrant function of the corresponding elements of two vectors Purpose vsatan2 and vdatan2 calculate the arctangent function of the quotient of vector element pairs a(i)/b(i), in the range -Pi to Pi, using the signs of both arguments to determine the quadrant of the result value: y(i) = atan2(a(i), b(i)), for 1 <= i <= n Usage The Fortran include file math.
PAGE 335
The arc hyperbolic tangent of the elements of a vector vsatanh, vdatanh Name vsatanh, vdatanh The arc hyperbolic tangent of the elements of a vector Purpose vsatanh and vdatanh calculate the arc hyperbolic tangent function of vector elements: y(i) = atanh(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 336
vscbrt, vdcbrt The cube root of the elements of a vector Name vscbrt, vdcbrt The cube root of the elements of a vector Purpose vscbrt and vdcbrt calculate the cube root of vector elements: y(i) = cbrt(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 337
The cosine of the elements of a vector vscos, vdcos Name vscos, vdcos The cosine of the elements of a vector Purpose vscos and vdcos calculate the cosine function of vector elements (specified in radians): y(i) = cos(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 338
vscosh, vdcosh The hyperbolic cosine of the elements of a vector Name vscosh, vdcosh The hyperbolic cosine of the elements of a vector Purpose vscosh and vdcosh calculate the hyperbolic cosine function of vector elements: y(i) = cosh(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 339
The quotient of corresponding elements of two vectors vsdiv, vddiv Name vsdiv, vddiv The quotient of corresponding elements of two vectors Purpose vsdiv and vddiv calculate the quotient of vector element pairs: y(i) = a(i)/b(i), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 340
vserf, vderf The error function of the elements of a vector Name vserf, vderf The error function of the elements of a vector Purpose vserf and vderf calculate the error function of vector elements: y(i) = erf(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 341
The complimentary error function of the elements of a vector vserfc, vderfc Name vserfc, vderfc The complimentary error function of the elements of a vector Purpose vserfc and vderfc calculate the complimantary error function of vector elements: y(i) = 1.0 - erfc(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 342
vsexp, vdexp The base-e exponential of the elements of a vector Name vsexp, vdexp The base-e exponential of the elements of a vector Purpose vsexp and vdexp calculate the base-e exponential function of vector elements: y(i) = exp(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 343
The sine of the elements of a vector, with variable increments vsin Name vsin The sine of the elements of a vector, with variable increments Purpose vsin calculates the sine function of vector elements: y(1+(i-1)*incy) = sin(x(1+(i-1)*incx)), for i = 1 to n Usage The Fortran include file math.fi declares an interface for this subroutine, suitable for both VMATH and VMATH8. The interface is compatible with Tru64 CXML (Compaq Extended Math Library). The C include files vmath.h (for VMATH) and vmath8.
PAGE 344
vsinv, vdinv The reciprocal of the elements of a vector Name vsinv, vdinv The reciprocal of the elements of a vector Purpose vsinv and vdinv calculate the reciprocal of vector elements: y(i) = 1.0/a(i), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 345
The reciprocal cube root of the elements of a vector vsinvcbrt, vdinvcbrt Name vsinvcbrt, vdinvcbrt The reciprocal cube root of the elements of a vector Purpose vsinvcbrt and vdinvcbrt calculate the reciprocal cube root of vector elements: y(i) = 1.0/cbrt(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 346
vsinvsqrt, vdinvsqrt The reciprocal square root of the elements of a vector Name vsinvsqrt, vdinvsqrt The reciprocal square root of the elements of a vector Purpose vsinvsqrt and vdinvsqrt calculate the reciprocal square root of vector elements: y(i) = 1.0/sqrt(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 347
The base-e logarithm of the elements of a vector vsln, vdln Name vsln, vdln The base-e logarithm of the elements of a vector Purpose vsln and vdln calculate the base-e logarithm function of vector elements: y(i) = log(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 348
vslog10, vdlog10 The base-10 logarithm of the elements of a vector Name vslog10, vdlog10 The base-10 logarithm of the elements of a vector Purpose vslog10 and vdlog10 calculate the base-10 logarithm function of vector elements: y(i) = log10(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 349
Pack (gather) a vector with specified indexing into a contiguous vector vdpackv, vdpackm vspacki, vspackv, vspackm, vdpacki, Name vspacki, vspackv, vspackm, vdpacki, vdpackv, vdpackm Pack (gather) a vector with specified indexing into a contiguous vector Purpose vspacki and vdpacki pack the vector of n elements stored (forward) with increment inca in array a into a contiguous vector in array y: y(i) = a(1+(i-1)*inca), for 1 = 1 to n vspackv and vdpackv pack the vector of elements in array a, with index
PAGE 350
vspacki, vspackv, vspackm, vdpacki, vdpackv, vdpackm contiguous vector Pack (gather) a vector with specified indexing into a SUBROUTINE vdpacki(n, a, inca, y) INTEGER, INTENT(IN) :: n, inca REAL (KIND=8), DIMENSION(n), INTENT(IN) :: a REAL (KIND=8), DIMENSION(n), INTENT(OUT):: y SUBROUTINE vdpackv(n, a, ia, y) INTEGER, INTENT(IN) :: n REAL (KIND=8), DIMENSION(n), INTENT(IN) :: a INTEGER, DIMENSION(n), INTENT(IN) :: ia REAL (KIND=8), DIMENSION(n), INTENT(OUT):: y SUBROUTINE vdpackm(n, a, ma, y) INTEGER,
PAGE 351
The exponentiation operation on corresponding elements of two vectors vspow, vdpow Name vspow, vdpow The exponentiation operation on corresponding elements of two vectors Purpose vspow and vdpow calculate the exponentiation operation on vector element pairs: y(i) = a(i)**b(i), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 352
vspowx, vdpowx Elements of a vector raised to a scalar power Name vspowx, vdpowx Elements of a vector raised to a scalar power Purpose vspowx and vdpowx calculate vector elements raised to a scalar power: y(i) = a(i)**b, for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 353
The square root of the elements of a vector, with variable increments vsqrt Name vsqrt The square root of the elements of a vector, with variable increments Purpose vsqrt calculates the square root function of vector elements: y(1+(i-1)*incy) = sqrt(x(1+(i-1)*incx)), for i = 1 to n Usage The Fortran include file math.fi declares an interface for this subroutine, suitable for both VMATH and VMATH8. The interface is compatible with Tru64 CXML (Compaq Extended Math Library). The C include files vmath.
PAGE 354
vssincos, vdsincos Both sine and cosine of the elements of a vector Name vssincos, vdsincos Both sine and cosine of the elements of a vector Purpose vssincos and vdsincos calculate both the sine and cosine of vector elements: s(i) = sin(a(i)) and c(i) = cos(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 355
The hyperbolic sine of the elements of a vector vssinh, vdsinh Name vssinh, vdsinh The hyperbolic sine of the elements of a vector Purpose vssinh and vdsinh calculate the hyperbolic sine of the elements of a vector: y(i) = sinh(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 356
vssqrt, vdsqrt The square root of the elements of a vector Name vssqrt, vdsqrt The square root of the elements of a vector Purpose vssqrt and vdsqrt calculate the square root of vector elements: y(i) = sqrt(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 357
The tangent of the elements of a vector vstan, vdtan Name vstan, vdtan The tangent of the elements of a vector Purpose vstan and vdtan calculate the tangent function of vector elements (specified in radians): y(i) = tan(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 358
vstanh, vdtanh The hyperbolic tangent of the elements of a vector Name vstanh, vdtanh The hyperbolic tangent of the elements of a vector Purpose vstanh and vdtanh calculate the hyperbolic tangent function of vector elements: y(i) = tanh(a(i)), for 1 <= i <= n Usage The Fortran include file math.fi declares interfaces for these subroutines, suitable for both VMATH and VMATH8. The interfaces are compatible with Intel’s MKL VML (Vector Math Library).
PAGE 359
Unpack (scatter) a contiguous vector to a vector with specified indexing vdunpacki, vdunpackv, vdunpackm vsunpacki, vsunpackv, vsunpackm, Name vsunpacki, vsunpackv, vsunpackm, vdunpacki, vdunpackv, vdunpackm Unpack (scatter) a contiguous vector to a vector with specified indexing Purpose vsunpacki and vdunpacki unpack a contiguous vector of n elements in array a into a vector with increment incy in array y: y(1+(i-1)*incy) = a(i), for i = 1 to n vsunpackv and vdunpackv unpack a contiguous vector of n e
PAGE 360
vsunpacki, vsunpackv, vsunpackm, vdunpacki, vdunpackv, vdunpackmUnpack (scatter) a contiguous vector to a vector with specified indexing SUBROUTINE vdunpacki(n, a, y, incy) INTEGER, INTENT(IN) :: n, incy REAL (KIND=8), DIMENSION(n), INTENT(IN) :: a REAL (KIND=8), DIMENSION(n), INTENT(OUT):: y SUBROUTINE vdunpackv(n, a, y, iy) INTEGER, INTENT(IN) :: n REAL (KIND=8), DIMENSION(n), INTENT(IN) :: a REAL (KIND=8), DIMENSION(n), INTENT(OUT):: y INTEGER, DIMENSION(n), INTENT(IN) :: iy SUBROUTINE vdunpackm(n, a,
PAGE 361
Part 6 HP SOLVERS
PAGE 362
PAGE 363
Overview 13 Sparse Linear Equations Overview This chapter describes state-of-the-art software for the direct solution of sparse systems of linear equations with symmetric coefficient matrices or unsymmetric coefficient matrices that are structurally symmetric. Throughout this chapter, a system of linear equations is called “symmetric” if its coefficient matrix is both symmetric in structure and in value. “Symmetric in value” or simply “symmetric” means that a(i,j) = a(j,i) for every i and j.
PAGE 364
Chapter objectives Chapter objectives After you read this chapter you will: • Understand what sparse systems are • Understand how to use these subprograms to solve linear systems and to estimate condition numbers • Understand issues in choosing an optimal method for a specific problem This sparse matrix linear equation software makes it possible to call a single subprogram to solve a single system of sparse linear equations. However, this requires a particular format for the sparse matrix.
PAGE 365
What you need to know to use these subprograms George, J.A. and J.W. Liu. Computer Solution of Large Sparse Positive Definite Systems. Englewood Cliffs, NJ: Prentice-Hall, Inc. 1981. George, A.; Liu, J. W. H. The Evolution of the Minimum Degree Ordering Algorithm. SIAM Review, Vol. 31, pp1-19, 1989. Heath, M.T., E. Ng, and B.W. Peyton. “Parallel Algorithms for Sparse Linear Systems.” SIAM Review. September, 1991. Vol. 33, No. 3. pp. 420-460. Karypis, G.; Kumar, V.
PAGE 366
What you need to know to use these subprograms Sparsity and storage formats Sparse matrices are matrices in which most of the entries are zero. The goal of sparse matrix software is to take maximum advantage of these zero entries to reduce storage and arithmetic. Storage is reduced by not storing zero entries; arithmetic is reduced by not performing operations on entries that are known to be zero. It is easiest to see how to economize on storage. Suppose that an n-by-n matrix A has only nz nonzero entries.
PAGE 367
What you need to know to use these subprograms If the entries are required to be ordered by row and column, less storage is needed. In addition, symmetry in the matrix can be used by storing only the entries, for example, on or below the main diagonal. Other contexts, such as finite element analysis, can make even more concise representations of the locations of the nonzeros.
PAGE 368
What you need to know to use these subprograms Figure 13-2 Column Pointer, Row Index Sparse Matrix Representation for the Full Matrix colstr = 1 rowind = 1 mxvalu = 11 31 41 22 32 52 13 23 33 53 14 44 54 25 35 45 55 66 Figure 13-3 4 3 4 7 2 3 11 5 1 14 2 3 5 18 1 4 19 5 2 3 4 5 6 - Column Pointer, Row Index Sparse Matrix Representation for the Lower Triangle of the Matrix colstr = 1 rowind = 1 3 4 2 3 5 3 5 4 5 5 6 mxvalu = 11 31 41 22 32 52 33 53 44
PAGE 369
What you need to know to use these subprograms Direct versus iterative solution This package is a direct linear equation solver. That is, it computes an explicit factorization of the matrix in a sparse analogy of the dense matrix subprograms DSYTRF/DSYTRS and DPOTRF/DPOTRS in LAPACK. There are applications where special, factorization-free algorithms can be used effectively. Algorithms appropriate for these cases iterate toward a solution, improving an approximate solution at each iteration.
PAGE 370
What you need to know to use these subprograms process is recovered many times over by the reduction obtained in the cost of the factorization. In some applications, it is common to have several unknowns associated with each grid point of the solution domain. Sparse matrices derived from these applications tend to have some special properties in their structure. A technique known as compression can often improve the effectiveness of the minimum degree reordering.
PAGE 371
What you need to know to use these subprograms In the dense case, pivoting causes no difficulties. In the sparse case, pivoting can interfere with the permutation chosen to reduce fill. This conflict is dealt with in two ways in this sparse package. First, the effect of pivoting is localized in its modification of the sparsity reordering. Second, a pivoting tolerance is incorporated that allows the user to fine-tune the balance between sparsity and guarantees of stability.
PAGE 372
What you need to know to use these subprograms Output controls This package differs from most library subroutines in providing optional printed or printable output. The amount of output is controlled by an integer variable, msglvl, specifying the message level. Setting msglvl ≤ 0 suppresses all printed messages, including error messages, and should generally be avoided. When msglvl = 1, some runtime statistics and any error messages are printed.
PAGE 373
What you need to know to use these subprograms can be used to save multiple structures or factorizations, or to interrupt the package at any point.
PAGE 374
Sample program Figure 13-5 Paths of Control (continued) Utilities: DSLEDA DSLEFF DSLEOC DSLEOP DSLEPS DSLERD Sample program As illustrated in Figure 13-4 and Figure 13-5, there are many possible paths of control through this sparse matrix package. The following sample programs are provided to show one possible path through the package. It is intended to demonstrate that the package is not as difficult to use as Figure 13-4 and Figure 13-5 may imply.
PAGE 375
Sample program Sample program for a symmetric matrix: INTEGER*4 I,J,K,NEQNS,IROW(NNZERO),JCOL(NNZERO),INRTIA(3),IER REAL*8 RCOND,GLOBAL(150),MXVALU(NNZERO),PVTTOL,VALUE C C C ... INITIALIZE THE SPARSE MATRIX PACKAGE CALL DSLEIN (NEQNS,1,6,GLOBAL,IER) IF ( IER .NE. 0 ) GO TO 8000 C C C ... INPUT THE MATRIX STRUCTURE (LOWER TRIANGULAR) DO 100 K = 1, NNZERO I = IROW(K) J = JCOL(K) CALL DSLEI1 (I,J,GLOBAL,IER) IF ( IER .NE. 0 ) GO TO 8000 100 CONTINUE CALL DSLEIF (GLOBAL,IER) IF ( IER .NE.
PAGE 376
Sample program C C C ... USE THE SOLUTION STORED IN ARRAY RHS . . . C C C ... ERROR TRAP 8000 .......... Sample program for a nonsymmetric matrix: INTEGER*4 I,J,K,NEQNS,IROW(NNZERO),JCOL(NNZERO),INRTIA(3),IER REAL*8 GLOBAL(150),MXVALU(NNZERO),PVTTOL,VALUE C C C ... INITIALIZE THE SPARSE MATRIX PACKAGE CALL DSLEIN (NEQNS,1,6,GLOBAL,IER) IF ( IER .NE. 0 ) GO TO 8000 CALL DSLEMA ("SU", GLOBAL, IER) IF (IER .NE. 0) GO TO 8000 C C C ...
PAGE 377
Sample program C C C ... FACTOR THE MATRIX PVTTOL = 0.0D0 CALL DSLEFA (PVTTOL,INRTIA,GLOBAL,IER) IF ( IER .NE. 0 ) GO TO 8000 C C C ... SOLVE FOR A GIVEN RIGHT HAND SIDE CALL DSLESL (1,RHS,NEQNS,GLOBAL,IER) IF ( IER .NE. 0 ) GO TO 8000 C C C ... USE THE SOLUTION STORED IN ARRAY RHS . . . C C C ... ERROR TRAP 8000 ..........
PAGE 378
Subprograms for sparse linear equations Subprograms for sparse linear equations The following sections describe subprograms included in VECLIB for the direct solution of sparse symmetric or nonsymmetric but structurally symmetric linear equations.
PAGE 379
Numeric factorization and condition number estimation Name DSLECO DSLECO Numeric factorization and condition number estimation Purpose This subprogram computes the numeric factorization and condition number estimate of the sparse symmetric matrix input to the package. The condition number estimate is not implemented for nonsymmetric matrices. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 380
DSLECO Numeric factorization and condition number estimation INTEGER*8 inrtia(3), ier REAL*8 pvttol, rcond REAL*8 global(150) CALL DSLECO(pvttol, rcond, inrtia, global, ier) INTEGER*8 inrtia(3), ier REAL*16 pvttol, rcond REAL*8 global(150) CALL QSLECO(pvttol, rcond, inrtia, global, ier) INTEGER*8 inrtia(3), ier COMPLEX*8 pvttol, rcond REAL*8 global(150) CALL CSLECO(pvttol, rcond, inrtia, global, ier) INTEGER*8 inrtia(3), ier COMPLEX*16 pvttol, rcond REAL*8 global(150) CALL ZSLECO(pvttol, rcond, inrtia, gl
PAGE 381
Numeric factorization and condition number estimation ier = −501 ier = −502 ier = −503 Example DSLECO Error in dynamic storage allocation during factorization. Error in dynamic storage allocation during factorization. Exactly zero pivot encountered during factorization; matrix is singular. Factor the matrix input to the package and estimate its condition number. Use a pivot tolerance of 0.1. INTEGER*4 INRTIA(3),IER REAL*8 RCOND,GLOBAL(150) CALL DSLECO (0.1D0,RCOND,INRTIA,GLOBAL,IER) IF ( IER .NE.
PAGE 382
DSLEDA Name Deallocate working storage DSLEDA Deallocate working storage Purpose Deallocate working storage at the end of processing. If the program using the package is to continue execution after use of the package is completed, it is recommended that the user deallocate the dynamically-allocated working storage with subroutine DSLEDA. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 383
Deallocate working storage DSLEDA REAL*8 global(150) CALL QSLEDA(global, ier) INTEGER*8 ier REAL*8 global(150) CALL CSLEDA(global, ier) INTEGER*8 ier REAL*8 global(150) CALL ZSLEDA(global, ier) Updated global Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package. Output ier Status response: ier = 0 Normal return. ier = −200 No system exists for this global communications array.
PAGE 384
DSLEFA Name Numeric factorization DSLEFA Numeric factorization Purpose This subprogram computes the numeric factorization of the sparse symmetric matrix input to the package. No condition number estimation is performed. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 385
Numeric factorization DSLEFA INTEGER*8 inrtia(3), ier REAL*8 pvttol REAL*8 global(150) CALL DSLEFA(pvttol, inrtia, global, ier) INTEGER*8 inrtia(3), ier REAL*16 pvttol REAL*8 global(150) CALL QSLEFA(pvttol, inrtia, global, ier) INTEGER*8 inrtia(3), ier COMPLEX*8 pvttol REAL*8 global(150) CALL CSLEFA(pvttol, inrtia, global, ier) INTEGER*8 inrtia(3), ier COMPLEX*16 pvttol REAL*8 global(150) CALL ZSLEFA(pvttol, inrtia, global, ier) Input pvttol Pivoting tolerance; 0 ≤ pvttol ≤ 1.
PAGE 386
DSLEFA Numeric factorization ier = −502 ier = −503 898 HP MLIB User’s Guide Error in dynamic storage allocation during factorization. Exactly zero pivot encountered during factorization; matrix is singular.
PAGE 387
Numeric factorization Example DSLEFA Factor the matrix input to the package. Use a pivot tolerance of 0.1. INTEGER*4 INRTIA(3),IER REAL*8 GLOBAL(150) CALL DSLEFA (0.1D0,INRTIA,GLOBAL,IER) IF ( IER .NE.
PAGE 388
DSLEFF Name Specify singularity treatment DSLEFF Specify singularity treatment Purpose This subprogram provides a mechanism to deal with singular (or nearly singular) matrices by specifying a course of action to follow when zero or small diagonal elements are encountered in the factorization during subsequent calls to DSLEFA or DSLECO that specify pvttol = 0. NOTE Matrix structure input by matrix supports five data types.
PAGE 389
Specify singularity treatment DSLEFF INTEGER*8 iabort, ier REAL*8 fudge, tol REAL*8 global(150) CALL DSLEFF(iabort, fudge, tol, global, ier) INTEGER*8 iabort, ier REAL*16 fudge, tol REAL*8 global(150) CALL QSLEFF(iabort, fudge, tol, global, ier) INTEGER*8 iabort, ier COMPLEX*8 fudge, tol REAL*8 global(150) CALL CSLEFF(iabort, fudge, tol, global, ier) INTEGER*8 iabort, ier COMPLEX*16 fudge, tol REAL*8 global(150) CALL ZSLEFF(iabort, fudge, tol, global, ier) Input iabort fudge Flag to specify when to te
PAGE 390
DSLEFF Specify singularity treatment tol The location of all of the fudged pivots can be determined if fudge ≤ tol by calling DSLERD to get the diagonal, and then searching it for values matching fudge. User-specified value of the tolerance. Those diagonal elements whose absolute values are less than tol during the numerical factorization are replaced by fudge. Default value is 0 if DSLEFF has not been called Updated global Global communications array for this problem.
PAGE 391
One-call usage DSLEFS Name DSLEFS One-call usage Purpose This subprogram provides one-call interface to the sparse linear equation solution package for a symmetric matrix based on the LDLT factorization. If this subroutine does not fit your needs, a more flexible interface is available by using the other subroutines in the package. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 392
DSLEFS One-call usage INTEGER*4 neqns, maxzer, msglvl, output, colstr(neqns+1), rowind(nnzero), nrhs, ldrhs, inrtia(3), ier COMPLEX*16 value(nnzero), rhs(ldrhs, nrhs) REAL*8 global(150) CALL ZSLEFS(neqns, maxzer, pvttol, msglvl, output, colstr, rowind, value, nrhs, rhs, ldrhs, rcond, inrtia, global, ier) VECLIB8 INTEGER*8 neqns, maxzer, msglvl, output, colstr(neqns+1), rowind(nnzero), nrhs, ldrhs, inrtia(3), ier REAL*4 pvttol, value(nnzero), rhs(ldrhs, nrhs), rcond REAL*8 global(150) CALL SSLEFS(neqns,
PAGE 393
One-call usage DSLEFS pvttol msglvl output colstr rowind value nrhs purpose of forming columns with identical structure to increase the efficiency of the factorization. If maxzer = 0, no additional fill is allowed. If many solution vectors are to be computed for each factorization, it is suggested that maxzer = 0 be used. If only a few solution vectors are to be computed maxzer = 10 could be tried but maxzer = 0 is perfectly acceptable.
PAGE 394
DSLEFS One-call usage ldrhs The leading dimension of array rhs as declared in the calling program unit, with ldrhs ≥ neqns. Updated rhs On input, rhs contains the nrhs right-hand sides. On output, rhs has been overwritten with the computed solutions. Output rcond Estimate of the reciprocal of the 1-norm condition number. Array of length 3 containing, respectively, the number of positive eigenvalues, the number of negative eigenvalues, and an indicator if there are zero eigenvalues.
PAGE 395
One-call usage DSLEFS Notes Calling DSLEFS is equivalent to calling DSLEIN, DSLEIM, DSLEOR, DSLEVM, DSLECO, and DSLESL in sequence. You can follow the call to DSLEFS with other calls to subprograms in the package to solve additional right-hand sides, to input new matrix values, to input a new matrix structure, or to perform utility functions. Example Solve the small (order 6) sparse system of linear equations where the matrix A and right-hand side b are A= 4.0 0.0 1.0 1.0 0.0 0.0 0.0 4.0 1.0 0.0 1.
PAGE 396
DSLEFS One-call usage INTEGER*4 COLSTR(7),ROWIND(13),INRTIA(3),IER REAL*8 VALUES(13),RHS(6),RCOND,GLOBAL(150) COLSTR(1) COLSTR(2) COLSTR(3) COLSTR(4) COLSTR(5) COLSTR(6) COLSTR(7) = = = = = = = 1 4 7 9 12 13 14 ROWIND(1) = 1 ROWIND(2) = 3 ROWIND(3) = 4 ROWIND(4) = 2 ROWIND(5) = 3 ROWIND(6) = 5 ROWIND(7) = 3 ROWIND(8) = 5 ROWIND(9) = 4 ROWIND(10) = 5 ROWIND(11) = 6 ROWIND(12) = 5 ROWIND(13) = 6 VALUES(1) = 4.0D0 VALUES(2) = 1.0D0 VALUES(3) = 1.0D0 VALUES(4) = 4.0D0 VALUES(5) = 1.0D0 VALUES(6) = 1.
PAGE 397
Matrix structure input by single entry DSLEI1 Name DSLEI1 Matrix structure input by single entry Purpose This subprogram adds a single entry in the (irow, jcol) position to the set of known nonzeros for the sparse matrix. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 398
DSLEI1 Matrix structure input by single entry Notes Calls to DSLEI1 and DSLEIE can be intermixed. DSLEIC and DSLEIM cannot be used if DSLEI1 or DSLEIE are used. Example Add the entry in row 3035, column 1024 to the list of nonzeros in the matrix. INTEGER*4 I,J,IER REAL*8 GLOBAL(150) I = 3035 J = 1024 CALL DSLEI1 (I,J,GLOBAL,IER) IF ( IER .NE.
PAGE 399
Matrix structure input by column DSLEIC Name DSLEIC Matrix structure input by column Purpose This subprogram adds a list of indices in a single column to the set of known nonzeros for the sparse matrix. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 400
DSLEIC Matrix structure input by column ier = 0 ier = −100 ier = −101 ier = −103 ier = −104 ier = −106 Notes Normal return. Incorrect processing path; DSLEIN not called or matrix input already finished. Error in dynamic storage allocation. Illegal value for nzcol. Illegal value for jcol, or out of order. Illegal value for at least one entry in jrowin or entries out of order. The entire matrix structure must be input with DSLEIC if it is used. Its use is not compatible with DSELE1, DSLEIE, or DSLEIM.
PAGE 401
Matrix structure input by column DSLEIC INTEGER*4 J,NZCOL,JROWIN(100),IER REAL*8 GLOBAL(150) J = 4519 NZCOL = 7 JROWIN(1) JROWIN(2) JROWIN(3) JROWIN(4) JROWIN(5) JROWIN(6) JROWIN(7) JROWIN(8) = = = = = = = = 1 2735 4519 4520 4521 6000 6002 6004 CALL DSLEIC (J,NZCOL,JROWIN,GLOBAL,IER) IF ( IER .NE.
PAGE 402
DSLEIE Matrix structure input by finite element Name DSLEIE Matrix structure input by finite element Purpose This subprogram adds indices to the set of known nonzeros in the lower triangle of the sparse matrix corresponding to a finite element or clique. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 403
Matrix structure input by finite element DSLEIE Notes Calls to DSLEIE can be intermixed with calls to DSLEI1. DSLEIC and DSLEIM cannot be used if DSLEI1 or DSLEIE is used. Example Rows and columns 345, 346, 347 and 989 form a small dense submatrix of A. Add the positions consisting of all pairs of numbers from this set to the list of nonzeros in the matrix.
PAGE 404
DSLEIF End of matrix structure input Name DSLEIF End of matrix structure input Purpose This subprogram specifies the end of structure input for the matrix. DSELIF is used only if routines DSLEI1 and/or DSLEIE were the mechanism by which the structure was input. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 405
Matrix structure input by matrix DSLEIM Name DSLEIM Matrix structure input by matrix Purpose This subprogram specifies the locations of nonzeros of the sparse matrix all at once. For a symmetric matrix (LDL’ factorization), only the locations of the nonzeros in the lower triangle are specified. For a nonsymmetric matrix (LU factorization), the locations of all nonzeros in the sparse matrix are specified. NOTE Matrix structure input by matrix supports five data types.
PAGE 406
DSLEIM Matrix structure input by matrix INTEGER*8 neqns, nnzero, colstr(neqns+1), rowind(nnzero), ier REAL*8 global(150) CALL CSLEIM(colstr, rowind, global, ier) INTEGER*8 neqns, nnzero, colstr(neqns+1), rowind(nnzero), ier REAL*8 global(150) CALL ZSLEIM(colstr, rowind, global, ier) Input colstr rowind For a symmetric matrix, colstr(j) gives the index in rowind of the first nonzero in the lower triangular part of column j of the matrix.
PAGE 407
Matrix structure input by matrix DSLEIM of order within a column or out of the lower triangle). Notes This is the most efficient mechanism for specifying the nonzero structure, but the entire matrix structure must be input with DSLEIM if it is used. Its use is not compatible with DSLEI1, DSLEIE, or DSLEIC.
PAGE 408
DSLEIM Matrix structure input by matrix INTEGER*4 COLSTR(7),ROWIND(12),IER REAL*8 GLOBAL(150) COLSTR(1) COLSTR(2) COLSTR(3) COLSTR(4) COLSTR(5) COLSTR(6) COLSTR(7) = = = = = = = 1 3 5 8 10 13 13 ROWIND(1) = 3 ROWIND(2) = 4 ROWIND(3) = 3 ROWIND(4) = 5 ROWIND(5) = 1 ROWIND(6) = 2 ROWIND(7) = 5 ROWIND(8) = 1 ROWIND(9) = 5 ROWIND(10) = 2 ROWIND(11) = 3 ROWIND(12) = 4 CALL DSLEIM (COLSTR,ROWIND,GLOBAL,IER) IF ( IER .NE.
PAGE 409
Initialize sparse linear equations DSLEIN Name DSLEIN Initialize sparse linear equations Purpose This subprogram provides the necessary information to begin processing with the sparse linear equation solution package. For a nonsymmetric matrix (LU factorization), this call must be followed by a call to DSLEMA. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 410
DSLEIN Initialize sparse linear equations INTEGER*8 neqns, msglvl, output, ier REAL*8 global(150) CALL CSLEIN(neqns, msglvl, output, global, ier) INTEGER*8 neqns, msglvl, output, ier REAL*8 global(150) CALL ZSLEIN(neqns, msglvl, output, global, ier) Input neqns msglvl output Output global ier Example Order of matrix (number of degrees of freedom); neqns > 0. Message level for printable output: msglvl ≤ 0 Suppress all output. msglvl = 1 Error messages and summary statistics.
PAGE 411
One-call usage DSLELU Name DSLELU One-call usage Purpose This subprogram provides a one-call interface to the sparse linear equation solution package for nonsymmetric matrices based on the LU factorization. If this subroutine does not fit your needs, a more flexible interface is available by using the other subroutines in the sparse linear equation package. NOTE Matrix structure input by matrix supports five data types.
PAGE 412
DSLELU One-call usage INTEGER*4 neqns, maxzer, msglvl, output, colstr(neqns+1), rowind(nnzero), nrhs, ldrhs, inrtia(3), ier COMPLEX*16 value(nnzero), rhs(ldrhs, nrhs) REAL*8 global(150) CALL ZSLELU(neqns, maxzer, pvttol, msglvl, output, colstr, rowind, value, nrhs, rhs, ldrhs, rcond, inrtia, global, ier) VECLIB8: INTEGER*8 neqns, maxzer, msglvl, output, colstr(neqns+1), rowind(nnzero), nrhs, ldrhs, inrtia(3), ier REAL*4 pvttol, value(nnzero), rhs(ldrhs, nrhs), rcond REAL*8 global(150) CALL SSLELU(neqns
PAGE 413
One-call usage DSLELU pvttol msglvl output nrhs ldrhs colstr rowind value purpose of forming columns with identical structure to increase the efficiency of the factorization. If maxzer = 0, no additional fill is allowed. If many solution vectors are to be computed for each factorization, it is suggested that maxzer = 0 be used. If only a few solution vectors are to be computed maxzer = 10 could be tried but maxzer = 0 is perfectly acceptable.
PAGE 414
DSLELU One-call usage ier = 0 ier = −101 ier = −102 ier = −106 ier = −201 ier = −301 ier = −302 ier = −401 ier = −501 ier = −502 ier = −503 ier = −602 ier = −603 926 HP MLIB User’s Guide Normal return. Error in dynamic storage allocation during matrix structure input. neqns ≤ 0. Unacceptable matrix structure. Error in dynamic storage allocation during ordering. Error in dynamic storage allocation during symbolic factorization. Internal error. Error in dynamic storage allocation during value input.
PAGE 415
One-call usage DSLELU Unused as output. Reserved for future use. Array of length 3 containing, respectively, the number of positive eigenvalues, the number of negative eigenvalues, and an indicator if there are zero eigenvalues. Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package.
PAGE 416
DSLELU One-call usage ROWIND(8) = 2 ROWIND(9) = 3 ROWIND(10) = 5 ROWIND(11) = 1 ROWIND(12) = 4 ROWIND(13) = 5 ROWIND(14) = 6 ROWIND(15) = 2 ROWIND(16) = 3 ROWIND(17) = 4 ROWIND(18) = 5 ROWIND(19) = 4 ROWIND(20) = 6 VALUES(1) = 4.0D0 VALUES(2) = 1.0D0 VALUES(3) = 1.0D0 VALUES(4) = 4.0D0 VALUES(5) = 1.0D0 VALUES(6) = 1.0D0 VALUES(7) = −1.0D0 VALUES(8) = −1.0D0 VALUES(9) = 4.0D0 VALUES(10) = 1.0D0 VALUES(11) = −1.0D0 VALUES(12) = 4.0D0 VALUES(13) = 1.0D0 VALUES(14) = 1.0D0 VALUES(15) = −1.
PAGE 417
Specify coefficient matrix type of sparse linear equations DSLEMA Name DSLEMA Specify coefficient matrix type of sparse linear equations Purpose This subprogram specifies whether the coefficient matrix is symmetric, or unsymmetric but with symmetric structure. A call to DSLEMA is optional for symmetric case. For the unsymmetric case, DSLEMA must be called immediately after DSLEIN. NOTE Matrix structure input by matrix supports five data types.
PAGE 418
DSLEMA Specify coefficient matrix type of sparse linear equations CHARACTER*2 type INTEGER*8 ier REAL*8 global(150) CALL DSLEMA(type, global, ier) CHARACTER*2 type INTEGER*8 ier REAL*8 global(150) CALL QSLEMA(type, global, ier) CHARACTER*2 type INTEGER*8 ier REAL*8 global(150) CALL CSLEMA(type, global, ier) CHARACTER*2 type INTEGER*8 ier REAL*8 global(150) CALL ZSLEMA(type, global, ier) Input Output type global ier 930 HP MLIB User’s Guide Type of matrix: type = ’SS’ or ’ss’ type = ’SU’ or ’su’ Sy
PAGE 419
Output control Name DSLEOC DSLEOC Output control Purpose This subprogram may be called at any point after subprogram DSLEIN to alter either the output message level or the Fortran output unit number for message output. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 420
DSLEOC Output control INTEGER*8 msglvl, output REAL*8 global(150) CALL CSLEOC(msglvl, output, global) INTEGER*8 msglvl, output REAL*8 global(150) CALL ZSLEOC(msglvl, output, global) Input msglvl Message level for printable output: msglvl ≤ 0 Suppress all output. msglvl = 1 output Updated Example global Error messages and summary statistics. msglvl = 2 More complete statistics. msglvl = 3 First stage of debugging output. msglvl = 4 Complete debugging output.
PAGE 421
Select reordering scheme Name DSLEOP DSLEOP Select reordering scheme Purpose This subprogram selects the reordering scheme used by the solver. The default is to use the Multiple Minimum Degree (MMD) reordering. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 422
DSLEOP Select reordering scheme INTEGER*8 ier REAL*8 global(150) CALL DSLEOP(order, global, ier) CHARACTER*3 order INTEGER*8 ier REAL*8 global(150) CALL QSLEOP(order, global, ier) CHARACTER*3 order INTEGER*8 ier REAL*8 global(150) CALL CSLEOP(order, global, ier) CHARACTER*3 order INTEGER*8 ier REAL*8 global(150) CALL ZSLEOP(order, global, ier) Input order The reordering scheme used by the solver. The reordering scheme depends on the matrix structure input as follows.
PAGE 423
Select reordering scheme DSLEOP Use METIS 4.0.1 ordering. order = ’CMD or ’cmd’ Use Constrained Minimum Degree ordering.
PAGE 424
DSLEOP Select reordering scheme For matrix structure input by elements, columns, and finite elements, use one of the following four options: order = ’MMD’ or ’mmd’ Use the Multiple Minimum Degree ordering. order = ’NOT’ or ’not’ No internal reordering is performed; the original ordering is used in all phases of the solver. order = ’MUL’ or ’mul’ Use the Multilevel Nested Dissection ordering.
PAGE 425
Reordering and symbolic factorization DSLEOR Name DSLEOR Reordering and symbolic factorization Purpose This subprogram reorders the matrix whose pattern of nonzeros has been input to the package to obtain an efficient sparse factorization. It then constructs data structures that represent the factorization. Optionally, different ordering heuristics can be selected by a call to DSLEOP before the call to DSLEOR. NOTE Matrix structure input by matrix supports five data types.
PAGE 426
DSLEOR Reordering and symbolic factorization INTEGER*8 maxzer, ier REAL*8 global(150) CALL CSLEOR(maxzer, global, ier) INTEGER*8 maxzer, ier REAL*8 global(150) CALL ZSLEOR(maxzer, global, ier) Input maxzer Supernodal relaxation parameter; maxzer ≥ 0. If maxzer > 0, the package allows additional fill for the purpose of forming columns with identical structure to increase the efficiency of the factorization. If maxzer = 0 no additional fill is allowed.
PAGE 427
Reordering and symbolic factorization Example DSLEOR The sparse matrix has been communicated to the page using DSLEIN and DSLEI1, DSLEIE, and DSLEIF, and DSLEIC or DSLEIM. The next step is obtaining good reordering and performing symbolic factorization for the numeric factorization phase. The value of MAXZER = 0 is used so that no extra fill occurs in the factorization. INTEGER*4 IER REAL*8 GLOBAL(150) CALL DSLEOR (0,GLOBAL,IER) IF ( IER .NE.
PAGE 428
DSLEPS Name Print statistics DSLEPS Print statistics Purpose This subprogram prints available statistics about the original matrix, amount of work space in use, amount required for the next phase, and maximum amount used thus far. Also, this subprogram prints storage and arithmetic requirements, CPU time used, and computational rate for Cholesky factorization. The amount of information printed depends on the stage of execution.
PAGE 429
Return diagonal elements of matrix in its current form Name DSLERD DSLERD Return diagonal elements of matrix in its current form Purpose This subprogram is called after the matrix value input is complete. This operation is not implemented for nonsymmetric matrices. It returns the diagonal elements of the sparse matrix in its current form: • If the matrix has not been factored, diag (A) is returned. • If the matrix has been factored, diag (D) is returned.
PAGE 430
DSLERD Return diagonal elements of matrix in its current form INTEGER*4 ier COMPLEX*16 diag(neqns) REAL*8 global(150) CALL ZSLERD(job, diag, global, ier) VECLIB8: CHARACTER*1 job INTEGER*8 ier REAL*4 diag(neqns) REAL*8 global(150) CALL SSLERD(job, diag, global, ier) CHARACTER*1 job INTEGER*8 ier REAL*8 diag(neqns) REAL*8 global(150) CALL DSLERD(job, diag, global, ier) CHARACTER*1 job INTEGER*8 ier REAL*16 diag(neqns) REAL*8 global(150) CALL QSLERD(job, diag, global, ier) CHARACTER*1 job INTEGER*8 ier COM
PAGE 431
Return diagonal elements of matrix in its current form global Output diag ier Notes DSLERD permuted order determined by the reordering algorithm. Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package. Array of diagonal elements of the current form of the sparse matrix. Status response: ier = 0 Normal return. ier = −800 Incorrect processing path; matrix value input not completed. Diagonal values are not returned.
PAGE 432
DSLESL Name Solve DSLESL Solve Purpose This subprogram computes the solution for a set of right hand side vectors given a numeric factorization performed by DSLECO or DSLEFA. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision.
PAGE 433
Solve DSLESL INTEGER*8 nrhs, ldrhs, ier REAL*16 rhs(ldrhs, nrhs) REAL*8 global(150) CALL QSLESL(nrhs, rhs, ldrhs, global, ier) INTEGER*8 nrhs, ldrhs, ier COMPLEX*8 rhs(ldrhs, nrhs) REAL*8 global(150) CALL CSLESL(nrhs, rhs, ldrhs, global, ier) INTEGER*8 nrhs, ldrhs, ier COMPLEX*16 rhs(ldrhs, nrhs) REAL*8 global(150) CALL ZSLESL(nrhs, rhs, ldrhs, global, ier) Input nrhs ldrhs Updated global rhs Output ier Number of right-hand sides; nrhs > 0.
PAGE 434
DSLESL Example Solve Given the following right-hand side vector, compute the solution given the numerical factorization computed by DSLECO or DSLEFA. rhs = 1.0 0.0 1.0 1.0 0.0 0.0 INTEGER*4 IER REAL*8 RHS(6),GLOBAL(150) RHS(1) RHS(2) RHS(3) RHS(4) RHS(5) RHS(6) = = = = = = 1.0 0.0 1.0 1.0 0.0 0.0 CALL DSLESL (1,RHS,6,GLOBAL,IER) IF ( IER .NE.
PAGE 435
Matrix value input by single entry DSLEV1 Name DSLEV1 Matrix value input by single entry Purpose This subprogram adds to the value of the entry in the (irow, jcol) position of the sparse matrix. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision. Calls to DSLEV1, DSLEVC, DSLEVE, and DSLEVM can be intermixed.
PAGE 436
DSLEV1 Matrix value input by single entry INTEGER*8 irow, jcol, ier REAL*8 value, global(150) REAL*8 global(150) CALL DSLEV1(irow, jcol, value, global, ier) INTEGER*8 irow, jcol, ier REAL*16 value REAL*8 global(150) CALL QSLEV1(irow, jcol, value, global, ier) INTEGER*8 irow, jcol, ier COMPLEX*8 value REAL*8 global(150) CALL CSLEV1(irow, jcol, value, global, ier) INTEGER*8 irow, jcol, ier COMPLEX*16 value REAL*8 global(150) CALL ZSLEV1(irow, jcol, value, global, ier) Input irow jcol value Row index of t
PAGE 437
Matrix value input by single entry DSLEV1 INTEGER*4 I,J,IER REAL*8 VALUE,GLOBAL(150) I = 3035 J = 1024 VALUE = 4.523D-5 CALL DSLEV1 (I,J,VALUE,GLOBAL,IER) IF ( IER .NE.
PAGE 438
DSLEVC Matrix value input by column Name DSLEVC Matrix value input by column Purpose This subprogram updates the values of a list of nonzero entries in a single column of the sparse matrix. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision. Calls to DSLEV1, DSLEVC, DSLEVE, and DSLEVM can be intermixed. It is not necessary that all entries in the column jcol be included in jrowin.
PAGE 439
Matrix value input by column DSLEVC INTEGER*8 jcol, nzcol, jrowin(nzcol), ier REAL*4 values(nzcol) REAL*8 global(150) CALL SSLEVC(jcol, nzcol, jrowin, values, global, ier) INTEGER*8 jcol, nzcol, jrowin(nzcol), ier REAL*8 values(nzcol) REAL*8 global(150) CALL DSLEVC(jcol, nzcol, jrowin, values, global, ier) INTEGER*8 jcol, nzcol, jrowin(nzcol), ier REAL*16 values(nzcol) REAL*8 global(150) CALL QSLEVC(jcol, nzcol, jrowin, values, global, ier) INTEGER*8 jcol, nzcol, jrowin(nzcol), ier COMPLEX*8 values(nzcol)
PAGE 440
DSLEVC Matrix value input by column Updated global Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package. Output ier Status response: ier = 0 Normal return. ier = −400 Incorrect processing path; DSLEOR not called. ier = −401 Error in dynamic storage allocation. ier = −402 At least one subscript pair (jrowin(k),jcol) was not specified in structure input. No room for value.
PAGE 441
Matrix value input by column Example 2 DSLEVC Column 4519 of a nonsymmetric matrix has entries in rows 1, 2735, 4519, 4520, 4521, 6000, 6002, and 6004. Input the value 1.0 to each of these positions in the matrix.
PAGE 442
DSLEVE Name Matrix value input by finite element DSLEVE Matrix value input by finite element Purpose This subprogram adds to the values of a set of nonzero entries in the lower triangle of the sparse matrix corresponding to a finite element or clique. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision. Calls to DSLEV1, DSLEVC, DSLEVE, and DSLEVM can be intermixed.
PAGE 443
Matrix value input by finite element DSLEVE INTEGER*8 nnode, ldelmx, nodlst(nnode), ier REAL*4 elmmtx(ldelmx, nnode) REAL*8 global(150) CALL SSLEVE(nnode, nodlst, elmmtx, ldelmx, global, ier) INTEGER*8 nnode, ldelmx, nodlst(nnode), ier REAL*8 elmmtx(ldelmx, nnode), global(150) REAL*8 global(150) CALL DSLEVE(nnode, nodlst, elmmtx, ldelmx, global, ier) INTEGER*8 nnode, ldelmx, nodlst(nnode), ier REAL*16 elmmtx(ldelmx, nnode) REAL*8 global(150) CALL QSLEVE(nnode, nodlst, elmmtx, ldelmx, global, ier) INTEGER*
PAGE 444
DSLEVE Matrix value input by finite element ier = −400 ier = −401 ier = −402 Example Incorrect processing path; DSLEOR not called. Error in dynamic storage allocation. At least one subscript pair (nodlst(k),nodlst(l)) was not specified in structure input. No room for value. Rows and columns 345, 346, 347, and 989 form a small dense submatrix of A. Add the value 1.0 to the values in the positions consisting of all pairs of numbers from this set to the list of nonzeros in the matrix.
PAGE 445
Matrix value Input by matrix Name DSLEVM DSLEVM Matrix value Input by matrix Purpose This subprogram updates the values of many or all of the nonzero entries in the sparse matrix. NOTE Matrix structure input by matrix supports five data types. Matrix structure input by elements, columns, and finite elements supports only REAL*8 precision. This is the most efficient mechanism for specifying the nonzero values. Normally, DSLEVM is used in conjunction with DSLEIM.
PAGE 446
DSLEVM Matrix value Input by matrix INTEGER*8 neqns, nnzero, colstr(neqns+1), rowind(nnzero), ier REAL*8 values(nnzero), global(150) REAL*8 global(150) CALL DSLEVM(colstr, rowind, values, global, ier) INTEGER*8 neqns, nnzero, colstr(neqns+1), rowind(nnzero), ier REAL*16 values(nnzero) REAL*8 global(150) CALL QSLEVM(colstr, rowind, values, global, ier) INTEGER*8 neqns, nnzero, colstr(neqns+1), rowind(nnzero), ier COMPLEX*8 values(nnzero) REAL*8 global(150) CALL CSLEVM(colstr, rowind, values, global, ier) I
PAGE 447
Matrix value Input by matrix Updated global DSLEVM Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package.
PAGE 448
DSLEVM Output Example 1 Matrix value Input by matrix Status response: ier = 0 Normal return. ier = −400 Incorrect processing path; DSLEOR not called. ier = −401 Error in dynamic storage allocation. ier = −402 At least one subscript pair was not specified in structure input. No room for value. ier Input the values for the following small (order 6) symmetric sparse matrix problem. 4.0 0.0 1.0 1.0 0.0 0.0 0.0 4.0 1.0 0.0 1.0 0.0 1.0 1.0 4.0 0.0 1.0 0.0 1.0 0.0 0.0 4.0 1.0 0.0 0.0 1.0 1.0 1.0 4.0 0.
PAGE 449
Matrix value Input by matrix DSLEVM handle error condition ENDIF DIAGVL = DO 100 I CALL IF ( 4.0 = 1, NEQNS DSLEV1 (I,I,DIAGVL,GLOBAL,IER) IER .NE. 0 ) THEN handle error condition ENDIF 100 CONTINUE This matrix could also have been input strictly with DSLEVM by expanding COLSTR, ROWIND, and VALUES to include the diagonal. This action would have avoided the calls to DSLEV1. Including the diagonal entries in COLSTR and ROWIND is also acceptable to DSLEIM.
PAGE 450
DSLEVM Matrix value Input by matrix VALUES(1) = 1.0D0 VALUES(2) = 1.0D0 VALUES(3) = 1.0D0 VALUES(4) = 1.0D0 VALUES(5) = −1.0D0 VALUES(6) = −1.0D0 VALUES(7) = 1.0D0 VALUES(8) = −1.0D0 VALUES(9) = 1.0D0 VALUES(10) = −1.0D0 VALUES(11) = −1.0D0 VALUES(12) = −1.0D0 CALL DSLEVM (COLSTR,ROWIND,VALUES,GLOBAL,IER) IF ( IER .NE. 0 ) THEN handle error condition ENDIF DIAGVL = DO 100 I CALL IF ( 4.0 = 1, NEQNS DSLEV1 (I,I,DIAGVL,GLOBAL,IER) IER .NE.
PAGE 451
Overview 14 METIS Routines Overview This chapter explains how to use the HP MLIB METIS implementation that supports graph partitioning, sparse matrix ordering, and mesh partitioning. This implementation provides the full METIS V4.0.1 functionality. However, the routine names are different. HP MLIB METIS routine names have been prepended with mlib_ to avoid name conflict on applications and libraries that already contain their own local version of METIS.
PAGE 452
What you need to know to use these routines http://www-users.cs.umn.edu/~karypis/metis/metis/index.html What you need to know to use these routines METIS provides stand-alone library interfaces that can be directly called from C or Fortran programs. The next sections describe HP MLIB METIS subroutine interfaces and the data structures used by these subroutines.
PAGE 453
What you need to know to use these routines Table 14-1 Adjacency-matrix Graph with 5 Vertices 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0 Assuming that vertex numbering starts from 0 (C style), the above (symmetric) adjacency-matrix indicates that vertex 0 is connected to vertices 1 and 4; vertex 1 is connected to vertices 0, 2, 3, and 4; vertex 2 is connected to vertices 1 and 3; vertex 3 is connected to vertices 1, 2, and 4; and vertex 4 is connected to vertices 0, 1 and 3.
PAGE 454
What you need to know to use these routines The ordering of the nodes is not important for triangle and tetrahedra elements. However, in the case of hexahedra, the nodes for each element must be ordered on a specific order (refer to METIS User’s Guide). Partitioning Objectives The partitioning algorithms in METIS can be used to compute a balanced K-way partitioning that minimizes either the number of edges that straddle partitions (edgecut) or the total communication volume (totalv).
PAGE 455
Partitions a graph Graph Partitioning Routines Graph Partitioning Routines The following sections describe the graph partitioning routines included with METIS. Name mlib_METIS_PartGraphRecursive Partitions a graph Purpose This function is used to partition a graph into k equal-size parts using multilevel recursive bisection. It provides the functionality of the pmetis program. The objective of the partitioning is to minimize the edgecut.
PAGE 456
mlib_METIS_PartGraphRecursive numflag nparts options edgecut part Notes Partitions a graph Used to indicate which numbering scheme is used for the adjacency structure of the graph. numflag can take the following two values: 0—C-Style numbering is assumed that starts from 0 1—Fortran-style numbering is assumed that starts from 1 The number of parts to partition the graph. This is an array of 5 integers that is used to pass parameters for the various phases of the algorithm.
PAGE 457
Partitions a graph mlib_METIS_PartGraphKway Name mlib_METIS_PartGraphKway Partitions a graph Purpose This function is used to partition a graph into k equal-size parts using multilevel k-way partitioning algorithm. It provides the functionality of the kmetis program. The objective of the partitioning is to minimize the edgecut.
PAGE 458
mlib_METIS_PartGraphKway Partitions a graph options[0]=1 then the remaining 4 elements of options edgecut part Notes are interpreted as follows: options[1] Determines matching type. Possible values are: 1—Random Matching (RM) 2—Heavy-Edge Matching (HEM) 3—Sorted Heavy-Edge Matching (SHEM)(Default) Experiments have shown that both HEM and SHEM perform quite well. options[2] Determines the algorithm used during initial partioning.
PAGE 459
Partitions a graph mlib_METIS_PartGraphVKway Name mlib_METIS_PartGraphVKway Partitions a graph Purpose This function is used to partition a graph into k equal-size parts using multilevel k-way partitioning algorithm. The objective of the partitioning is to minimize the total communication volume.
PAGE 460
mlib_METIS_PartGraphVKway Partitions a graph options[0]=1 then the remaining 4 elements of options volume part 972 HP MLIB User’s Guide are interpreted as follows: options[1] Determines matching type. Possible values are: 1—Random Matching (RM) 2—Heavy-Edge Matching (HEM) 3—Sorted Heavy-Edge Matching (SHEM)(Default) Experiments have shown that both HEM and SHEM perform quite well. options[2] Determines the algorithm used during initial partioning.
PAGE 461
Partitions a graph mlib_METIS_mCPartGraphRecursive Name mlib_METIS_mCPartGraphRecursive Partitions a graph Purpose This function is used to partition a graph into k parts such that multiple balancing constraints are satisfied. It uses the multi-constraint multilevel recursive bisection algorithm. It provides the functionality of the pmetis program when it is used to compute a multi-constraint partitioning. The object of the partitioning is to minimize the edgecut.
PAGE 462
mlib_METIS_mCPartGraphRecursive options edgecut part 974 HP MLIB User’s Guide Partitions a graph This is an array of 5 integers that is used to pass parameters for the various phases of the algorithm. If options[0]=0 then default values are used. If options[0]=1 then the remaining 4 elements of options are interpreted as follows: options[1] Determines matching type.
PAGE 463
Partitions a graph Notes mlib_METIS_mCPartGraphRecursive This function should be used to partition a graph into a small number of partitions. If a large number of partitions is desired, the mlib_METIS_mCPartGraphKway routine should be used instead, as it produces somewhat better partitions (both in terms of quality and balance).
PAGE 464
mlib_METIS_mCPartGraphKway Partitions a graph Name mlib_METIS_mCPartGraphKway Partitions a graph Purpose This function is used to partition a graph into k parts such that multiple balancing constraints are satisfied. It uses the multi-constraint multilevel k-way partitioning algorithm. It provides the functionality of the kmetis program when it is used to compute a multi-constraint partitioning. The object of the partitioning is to minimize the edgecut.
PAGE 465
Partitions a graph mlib_METIS_mCPartGraphKway options edgecut part This is an array of 5 integers that is used to pass parameters for the various phases of the algorithm. If options[0]=0 then default values are used. If options[0]=1 then the remaining 4 elements of options are interpreted as follows: options[1] Determines matching type.
PAGE 466
mlib_METIS_mCPartGraphKway Notes Partitions a graph This function should be used to partition a graph into a large number of partitions (greater than 8). If a small number of partitions is desired, the mlib_METIS_mCPartGraphRecursive routine should be used instead, as it produces somewhat better partitions.
PAGE 467
Partitions a graph mlib_METIS_WPartGraphRecursive Name mlib_METIS_WPartGraphRecursive Partitions a graph Purpose This function is used to partition a graph into k parts using multilevel recursive bisection. The underlying algorithm is similar to the one used by mlib_METIS_PartGraphRecursive, but it can be used to compute a partitioning with prescribed partition weights.
PAGE 468
mlib_METIS_WPartGraphRecursive tpwgts options edgecut part Notes Partitions a graph This is an array containing nparts floating point numbers. For partition i, tpwgts[i] stores the fraction of the total weight that should be assigned to it. For example, for a 4-way partition the vector tpwgts[]=[0.2 0.2 0.4 0.2] will result in partitions 0, 1, and 3 having 20% of the weight and partition 2 having 40% of the weight. Note that the numbers in tpwgts should add up to 1.0.
PAGE 469
Partitions a graph mlib_METIS_WPartGraphKway Name mlib_METIS_WPartGraphKway Partitions a graph Purpose This function is used to partition a graph into k parts using multilevel recursive bisection. The underlying algorithm is similar to the one used by mlib_METIS_PartGraphKway, but it can be used to compute a partitioning with prescribed partition weights.
PAGE 470
mlib_METIS_WPartGraphKway tpwgts options edgecut part 982 HP MLIB User’s Guide Partitions a graph This is an array containing nparts floating point numbers. For partition i, tpwgts[i] stores the fraction of the total weight that should be assigned to it. For example, for a 4-way partition the vector tpwgts[]=[0.2 0.2 0.4 0.2] will result in partitions 0, 1, and 3 having 20% of the weight and partition 2 having 40% of the weight. Note that the numbers in tpwgts should add up to 1.0.
PAGE 471
Partitions a graph Notes mlib_METIS_WPartGraphKway This function should be used to partition a graph into a large number of partitions (greater than 8). If a small number of partitions is desired, the mlib_METIS_WPartGraphRecursive routine should be used instead, as it produces somewhat better partitions.
PAGE 472
mlib_METIS_WPartGraphVKway Partitions a graph Name mlib_METIS_WPartGraphVKway Partitions a graph Purpose This function is used to partition a graph into k parts using multilevel recursive bisection. The underlying algorithm is similar to the one used by mlib_METIS_PartGraphKway, but it can be used to compute a partitioning with prescribed partition weights.
PAGE 473
Partitions a graph mlib_METIS_WPartGraphVKway tpwgts options volume part This is an array containing nparts floating point numbers. For partition i, tpwgts[i] stores the fraction of the total weight that should be assigned to it. For example, for a 4-way partition the vector tpwgts[]=[0.2 0.2 0.4 0.2] will result in partitions 0, 1, and 3 having 20% of the weight and partition 2 having 40% of the weight. Note that the numbers in tpwgts should add up to 1.0.
PAGE 474
Mesh Partitioning Routines Partitions a mesh Mesh Partitioning Routines The following sections describe the mesh partitioning routines included with METIS. Name mlib_METIS_PartMeshNodal Partitions a mesh Purpose This function is used to partition a mesh into k parts. It provides the functionality of the partnmesh program.
PAGE 475
Partitions a mesh mlib_METIS_PartMeshNodal edgecut epart npart Notes Upon successful completion, this variable stores the number of edges that are cut by the partition in the nodal graph. This is the vector of size ne that upon successful completion stores the partition vector for the elements of the mesh. The numbering of this vector starts from either 0 or 1, depending on the value of numflag.
PAGE 476
mlib_METIS_PartMeshDual Partitions a mesh Name mlib_METIS_PartMeshDual Partitions a mesh Purpose This function is used to partition a mesh into k equal-size parts. It provides the functionality of the partdmesh program.
PAGE 477
Partitions a mesh mlib_METIS_PartMeshDual npart Notes This is a vector of size nn that upon successful completion sotres the partition vector for the nodes of the mesh. The numbering of this vector starts from either 0 or 1, depending on the value of numflag. This function converts the mesh into a dual graph and then uses mlib_METIS_PartGraphKway to compute a partitioning of the elements. This partitioning of elements is then used to compute a partitioning for the nodes.
PAGE 478
Sparse Matrix Reordering Routines Computes fill reducing orderings of sparse matrices Sparse Matrix Reordering Routines The following sections describe the sparse matrix reording routines included with METIS. Name mlib_METIS_EdgeND Computes fill reducing orderings of sparse matrices Purpose This function computes fill reducing orderings of sparse matrices using the multilevel nested dissection algorithm. It provides the functionality of the oemetis program.
PAGE 479
Computes fill reducing orderings of sparse matrices mlib_METIS_EdgeND 1—Random Matching (RM) 2—Heavy-Edge Matching (HEM) 3—Sorted Heavy-Edge Matching (SHEM)(Default) Experiments have shown that both HEM and SHEM perform quite well. options[2] Determines the algorithm used during initial partioning. Possible values are: 1—Region Growing (Default) options[3] Determines the algorithm used for refinement.
PAGE 480
mlib_METIS_NodeND Computes fill reducing orderings of sparse matrices Name mlib_METIS_NodeND Computes fill reducing orderings of sparse matrices Purpose This function computes fill reducing orderings of sparse matrices using the multilevel nested dissection algorithm. It provides the functionality of the onmetis program.
PAGE 481
Computes fill reducing orderings of sparse matrices mlib_METIS_NodeND options[2] Determines the algorithm used during initial partioning. Possible values are: 1—Edge-based region growing (Default) 2—Node-based region growing options[3] Determines the algorithm used for refinement. Possible values are: 1—Two-sided node FM refinement 2—One-sided node FM refinement (Default) One-sided FM refinement is faster than two-sided, but in some cases two-sided refinement may produce better orderings.
PAGE 482
mlib_METIS_NodeND Computes fill reducing orderings of sparse matrices perm, iperm Notes These are vectors, each of size n. Upon successful completion, they store the fill-reducing permutation and inverse-permutation. Let A be the original matrix and A' be the permuted matrix. The arrays perm and iperm are defined as follows: Row (column) i of A' is the perm[i] row (column) of A, and row (column) i of A is the iperm[i] row (column) of A'.
PAGE 483
Computes fill reducing orderings of sparse matrices mlib_METIS_NodeWND Name mlib_METIS_NodeWND Computes fill reducing orderings of sparse matrices Purpose This function computes fill reducing orderings of sparse matrices using the multilevel nested dissection algorithm. It is similar to mlib_METIS_NodeND, but it assumes that the compression has been performed prior to calling this routine. It is particularly suited for ordering very large matrices in which the compressed matrix is previously known.
PAGE 484
mlib_METIS_NodeWND Computes fill reducing orderings of sparse matrices and RM is slower, but feel free to experiment with the other matching schemes. options[2] Determines the algorithm used during initial partioning. Possible values are: 1—Edge-based region growing (Default) 2—Node-based region growing options[3] Determines the algorithm used for refinement.
PAGE 485
Converts a mesh into a nodal graph Auxiliary Routines Auxiliary Routines The following sections describe the auxiliary routines included with METIS. Name mlib_METIS_MeshToNodal Converts a mesh into a nodal graph Purpose This function is used to convert a mesh into a nodal graph. It provides the function of the mesh2nodal program.
PAGE 486
mlib_METIS_MeshToNodal nxadj, nadjncy Notes Converts a mesh into a nodal graph These arrays store the adjacency structure of the nodal graph. The user must provide arrays that are sufficiently large to store the graph. The size of array nxadj is nn+1 where the size of nadjncy depends on the type of the mesh.
PAGE 487
Converts a mesh into a dual graph mlib_METIS_MeshToDual Name mlib_METIS_MeshToDual Converts a mesh into a dual graph Purpose This function is used to convert a mesh into a dual graph. It provides the function of the mesh2nodal program.
PAGE 488
mlib_METIS_MeshToDual Notes Converts a mesh into a dual graph The dual graph is defined as the graph in which each vertex of the graph corresponds to an element in the mesh, and two vertices are connected by an edge if the corresponding elements share a face.
PAGE 489
Estimates the amount of memory that will be used mlib_METIS_EstimateMemory Name mlib_METIS_EstimateMemory Estimates the amount of memory that will be used Purpose This function is used to estimate the amount of memory that will be used. Even though METIS dynamically allocated the amount of memory that it need, this function can be useful in determining if the amount of memory in the system is sufficient for METIS.
PAGE 490
mlib_METIS_EstimateMemory 1002 HP MLIB User’s Guide Estimates the amount of memory that will be used
PAGE 491
Overview 15 Sparse Eigenvalues and Eigenvectors Overview This chapter describes state-of-the-art software for solving sparse symmetric and generalized symmetric eigenvalue problems. This package of subprograms provides efficient use of the Hewlett-Packard architecture in conjunction with powerful techniques for using the sparsity of the problem to reduce the cost of solution. Accuracy is assured through appropriate numerical techniques.
PAGE 492
Associated documentation Associated documentation The following documents provide supplemental material for this chapter: Grimes, R.G., J.G. Lewis and H.D. Simon. “Eigenvalue Problems and Algorithms in Structural Engineering.” Large Scale Eigenvalue Problems. North-Holland. 1986. pp. 81-95. Grimes, R.G., J.G. Lewis and H.D. Simon. “The Implementation of a Block Shifted and Inverted Lanczos Algorithm for Eigenvalue Problems in Structural Engineering.” Boeing Computer Services Technical Report, ETA-TR-39.
PAGE 493
What you need to know to use these subprograms Unfortunately, there is no general and sparse algorithm for determining the necessary scalars α and β for a general pair of matrices A and B. The transformations used in this package require explicit knowledge of C, so this package is restricted to cases where the user can explicitly transform the problem to be in terms of C, or to the two standard cases where C is obvious; that is, when B or A alone is a positive definite matrix.
PAGE 494
What you need to know to use these subprograms Consider, for example, the following matrix: 11 0 31 41 0 0 0 22 32 0 52 0 13 23 33 0 53 0 14 0 0 44 54 0 0 25 35 45 55 0 0 0 0 0 0 66 This matrix could be represented in the format described above by three arrays, irow, jcol, and mxvalu, as shown in Figure 15-1.
PAGE 495
What you need to know to use these subprograms This sparse matrix format, known as the column pointer, row index representation, is illustrated in Figure 15-2. Figure 15-2 Column Pointer, Row Index Sparse Matrix Representation colstr = 1 rowind = 1 3 4 2 3 5 3 5 4 5 5 6 mxvalu = 11 31 41 22 32 52 33 53 44 54 55 66 4 7 9 11 12 13 _ There are three ways to communicate the coefficient matrix or matrices to the package.
PAGE 496
What you need to know to use these subprograms For notational purposes, assume that the eigenvalues λi are ordered λ1 ≤ λ2 ≤ ... ≤ λn. The results from subroutines in this section normally form a contiguous set of eigenvalues, say, λi, λi+1,..., λj. You must designate which subset you want through the following kinds of constraints, which usually relate to the application-specific meaning of the eigenvalues: 1.
PAGE 497
What you need to know to use these subprograms an interval specification [a,b] that lies to one side of zero (for example, [1.0,100.0], [−9.99,−1.0] or [0.0, 25.0], but not [−100.0,100.0]). This restriction on the interval does not apply to the other descriptors (all, lowest, or nearest). When you want the algebraically least or greatest eigenvalues, the problem must be transformed into finding the eigenvalues nearest one or the other endpoint of the interval.
PAGE 498
What you need to know to use these subprograms The eigenvalue location counts derived from the inertias may differ from what has been computed. In such cases, a warning is issued. The algorithm attempts to avoid such situations when creating trust regions but often must use user-specified interval bounds for its final trust region. Specifying eigenvalues in an interval where the endpoints are very close to eigenvalues is likely to cause warnings.
PAGE 499
What you need to know to use these subprograms The package allows the retrieval of these otherwise discarded eigenvectors if the user wants them.
PAGE 500
What you need to know to use these subprograms Output controls This package differs from most library subroutines in providing optional printed or printable output. The amount of output is controlled by an integer variable, msglvl, specifying the message level. Setting msglvl ≤ 0 suppresses all printed messages, including error messages, and thus should generally be avoided.
PAGE 501
What you need to know to use these subprograms Figure 15-3 Paths of Control—Sparse Eigenvalues and Eigenvectors Return Results One-call usage routine Initialization Matrix Structure Input Reordering and Symbolic Factorization Matrix Value Input Eigenextraction One-call usage routine: DSEVE1 Initialization: DSEVIN Matrix Structure Input: DSEVIM DSEVIC DSEVIF DSEVIE DSEVI1 Reordering and Symbolic Factorization: DSEVOR Matrix Value Input: DSEVVM DSEVVC DSEVVD DSEVVE DSEVV1 Eigenextraction: DSEV
PAGE 502
What you need to know to use these subprograms Figure 15-4 Paths of Control(continued) Utilities: DSEVCK DSEVDA DSEVOC DSEVPS DSEVRS DSEVSV To use the general interface, the user must call a sequence of subprograms that perform the following operations: • Initialization • Input of matrix structure • Reordering • Input of values of matrix entries • Eigenextraction • Retrieval of eigenvalues and vectors It is possible to use this package to solve several different eigenvalue problems associated with the s
PAGE 503
Sample program Sample program As illustrated in Figure 15-3, there are many possible paths of control through this sparse eigenvalue package. The following sample program shows one possible path through the package, where the problem to be solved is the ordinary eigenvalue problem Ax = λx. It is intended to demonstrate that the package is not as difficult to use as Figure 15-3 implies.
PAGE 504
Sample program C C C ... INPUT MATRIX VALUES DO 200 K = 1, NNZERO I = IROW(K) J = JCOL(K) VALUE = MXVALU(K) CALL DSEVV1 (’A’,I,J,VALUE,GLOBAL,IER) IF ( IER .NE. 0 ) GO TO 8000 200 CONTINUE C C C ... COMPUTE THE EIGENVALUES AND EIGENVECTORS CALL DSEVES (10,’L’,’O’,.FALSE.,DUMMY,.FALSE.,DUMMY, DUMMY,NFOUND,NDISCD,GLOBAL,IER,WARNNG) IF ( IER .NE. 0 ) GO TO 8000 1 C C C ... CHECK ACCURACY CALL DSEVCK (.TRUE.,ORTWRN,DISCRP,GLOBAL,IER) IF ( IER .NE.
PAGE 505
Subprograms for sparse eigenvalues and eigenvectors Subprograms for sparse eigenvalues and eigenvectors The following sections describe subprograms included with VECLIB for solving sparse symmetric and generalized symmetric eigenvalue problems.
PAGE 506
DSEVCK Check accuracy of results Name DSEVCK Check accuracy of results Purpose This subprogram performs an a posteriori check on the accuracy of the computed eigenvalues and eigenvectors.
PAGE 507
Check accuracy of results DSEVCK ier = 0 ier = −800 ier = −801 ier = −802 ier = −804 Normal return. Incorrect processing path. Error in storage allocation. Error in retrieving eigenvectors. System error. Notes Additional diagnostic output from this subprogram is written onto the Fortran logical unit specified in the last call to DSEVIN or DSEVOC. The amount of output is independent of the message level value set in the last call to DSEVIN or DSEVOC.
PAGE 508
DSEVDA Deallocate working storage Name DSEVDA Deallocate working storage Purpose This subprogram deallocates working storage at the end of processing. If the program using the package is going to continue execution when use of the package is completed, then it is recommended that the user deallocate the dynamically allocated working storage to reduce system impact.
PAGE 509
One-call usage DSEVE1 Name DSEVE1 One-call usage Purpose This subprogram provides a one-call usage for the sparse eigenvalue package. If this subroutine does not fit your needs, a more flexible interface is available by using the other subroutines in the package described in this chapter.
PAGE 510
DSEVE1 One-call usage output 1022 HP MLIB User’s Guide Fortran logical unit number for file to which printable output will be written.
PAGE 511
One-call usage DSEVE1 acolst arowin avalue bmxtyp acolst(j) gives the address in arowin of the first nonzero in the lower triangular part of column j of the matrix A. All of the nonzeros for column j are found in ascending order in arowin(acolst(j)), arowin(acolst(j)+1), ..., arowin(acolst(j+1)−1). acolst(norder+1) must be set to one greater than the total number of nonzeros, annzer, in the lower triangular part of the matrix.
PAGE 512
DSEVE1 One-call usage bcolst browin bvalue neigvl which 1024 HP MLIB User’s Guide If B is specified as a general sparse matrix with structure different than A, then bcolst(j) gives the ordinal in browin of the first nonzero in the lower triangular part of column j of the matrix B. All of the nonzeros for column j are found in ascending order in browin(bcolst(j)), browin(bcolst(j)+1), ..., browin(bcolst(j+1)−1).
PAGE 513
One-call usage DSEVE1 pbtype Character string indicating type of problem. Only the first character is significant, but longer input can be used for clarity (for example, ’Vibration’ or ’Ordinary’). ’V’ or ’v’ Generalized symmetric (vibration) problem Kx = λMx, with M positive semi-definite. ’B’ or ’b’ Generalized symmetric (buckling) problem Kx = λKδx, with K positive semi definite and Kδ possibly ’O’ or ’o’ lfinit lftend rfinit rhtend center Output nfound ndiscd indefinite.
PAGE 514
DSEVE1 One-call usage global ier 1026 HP MLIB User’s Guide Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package. Status response: ier = 0 Normal return. ier < 0 Fatal error, no useful results returned. ier > 0 Fatal error, but some eigenvalues and vectors computed. ier = −101 Error in dynamic storage allocation during matrix structure input. ier = −102 norder ≤ 0.
PAGE 515
One-call usage DSEVE1 ier = ±724 warnng QL iteration did not converge. ier = ±725 Number of eigenvalues computed exceeded storage limitations. ier = ±726 Internal error involving access of Lanczos recurrence vectors. ier = ±727 Singular Value Decomposition did not converge. ier = ±728 Gram-Schmidt reorthogonalization did not converge. ier = ±729 Three numeric factorizations in a row failed. Probable cause is that this is not a symmetric generalized eigenproblem.
PAGE 516
DSEVE1 One-call usage warnng = 1xx Interval had to be expanded because one or both endpoints were very close to eigenvalues. Notes Actual character arguments in a subroutine call may be longer than corresponding dummy arguments. Therefore, readability of the CALL statement may be improved by coding the pbtype argument as ’vibration’ or ’buckling’. Example In a small (order 6) sparse generalized eigenvalue problem, the matrices A and B are as follows: A= 4.0 0.0 1.0 1.0 0.0 0.0 0.0 4.0 1.0 0.0 1.
PAGE 517
One-call usage DSEVE1 AROWIN(7) = 3 AROWIN(8) = 5 AROWIN(9) = 4 AROWIN(10) = 5 AROWIN(11) = 6 AROWIN(12) = 5 AROWIN(13) = 6 AVALUE(1) = 4.0 AVALUE(2) = 1.0 AVALUE(3) = 1.0 AVALUE(4) = 4.0 AVALUE(5) = 1.0 AVALUE(6) = 1.0 AVALUE(7) = 4.0 AVALUE(8) = 1.0 AVALUE(9) = 4.0 AVALUE(10) = 1.0 AVALUE(11) = 1.0 AVALUE(12) = 4.0 AVALUE(13) = 1.0 BVALUE(1) = 1.0 BVALUE(2) = 1.0 BVALUE(3) = 1.0 BVALUE(4) = 1.0 BVALUE(5) = 1.0 BVALUE(6) = 1.0 BVALUE(7) = 1.0 BVALUE(8) = 1.0 BVALUE(9) = 1.0 BVALUE(10) = 1.
PAGE 518
DSEVES Eigenextraction Name DSEVES Eigenextraction Purpose This subprogram computes selected eigenvalues and eigenvectors of a sparse symmetric matrix A or of a sparse symmetric matrix pencil (A,B). The matrix or matrices have already been processed through the structure input, reordering, and value input phases. This is the standard interface; subroutine DSEVEX provides additional control parameters.
PAGE 519
Eigenextraction DSEVES ’B’ or ’b’ Generalized symmetric (buckling) problem Kx = λKδx, with K positive semi definite and Kδ possibly ’O’ or ’o’ Chapter 15 indefinite. Ordinary symmetric eigenproblem Kx = λx.
PAGE 520
DSEVES Eigenextraction lfinit lftend rfinit rhtend center If .TRUE., the value of the argument lftend is to be used as a restriction on the location of computed eigenvalues. If .FALSE., no lower bound is placed on the value of computed eigenvalues; equivalently, the left endpoint is negative infinity. Left endpoint of an interval in which the desired eigenvalues lie; not used if lfinit is .FALSE. If .TRUE.
PAGE 521
Eigenextraction DSEVES ier = ±703 ier = ±704 ier = ±705 ier = ±706 ier = ±707 ier = ±710 ier = ±720 ier = ±721 ier = ±722 ier = ±723 ier = ±724 ier = ±725 ier = ±726 ier = ±727 ier = ±728 Illegal specification for pbtype. Illegal specification for which. which = highest for an interval that spans zero. Insufficient dynamic memory for factorization. Internal error during factorization. Factorization save error in finite interval analysis.
PAGE 522
DSEVES Eigenextraction ier = ±741 warnng Notes Error in deallocation of storage for second set of Lanczos recurrence vectors. Warning status. warnng includes several different warnings encoded in a three-digit decimal integer: low-order digit: warnng = xx0 Normal return. warnng = xx1 Fewer modes computed than requested (interval does not include requested number). warnng = xx2 Fewer modes computed than requested because eigenvalues are different in size.
PAGE 523
Eigenextraction Example 1 DSEVES Compute the lowest 10 eigenvalues of a structural engineering vibration analysis, where matrix A is the stiffness matrix K, matrix B is the mass matrix M, and matrices K and M are positive definite or semi definite. INTEGER*4 NFOUND,NDISCD,IER,WARNNG CALL DSEVES (10,’LOWEST’,’VIBRATION’,.FALSE.,-1.0D30, .FALSE,.1.0D30,1.0D30,NFOUND,NDISCD,GLOBAL,IER, WARNNG) IF ( IER .NE. 0 ) THEN handle error condition ENDIF IF ( IER .GE.
PAGE 524
DSEVEX Eigenextraction Name DSEVEX Eigenextraction Purpose This subprogram computes selected eigenvalues and eigenvectors of a sparse symmetric matrix A or of a sparse symmetric matrix pencil (A,B). The matrix or matrices have already been processed through the structure input, reordering, and value input phases. This is the sophisticated interface; subroutine DSEVES is simpler to use.
PAGE 525
Eigenextraction DSEVEX ’H’ or ’h’ The highest (greatest magnitude) eigenvalues. ’C’ or ’c’ or ’N’ or ’n’ ’A’ or ’a’ pbtype The eigenvalues nearest center. All eigenvalues in the specified interval. Character string indicating type of problem: ’V’ or ’v’ Generalized symmetric (vibration) problem Kx = λMx, with M positive semi definite.
PAGE 526
DSEVEX Eigenextraction • A block size of 1 is usually less efficient than a block size of 2. • Block sizes larger than 10 are usually less efficient than smaller block sizes. tolact shfscl nusrvc usrsvc Updated global 1038 HP MLIB User’s Guide • If mxbksz is zero or negative, a default block size optimized to the computer system will be chosen. In all cases, the Lanczos algorithm may find it necessary to reduce the block size to fit the problem into the storage available.
PAGE 527
Eigenextraction Output DSEVEX nfound ndiscd ier The number of eigenvalues computed and confirmed to meet the problem description constraints by inertia (Sturm sequence) computations. The number of other eigenvalues computed during the course of the computation but discarded because they were outside of the specifications for eigenvalues or because they were not confirmed by inertia computations. Status response: ier = 0 Normal return. ier < 0 Fatal error, no useful results returned.
PAGE 528
DSEVEX Eigenextraction ier = ±725 warnng 1040 HP MLIB User’s Guide Number of eigenvalues computed exceeded storage limitations. ier = ±726 Internal error involving access of Lanczos recurrence vectors. ier = ±727 Singular Value Decomposition did not converge. ier = ±728 Gram-Schmidt reorthogonalization did not converge. ier = ±729 Three numeric factorizations in a row failed. Probable cause is that this is not a symmetric generalized eigenproblem.
PAGE 529
Eigenextraction DSEVEX hundred’s digit: warnng = 0xx Normal return. warnng = 1xx Interval had to be expanded because one or both endpoints were very close to eigenvalues.
PAGE 530
DSEVEX Eigenextraction Actual character arguments in a subroutine call may be longer than corresponding dummy arguments. Therefore, readability of the CALL statement may be improved by coding the pbtype argument as ’vibration’ or ’buckling’. Example 1 Compute the lowest 10 eigenvalues of a structural engineering vibration analysis, where the matrix A is the stiffness matrix K, the matrix Bis the mass matrix M, and both matrices K and M are positive definite or semi-definite.
PAGE 531
Matrix structure input by single entry DSEVI1 Name DSEVI1 Matrix structure input by single entry Purpose This subprogram adds a single entry in the (irow, jcol) position in the lower triangle to the set of known nonzeros for one of the sparse matrices.
PAGE 532
DSEVI1 Output Notes Matrix structure input by single entry ier Status response: ier = 0 Normal return. ier = −100 Incorrect processing path; DSEVIN not called or matrix input already finished. ier = −101 Error in dynamic storage allocation. ier = −104 Illegal value for jcol. ier = −105 Illegal value for irow. ier = −109 Illegal value for matrix designator.
PAGE 533
Matrix structure input by column DSEVIC Name DSEVIC Matrix structure input by column Purpose This subprogram adds a list of indices in a single column in the lower triangle to the set of known nonzeros for one of the sparse matrices.
PAGE 534
DSEVIC Output Notes Matrix structure input by column ier Status response: ier = 0 Normal return. ier = −100 Incorrect processing path; DSEVIN not called or matrix input already finished. ier = −101 Error in dynamic storage allocation. ier = −103 Illegal value for nzcol. ier = −104 Illegal value for jcol; either out of range or out of order. ier = −106 Illegal value for at least one entry in jrowin or entries out of order. ier = −109 Illegal value for matrix designator.
PAGE 535
Matrix structure input by finite element DSEVIE Name DSEVIE Matrix structure input by finite element Purpose This subprogram adds indices to the set of known nonzeros in the lower triangle of one of the sparse matrices corresponding to a finite element or clique.
PAGE 536
DSEVIE Output Notes Matrix structure input by finite element ier Status response: ier = 0 Normal return. ier = −100 Incorrect processing path; DSEVIN not called or matrix input already finished. ier = −101 Error in dynamic storage allocation. ier = −107 Illegal value for nnode. ier = −108 Illegal value for at least one entry in nodlst. ier = −109 Illegal value for matrix designator.
PAGE 537
End of matrix structure input DSEVIF Name DSEVIF End of matrix structure input Purpose This subprogram indicates the end of structure input for matrices in the sparse eigenvalue problem. DSEVIF is used only if subprograms DSEVI1 and/or DSEVIE were the mechanism by which the structure was input. Usage VECLIB: INTEGER*4 ier REAL*8 global(150) CALL DSEVIF(global, ier) VECLIB8: INTEGER*8 ier REAL*8 global(150) CALL DSEVIF(global, ier) Updated global Global communications array for this problem.
PAGE 538
DSEVIM Matrix structure input by matrix Name DSEVIM Matrix structure input by matrix Purpose This subprogram specifies the locations of all of the nonzeros in the lower triangle of one of the matrices.
PAGE 539
Matrix structure input by matrix Updated global DSEVIM Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package.
PAGE 540
DSEVIM Output Notes Matrix structure input by matrix ier Status response: ier = 0 Normal return. ier = −100 Incorrect processing path; DSEVIN not called or matrix input already finished. ier = −101 Error in dynamic storage allocation. ier = −106 Illegal value for at least one entry in colstr (not increasing or negative) or invalid entries in rowind (entries out of order within a column, or out of the lower triangle). ier = −109 Illegal value for matrix designator.
PAGE 541
Matrix structure input by matrix Example DSEVIM In a small (order 6) sparse generalized eigenvalue problem, B is a mass matrix with the following structure: x 0 x x 0 0 0 x x 0 x 0 x x x 0 x 0 x 0 0 x x 0 0 x x x x 0 INTEGER*4 REAL*8 COLSTR(1) COLSTR(2) COLSTR(3) COLSTR(4) COLSTR(5) COLSTR(6) COLSTR(7) COLSTR(7),ROWIND(6),IER GLOBAL(150) = 1 = 3 = 5 = 6 = 7 = 7 = 7 ROWIND(1) ROWIND(2) ROWIND(3) ROWIND(4) ROWIND(5) ROWIND(6) = = = = = = 0 0 0 0 0 0 3 4 3 5 5 5 CALL DSEVIM (’B’,COLSTR,ROWIND,GLO
PAGE 542
DSEVIN Initialize sparse eigenvalues/eigenvectors Name DSEVIN Initialize sparse eigenvalues/eigenvectors Purpose This subprogram provides information to start the sparse eigenvalue package.
PAGE 543
Initialize sparse eigenvalues/eigenvectors DSEVIN msglvl = 3 output Output global ier First stage of debugging output. msglvl = 4 Complete debugging output. Fortran logical unit number to which all printable output will be written. Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package. Status response: ier = 0 Normal return. ier = −101 Error in dynamic storage allocation. ier = −102 norder not positive.
PAGE 544
DSEVOC Output control Name DSEVOC Output control Purpose This subprogram can be called at any point after subprogram DSEVIN to alter the output message level and the Fortran output unit number for message output.
PAGE 545
Reordering and symbolic factorization DSEVOR Name DSEVOR Reordering and symbolic factorization Purpose This subprogram reorders the matrix whose pattern of nonzeros is given by the locations where either A or B is nonzero, such that an efficient sparse factorization of such a matrix can be obtained. DSEVOR then constructs data structures required for the sparse factorization.
PAGE 546
DSEVPS Print statistics Name DSEVPS Print statistics Purpose This subprogram prints available statistics about the original matrices, amount of work space in use, amount required for the next phase, and maximum amount used thus far. Also, this subprogram prints storage and arithmetic requirements, CPU time used, and computational rate for Cholesky factorization. The amount of information printed depends on the stage of execution.
PAGE 547
Return eigenvalue/eigenvector results DSEVRC Name DSEVRC Return eigenvalue/eigenvector results Purpose This subprogram retrieves all or selected eigenvalues and eigenvectors after eigenextraction.
PAGE 548
DSEVRC Return eigenvalue/eigenvector results norder is the matrix order as specified in the call to DSEVIN. Updated global Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package. Output evalue List of selected eigenvalues. Corresponding array of eigenvectors.
PAGE 549
Return eigenvalue results DSEVRL Name DSEVRL Return eigenvalue results Purpose This subprogram retrieves all or selected eigenvalues after eigenextraction.
PAGE 550
DSEVRL Updated Return eigenvalue results global 1062 HP MLIB User’s Guide Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package.
PAGE 551
Return eigenvalue results Output evalue ier Example 1 DSEVRL List of eigenvalues. Status response: ier = 0 Normal return. ier = −800 Incorrect processing path. ier = −801 levalu not large enough. ier = −803 Input error, indices out of range. Retrieve all of the final trust region eigenvalues. INTEGER*4 NEVALS,NFOUND,IER REAL*8 EVALUE(NEVALS),GLOBAL(150) NEVALS = 100 LDEVCT = 1000 CALL DSEVRL (NEVALS,EVALUE,1,NFOUND,GLOBAL,IER) IF ( IER .NE.
PAGE 552
DSEVRS Restore problem state from a savefile Name DSEVRS Restore problem state from a savefile Purpose This subprogram restores the working problem from the state stored on the user-specified I/O file by subprogram DSEVSV. Usage VECLIB: INTEGER*4 svfile, ier REAL*8 global(150) CALL DSEVRS(svfile, global, ier) VECLIB8: INTEGER*8 svfile, ier REAL*8 global(150) CALL DSEVRS(svfile, global, ier) Input svfile Fortran logical unit number of a file containing the saved problem state as created by DSEVSV.
PAGE 553
Save problem state to a savefile DSEVSV Name DSEVSV Save problem state to a savefile Purpose This subprogram saves the current problem for later restart at its current state. The global communication array and all working storage are written onto the user-specified I/O file using standard Fortran unformatted sequential write statements. The file is rewound before and after use.
PAGE 554
DSEVV1 Matrix value input by single entry Name DSEVV1 Matrix value input by single entry Purpose This subprogram adds to the value of the entry in the (irow, jcol) position in the lower triangle of one of the sparse matrices.
PAGE 555
Matrix value input by single entry DSEVV1 ier = −401 ier = −402 ier = −403 ier = −404 ier = −405 ier = −406 Notes Error in dynamic storage allocation. Subscript pair (irow, jcol) is not in lower triangle of matrix. Illegal value for matrix designator. Subscript pair (irow, jcol) was not specified in structure input. No room for value. The B matrix was specified as diagonal. Cannot add off-diagonal nonzero value. The B matrix was specified as an identity matrix. Cannot add nonzero value.
PAGE 556
DSEVVC Matrix value input by column Name DSEVVC Matrix value input by column Purpose This subprogram adds to the values of a list of nonzero entries in the lower triangle of a single column of one of the sparse matrices.
PAGE 557
Matrix value input by column Output Notes ier DSEVVC Status response: ier = 0 Normal return. ier = −400 Incorrect processing path; DSEVOR not called or eigenextraction already begun. ier = −401 Error in dynamic storage allocation. ier = −402 Illegal value for either nzcol or jcol. ier = −403 Illegal value for matrix designator. ier = −404 At least one subscript pair (jrowin(j),jcol) was not specified in structure input. No room for value. ier = −405 The B matrix was specified as diagonal.
PAGE 558
DSEVVC Matrix value input by column INTEGER*4 J,NZCOL,JROWIN(100),IER REAL*8 VALUES(100),GLOBAL(150) J = 4519 NZCOL = 6 JROWIN(1) JROWIN(2) JROWIN(3) JROWIN(4) JROWIN(5) JROWIN(6) = = = = = = 4519 4520 4521 6000 6002 6004 VALUES(1) VALUES(2) VALUES(3) VALUES(4) VALUES(5) VALUES(6) = = = = = = 1.0D0 1.0D0 1.0D0 1.0D0 1.0D0 1.0D0 CALL DSEVVC (’A’,J,NZCOL,JROWIN,VALUES,GLOBAL,IER) IF ( IER .NE.
PAGE 559
Matrix value input to main diagonal DSEVVD Name DSEVVD Matrix value input to main diagonal Purpose This subprogram adds to the values of corresponding entries on the main diagonal of one of the sparse matrices.
PAGE 560
DSEVVD Matrix value input to main diagonal ier = −404 At least one subscript pair (jrowin(j),jcol) was not specified in ier = −406 Notes structure input. No room for value. The B matrix was specified as an identity matrix. Cannot add nonzero value. Calls to DSEVV1, DSEVVC, DSEVVD, DSEVVE and DSEVVM can be intermixed. If the matrix entries of the main diagonal are available as a vector, using DSEVVD is more efficient than using DSEVV1.
PAGE 561
Matrix value input by finite element DSEVVE Name DSEVVE Matrix value input by finite element Purpose This subprogram adds to the values of a set of nonzero entries in the lower triangle of one of the sparse matrices corresponding to a finite element or clique.
PAGE 562
DSEVVE Matrix value input by finite element Updated global Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package. Output ier Status response: ier = 0 Normal return. ier = −400 Incorrect processing path; DSEVOR not called or eigenextraction already begun. ier = −401 Error in dynamic storage allocation. ier = −402 Illegal value for either nnode or in nodlst. ier = −403 Illegal value for matrix designator.
PAGE 563
Matrix value input by finite element Example DSEVVE Rows and columns 345, 346, 347, and 989 form a small dense submatrix of A. Add the value 1.0 to the values in the positions consisting of all pairs of numbers from this set to the list of nonzeros in matrix A. INTEGER*4 NNODE,NODLST(10),IER REAL*8 ELMMTX(10,10),GLOBAL(150) NNODE = 4 NODLST(1) NODLST(2) NODLST(3) NODLST(4) = = = = 345 346 347 989 DO 200 K = 1, NNODE DO 100 L = K, NNODE ELMMTX(K,L) = 1.
PAGE 564
DSEVVM Matrix value input by matrix Name DSEVVM Matrix value input by matrix Purpose This subprogram adds values to all of the entries in the lower triangle of one of the sparse matrices.
PAGE 565
Matrix value input by matrix Updated global DSEVVM Global communications array for this problem. This array must be passed, untouched by the user, to successive subroutines in this package.
PAGE 566
DSEVVM Output Notes Matrix value input by matrix ier Status response: ier = 0 Normal return. ier = −400 Incorrect processing path; DSEVOR not called or eigenextraction already begun. ier = −401 Error in dynamic storage allocation. ier = −402 Illegal value for either nnode or in nodlst. ier = −403 Illegal value for matrix designator. ier = −404 At least one subscript pair was not specified in structure input. No room for value. ier = −405 The B matrix was specified as diagonal.
PAGE 567
Matrix value input by matrix DSEVVM INTEGER*4 COLSTR(7),ROWIND(6),IER REAL*8 VALUES(6),DIAGNL(6),GLOBAL(150) COLSTR(1) COLSTR(2) COLSTR(3) COLSTR(4) COLSTR(5) COLSTR(6) COLSTR(7) = = = = = = = 1 3 5 6 7 7 7 ROWIND(1) ROWIND(2) ROWIND(3) ROWIND(4) ROWIND(5) ROWIND(6) = = = = = = 3 4 3 5 6 6 VALUES(1) VALUES(2) VALUES(3) VALUES(4) VALUES(5) VALUES(6) = = = = = = 1.0 1.0 1.0 1.0 1.0 1.0 CALL DSEVVM (’B’,COLSTR,ROWIND,VALUES,GLOBAL,IER) IF ( IER .NE.
PAGE 568
DSEVVM 1080 HP MLIB User’s Guide Matrix value input by matrix
PAGE 569
Overview 16 Introduction to BCSLIB-EXT Overview This chapter provides a brief overview of HP’s implementation of BCSLIB-EXT functionality. BCSLIB-EXT is a set of subroutines for solving large sparse linear algebra problems and it has been developed by Boeing Phantom Works Mathematics and Computing Technology (http://www.boeing.com/phantom).
PAGE 570
Chapter objectives Chapter objectives After reading this chapter you will: • Know how to access BCSLIB-EXT subprograms • Know what BCSLIB-EXT can do • Know where to find additional documentation Associated documentation This chapter is based on The Boeing Extreme Mathematical Library BCSLIB-EXT, Fourth Edition, 20462-0520-R4.2 March 2004, The Boeing Company. This document is distributed in electronic format under the HP MLIB source tree installed in /opt/mlib/doc.
PAGE 571
BCSLIB-EXT functionality • Standard eigenvalue problems • Generalized eigenvalue problems BCSLIB-EXT solves these problems in cases where the coefficient matrices are sparse and large, perhaps extremely large. Sparse matrices are those for which almost all of the entries are zero; only a few entries in any given row or column are not zero.
PAGE 572
BCSLIB-EXT functionality 1084 HP MLIB LAPACK User’s Guide
PAGE 573
Overview A Calling MLIB routines from C Overview Subprogram calls in Fortran use the same calling convention as used by other Hewlett-Packard language processors. This HP standard permits MLIB subprograms, that are written in Fortran or assembly language compatible with Fortran, to be called from programs written in C. This appendix describes the calling conventions necessary to call MLIB subprograms—VECLIB and LAPACK—from C programs.
PAGE 574
General interlanguage programming rules • By default, the first element of a Fortran array is X(1), but the first element of a C array is X[0]. As a consequence, you must adjust any input or output arguments or function values that are indices into arrays by decrementing the quantity by 1.
PAGE 575
General interlanguage programming rules • Because C does not support complex arithmetic in a predefined way, storage layouts for Fortran types COMPLEX*8 and COMPLEX*16 usually are defined by typedef statements. Use the following typedef statements: typedef /* typedef /* struct {float re, im;} complex8_t; COMPLEX*8 type */ struct {double re, im;} complex16_t; COMPLEX*16 type */ • Using the above typedef statements, Fortran and C declarations are related as shown in the table below.
PAGE 576
General interlanguage programming rules However, if the struct is allocated using malloc(), it will be 8-byte aligned. So if a program allocates the complex8_t struct with malloc(), and then passes it using pointers, then the data will be 8-byte aligned. For example: complex8_t*zarray=malloc(n*sizeof(complex8_t)) will be 8-byte aligned. One way to ensure that static storage is aligned is to use the C compiler’s align #pragma to enforce 8-byte alignment.
PAGE 577
Header files Header files C header files for VECLIB are installed in /opt/mlib/include as file veclib.h. Header file lapack.h is also provided for use with its associated libraries. The following discussion applies to all these header files. Include the −I/opt/mlib/include option on your C compiler command line to access the C header files. Use the #include directive to incorporate one of these header files into your program, for example, #include
PAGE 578
Examples • The ANSI C function prototypes also enforce the additional int arguments at the end of the argument list corresponding to the character arguments. Note the two char* arguments at the beginning of the argument list in the function prototype for DGEMM and the two int arguments at the end, which must tell the lengths of the character strings transa and transb.
PAGE 579
Examples Figure A-1 Calling VECLIB subroutines from Fortran and C Fortran: PROGRAM MAIN INTEGER*4 LDA, N, IER, JOB PARAMETER ( LDA = 10 ) INTEGER*4 IPVT(LDA) REAL*8 A(LDA,LDA), B(LDA) N = 6 JOB = 0 ! TO SOLVE COLUMN-MAJOR-STORED SYSTEM CALL DGEFA (A, LDA, N, IPVT, IER) IF ( IER .EQ. 0 ) THEN CALL DGESL (A, LDA, N, IPVT, B, JOB) ELSE WRITE (6,*) "singular matrix" END IF END C: #include #include
PAGE 580
Examples Calling functions The method of calling a VECLIB function subprogram depends on its data type. The following examples show how to call some Fortran function subprograms from C. The function prototypes in veclib.h cast the function return values to the proper data types. REAL functions The program in Figure A-2 computes the dot product of two double (Fortran REAL*8) vectors of length 2 using the VECLIB subprogram DDOT. Figure A-2 Calling a real-valued function from C #include
PAGE 581
Linking VECLIB subprograms into a C program Linking VECLIB subprograms into a C program You can call VECLIB or LAPACK subprograms from a program coded in C by including −lveclib -lcl or −llapack -lcl on the cc command line that links your program: cc −o test test.c −lveclib -lcl cc −o test test.c −llapack -lcl Error handling MLIB contains two standard error handlers, subroutines XERBLA and XERVEC. Refer to “XERBLA” on page 337 and “XERVEC” on page 623.
PAGE 582
Error handling Figure A-4 Default VECLIB error handling in a C program % cat vsig1.c #include main () { int n = 1; float alpha = 0.0; float x[10]; int incx = 0; sflr1c (&n, &alpha, x, &incx); }% cc vsig1.c -lveclib % a.
PAGE 583
Error handling Figure A-5 illustrates the use of initMLIB, whose function prototype is obtained from veclib.h. Figure A-5 Changing the VECLIB error handling signal % cat vsig2.c #include #include main () { int n = 1; float alpha = 0.0; float x[10]; int incx = 0; initMLIB (SIGUSR1); sflr1c (&n, &alpha, x, &incx); printf("Hello world\n"); } % cc vsig2.c -lveclib % a.
PAGE 584
Error handling Supplying your own signal handler After calling initMLIB to specify which signal VECLIB’s standard error handlers are to use, you can set up your own signal handler for that signal via the C function signal(3c). This is demonstrated by the program in Figure A-6. Figure A-6 Changing the error handling signal handler % cat vsig3.c #include #include static void veclib_sig (sig, code, scp) int sig, code; struct sigcontext *scp; { printf("SIGUSR2 has occurred...
PAGE 585
Overview B LINPACK Subprograms Overview LINPACK is a collection of Fortran subroutines that analyze and solve systems of simultaneous linear algebraic equations. The LINPACK package was developed with the support of the National Science Foundation in the latter part of the 1970s as one of the first completely systemized collections of linear algebra software. Since that time, much has changed in computer architectures and mathematical algorithms, and LAPACK supersedes LINPACK.
PAGE 586
LINPACK subprograms included in VECLIB LU factorization LINPACK subprograms included in VECLIB Name DGEFA LU factorization Purpose This subprogram computes the triangular factorization of a general dense n-by-n matrix A. Specifically, given A, DGEFA determines an n-by-n permutation matrix P, an n-by-n unit lower-triangular matrix L, and an n-by-n upper-triangular matrix U, such that PA = LU . Computational singularity of A results in one or more zero diagonal elements of U.
PAGE 587
LU factorization DGEFA ier Notes Status response: ier = 0 Normal return. ier = k ≠ 0 If u = 0. (u is the k-th element on kk kk the diagonal of upper triangular matrix U). Technically, this is not an error condition for these subprograms, but it does indicate that A is computationally singular and that a division by zero occurs if the factorization is used to solve a system of linear equations. DGEFA is usage compatible with the standard LINPACK subprogram.
PAGE 588
DGESL Solve linear equations Name DGESL Solve linear equations Purpose Given the triangular factorization of a general dense n-by-n coefficient matrix A, and a right-hand-side n-vector b, this subprogram solves the system of linear equations Ax = b.
PAGE 589
Solve linear equations Input DGESL a lda n ipvt b job Output b Array containing the triangular factors L and U of the n-by-n coefficient matrix A as computed by the companion factorization or condition number estimation subprogram. a must have been preserved between the factorization call and the solve call. The leading dimension of array a as declared in the calling program unit, with lda ≥ max(n,1). The order of matrix A, n ≥ 0.
PAGE 590
DGESL 1102 HP MLIB User’s Guide Solve linear equations
PAGE 591
Overview C Parallelized Subprograms Overview Many subprograms in HP MLIB are parallelized to improve performance on Hewlett-Packard computers. This appendix lists parallelized routines in both HP VECLIB and HP LAPACK. Refer to Table C-1 on page 1104 for VECLIB subprograms, and Table C-2 on page 1105 for LAPACK subprograms. Many additional subprograms in the libraries call the BLAS and so inherit BLAS parallelism.
PAGE 592
Parallelized subprograms in VECLIB Parallelized subprograms in VECLIB Table C-1 lists the HP VECLIB subprograms that are parallelized to improve performance.
PAGE 593
Parallelized subprograms in LAPACK Parallelized subprograms in LAPACK Table C-2 lists the HP LAPACK subprograms that are parallelized to improve performance.
PAGE 594
Parallelized subprograms in LAPACK 1106 HP MLIB User’s Guide
PAGE 595
Index Symbols +noppu 14, 639, 701, 732 Itanium processor 15, 640, 703, 733 Itanium processors 15, 640, 703, 733 PA-RISC 14, 640, 702, 733 +ppu 14, 639, 701, 732 Itanium processor 15, 640, 703, 733 Itanium processors 15, 640, 703, 733 PA-RISC 14, 640, 702, 733 1-way Dissection reordering 934 A accessing VECLIB 3 accessing VECLIB 3 accuracy of computed eigenvaules and eigenvectors 1018 apply Givens rotation general 114 modified 123 sparse 120 arguments BLAS Standard 36 array Fortran argument association 34 F
PAGE 596
CGERU 237 CGETRA 241 CGTHR 94 CGTHRZ 96 CHBMV 244 check accuracy of eigenvalue and eigenvector results 1018 CHEMM 265 CHEMV 270 CHER 275 CHER2 279 CHER2K 284 CHERK 289 CHPMV 249 CHPR 254 CHPR2 259 CLANGB 657 CLANGE 661 CLANGT 663 CLANHB 666 CLANHE 680 CLANHP 671 CLANHT 676 CLANSB 666 CLANSP 671 CLANSY 680 clear vector 150 clip vector left-sided 74 right-sided 77 two-sided 71 CLSTEQ 99 CLSTNE 99 combine AXPY and DOT 164 compiling linking LAPACK 629 linking libisamstub.
PAGE 597
CZERO 150 D D1DFFT 544 D2DFFT 549 D3DFFT 553 DAMAX 56 DAMIN 59 DASUM 62 data types 23 DAXPY 65 DAXPYI 68 DBCOMM 441 DBDIMM 445 DBDISM 449 DBELMM 453 DBELSM 457 DBSCMM 461 DBSCSM 465 DBSRMM 469 DBSRSM 473 DCLIP 71 DCLIPL 74 DCLIPR 77 DCONV 594, 599 DCOOMM 477 DCOPY 80 DCSCMM 481 DCSCSM 485 DCSRMM 489, 497 DCSRSM 493 DDIASM 501 DDOT 84 DDOTI 88 deallocate working storage 894, 1020 DELLMM 505, 513 DELLSM 509, 517 determine extent of parallelism 607, 608 DFFTS 560 DFRAC 92 DFT 540 DGBMV 212 DGECPY 219 DGEFA 10
PAGE 598
DSEVVE 1073 DSEVVM 1076 DSKYMM 521 DSKYSM 525 DSLECO 891 DSLEDA 894 DSLEFA 896 DSLEFF 900 DSLEFS 903 DSLEI1 909 DSLEIC 911 DSLEIE 914 DSLEIF 916 DSLEIM 917 DSLEIN 921 DSLELU 923 DSLEMA 929 DSLEOC 931 DSLEOP 933 DSLEOR 937 DSLEPS 940 DSLERD 941 DSLESL 944 DSLEV1 947 DSLEVC 950 DSLEVE 954 DSLEVM 957 DSORT 620 DSPMV 249 DSPR 254 DSPR2 259 DSUM 139 DSWAP 141 DSYMM 265 DSYMV 270 DSYR 275 DSYR2 279 DSYR2K 284 DSYRK 289 DTBMV 294 DTBSV 301 DTPMV 308 DTPSV 313 DTRMM 318 DTRMV 323 DTRSM 327 DTRSV 332 DVBRMM 529 DVBR
PAGE 599
compute 107 squared 110 extended BLAS 205 extract fractional parts 92 F F_BLASERROR 605 F_CAMAX_VAL 153 F_CAMIN_VAL 155 F_CAPPLY_GROT 158 F_CAXPBY 161 F_CAXPY_DOT 164 F_CCOPY 167 F_CDOT 169 F_CGBMV 355 F_CGE_COPY 358 F_CGE_TRANS 360 F_CGEMM 362 F_CGEMV 365 F_CGEMVER 368 F_CGEMVT 372 F_CGEN_GROT 174 F_CGEN_HOUSE 175 F_CGEN_JROT 178 F_CGER 375 F_CHBMV 340 F_CHEMV 342 F_CHER 344 F_CHER2 346 F_CHPMV 348 F_CHPR 350 F_CHPR2 352 F_CPERMUTE 187 F_CRSCALE 190 F_CSBMV 378 F_CSPMV 381 F_CSPR 384 F_CSPR2 386 F_CSUM 19
PAGE 600
F_SGEN_GROT 174 F_SGEN_HOUSE 175 F_SGEN_JROT 178 F_SGER 375 F_SMAX_VAL 181 F_SMIN_VAL 183 F_SNORM 185 F_SPERMUTE 187 F_SRSCALE 190 F_SSBMV 378 F_SSORT 192 F_SSORTV 193 F_SSPMV 381 F_SSPR 384 F_SSPR2 386 F_SSUM 195 F_SSUMSQ 197 F_SSWAP 200 F_SSYMV 389 F_SSYR 392 F_SSYR2 394 F_STBMV 397 F_STBSV 400 F_STPMV 403 F_STPSV 406 F_STRMV 408 F_STRMVT 411 F_STRSM 414 F_STRSV 417 F_SWAXPBY 202 F_ZAMAX_VAL 153 F_ZAMIN_VAL 155 F_ZAPPLY_GROT 158 F_ZAXPBY 161 F_ZAXPY_DOT 164 F_ZCOPY 167 F_ZDOT 169 F_ZGBMV 355 F_ZGE_COPY 35
PAGE 601
one-dimensional 556, 560, 582, 587 real-to-complex 582, 587 separate real arrays 560 three-dimensional general 551 real-to-complex 576, 579 separate real arrays 553 two-dimensional general 547 real-to-complex 570, 573 separate real arrays 549 find first vector element 53 selected vector element 99 floating point machine specific characteristics 172, 354 parameters for F_FPINFO 173 Fortran array argument association 34 storage of arrays 34 forward storage 35 fractional parts, extract 92 full Hermitian matrix
PAGE 602
ill-conditioned problems See roundoff effects ILSTEQ 99 ILSTGE 99 ILSTGT 99 ILSTLE 99 ILSTLT 99 ILSTNE 99 IMAX 103 IMIN 105 index element of vector maximum 49 minimum 51 magnitudes maximum 40 minimum 43 maximum element of vector 49 magnitudes 40 minimum element of vector 51 magnitudes 43 Indexed Sequential Access Method See ISAM initialize sparse eigenvalues/eigenvectors 1054 sparse linear equations 921 vector to zero 150 initMLIB 1094 inner rotation 26 input matrix structure by column 911, 1045 by finite e
PAGE 603
DGESL 1097 subprograms in VECLIB 1098 list selected vector elements 99 long period random number generator scalar 615 vector 617 LSAME 606 LU factorization 1098 M machine independence 655 machine specific charactristics See F_SFPINFO machine-dependent parameters, return 655 magnitudes compute maximum 56 minimum 59 index magnitudes 40 maximum 40 minimum 43 maximum compute 56 index 40 minimum compute 59 index 43 sum 62 man pages 30, 650 matrix band compute norm 657 band symmetric compute norm 666 compute nor
PAGE 604
by matrix 917, 1050 by single entry 909, 1043 end 916, 1049 symmetric sample program 887 symmetric band compute orm 666 symmetric full compute norm 680 symmetric packed compute norm 671 symmetric tridiagonal, compute norm 676 transposition 360 tridiagonal compute norm 676 tridiagonal general compute norm 663 value input by column 950, 1068 by finite element 954, 1073 by matrix 957, 1076 by single entry 947, 1066 to main diagonal 1071 maximum absolute vector 153 compute magnitudes 56 element of vector, index
PAGE 605
symmetric matrix 389 symmetric packed matrix 381 triangular band matrix 397 triangular matrix 408 triangular matrix and n-vector 323 triangular packed matrix 403 N naming conventions Extended BLAS 207–209 Sparse BLAS 423–424 VECLIB subroutines 24–25 natural reordering 934 non parallel processing controlling at link time 643 nonsymmetric matrix sample program 888 norm 36, 185, 209 defined 25 general band matrix 657 general full matrix 661 general tridiagonal matrix 663 Hermitian band matrix 666 Hermitian fu
PAGE 606
rotation apply plane 158 Givens 174 Jacobi 178 roundoff effects 23, 648 runtime, controlling parallel processing 18, 643 S S1DFFT 544 S2DFFT 549 S3DFFT 553 SAMAX 56 SAMIN 59 SASUM 62 save problem state to savefile 1065 SAXPY 65 SAXPYI 68 SBCOMM 441 SBDIMM 445 SBDISM 449 SBELMM 453 SBELSM 457 SBSCMM 461 SBSCSM 465 SBSRMM 469 SBSRSM 473 ScaLAPACK xviii, 689–716 arguments 709 array 714 INFO 716 option 709 scalar 714 associated documentation 690 functionality 705 linear equations 706 linear least squares 707 n
PAGE 607
sort vector 192, 193 sorted rotation 26 sparse 120 BLAS 31, 421 dot product 88 eigenextraction 1030, 1036 eigenvalues initialize 1054 one-call usage 1021 subprograms 1003–1079 eigenvectors initialize 1054 subprograms 1003–1079 Givens rotation, apply 120 linear equations initialize 921 solve 944 specify 929 subprograms 875–962 vector elementary operation 68 gather 94 gather and zero 96 scatter 136 zero and gather 96 specify coefficient matrix sparse linear equations 929 specify singularity treatment 900 SRAM
PAGE 608
symmetric tridiagonal matrix compute norm 676 SZERO 150 T thread definition 643 time CPU time 604 wall-clock time 622 trans 36, 209 defined 25 transform Householder 175 transposition 360 triangular band system solve 301, 400 multiply matrix-matrix 318 matrix-vector 308 matrix-vector and n-vector 323 packed solve 406 system solve 313, 327, 332, 414, 417 triangular factorization 1098, 1100 triangular solve block diagonal format triangular solve 449, 457 block sparse column format triangular solve 465 block s
PAGE 609
vdSin 833 vdsin 812 vdSinCos 834 vdsincos 866 vdSinh 835 vdsinh 813, 867 vdSqrt 836 vdsqrt 868 vdTan 837 vdtan 869 vdTanh 838 vdtanh 870 vdUnpackI 839 vdunpacki 871 vdUnpackM 839 vdunpackm 871 vdUnpackV 839 vdunpackv 871 VECLIB 1 error handling 684 VECLIB, accessing 3 VECLIB8 1 libraries 3 vector clear 150 clip left-sided 74 right-sided 77 two-sided 71 copy 80, 167 elementary operation 65 elements, count selected 46 index of maximum element 49, 51 list selected elements 99 long period random-number generato
PAGE 610
vsln 859 vsLog10 828 vslog10 860 vsPackI 829 vspacki 861 vsPackM 829 vspackm, vdpacki 861 vsPackV 829 vspackv 861 vsPow 831 vspow 863 vsPowx 832 vspowx 864 vsqrt 865 vsSin 833 vsSinCos 834 vssincos 866 vsSinh 835 vssinh 867 vsSqrt 836 vssqrt 868 vsTan 837 vstan 869 vsTanh 838 vstanh 870 vsUnpackI 839 vsunpacki 871 vsUnpackM 839 vsunpackm 871 vsUnpackV 839 vsunpackv 871 W wall-clock time 622 WALLTIME 622 weighted dot product, compute 145 working storage 648 working storage, deallocate 894, 1020 X XERBLA 33
PAGE 611
ZLANHE 680 ZLANHP 671 ZLANHT 676 ZLANSB 666 ZLANSP 671 ZLANSY 680 ZLSTEQ 99 ZLSTNE 99 ZRC1FT 564 ZRC2FT 570 ZRC3FT 576 ZRCFTS 582 ZROT 114 ZROTG 118 ZRSCL 130 ZSCAL 133 ZSCALC 133 ZSCTR 136 ZSKYMM 521 ZSKYSM 525 ZSUM 139 ZSWAP 141 ZSYMM 265 ZSYR2K 284 ZSYRK 289 ZTBMV 294 ZTBSV 301 ZTPMV 308 ZTPSV 313 ZTRMM 318 ZTRMV 323 ZTRSM 327 ZTRSV 332 ZVBRMM 529 ZVBRSM 533 ZWDOTC 145 ZWDOTU 145 ZZERO 150 1123
PAGE 612
1124