HP-UX Floating-Point Guide HP 9000 Computers Edition 5 B3906-90006 November 1997 Printed in: United States © Copyright 1997 Hewlett-Packard Company
Legal Notices The information in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be held liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material. Warranty.
©copyright 1980, 1984, 1986 Novell, Inc. ©copyright 1986-1992 Sun Microsystems, Inc. ©copyright 1985-86, 1988 Massachusetts Institute of Technology. ©copyright 1989-93 The Open Software Foundation, Inc. ©copyright 1986 Digital Equipment Corporation. ©copyright 1990 Motorola, Inc.
Contents 1. Introduction Overview of Floating-Point Principles . . . . . . . . . . . . . . . . . . . . . . . . . . .22 Overview of HP-UX Math Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24 Math Libraries and System Architecture . . . . . . . . . . . . . . . . . . . . . . .25 Selecting Different Versions of the Math Libraries . . . . . . . . . . . . . . .27 Locations of the Math Libraries at Release 11.0 . . . . . . . . . . . . . . . . .29 2.
Contents Floating-Point Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conversion Between Operand Formats . . . . . . . . . . . . . . . . . . . . . . . . The Remainder Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 62 64 66 Recommended Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.
Contents 4. HP-UX Math Libraries on HP 9000 Systems HP-UX Library Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99 Math Library Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101 Anatomy of a Math Library Function Call . . . . . . . . . . . . . . . . . . . . .102 Math Library Error Handling for C . . . . . . . . . . . . . . . . . . . . . . . . . .104 Handling Invalid Operation Exceptions (C and Fortran) . . . . . . . . .
Contents Handling Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Using the ON Statement (Fortran only) . . . . . . . . . . . . . . . . . . . . . . 157 Using the sigaction(2) Function (C only) . . . . . . . . . . . . . . . . . . . . . . 161 Detecting Exceptions without Enabling Traps. . . . . . . . . . . . . . . . . . . 164 Handling Integer Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Handling Integer Division by Zero. . . .
Contents A. The C Math Library C Math Library Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .195 B. The Fortran Math Library C. Floating-Point Problem Checklist Results Different from Those Produced Previously . . . . . . . . . . . . . . . .219 Incorrect or Imprecise Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .221 Compiling and Linking Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figures Figure 1-1. Math Library Directory Hierarchy at Release 11.0 . . . . . . 31 Figure 2-1. IEEE Single-Precision Format . . . . . . . . . . . . . . . . . . . . . . . 37 Figure 2-2. IEEE Double-Precision Format . . . . . . . . . . . . . . . . . . . . . . 37 Figure 2-3. IEEE Quad-Precision Format. . . . . . . . . . . . . . . . . . . . . . . . 38 Figure 2-4. IEEE Single-Precision Format: Example . . . . . . . . . . . . . . 39 Figure 3-1. Taking the Difference of Similar Values . . . . . . . . . . . . . .
Tables Table 1-1. HP-UX Math Libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 Table 1-2. Code-Generation Compiler Options at HP-UX Release 11.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28 Table 1-3. Math Library Path Names. . . . . . . . . . . . . . . . . . . . . . . . . . . .29 Table 2-1. IEEE Representations of Floating-Point Values . . . . . . . . . .40 Table 2-2. Minimum and Maximum Positive Denormalized Values. . . .43 Table 2-3.
Tables Table A-1. The C Math Library (By Category) . . . . . . . . . . . . . . . . . . . 196 Table A-2. The C Math Library (Alphabetical Listing) . . . . . . . . . . . .
Printing History The manual printing date and part number indicate its current edition. The printing date will change when a new edition is printed. Minor changes may be made at reprint without changing the printing date. the manual part number will change when extensive changes are made. Manual updates may be issued between editions to correct errors or document product changes. To ensure that you receive the updated or new editions, you should subscribe to the appropriate product support service.
Preface The HP-UX Floating-Point Guide describes how floating-point arithmetic is implemented on HP 9000 systems and discusses how floating-point behavior affects the programmer. This book provides information useful to any programmer writing or porting floating-point-intensive programs. We recommend that you start with Chapter 1, which not only provides an overview of floating-point principles but also provides important information about HP-UX math libraries.
Summary of Technical This edition of the HP-UX Floating-Point Guide describes the following Changes changes to the HP-UX math libraries at Release 11.00. The math libraries provided in the /usr/lib directory support PA-RISC 1.1 (PA1.1) and PA-RISC 2.0 (PA2.0) systems. The libraries in /usr/lib support PA2.0 systems in 32-bit mode; the libraries in /usr/lib/pa20_64 support PA2.0 systems in 64-bit mode. A version of the BLAS library tuned for optimal 32-bit performance on PA2.
The document ANSI/IEEE Std 754-1985, entitled “IEEE Standard for Binary Floating-Point Arithmetic,” was also published in ACM SIGPLAN Notices 22(2), pp. 9 - 25, Feb. 1987. The international version of the standard is Binary floating-point arithmetic for microprocessor systems, second edition (IEC 559:1989). Information on HP-UX programming languages and tools is available in both online and hardcopy format.
• The HP/DDE Debugger User’s Guide (B3476-90015) describes the HP DDE debugger. • The PA-RISC 1.1 Architecture and Instruction Set Reference Manual (09740-90039) describes the PA-RISC 1.1 architecture. PA-RISC 2.0 Architecture, by Gerry Kane (Prentice-Hall, ISBN 0-13-182734-0), describes the PA-RISC 2.0 architecture. • The Assembly Language Reference Manual (92432-90001) describes assembly language programming on HP-UX systems. The ADB Tutorial (92432-90005) introduces the assembly language debugger.
Conventions Unless otherwise noted in the text, this manual uses the following symbolic conventions. literals This font indicates commands, keywords, options, literals, source code, system output, and path names. In syntax formats, this font indicates commands, keywords, and punctuation that you must enter exactly as shown. user input In examples, this font represents user input. variables, titles In syntax formats, words or characters in this font represent values that you must supply.
In This Book This manual is organized as follows: Chapter 1 Provides an overview of the basic principles involved in writing floating-point programs and of the HP-UX math libraries used by most floating-point applications. Chapter 2 Provides an overview of the IEEE Standard for Binary Floating-Point Arithmetic, the standard on which HP floating-point behavior is based.
1 Introduction This chapter introduces some of the basic principles involved in writing floating-point programs and introduces the HP-UX math libraries used by most floating-point applications.
Introduction Overview of Floating-Point Principles Overview of Floating-Point Principles In the context of computer programming, the term floating-point refers to the ways in which modern computer systems represent real numbers and perform real arithmetic. Computers use special representations for floating-point numbers. They also have special rules for performing floating-point arithmetic that differ from the rules for performing integer arithmetic.
Introduction Overview of Floating-Point Principles arithmetic. The purpose of this book is to help you avoid or fix these types of problems on HP 9000 computer systems and to help you increase the performance of your floating-point-intensive applications.
Introduction Overview of HP-UX Math Libraries Overview of HP-UX Math Libraries Basic operations such as addition and multiplication are specified by the IEEE standard. More complex mathematical operations such as logarithmic and trigonometric functions are provided by math library routines.
Introduction Overview of HP-UX Math Libraries Table 1-1 HP-UX Math Libraries Library Name Description Linker Option libm C math library; ANSI C, POSIX, XPG4.2, and SVID specifications -lm libcl Fortran and Pascal library Linked in automatically by f90, f77, and pc commands; use -lcl with other compiler commands libblas Basic Linear Algebra Subroutine (BLAS) library (provided with the HP Fortran 90 and HP FORTRAN/9000 products only) -lblas Math Libraries and System Architecture At Release 11.
Introduction Overview of HP-UX Math Libraries When you compile a program on a PA2.0 system at HP-UX Release 11.0, the compiler by default generates PA2.0N code. To generate PA2.0W code, you need to specify the +DA2.0W option (see “Selecting Different Versions of the Math Libraries” on page 27). All HP 9000 systems except the oldest Series 800 systems are PA1.1-based or PA2.0-based. If you do not know your system’s architecture type, see “Determining Your System’s Architecture Type” on page 26.
Introduction Overview of HP-UX Math Libraries The uts.release is the release of HP-UX on the system where you run the program. The _SYSTEM_ID is the kind of code the compiler generated. The _CPU_REVISION is the architecture type. If you compile this program on a PA1.1 system, then run it on a PA2.0 system running HP-UX Release 11.0, you get results like the following: Release = B.11.00 _SYSTEM_ID = 210 _CPU_REVISION = 214 The release, 11.00, is easy to decipher.
Introduction Overview of HP-UX Math Libraries Table 1-2 Code-Generation Compiler Options at HP-UX Release 11.0 Option Synonyms Code Generated Runs on +DA1.1 +DD32 (C only), +Oportable PA1.1 All PA systems running HP-UX 11.0 or later +DA2.0 +DA2.0N PA2.0 narrow mode All PA2.0 systems running HP-UX 11.0 or later +DA2.0W +DD64 (C only) PA2.0 wide mode PA2.0 (HP-UX 11.0 wide-mode kernels only) If your application must run on both PA1.1 and PA2.0 systems, compile with +DA1.1.
Introduction Overview of HP-UX Math Libraries Locations of the Math Libraries at Release 11.0 At Release 11.0, the main HP-UX math libraries are in the directory /usr/lib. The BLAS library is in both /opt/fortran90/lib and /opt/fortran/lib. The obsolete vector library exists only in /opt/fortran/old/lib. Table 1-3 shows the math library path names. Table 1-3 Math Library Path Names Library Path Name Description /usr/lib/libm.a C math library, archive version, PA1.1 /usr/lib/libm.
Introduction Overview of HP-UX Math Libraries Library Path Name Description /opt/fortran90/lib/libblas.a Basic Linear Algebra Subroutine (BLAS) library, PA1.1 archive version (provided with the HP Fortran 90 product) /opt/fortran90/lib/pa2.0/libblas.a Basic Linear Algebra Subroutine (BLAS) library, PA2.0 narrow-mode archive version (provided with the HP Fortran 90 product) /opt/fortran/lib/libblas.a Basic Linear Algebra Subroutine (BLAS) library, PA1.
Introduction Overview of HP-UX Math Libraries Figure 1-1 illustrates the directory hierarchy for the math libraries. Figure 1-1 Math Library Directory Hierarchy at Release 11.
Introduction Overview of HP-UX Math Libraries 32 Chapter 1
2 Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic 33
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic This chapter introduces the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985). Throughout this chapter and the remainder of the book, we refer to the IEEE Standard for Binary Floating-Point Arithmetic as “the IEEE standard” or simply “the standard.” Programmers who intend to write floating-point-intensive code should become familiar with the IEEE standard.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic What Is the IEEE Standard? What Is the IEEE Standard? The IEEE standard was approved in 1985. Its main purpose is to define specifications for representing and manipulating floating-point values so that programs written on one IEEE-conforming machine can be moved to another conforming machine with predictable results.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Floating-Point Formats The IEEE standard specifies four formats for representing floating-point values: • Single-precision • Double-precision (optional, though a double type wider than IEEE single-precision is required by standard C) • Single-extended precision (optional) • Double-extended precision (optional) The IEEE standard does not require an implementation to support single-extended precision a
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Figure 2-1 IEEE Single-Precision Format The double-precision format is 64 bits long: 1 bit for the sign, 11 bits for the exponent, and 52 bits for the fraction. The double-precision format is sometimes divided conceptually into two 32-bit words. The word containing the sign bit, the exponent field, and the first portion of the fraction field is referred to as the most significant word.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Figure 2-3 IEEE Quad-Precision Format NOTE On HP 9000 systems, the most significant word is stored at a lower memory address than the least significant word. If, for example, a double-precision value is stored at address 0x1000, the least significant word is stored at address 0x1004. If a quad-precision value is stored at address 0x1000, the least significant word is at address 0x100C.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats For example, if the 23 bits in the fraction field of a single-precision number are 011 0100 0000 0000 0000 0000 and the exponent field is not all 1’s or all 0’s, the fraction value is 1.0 + 2−2 + 2−3 + 2−5 = 1.0 + 0.25 + 0.125 + .03125 = 1.40625 The 1.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats In our example, this would be (-1)0 * 1.5 * 22 = 1.5 * 4.0 = 6.0 Table 2-1 shows some additional examples. Table 2-1 IEEE Representations of Floating-Point Values Hexadecimal Representation Sign Exponent Fraction Value SP: 40C0 0000 DP: 4018 0000 0000 0000 QP: 4001 8000 0000 0000 0000 0000 0000 0000 + 129 – 127 = 2 1025 – 1023 = 2 16385 – 16383 = 2 1.0 + 0.5 = 1.5 +1.5 * 22 = 6.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Because of this granularity in floating-point representation, most real numbers cannot be represented exactly. The result of an arithmetic operation (including the operation of converting from a decimal string into IEEE format) usually must be rounded to a nearby representable number. (For information on rounding, see “Inexact Result (Rounding)” on page 53.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats A denormalized value is represented by a zero exponent field and a nonzero fraction (if the fraction were also zero, the floating-point value would be zero).
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Table 2-2 Precision Minimum and Maximum Positive Denormalized Values Values Hexadecimal Representation Value Single Minimum denormalized Maximum denormalized Minimum normalized 0000 0001 007F FFFF 0080 0000 2−149 2−149 * (223 − 1) 2−126 Double Minimum denormalized Maximum denormalized Minimum normalized 0000 0000 0000 0001 000F FFFF FFFF FFFF 0010 0000 0000 0000 2−1074 2−1074 * (252 − 1) 2−
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Table 2-3 Arithmetic Properties of Infinity Operand NOTE Operator Operand Result +Infinity −Infinity +Infinity −Infinity +Infinity + + + + + Finite Value Finite Value +Infinity −Infinity −Infinity +Infinity −Infinity +Infinity −Infinity NaN (invalid operation) +Infinity −Infinity Finite Value Finite Value +Infinity −Infinity +Infinity −Infinity − − − − − − − − Finite Value Finite Value +In
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Not-a-Number (NaN) A NaN (Not-a-Number) is a special IEEE representation for values that are • The result of an invalid operation • The result returned by a library function when it would be incorrect to return a numeric value • An undetermined value NaNs are represented by setting all of the bits in the exponent to 1 and setting at least one of the bits in the fraction field to 1.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Table 2-4 Properties of NaNs Operand Operator SNaN QNaN SNaN1 QNaN1 + + + + SNaN float_to_int() QNaN float_to_int() SNaN QNaN sqrt() sqrt() Operand Result Finite Value Finite Value SNaN2 QNaN2 QNaN (invalid operation) QNaN QNaN (invalid operation) QNaN1 or QNaN2 (implementationdependent) Largest-magnitude integer (invalid operation) Largest-magnitude integer (invalid operation) QNaN (inv
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Table 2-5 Operations With Zero Operand NOTE Operator Operand Result +Zero .EQ.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats each component is a double-precision IEEE operand type. HP Fortran 90 and HP FORTRAN/9000 support both complex data types and a full range of complex arithmetic operations. NOTE HP Fortran 90 and HP FORTRAN/9000 also support the nonstandard data type names DOUBLE COMPLEX and COMPLEX*16 (equivalent to COMPLEX(KIND=8)) and COMPLEX*8 (equivalent to COMPLEX).
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Table 2-6 Value IEEE Single-Precision Value Summary (Hexadecimal Values) Exponent Fraction Hexadecimal Values (Single-Precision) Positive Negative Zero All zeros All zeros 0000 0000 8000 0000 Denormalized All zeros Nonzero 0000 0001 to 007F FFFF 8000 0001 to 807F FFFF Normalized Neither all zeros nor all ones Anything 0080 0000 to 7F7F FFFF 8080 0000 to FF7F FFFF Infinity All on
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Table 2-7 IEEE Single-Precision Value Summary (Decimal Values) Decimal Values (Single-Precision) Value Positive Negative Zero 0.0 −0.0 Denormalized 1.4012985E-45 to 1.1754942E-38 −1.4012985E-45 to −1.1754942E-38 Normalized 1.1754944E-38 to 3.4028235E+38 −1.1754944E-38 to −3.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Table 2-8 IEEE Double-Precision Value Summary (Hexadecimal Values) Hexadecimal Values (Double-Precision) Value Exponent Fraction Positive Negative Zero All zeros All zeros 0000 0000 0000 0000 8000 0000 0000 0000 Denormalized All zeros Nonzero 0000 0000 0000 0001 8000 0000 0000 0001 to to 000F FFFF FFFF FFFF 800F FFFF FFFF FFFF 0010 0000 0000 0000 8010 0000 0000 0000 to to 7FEF
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Formats Table 2-9 IEEE Double-Precision Value Summary (Decimal Values) Decimal Values (Double-Precision) Value Positive Negative Zero 0.0 −0.0 Denormalized 4.94065E−324 to 2.22507E−308 −4.94065E−324 to −2.22507E−308 Normalized 2.22507E−308 to 1.79769E+308 −2.22507E−308 to −1.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Exception Conditions Exception Conditions The IEEE standard defines five exception conditions, also called exceptions: • Inexact result • Overflow • Underflow • Invalid operation • Division by zero The following sections describe the exceptions. On HP-UX systems, traps for all of these exceptions are initially disabled by default.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Exception Conditions produces an inexact result condition. Because most floating-point operations produce rounded (that is, inexact) results most of the time, the inexact result exception is not usually considered to be an error.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Exception Conditions NOTE Most applications do not need the alternate rounding modes. The default rounding mode is round to nearest. The four rounding modes are: Round To Nearest Round to the representable value closest to the true value. If two representable values are equally close to the true value, choose the one whose least significant bit is 0.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Exception Conditions These routines are designed to run in the default rounding mode, round to nearest. Changing the rounding mode may cause library routines to yield results with more rounding errors in unpredictable directions. Be careful if you change the rounding mode in the middle of your program when you are optimizing your code.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Exception Conditions If overflow traps are disabled, the result of a floating-point operation that overflows is assigned either an infinity code or the closest representable number (this will be either the largest positive value or the largest-magnitude negative value). The choice of whether to use infinity or the nearest representable value depends on the rounding mode, as shown in Table 2-11.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Exception Conditions conditions. On HP 9000 systems, the definition of loss of accuracy in underflow conditions includes all inaccuracies, whether they originate from denormalization or are inherent in the operation. NOTE In many properly functioning applications, underflows may occur in the normal course of execution—for example, in convergence algorithms.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Exception Conditions If an invalid operation condition occurs when invalid operation traps are disabled, the system by default returns a quiet NaN as the result of the operation. If traps are enabled, the system signals a floating-point exception and, if a trap handler is provided, takes whatever action the trap handler dictates.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Exception Processing Exception Processing Exception processing refers to the sequence of events that takes place when any of the IEEE exception conditions occur. The standard states that a programmer should be able to enable or disable the trapping of any of the exception conditions. The standard also defines default results for all disabled exceptions.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Operations Floating-Point Operations The IEEE standard requires a complying system to support the following floating-point operations: Addition Algebraic addition. Subtraction Algebraic subtraction. Multiplication Algebraic multiplication. Division Algebraic division. Comparison There are four possible relations between any two floating-point values: less than, equal, greater than, and unordered.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Operations Round to Nearest Integral Value Rounds an argument to the nearest integral value (in floating-point format) based on the current rounding mode. Rounding modes are described in “Inexact Result (Rounding)” on page 53. Remainder The remainder operation takes two arguments, x and y, and is defined as x − y * n, where n is the integer nearest the exact value x/y.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Operations NOTE The assertion operators should not be confused with actual programming language operators. Languages, for example, do not support the ? operator. At Release 11.0, the C math library provides six new macros that implement comparison operations without raising exceptions: isgreater, isgreaterequal, isless, islessequal, islessgreater, and unordered.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Operations Infinity To the comparison operators, infinity is just another signed numeric value whose magnitude is greater than the largest normalized magnitude. Infinities with the same sign compare as equal to each other. NaN A NaN compares as unequal to all other operands, including other NaNs and itself. The rules above are used to evaluate assertions involving NaNs as TRUE or FALSE.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Operations Decimal to Single-Precision, Double-Precision, or Quad-Precision Single-Precision, Double-Precision, or Quad-Precision to Decimal Single-Precision, Double-Precision, or Quad-Precision to Integer These conversions can overflow or underflow and are usually inexact. See “Conversions Between Binary and Decimal” on page 76 for more information about these conversions.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Floating-Point Operations Integer to Double-Precision or Single-Precision These conversions are exact except for conversions of 32-bit integer values greater than 224 − 1 to single-precision, or of 64-bit integer values greater than 253 − 1 to double-precision, which may generate an inexact result exception. The Remainder Operation The remainder operation is an exact modulo function.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Recommended Functions Recommended Functions In an appendix, the IEEE standard lists several useful floating-point functions that an implementor may support but is not required to support. Table 2-12 describes how HP-UX systems support these functions. The supported functions and macros are provided in the C library only. Appendix A describes these functions briefly; see the online man pages for more information.
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic Recommended Functions 68 Chapter 2
3 Factors that Affect the Results of Floating-Point Computations 69
Factors that Affect the Results of Floating-Point Computations When a floating-point application executes, the results it yields may be different from those of previous executions, or the results may be inaccurate. A great many factors can contribute to such differences or inaccuracies. This chapter describes these factors.
Factors that Affect the Results of Floating-Point Computations How Basic Operations Affect Application Results How Basic Operations Affect Application Results To understand why floating-point calculations can yield different results, you need to know how a system performs the most basic operations: add, subtract, multiply, divide.
Factors that Affect the Results of Floating-Point Computations How Basic Operations Affect Application Results This example also demonstrates a guiding principle of floating-point programming: Be very careful about testing two floating-point values or expressions for exact equality or inequality. This principle derives from the fact that two given floating-point values are almost never equal, even when the programmer might expect them to be equal from a purely mathematical standpoint.
Factors that Affect the Results of Floating-Point Computations How Mathematical Library Functions Affect Application Results How Mathematical Library Functions Affect Application Results Mathematical library functions do not always yield identical results from one system to another, from one language to another, or even from one software release to another.
Factors that Affect the Results of Floating-Point Computations How Exceptions and Library Errors Affect Application Results How Exceptions and Library Errors Affect Application Results When an application performs a floating-point operation that causes an exception condition or a library error, and the application is not coded to detect and deal with the exception or error, the default exception-handling response of the system may introduce a dramatic amount of error into the continuing computation.
Factors that Affect the Results of Floating-Point Computations How Exceptions and Library Errors Affect Application Results Table 3-1 Effects of Floating-Point Exceptions and Library Errors Type of Error Default System Behavior Effect on Application Overflow (ERANGE) Substitute an infinity as the result May be catastrophic unless specifically handled Underflow Substitute either a denormalized or zero value as the result: if the result after rounding would be smaller in magnitude than MINDOUBLE (the
Factors that Affect the Results of Floating-Point Computations Other System-Related Factors that Affect Application Results Other System-Related Factors that Affect Application Results In the previous sections we discussed the three most fundamental causes of inaccuracy in floating-point computations: • Rounding in basic operations • Math library functions • Exceptions and library errors This section lists the factors that can contribute to inaccuracy or to changes in the results of an application.
Factors that Affect the Results of Floating-Point Computations Other System-Related Factors that Affect Application Results The conversion between a decimal value and a binary floating-point value may cause a loss of accuracy for any of three reasons: • The algorithm may not be accurate, probably because speed/accuracy tradeoffs have been made in favor of speed. • Not all decimal values are exactly representable in binary—for example, 0.1.
Factors that Affect the Results of Floating-Point Computations Other System-Related Factors that Affect Application Results The Fortran program displays results similar to the following: Y = .3304651080717299 Y = 3FD526571FE8C7A5 The following C program shows how you can use a union to display floating-point values in hexadecimal: Sample Program: flophex.c #include #include union { double y; struct { unsigned int ym, yl; } i; } di; int main(void) { double x; x = 1.234e0; di.
Factors that Affect the Results of Floating-Point Computations Other System-Related Factors that Affect Application Results of this, the parsing of a floating-point constant might change if the exact value of the constant lies extremely close to the halfway point between two representable values. The compiler usually performs compile-time expression evaluation, which is commonly referred to as constant folding. A statement like X = 1.1/10.0E1 + 5.0E-1 will probably be compiled as X = 0.
Factors that Affect the Results of Floating-Point Computations Other System-Related Factors that Affect Application Results For example, by default the compiler orders the expression a + b * c + d as (a + (b * c)) + d But if you use +Onofltacc, it may change the ordering to a + d + (b * c) As we showed in “How Basic Operations Affect Application Results” on page 71, this kind of reordering has an effect on rounding errors and consequently on the final result.
Factors that Affect the Results of Floating-Point Computations Other System-Related Factors that Affect Application Results FMA instructions are generated by default at optimization levels of 2 and higher on PA2.0 systems. If you want your optimized code to preserve exactly the expression semantics of your source code, specify the +Ofltacc option to suppress the generation of FMA instructions.
Factors that Affect the Results of Floating-Point Computations Other System-Related Factors that Affect Application Results Values of Certain Modifiable Hardware Status Register Fields All HP 9000 systems have a modifiable floating-point status register. Figure 5-1 on page 128 illustrates this register.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Floating-Point Coding Practices that Affect Application Results The most common types of floating-point “bugs” reported to Hewlett-Packard are not bugs at all, but rather a class of programming mistakes.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results The following sections describe common floating-point programming mistakes that can lead to incorrect application results: • Testing floating-point values for equality • Taking the difference of similar values • Adding values with very different magnitudes • Unintentional underflow • Truncation to an integer value • Ill-conditioned computations The first mistake produces results tha
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Sample Program: fpeq.c #include int main(void) { union { double x; int a[2]; } u1, u2; u1.x = 1.2 - 0.1; u2.x = 1.1; if (u1.x == u2.x) printf("1.2 - 0.1 equals 1.1\n"); else { printf("1.2 - 0.1 is NOT equal to 1.1.\n"); printf("1.2 - 0.1 = %x%x\n1.1 = %x%x\n", u1.a[0], u1.a[1], u2.a[0], u2.a[1]); } } From an algebraic viewpoint, this routine should print that 1.2 − 0.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Consider again the example, rounderr.f, in “How Basic Operations Affect Application Results” on page 71. A better way to code that example is to test whether the two values are sufficiently close to each other, rather than exactly the same: Sample Program: roundeps.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Taking the Difference of Similar Values Calculations can lose precision when a program attempts to take the difference between two values that are similar in magnitude and also have some degree of inaccuracy to begin with.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Figure 3-1 Taking the Difference of Similar Values The modulo operation (mod(x, y) in Fortran) is an instance of this type of problem when x is much greater than y; remember that the modulo formula is mod(x,y) = x - int(x/y) * y (See “The Remainder Operation” on page 66 for details.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Figure 3-2 Adding Values with Very Different Magnitudes In fact, if the difference between the exponents of two single-precision values is 25 or greater, the smaller value is right-shifted out of existence, and adding the two values results in no change at all in the larger value.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Sample Program: diffmag1.f PROGRAM DIFFMAG1 REAL X INTEGER I 10 20 X = 0.01 DO 10 I = 1, 1000 X = X + 0.01 CONTINUE PRINT *, 'X is', X DO 20 I = 1, 1000 X = X - 0.01 CONTINUE PRINT *, 'X is', X END The result is not exact. Instead of 10.01 and 0.01, you are likely to get results similar to the following: X is 10.01013 X is 9.99994E-03 The following example is even simpler.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results One way to minimize this kind of precision loss, if you can tolerate the added execution time, is to sort the elements of an array in ascending order before you add them together. Unintentional Underflow An underflow may occur when a calculation produces a result that is smaller in magnitude than the smallest normalized value.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Both of these values have only 16 bits of significance. The final result, 9.9178697E−21, is a reasonable-looking normalized number. However, because it is produced by a calculation that once lost all but 16 bits of significance, it can have at most 16 bits of significance itself. In fact, it actually has considerably less.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Sample Program: trunc.c #include int main(void) { double x; int i, n; x = 1.5; for (i = 0; i < 10; i++) { n = x; printf("x is %g, n is %d\n", x, n); x += 0.1; } } No matter how close the value of x gets to 2.0, C conversion rules require the fractional part to be truncated.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Ill-Conditioned Computations If relatively small changes to the input of a program or to the intermediate results generated by a program cause relatively large changes in the final output, the program is said to be ill-conditioned or numerically unstable. The following example illustrates an ill-conditioned program: Sample Program: sloppy_tangent.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results Another technique, which does not require you to use additional functions or even to modify your code, is to make very small changes in the input data and to observe the amount by which the result changes. Wild swings in output magnitude may indicate an ill-conditioned application.
Factors that Affect the Results of Floating-Point Computations Floating-Point Coding Practices that Affect Application Results 96 Chapter 3
4 HP-UX Math Libraries on HP 9000 Systems 97
HP-UX Math Libraries on HP 9000 Systems This chapter describes what libraries are and how to use them. It also provides detailed information about the math libraries on HP 9000 systems. It covers the following topics: • HP-UX library basics • Math library basics, including math library error handling • HP-UX math library contents • Calling C library functions from Fortran For basic information on HP 9000 math libraries, see “Overview of HP-UX Math Libraries” on page 24.
HP-UX Math Libraries on HP 9000 Systems HP-UX Library Basics HP-UX Library Basics A library is a collection of commonly used functions, precompiled in object format and ready to be linked to an application. Because different programming languages have different calling conventions, there are separate libraries for various languages. On HP-UX systems, the C and C++ languages use one set of libraries, while the Fortran and Pascal languages use another.
HP-UX Math Libraries on HP 9000 Systems HP-UX Library Basics For detailed information about archive libraries and shared libraries, see the HP-UX Linker and Libraries Online User Guide.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics Math Library Basics Math libraries in most computer systems, including HP-UX, are collections of frequently used mathematical functions. The functions take one or more arguments and return one or more results. When an application source file contains a use of a math function, the compiler automatically generates a call to the appropriate routine name in the appropriate math library.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics The following section, which describes what happens when a program calls a math library function, provides some answers to questions 1 through 4. Anatomy of a Math Library Function Call Figure 4-1 shows a generalized flowchart of a math library function call, applicable to all languages and standards.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics Step A The compiler-generated call to a math library function includes code that • Converts the argument to the required format, if necessary • Places the argument in the appropriate place (that is, where the function expects it) • Calls the function Step B The function determines whether the argument is valid. All functions check for NaN arguments and make additional checks specific to the function.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics A math library call that encounters an illegal argument does some or all of the following, depending on which programming language you are using, whether traps are enabled, and which standards are being enforced: 1. Supplies a system-defined default result: NaN for invalid operations, a huge value for overflows, and so on 2. Sets the globally accessible error code variable errno 3. Sets some state in the hardware floating-point status register 4.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics Figure 4-2 C Math Library Error Handling for the acos Function If a C library function such as acos encounters an invalid argument, it ordinarily returns a default result that indicates a failure—a result that the function could not ordinarily return. The default result is usually a NaN or HUGE_VAL, depending on the function and the argument value.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics NOTE The C9X draft standard, the proposed new C standard, does not require math functions to set errno. You will need to test for errno only if your program must conform strictly to C89 or XPG4.2. Handling Invalid Operation Exceptions (C and Fortran) The exception condition that indicates an invalid argument to a math library function is the invalid operation condition. (See “Exception Conditions” on page 53 for more information.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics The most generally useful method of detecting errors is for the application to check for an anomalous result and then to take appropriate action. What happens after the error depends on the following factors: • Whether an exception trap for the error is enabled • Whether the program contains an ON EXTERNAL ERROR statement (HP FORTRAN/9000 only) We discuss the possible sequences of events in the following sections.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics Figure 4-4 Fortran 77 Math Library Error Handling The ON EXTERNAL ERROR Statement (HP FORTRAN/9000 only) NOTE HP Fortran 90 does not support the ON EXTERNAL ERROR statement, although it does support the ON statement for other kinds of error handling. If you use +T to enable a trap for invalid operations, what happens next depends on whether your program contains an ON EXTERNAL ERROR statement.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics If your program contains an ON statement, you must compile with the +T option in order to enable trap handling. If your program contains an ON statement and you do not specify +T, you get a compile-time warning. If your program does contain an ON statement, what happens depends on the action you specify in the statement. You can specify any of the following: • ABORT. If you specify ABORT, the program exits. • IGNORE.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics Sample Program: liberr77.
HP-UX Math Libraries on HP 9000 Systems Math Library Basics If you compile this program with the ON statements commented out and without the +T option, no traps are enabled. Therefore, the math library error sets the appropriate exception flag. The output of the program is as follows: $ f77 liberr77.f -lm liberr77.f: MAIN liberr: mysub: $ a.out NaN 88000000 invalid operation occurred 00000000 The exception flag value indicates that both an invalid operation and an inexact result condition were generated.
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries Contents of the HP-UX Math Libraries This section describes in some detail the contents of the math libraries.
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries acos asin atan atan2 cos exp log log10 pow sin tan (single-precision also) (single-precision also) (single-precision also) (single-precision also) Millicode versions exist for the following Pascal functions: arctan cos exp ln sin To get the millicode versions of any of these functions, compile your program with • Any optimization level (0 through 4) • The +Olibcalls or the +Oaggressive optimization option With the f90 and f77 co
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries The C Math Library (libm) The C math library, libm, provides all the math functions specified by the ANSI C standard.
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries • Compile with the basic ANSI option -Aa and use the -D option to define the macro _HPUX_SOURCE on the command line. If this macro is defined, all the function declarations are visible. (If you compile with -Ae or -Ac, this macro is defined automatically.) cc -Aa -D_HPUX_SOURCE program_name.c -lm ANSI mode is the default for the HP C++ compilers (CC and aCC). For a complete list of the contents of libm, see Appendix A.
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries promote the argument to double, and the function call either generates a linker error or produces incorrect results. To compile in ANSI mode, use a command line like the following: cc program_name.c -lm aCC program_name.C -lm Degree-Valued Trigonometric Functions The Fortran math library defines a set of trigonometric functions whose arguments or results are specified in degrees rather than radians.
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries The list of classes is exhaustive: all IEEE floating-point values fall into one of these classes. The fpclassify macro never causes an exception, regardless of the operand. It is useful, therefore, for classifying an operand without risking an exception trap. Use the signbit macro to determine whether a value is negative or positive. To use the fpclassify macro, compile your program in any mode except strict ANSI mode (-Aa).
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries Other macros that test the class of a floating-point value are • isinf, which tests whether a value is an infinity • isnan, which tests whether a value is a NaN • isfinite, which tests whether a value is neither infinity nor NaN • isnormal, which tests whether a value is normalized See the online man pages for information about these macros.
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries rint(x) Rounds x to integer-valued double-precision number, in the direction of the current rounding mode scalb(x, y) Returns x*(2**y) To use these functions, compile your program in any mode except strict ANSI mode (-Aa). Use either extended ANSI mode (-Ae, the default), non-ANSI mode (-Ac), or -Aa -D_HPUX_SOURCE.
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries islessgreater(x, y) (Macro) Returns the value of (x) < (y) || (x) > (y) isunordered(x, y) (Macro) Returns 1 if its arguments are unordered llrint(x) Returns the nearest long long value, rounded according to the current rounding direction llround(x) Returns the nearest long long value, rounding halfway cases away from zero, regardless of the current rounding direction lrint(x) Returns the nearest long value, rounded accordi
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries scalbn(x, n) Returns x*(2**n), where n is an integer trunc(x) Returns the integral value, in floating-point format, nearest to but no larger in magnitude than the argument The BLAS Library (libblas) The Basic Linear Algebra Subroutine (BLAS) library routines perform low-level vector and matrix operations. They have been tuned for maximum performance.
HP-UX Math Libraries on HP 9000 Systems Contents of the HP-UX Math Libraries NOTE This library is obsolete. It was formerly used by the FORTRAN Optimizing Preprocessor (FTNOPP). The performance benefits provided by FTNOPP are now supplied by the compiler when you use the +Ovectorize option with either C or Fortran programs at optimization level 3 or above. (See “Optimizing Your Program” on page 171 for details.) The vector library is provided for compatibility reasons only, in /opt/fortran/old/lib/libvec.
HP-UX Math Libraries on HP 9000 Systems Calling C Library Functions from Fortran Calling C Library Functions from Fortran To call a C math library function from a Fortran program, you must do the following: 1. Use an !$HP$ ALIAS directive (Fortran 90) or an $ALIAS (FORTRAN 77) directive to tell the compiler that the function’s arguments are passed by value. 2. Declare the function with the correct return value. See the online reference pages, Appendix A, or /usr/include/math.h to find the return value. 3.
HP-UX Math Libraries on HP 9000 Systems Calling C Library Functions from Fortran You can compile and run the program as follows: $ f90 bessel.f -lm bessel.f program BESSEL 16 Lines Compiled $ ./a.out Bessel of 1.0 is .
5 Manipulating the Floating-Point Status Register 125
Manipulating the Floating-Point Status Register The floating-point status register (also known as the floating-point control register) stores information about several aspects of the floating-point environment: • The rounding mode • What traps are enabled—that is, what exceptions your program can catch • If traps are not enabled, what exceptions have occurred • Whether flush-to-zero underflow mode is set (for systems that have this capability) • The model and revision of the system’s floating-point unit (F
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Run-Time Mode Control: The fenv(5) Suite This section describes the fenv(5) suite of functions, a collection of services provided in the C math library that allow an application to manipulate several modifiable control mode and status flags in the floating-point status register. These functions and their associated parameter types are declared in the header file /usr/include/fenv.h.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite NOTE Be careful if you use these functions at higher optimization levels (2 and above). Optimization may change the order of operations in a program, so that a call to one of these functions may be placed after an operation you want the function to affect, or before an operation whose result you want the function to check. These functions will then produce unexpected results.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Table 5-1 IEEE Exception Bits Bit Name Table 5-2 Description V Invalid operation Z Division by zero O Overflow U Underflow I Inexact result Enables Exception trap enable bits. An enable bit is associated with each IEEE exception. When an enable bit equals 1, the corresponding trap is enabled.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite The fields not manipulated by the fenv(5) functions are as follows: C The Compare bit. Res Reserved for future use. Model, Revision T The Model and Revision fields contain values that correspond to various implementations of HP 9000 floating-point coprocessors. The Delayed Trap bit.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Sample Program: fe_round.c /*************************************************************/ #include #include int main(void) { int save_rnd, rnd; double x, y, z; save_rnd = fegetround(); if (save_rnd == FE_TONEAREST) printf("rounding direction is FE_TONEAREST\n"); else printf("unexpected rounding direction\n"); x = 1.79e308; y = 2.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Sample Program: fe_round.c (cont.) /*************************************************************/ x = -1.79e308; y = 2.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite The C library supplies a group of functions, all included in the C9X draft standard, to manipulate the exception flags. The library also supplies two functions, not included in the C9X draft standard and specific to HP, to manipulate the exception trap enable bits. The following subsections discuss these groups of functions. The functions use the following exception macros, defined in fenv.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite The fesetexceptflag function sets the status for the exception flags indicated by the argument excepts according to the representation in the object pointed to by flagp. Use it to reset the exception flags to a previously saved state. The fetestexcept function determines which of a specified subset of the exception flags are currently set.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Sample Program: fe_flags.c (cont.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite If you run this program, it generates the following output: $ cc fe_flags.c -lm $ ./a.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Sample Program: fe_traps.c /*************************************************************/ #include #include #include
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Sample Program: fe_traps.c (cont.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Manipulating the Floating-Point Environment: fegetenv, fesetenv, feupdateenv, feholdexcept The fenv(5) suite includes a group of functions that allow you to manage the floating-point environment as a whole. The declarations for these functions are as follows.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite The following program shows the use of all these functions except feupdateenv. Sample Program: fe_env.c /*************************************************************/ #include #include
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Sample Program: fe_env.c (cont.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite If you compile and run this program, the call to fesetenv restores the environment without raising the inexact exception, so the inexact trap is not taken even though it is set. $ fe_env at start, env is 000b0800 Enter x and y: 1.0e308 1.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Sample Program: fe_update.c /*************************************************************/ #include #include #include #define ARRLEN 11 static double argarr[ARRLEN] = { -1.73205, -0.57735, 0.0174551, 0.57735, 1.0, 1.73205, 2.0, -1.22465e-244, -2.44929e-207, -1.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Sample Program: fe_update.c (cont.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite If you compile and run this program, it produces results like the following: $ cc fe_update.c -lm $ ./a.out at start, env is 00000000 Enter x and y: 3 0 x and y are 3 and 0 division by zero occurred result is inf again? (y or n) n setting traps for overflow and underflow after calculations, env is 44200006 saving environment inexact result occurred underflow occurred sum of squares is 12.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite NOTE Flush-to-zero mode is supported on all HP 9000 systems except those with chip levels of PA7000 and PA7100LC. You can look up the chip level of your system in /opt/langtools/lib/sched.models. See “Determining Your System’s Architecture Type” on page 26 for more information. Use caution in changing the underflow mode when you call math library routines.
Manipulating the Floating-Point Status Register Run-Time Mode Control: The fenv(5) Suite Sample Program: fe_flush.c /*************************************************************/ #include #include typedef union { double y; struct { unsigned int ym, yl; } i; } DBL_INT; int main(void) { DBL_INT dix, diy, diz; int fm, fm_saved; fm_saved = fegetflushtozero(); printf("underflow mode is %d\n", fm_saved); dix.y = -4.94066e-324; printf("denormalized value is %g [%08x%08x]\n", dix.y, dix.i.
Manipulating the Floating-Point Status Register Command-Line Mode Control: The +FP Compiler Option Command-Line Mode Control: The +FP Compiler Option The compiler and linker option +FP allows you to specify what traps to enable for your program and can also enable or disable flush-to-zero mode. This option is available with the HP Fortran, C, and Pascal compilers.
Manipulating the Floating-Point Status Register Command-Line Mode Control: The +FP Compiler Option Table 5-3 +FP Option Arguments Value Behavior V Enable traps on invalid floating-point operations. v Disable traps on invalid floating-point operations. Z Enable traps on divide by zero. z Disable traps on divide by zero. O Enable traps on floating-point overflow. o Disable traps on floating-point overflow. U Enable traps on floating-point underflow.
Manipulating the Floating-Point Status Register Command-Line Mode Control: The +FP Compiler Option 150 Chapter 5
6 Floating-Point Trap Handling By default, trapping on floating-point exceptions is disabled on HP 9000 systems, in accordance with the IEEE standard. If you want your program to continue through floating-point exceptions without trapping, it will do so automatically.
Floating-Point Trap Handling Floating-point trap handling requires two main steps: 1. Setting the exception trap enable bits in the floating-point status register 2. Defining the action to be taken when the trap occurs Both steps vary somewhat from language to language on HP-UX systems. In this section we suggest some methods of enabling and handling traps in Fortran and C. In C, you can also use the exception flags in the floating-point status register to detect exceptions without taking a trap.
Floating-Point Trap Handling Enabling Traps Enabling Traps When you enable a trap without providing a trap handler (a mechanism for handling it), the trap causes a SIGFPE signal. The signal, in turn, causes your program to abort with an error message that is more or less informative, depending on the method you use to enable the trap. If you want your program to abort when it encounters a trap, then enabling the trap may be all you want to do.
Floating-Point Trap Handling Enabling Traps If you compile this program with traps disabled (the default), it produces the following output: 1.7900+308 divided by 2.2000-308 = +INF If you compile it with the +FP option, however, you get a core dump. (The O flag of the +FP option enables traps for overflow exceptions.) $ f90 +FPO overflow.f overflow.f program OVERFLOW 11 Lines Compiled $ ./a.
Floating-Point Trap Handling Enabling Traps Sample Program: overflow_trap.f PROGRAM OVERFLOW_TRAP C F90: !$HP$ ALIAS FESETTRAPENABLE = 'fesettrapenable' (%val) C F77: C $ALIAS FESETTRAPENABLE = 'fesettrapenable' (%val) PARAMETER (FE_OVERFLOW = Z'20000000') EXTERNAL FESETTRAPENABLE DOUBLE PRECISION X, Y, Z CALL FESETTRAPENABLE(FE_OVERFLOW) X = 1.79D308 Y = 2.
Floating-Point Trap Handling Enabling Traps For example, if you use +fp_exception or +T to compile the program in “Using the +FP Compiler Option” on page 153, it produces a result like the following: $ f77 +T overflow.f overflow.f: MAIN overflow: $ ./a.out PROGRAM ABORTED : IEEE overflow PROCEDURE TRACEBACK: ( 0) 0x00003ad4 _start + 0x6c [./a.out] The effect is similar with +fp_exception: $ f90 +fp_exception overflow.f overflow.f program OVERFLOW 11 Lines Compiled $ a.
Floating-Point Trap Handling Handling Traps Handling Traps Once you have enabled traps, either by a compiler option or by a call to a routine, you need a mechanism for handling them when they occur. It may be convenient simply to have your program abort (particularly if you enable traps with a method that does not cause a core dump). You may, however, prefer to establish an error-handling routine that generates a helpful error message and exits the program gracefully.
Floating-Point Trap Handling Handling Traps The ON statement allows you to specify a particular action to be taken when a particular exception arises. The action may be any of the following: • ABORT (the default action). Specifying ABORT allows you to get the address where the error occurred, which may be useful in debugging. • IGNORE (usually not a good idea). • CALL sub (call a subroutine).
Floating-Point Trap Handling Handling Traps Sample Program: overflow_on.f PROGRAM OVERFLOW_ON DOUBLE PRECISION X, Y, Z ON DOUBLE PRECISION OVERFLOW CALL HANDLE_OFL X = 1.79D308 Y = 2.
Floating-Point Trap Handling Handling Traps a value of 0 for the result of an operation that underflows may be exactly what you want to do. In fact, the system may perform this substitution for you; the IEEE standard specifies that 0 may be the result of an operation that underflows, and on HP 9000 systems it often is (when the result of the operation is less than the smallest denormalized value). If you want to guarantee a result of 0, you can call a handler as follows: Sample Program: underflow_on.
Floating-Point Trap Handling Handling Traps The ON statement is documented fully in the HP Fortran 90 Programmer’s Reference and the HP FORTRAN/9000 Programmer’s Guide. Using the sigaction(2) Function (C only) For C programs, the standard method of handling errors is to use the sigaction(2) function. The function establishes the address of a signal-handling function that is called whenever the specified HP-UX signal is raised.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c /*************************************************************/ #include #include #include #include
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.
Floating-Point Trap Handling Handling Traps Sample Program: overflow_sig.c (Cont.