HP-UX Floating-Point Guide

Chapter 3 83
Factors that Affect the Results of Floating-Point Computations
Floating-Point Coding Practices that Affect Application Results
Floating-Point Coding Practices that
Affect Application Results
The most common types of floating-point “bugs” reported to
Hewlett-Packard are not bugs at all, but rather a class of programming
mistakes. These types of mistakes usually stem from one of the following
invalid assumptions:
It is invalid to assume that an arithmetic expression in a computer
language produces an exactly representable result and that all of the
digits in a floating-point value are always meaningful. The fact that
an application prints out 25 decimal digits of a result does not mean
that the value printed actually has 25 significant digits. Many of the
rightmost digits printed may be meaningless. Values may lose
precision in the course of computation, and you must be alert for the
kinds of operations that can cause precision loss.
The significance limitations of the system are immutable. Entering a
datum of 3.14159265358979323846 for pi is no better than entering
3.1415926535897932 (in double-precision). In fact, the former may be
worse, because it might beguile a programmer into thinking that the
system accepted all 21 digits, when in reality it accepted only 17 (9 in
single-precision).
It is invalid to assume that an arithmetic expression in a computer
language will abide by all algebraic rules, including the associative
and distributive laws. You cannot make this assumption unless you
have made a thorough analysis of the code to determine that
rounding errors and other sources of inaccuracy will not invalidate
the rules.
Sometimes the source of an erroneous result is related to a particular
optimization generated by the compiler. This kind of problem can be
particularly hard to solve, because it may disappear when you compile
the program with a debugging option in order to debug it. See “Compiler
Behavior and Compiler Version” on page 78 for a discussion of the effects
of compiler optimization on floating-point results.