HP-UX Floating-Point Guide
92 Chapter 3
Factors that Affect the Results of Floating-Point Computations
Floating-Point Coding Practices that Affect Application Results
Both of these values have only 16 bits of significance. The final result,
9.9178697E−21, is a reasonable-looking normalized number. However,
because it is produced by a calculation that once lost all but 16 bits of
significance, it can have at most 16 bits of significance itself. In fact, it
actually has considerably less.
You can find out whether your application has underflowed by using the
fetestexcept function. Alternatively, you can use the
fesettrapenable function or the +FP compiler option to run the
application with the underflow exception trap enabled. See “Exception
Bits” on page 132 and “Command-Line Mode Control: The +FP Compiler
Option” on page 148 for more information.
If you have problems due to underflow in a single-precision application,
you can solve the problem by changing to double-precision. If you have
problems due to underflow in a double-precision application, you could
migrate it to quad-precision, but at a considerable loss of efficiency; you
may want to restructure your application instead so as to avoid
underflows, if possible.
Truncation to an Integer Value
The floor of a value is the greatest whole number less than the value.
The ceiling of a value is the smallest whole number greater than the
value.
Rounding, precision mode problems, and compiler optimizations can all
contribute to inaccurate results, but under most circumstances the
inaccuracy is very small and not noticeable. However, some operations
can magnify the inaccuracy of a calculation to the point where the result
is meaningless. This can occur in algorithms that truncate a
floating-point value to make it an integer. Truncating a positive
floating-point value, for instance, reduces its magnitude to the floor
integer, regardless of how close to the ceiling value it may be. An
expression may yield 1.999999 on one system and 2.00001 on another.
Both results may be acceptable in terms of expected rounding errors.
However, if the result is truncated to an integer, these two systems will
produce 1 and 2, respectively, which can be an unacceptable difference.
The following program provides a simple example of this situation.