HP-UX Floating-Point Guide

92 Chapter 3

Factors that Affect the Results of Floating-Point Computations

Floating-Point Coding Practices that Affect Application Results

Both of these values have only 16 bits of signiﬁcance. The ﬁnal result,

9.9178697E−21, is a reasonable-looking normalized number. However,

because it is produced by a calculation that once lost all but 16 bits of

signiﬁcance, it can have at most 16 bits of signiﬁcance itself. In fact, it

actually has considerably less.

You can ﬁnd out whether your application has underﬂowed by using the

fetestexcept function. Alternatively, you can use the

fesettrapenable function or the +FP compiler option to run the

application with the underﬂow exception trap enabled. See “Exception

Bits” on page 132 and “Command-Line Mode Control: The +FP Compiler

Option” on page 148 for more information.

If you have problems due to underﬂow in a single-precision application,

you can solve the problem by changing to double-precision. If you have

problems due to underﬂow in a double-precision application, you could

migrate it to quad-precision, but at a considerable loss of efﬁciency; you

may want to restructure your application instead so as to avoid

underﬂows, if possible.

Truncation to an Integer Value

The ﬂoor of a value is the greatest whole number less than the value.

The ceiling of a value is the smallest whole number greater than the

value.

Rounding, precision mode problems, and compiler optimizations can all

contribute to inaccurate results, but under most circumstances the

inaccuracy is very small and not noticeable. However, some operations

can magnify the inaccuracy of a calculation to the point where the result

is meaningless. This can occur in algorithms that truncate a

ﬂoating-point value to make it an integer. Truncating a positive

ﬂoating-point value, for instance, reduces its magnitude to the ﬂoor

integer, regardless of how close to the ceiling value it may be. An

expression may yield 1.999999 on one system and 2.00001 on another.

Both results may be acceptable in terms of expected rounding errors.

However, if the result is truncated to an integer, these two systems will

produce 1 and 2, respectively, which can be an unacceptable difference.

The following program provides a simple example of this situation.