HP-UX Floating-Point Guide
36 Chapter 2
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic
Floating-Point Formats
Floating-Point Formats
The IEEE standard specifies four formats for representing floating-point
values:
• Single-precision
• Double-precision (optional, though a double type wider than IEEE
single-precision is required by standard C)
• Single-extended precision (optional)
• Double-extended precision (optional)
The IEEE standard does not require an implementation to support
single-extended precision and double-extended precision in order to be
standard-conforming.
HP 9000 systems fully support the single-precision and
double-precision formats. They also support quadruple-precision or
quad-precision format, which is similar to the double-extended
precision format.
Single-Precision, Double-Precision, and
Quad-Precision Formats
Single-precision, double-precision, and quad-precision values consist of
three fields: sign bit, exponent, and fraction. The sign bit reflects the
algebraic sign of the value. A 1 indicates a negative value; a 0 indicates a
positive value. The exponent represents an integer value that is a
power to which 2 is raised. The fraction, also called the significand,
represents a value between 1.0 and 2.0 (for normalized values). The
result of the exponent expression is multiplied by the fraction to yield the
actual numerical value.
The only difference among the single-precision, double-precision, and
quad-precision formats is the number of bits allocated for the exponent
and fraction. Figure 2-1, Figure 2-2, and Figure 2-3 show the number of
bits allocated in each format.
The single-precision format is 32 bits long: 1 bit for the sign, 8 bits for the
exponent, and 23 bits for the fraction.