HP-UX Floating-Point Guide

36 Chapter 2

Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic

Floating-Point Formats

The IEEE standard speciﬁes four formats for representing ﬂoating-point

values:

• Single-precision

• Double-precision (optional, though a double type wider than IEEE

single-precision is required by standard C)

• Single-extended precision (optional)

• Double-extended precision (optional)

The IEEE standard does not require an implementation to support

single-extended precision and double-extended precision in order to be

standard-conforming.

HP 9000 systems fully support the single-precision and

double-precision formats. They also support quadruple-precision or

quad-precision format, which is similar to the double-extended

precision format.

Single-Precision, Double-Precision, and

Quad-Precision Formats

Single-precision, double-precision, and quad-precision values consist of

three ﬁelds: sign bit, exponent, and fraction. The sign bit reﬂects the

algebraic sign of the value. A 1 indicates a negative value; a 0 indicates a

positive value. The exponent represents an integer value that is a

power to which 2 is raised. The fraction, also called the signiﬁcand,

represents a value between 1.0 and 2.0 (for normalized values). The

result of the exponent expression is multiplied by the fraction to yield the

actual numerical value.

The only difference among the single-precision, double-precision, and

quad-precision formats is the number of bits allocated for the exponent

and fraction. Figure 2-1, Figure 2-2, and Figure 2-3 show the number of

bits allocated in each format.

The single-precision format is 32 bits long: 1 bit for the sign, 8 bits for the

exponent, and 23 bits for the fraction.