HP-UX Floating-Point Guide

Chapter 2 39

Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic

Floating-Point Formats

For example, if the 23 bits in the fraction ﬁeld of a single-precision

number are

011 0100 0000 0000 0000 0000

and the exponent ﬁeld is not all 1’s or all 0’s, the fraction value is

1.0 + 2

−2

+ 2

−3

+ 2

−5

= 1.0 + 0.25 + 0.125 + .03125 = 1.40625

The 1.0 represents the fraction implicit bit, and the exponents of −2, −3,

and −5 indicate that the second, third, and ﬁfth bits of the fraction ﬁeld

are set.

The Exponent Field

The exponent ﬁeld uses a biased representation. This means that the

value represented by the exponent ﬁeld is the value in the exponent ﬁeld

interpreted as an unsigned integer minus a constant value (the bias).

The purpose of the bias is to allow all exponent calculations to be

performed using unsigned arithmetic. For single-precision formats, the

bias is 127; for double-precision formats, it is 1023; for quad-precision

formats, it is 16383.

Floating-Point Format: Examples

The value 6.0 would be represented in single-precision format as shown

in Figure 2-4.

Figure 2-4 IEEE Single-Precision Format: Example

The ﬁrst bit is the sign bit. Because the sign bit is 0, the ﬂoating-point

value is positive. The next eight bits make up the exponent. 1000 0001

equals 129, but the true value of the exponent is derived by subtracting

the bias constant 127 from this value. So the true exponent value is 2.

The fraction bits are 100 0000 0000 0000 0000 0000, which, when added

to the implicit bit, equal 1 + 0.5, or 1.5.

In algebraic terms, a ﬂoating-point value is

(-1.0)

*M*2

E−B

where S is the value of the sign bit, M is the fraction (with implicit bit), E

is the exponent, and B is the bias.