HP-UX Floating-Point Guide

Chapter 2 39
Floating-Point Principles and the IEEE Standard for Binary Floating-Point Arithmetic
Floating-Point Formats
For example, if the 23 bits in the fraction field of a single-precision
number are
011 0100 0000 0000 0000 0000
and the exponent field is not all 1’s or all 0’s, the fraction value is
1.0 + 2
−2
+ 2
−3
+ 2
5
= 1.0 + 0.25 + 0.125 + .03125 = 1.40625
The 1.0 represents the fraction implicit bit, and the exponents of 2, 3,
and 5 indicate that the second, third, and fifth bits of the fraction field
are set.
The Exponent Field
The exponent field uses a biased representation. This means that the
value represented by the exponent field is the value in the exponent field
interpreted as an unsigned integer minus a constant value (the bias).
The purpose of the bias is to allow all exponent calculations to be
performed using unsigned arithmetic. For single-precision formats, the
bias is 127; for double-precision formats, it is 1023; for quad-precision
formats, it is 16383.
Floating-Point Format: Examples
The value 6.0 would be represented in single-precision format as shown
in Figure 2-4.
Figure 2-4 IEEE Single-Precision Format: Example
The first bit is the sign bit. Because the sign bit is 0, the floating-point
value is positive. The next eight bits make up the exponent. 1000 0001
equals 129, but the true value of the exponent is derived by subtracting
the bias constant 127 from this value. So the true exponent value is 2.
The fraction bits are 100 0000 0000 0000 0000 0000, which, when added
to the implicit bit, equal 1 + 0.5, or 1.5.
In algebraic terms, a floating-point value is
(-1.0)
S
*M*2
EB
where S is the value of the sign bit, M is the fraction (with implicit bit), E
is the exponent, and B is the bias.