User Guide

128-Bit Media and Scientific Programming 127
24592—Rev. 3.15—November 2009 AMD64 Technology
Single-Precision Format—This format includes a 1-bit sign, an 8-bit biased exponent whose value
is 127, and a 23-bit significand. The integer bit is implied, making a total of 24 bits in the
significand.
Double-Precision Format—This format includes a 1-bit sign, an 11-bit biased exponent whose
value is 1023, and a 52-bit significand. The integer bit is implied, making a total of 53 bits in the
significand.
Table 4-3 on page 127 shows the range of finite values representable by the two floating-point data
types.
For example, in the single-precision format, the largest normal number representable has an exponent
of FEh and a significand of 7FFFFFh, with a numerical value of 2
127
*(2–2
–23
). Results that overflow
above the maximum representable value return either the maximum representable normalized number
(see “Normalized Numbers” on page 128) or infinity, with the sign of the true result, depending on the
rounding mode specified in the rounding control (RC) field of the MXCSR register. Results that
underflow below the minimum representable value return either the minimum representable
normalized number or a denormalized number (see “Denormalized (Tiny) Numbers” on page 128),
with the sign of the true result, or a result determined by the SIMD floating-point exception handler,
depending on the rounding mode and the underflow-exception mask (UM) in the MXCSR register (see
“Unmasked Responses” on page 187).
Compatibility with x87 Floating-Point Data Types. The results produced by 128-bit media
floating-point instructions comply fully with the IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE Std 754), because these instructions represent data in the single-precision or double-
precision data types throughout their operations. The x87 floating-point instructions, however, by
default perform operations in the double-extended-precision format. Because of this, x87 instructions
operating on the same source operands as 128-bit media floating-point instructions may return results
that are slightly different in their least-significant bits.
4.4.7 Floating-Point Number Representation
A 128-bit media floating-point value can be one of five types, as follows:
Normal
Denormal (Tiny)
Zero
Table 4-3. Range of Values in Normalized Floating-Point Data Types
Data Type
Range of Normalized
1
Values
Base 2 (exact) Base 10 (approximate)
Single Precision 2
–126
to 2
127
*(2–2
–23
) 1.17 * 10
–38
to +3.40 * 10
38
Double Precision 2
–1022
to 2
1023
*(2–2
–52
) 2.23 * 10
–308
to +1.79 * 10
308
Note:
1. See “Floating-Point Number Representation” on page 127 for a definition of “normalized”.