User Guide
Overview of the AMD64 Architecture 5
24592—Rev. 3.15—November 2009 AMD64 Technology
The 128-bit and 64-bit media instructions are designed to accelerate these applications. The
instructions use a form of vector (or packed) parallel processing known as single-instruction, multiple
data (SIMD) processing. This vector technology has the following characteristics:
• A single register can hold multiple independent pieces of data. For example, a single 128-bit XMM
register can hold 16 8-bit integer data elements, or four 32-bit single-precision floating-point data
elements.
• The vector instructions can operate on all data elements in a register, independently and
simultaneously. For example, a PADDB instruction operating on byte elements of two vector
operands in 128-bit XMM registers performs 16 simultaneous additions and returns 16
independent results in a single operation.
128-bit and 64-bit media instructions take SIMD vector technology a step further by including special
instructions that perform operations commonly found in media applications. For example, a graphics
application that adds the brightness values of two pixels must prevent the add operation from wrapping
around to a small value if the result overflows the destination register, because an overflow result can
produce unexpected effects such as a dark pixel where a bright one is expected. The 128-bit and 64-bit
media instructions include saturating-arithmetic instructions to simplify this type of operation. A
result that otherwise would wrap around due to overflow or underflow is instead forced to saturate at
the largest or smallest value that can be represented in the destination register.
1.1.5 Floating-Point Instructions
The AMD64 architecture provides three floating-point instruction subsets, using three distinct register
sets:
• 128-Bit Media Instructions support 32-bit single-precision and 64-bit double-precision floating-
point operations, in addition to integer operations. Operations on both vector data and scalar data
are supported, with a dedicated floating-point exception-reporting mechanism. These floating-
point operations comply with the IEEE-754 standard.
• 64-Bit Media Instructions (the subset of 3DNow! technology instructions) support single-
precision floating-point operations. Operations on both vector data and scalar data are supported,
but these instructions do not support floating-point exception reporting.
• x87 Floating-Point Instructions support single-precision, double-precision, and 80-bit extended-
precision floating-point operations. Only scalar data are supported, with a dedicated floating-point
exception-reporting mechanism. The x87 floating-point instructions contain special instructions
for performing trigonometric and logarithmic transcendental operations. The single-precision and
double-precision floating-point operations comply with the IEEE-754 standard.
Maximum floating-point performance can be achieved using the 128-bit media instructions. One of
these vector instructions can support up to four single-precision (or two double-precision) operations
in parallel. In 64-bit mode, the AMD64 architecture doubles the number of legacy XMM registers
from 8 to 16.
Applications gain additional benefits using the 64-bit media and x87 instructions. The separate register
sets supported by these instructions relieve pressure on the XMM registers available to the 128-bit