User Guide
194 64-Bit Media Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
The MMX and 3DNow! instructions introduce no additional registers, status bits, or other processor
state to the legacy x86 architecture. Instead, they use the x87 floating-point registers that have long
been a part of most x86 architectures. Because of this, 64-bit media procedures require no special
operating-system support or exception handlers. When state-saves are required between procedures,
the same instructions that system software uses to save and restore x87 floating-point state also save
and restore the 64-bit media-programming state.
AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their
more efficient 128-bit media counterparts. Relevant recommendations are provided below and in the
AMD64 Programmer’s Manual Volume 4: 64-Bit Media and x87 Floating-Point Instructions.
5.3 Capabilities
The 64-bit media instructions are designed to support multimedia and communication applications
that operate on vectors of small-sized data elements. For example, 8-bit and 16-bit integer data
elements are commonly used for pixel information in graphics applications, and 16-bit integer data
elements are used for audio sampling. The 64-bit media instructions allow multiple data elements like
these to be packed into single 64-bit vector operands located in an MMX register or in memory. The
instructions operate in parallel on each of the elements in these vectors. For example, 8-bit integer data
can be packed in vectors of eight elements in a single 64-bit register, so that a single instruction can
operated on all eight byte elements simultaneously.
Typical applications of the 64-bit media integer instructions include music synthesis, speech synthesis,
speech recognition, audio and video compression (encoding) and decompression (decoding), 2D and
3D graphics (including 3D texture mapping), and streaming video. Typical applications of the 64-bit
media floating-point instructions include digital signal processing (DSP) kernels and front-end 3D
graphics algorithms, such as geometry, clipping, and lighting.
These types of applications are referred to as media applications. Such applications commonly use
small data elements in repetitive loops, in which the typical operations are inherently parallel. In 256-
color video applications, for example, 8-bit operands in 64-bit MMX registers can be used to compute
transformations on eight pixels per instruction.
5.3.1 Parallel Operations
Most of the 64-bit media instructions perform parallel operations on vectors of operands. Vector
operations are also called packed or SIMD (single-instruction, multiple-data) operations. They take
operands consisting of multiple elements and operate on all elements in parallel. Figure 5-1 on
page 195 shows an example of an integer operation on two vectors, each containing 16-bit (word)
elements. There are also 64-bit media instructions that operate on vectors of byte or doubleword
elements.