User Guide
108 128-Bit Media and Scientific Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
instructions are often required to operate completely on the data. For example, software can change the
viewing perspective of a 3D scene through transformation matrices by using floating-point
instructions in the same procedure that contains integer operations on other aspects of the graphics
data.
It is typically much easier to write 128-bit media programs using floating-point instructions. Such
programs perform better than x87 floating-point programs, because the XMM register file is flat rather
than stack-oriented, there are twice as many registers (in 64-bit mode), and 128-bit media instructions
can operate on two or four times the number of floating-point operands as can x87 instructions. This
ability to operate in parallel on multiple pairs of floating-point elements often makes it possible to
remove local temporary variables that would otherwise be needed in x87 floating-point code.
4.2.4 Data Conversion and Reordering
There are instructions that support data conversion of vector elements, including conversions between
integer and floating-point data types—located in XMM registers, MMX™ registers, GPR registers, or
memory—and conversions of element-ordering or precision. For example, the unpack instructions
take two vector operands and interleave their low or high elements. Figure 4-3 shows an unpack and
interleave operation on word-sized elements (PUNCKLWD). If the left-hand source operand has
elements whose value is zero, the operation converts each element in the low half of the right-hand
operand to a data type of twice its original precision—useful, for example, in multiply operations in
which results may overflow or underflow.
Figure 4-3. Unpack and Interleave Operation
There are also pack instructions, such as PACKSSDW shown in Figure 4-4 on page 109, that convert
each element in a pair of vectors to lower precision by selecting the elements in the low half of each
vector. Vector-shift instructions are also supported. They can scale each element in a vector to higher
or lower values.
513-149.eps
operand 1
result
127 0
operand 2
127 0
127 0
. .. .