User Guide

108 128-Bit Media and Scientific Programming

AMD64 Technology 24592—Rev. 3.15—November 2009

instructions are often required to operate completely on the data. For example, software can change the

viewing perspective of a 3D scene through transformation matrices by using floating-point

instructions in the same procedure that contains integer operations on other aspects of the graphics

data.

It is typically much easier to write 128-bit media programs using floating-point instructions. Such

programs perform better than x87 floating-point programs, because the XMM register file is flat rather

than stack-oriented, there are twice as many registers (in 64-bit mode), and 128-bit media instructions

can operate on two or four times the number of floating-point operands as can x87 instructions. This

ability to operate in parallel on multiple pairs of floating-point elements often makes it possible to

remove local temporary variables that would otherwise be needed in x87 floating-point code.

4.2.4 Data Conversion and Reordering

There are instructions that support data conversion of vector elements, including conversions between

integer and floating-point data types—located in XMM registers, MMX™ registers, GPR registers, or

memory—and conversions of element-ordering or precision. For example, the unpack instructions

take two vector operands and interleave their low or high elements. Figure 4-3 shows an unpack and

interleave operation on word-sized elements (PUNCKLWD). If the left-hand source operand has

elements whose value is zero, the operation converts each element in the low half of the right-hand

operand to a data type of twice its original precision—useful, for example, in multiply operations in

which results may overflow or underflow.

Figure 4-3. Unpack and Interleave Operation

There are also pack instructions, such as PACKSSDW shown in Figure 4-4 on page 109, that convert

each element in a pair of vectors to lower precision by selecting the elements in the low half of each

vector. Vector-shift instructions are also supported. They can scale each element in a vector to higher

or lower values.

513-149.eps

operand 1

result

127 0

operand 2

127 0

. .. .