User Guide

128-Bit Media and Scientific Programming 141

24592—Rev. 3.15—November 2009 AMD64 Technology

Figure 4-20 shows an example of a PACKSSDW instruction. The operation merges vector elements of

2x size into vector elements of 1x size, thus reducing the precision of the vector-element data types.

Any results that would otherwise overflow or underflow are saturated (clamped) at the maximum or

minimum representable value, respectively, as described in “Saturation” on page 125.

Figure 4-20. PACKSSDW Pack Operation

Conversion from higher-to-lower precision is often needed, for example, by multiplication operations

in which the higher-precision format is used for source operands in order to prevent possible overflow,

and the lower-precision format is the desired format for the next operation.

Unpack and Interleave. These instructions interleave vector elements from the high or low halves of

two integer source operands. They can be used to double the precision of operands.

• PUNPCKHBW—Unpack and Interleave High Bytes

• PUNPCKHWD—Unpack and Interleave High Words

• PUNPCKHDQ—Unpack and Interleave High Doublewords

• PUNPCKHQDQ—Unpack and Interleave High Quadwords

• PUNPCKLBW—Unpack and Interleave Low Bytes

• PUNPCKLWD—Unpack and Interleave Low Words

• PUNPCKLDQ—Unpack and Interleave Low Doublewords

• PUNPCKLQDQ—Unpack and Interleave Low Quadwords

The PUNPCKHBW instruction copies the eight high-order bytes from its two source operands (an

XMM register, and another XMM register or 128-bit memory location) and interleaves them into the

128-bit destination operand (an XMM register). The bytes in the low-order half of the source operands

are ignored. The PUNPCKHWD, PUNPCKHDQ, and PUNPCKHQDQ instructions perform

analogous operations for words, doublewords, and quadwords in the source operands, packing them

513-150.eps

operand 1

result

127 0

operand 2

127 0