User Guide

128-Bit Media and Scientific Programming 141
24592—Rev. 3.15—November 2009 AMD64 Technology
Figure 4-20 shows an example of a PACKSSDW instruction. The operation merges vector elements of
2x size into vector elements of 1x size, thus reducing the precision of the vector-element data types.
Any results that would otherwise overflow or underflow are saturated (clamped) at the maximum or
minimum representable value, respectively, as described in “Saturation” on page 125.
Figure 4-20. PACKSSDW Pack Operation
Conversion from higher-to-lower precision is often needed, for example, by multiplication operations
in which the higher-precision format is used for source operands in order to prevent possible overflow,
and the lower-precision format is the desired format for the next operation.
Unpack and Interleave. These instructions interleave vector elements from the high or low halves of
two integer source operands. They can be used to double the precision of operands.
PUNPCKHBW—Unpack and Interleave High Bytes
PUNPCKHWD—Unpack and Interleave High Words
PUNPCKHDQ—Unpack and Interleave High Doublewords
PUNPCKHQDQ—Unpack and Interleave High Quadwords
PUNPCKLBW—Unpack and Interleave Low Bytes
PUNPCKLWD—Unpack and Interleave Low Words
PUNPCKLDQ—Unpack and Interleave Low Doublewords
PUNPCKLQDQ—Unpack and Interleave Low Quadwords
The PUNPCKHBW instruction copies the eight high-order bytes from its two source operands (an
XMM register, and another XMM register or 128-bit memory location) and interleaves them into the
128-bit destination operand (an XMM register). The bytes in the low-order half of the source operands
are ignored. The PUNPCKHWD, PUNPCKHDQ, and PUNPCKHQDQ instructions perform
analogous operations for words, doublewords, and quadwords in the source operands, packing them
513-150.eps
operand 1
result
127 0
127 0
operand 2
127 0