User Guide

128-Bit Media and Scientific Programming 143
24592—Rev. 3.15—November 2009 AMD64 Technology
PUNPCKLQDQ may be 64 bits, but the width of the memory access of the memory-operand forms of
PUNPCKHBW, PUNPCKHWD, PUNPCKHDQ, and PUNPCKHQDQ may be 128 bits. Thus, the
alignment constraints for PUNPCKLx instructions may be less restrictive than the alignment
constraints for PUNPCKHx instructions. For details, see the documentation for particular hardware
implementations of the architecture.
Another advantage of using PUNPCKLx rather than PUNPCKHx—also depending on the hardware
implementation—is that it may help avoid potential size mismatches if a particular hardware
implementation uses load-to-store forwarding. In such cases, store data from either a quadword store
or the lower quadword of a double-quadword store could be forwarded to PUNPCKLx instructions,
but only store data from a double-quadword store could be forwarded to PUNPCKHx instructions.
The PUNPCKx instructions—along with the MOVx instructions—are often among the most
frequently used instructions in 128-bit media integer and floating-point procedures.
Extract and Insert. These instructions copy a word element from a vector, in a manner specified by
an immediate operand.
EXTRQ—Extract Field from Register
INSERTQ—Insert Field
PEXTRW—Packed Extract Word
PINSRW—Packed Insert Word
The EXTRQ instruction extracts specified bits from the lower 64 bits of the destination XMM register.
The extracted bits are saved in the least-significant bit positions of the destination and the remaining
bits in the lower 64 bits of the destination register are cleared to 0. The upper 64 bits of the destination
register are undefined.
The INSERTQ instruction inserts a specified number of bits from the lower 64 bits of the source
operand into a specified bit position of the lower 64 bits of the destination operand. No other bits in the
lower 64 bits of the destination are modified. The upper 64 bits of the destination are undefined.
The PEXTRW instruction extracts a 16-bit value from an XMM register, as selected by the immediate-
byte operand, and writes it to the low-order word of a 32-bit or 64-bit general-purpose register, with
zero-extension to 32 or 64 bits. PEXTRW is useful for loading computed values, such as table-lookup
indices, into general-purpose registers where the values can be used for addressing tables in memory.
The PINSRW instruction inserts a 16-bit value from the low-order word of a general-purpose register
or from a 16-bit memory location into an XMM register. The location in the destination register is
selected by the immediate-byte operand. The other words in the destination register operand are not
modified. Figure 4-22 on page 144 shows the operation.