User Guide
64-Bit Media Programming 219
24592—Rev. 3.15—November 2009 AMD64 Technology
For floating-point multiplication operations, see the PFMUL instruction on page 225. For floating-
point accumulation operations, see the PFACC, PFNACC, and PFPNACC instructions on page 226.
Average
• PAVGB—Packed Average Unsigned Bytes
• PAVGW—Packed Average Unsigned Words
• PAVGUSB—Packed Average Unsigned Packed Bytes
The PAVGx instructions compute the rounded average of each unsigned 8-bit (PAVGB) or 16-bit
(PAVGW) integer value in the first operand and the corresponding, same-sized unsigned integer in the
second operand. The instructions then write each average in the corresponding, same-sized element of
the destination. The rounded average is computed by adding each pair of operands, adding 1 to the
temporary sum, and then right-shifting the temporary sum by one bit.
The PAVGB instruction is useful for MPEG decoding, in which motion compensation performs many
byte-averaging operations between and within macroblocks. In addition to speeding up these
operations, PAVGB can free up registers and make it possible to unroll the averaging loops.
The PAVGUSB instruction (a 3DNow! instruction) performs a function identical to the PAVGB
instruction, described on page 219, although the two instructions have different opcodes.
Sum of Absolute Differences
• PSADBW—Packed Sum of Absolute Differences of Bytes into a Word
The PSADBW instruction computes the absolute values of the differences of corresponding 8-bit
signed integer values in the first and second operands. The instruction then sums the differences and
writes an unsigned 16-bit integer result in the low-order word of the destination. The remaining bytes
in the destination are cleared to all 0s.
Sums of absolute differences are used to compute the L1 norm in motion-estimation algorithms for
video compression.
5.6.7 Shift
The vector-shift instructions are useful for scaling vector elements to higher or lower precision,
packing and unpacking vector elements, and multiplying and dividing vector elements by powers of 2.
Left Logical Shift
• PSLLW—Packed Shift Left Logical Words
• PSLLD—Packed Shift Left Logical Doublewords
• PSLLQ—Packed Shift Left Logical Quadwords
The PSLLx instructions left-shift each of the 16-bit (PSLLW), 32-bit (PSLLD), or 64-bit (PSLLQ)
values in the first operand by the number of bits specified in the second operand. The instructions then
write each shifted value into the corresponding, same-sized element of the destination. The first and