User Guide

150 128-Bit Media and Scientific Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
Figure 4-28. PMADDWD Multiply-Add Operation
PMADDWD can be used with one source operand (for example, a coefficient) taken from memory and
the other source operand (for example, the data to be multiplied by that coefficient) taken from an
XMM register. The instruction can also be used together with the PADDD instruction (page 146) to
compute dot products. Scaling can be done, before or after the multiply, using a vector-shift instruction
(page 152).
If all four of the 16-bit source operands used to produce a 32-bit multiply-add result have the value
8000h, the result is represented as 8000_0000h, because the maximum negative 16-bit value of 8000h
multiplied by itself equals 4000_0000h, and 4000_0000h added to 4000_0000h equals 8000_0000h.
The result of multiplying two negative numbers should be a positive number, but 8000_0000h is the
maximum possible 32-bit negative number rather than a positive number.
Average.
PAVGB—Packed Average Unsigned Bytes
PAVGW—Packed Average Unsigned Words
The PAVGx instructions compute the rounded average of each unsigned 8-bit (PAVGB) or 16-bit
(PAVGW) integer value in the first operand and the corresponding, same-sized unsigned integer in the
second operand and write the result in the corresponding, same-sized element of the destination. The
rounded average is computed by adding each pair of operands, adding 1 to the temporary sum, and
then right-shifting the temporary sum by one bit-position. For vectors of n number of elements, the
operation is:
513-154.eps
operand 1
result
127 0
operand 2
127 0
127 0
intermediate result
255 0
*
* *
*
. . . .
+
+
+
+