User Guide

150 128-Bit Media and Scientific Programming

AMD64 Technology 24592—Rev. 3.15—November 2009

Figure 4-28. PMADDWD Multiply-Add Operation

PMADDWD can be used with one source operand (for example, a coefficient) taken from memory and

the other source operand (for example, the data to be multiplied by that coefficient) taken from an

XMM register. The instruction can also be used together with the PADDD instruction (page 146) to

compute dot products. Scaling can be done, before or after the multiply, using a vector-shift instruction

(page 152).

If all four of the 16-bit source operands used to produce a 32-bit multiply-add result have the value

8000h, the result is represented as 8000_0000h, because the maximum negative 16-bit value of 8000h

multiplied by itself equals 4000_0000h, and 4000_0000h added to 4000_0000h equals 8000_0000h.

The result of multiplying two negative numbers should be a positive number, but 8000_0000h is the

maximum possible 32-bit negative number rather than a positive number.

Average.

• PAVGB—Packed Average Unsigned Bytes

• PAVGW—Packed Average Unsigned Words

The PAVGx instructions compute the rounded average of each unsigned 8-bit (PAVGB) or 16-bit

(PAVGW) integer value in the first operand and the corresponding, same-sized unsigned integer in the

second operand and write the result in the corresponding, same-sized element of the destination. The

rounded average is computed by adding each pair of operands, adding 1 to the temporary sum, and

then right-shifting the temporary sum by one bit-position. For vectors of n number of elements, the

operation is:

513-154.eps

operand 1

result

127 0

operand 2

127 0

intermediate result

255 0

* *

. . . .