User Guide

148 128-Bit Media and Scientific Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
The PMULHW instruction multiplies each 16-bit signed integer value in the first operand by the
corresponding 16-bit integer in the second operand, producing a 32-bit intermediate result. The
instruction then writes the high-order 16 bits of the 32-bit intermediate result of each multiplication to
the corresponding word of the destination. The PMULLW instruction performs the same
multiplication as PMULHW but writes the low-order 16 bits of the 32-bit intermediate result to the
corresponding word of the destination.
Figure 4-26 shows the PMULHW and PMULLW operations. The difference between the two is
whether the high or low half of each intermediate-element result is copied to the destination result.
Figure 4-26. PMULxW Multiply Operation
The PMULHUW instruction performs the same multiplication as PMULHW but on unsigned
operands. Without this instruction, it is difficult to perform unsigned integer multiplies using 128-bit
media instructions. The instruction is useful in 3D rasterization, which operates on unsigned pixel
values.
The PMULUDQ instruction, unlike the other PMULx instructions, preserves the full precision of
results by multiplying only half of the source-vector elements. It multiplies the 32-bit unsigned integer
values in the first (low-order) and third doublewords of the source operands, writes the full 64-bit
result of the low-order multiply to the low-order doubleword of the destination, and writes a
corresponding result of the high-order multiply to the high-order doubleword of the destination.
Figure 4-27 on page 149 shows a PMULUDQ operation.
513-152.eps
operand 1
result
127 0
operand 2
127 0
127 0
intermediate result
255 0
*
* *
*
. . . .
. . . .