User Guide

64-Bit Media Programming 197

24592—Rev. 3.15—November 2009 AMD64 Technology

instructions can also perform multiply-accumulate operations. Efficient matrix multiplication is

further supported with instructions that can first transpose the elements of matrix rows and columns.

These transpositions can make subsequent accesses to memory or cache more efficient when

performing arithmetic matrix operations.

Figure 5-4 shows a vector multiply-add instruction (PMADDWD) that multiplies vectors of 16-bit

integer elements to yield intermediate results of 32-bit elements, which are then summed pair-wise to

yield two 32-bit elements.

Figure 5-4. Multiply-Add Operation

The operation shown in Figure 5-4 can be used together with transpose and vector-add operations (see

“Addition” on page 216) to accumulate dot product results (also called inner or scalar products),

which are used in many media algorithms.

5.3.4 Saturation

Several of the 64-bit media integer instructions and most of the 64-bit media floating-point instructions

produce vector results in which each element saturates independently of the other elements in the

result vector. Such results are clamped (limited) to the maximum or minimum value representable by

the destination data type when the true result exceeds that maximum or minimum representable value.

Saturation avoids the need for code that tests for potential overflow or underflow. Saturating data is

useful for representing physical-world data, such as sound and color. It is used, for example, when

combining values for pixel coloring.

513-119.eps

operand 1

result

63 0

operand 2

63 0

127 0

* *