User Guide

64-Bit Media Programming 197
24592—Rev. 3.15—November 2009 AMD64 Technology
instructions can also perform multiply-accumulate operations. Efficient matrix multiplication is
further supported with instructions that can first transpose the elements of matrix rows and columns.
These transpositions can make subsequent accesses to memory or cache more efficient when
performing arithmetic matrix operations.
Figure 5-4 shows a vector multiply-add instruction (PMADDWD) that multiplies vectors of 16-bit
integer elements to yield intermediate results of 32-bit elements, which are then summed pair-wise to
yield two 32-bit elements.
Figure 5-4. Multiply-Add Operation
The operation shown in Figure 5-4 can be used together with transpose and vector-add operations (see
“Addition” on page 216) to accumulate dot product results (also called inner or scalar products),
which are used in many media algorithms.
5.3.4 Saturation
Several of the 64-bit media integer instructions and most of the 64-bit media floating-point instructions
produce vector results in which each element saturates independently of the other elements in the
result vector. Such results are clamped (limited) to the maximum or minimum value representable by
the destination data type when the true result exceeds that maximum or minimum representable value.
Saturation avoids the need for code that tests for potential overflow or underflow. Saturating data is
useful for representing physical-world data, such as sound and color. It is used, for example, when
combining values for pixel coloring.
513-119.eps
operand 1
result
63 0
63 0
operand 2
63 0
127 0
*
+
+
* *
*