User Guide

114 128-Bit Media and Scientific Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
Figure 4-9. Sum-of-Absolute-Differences Operation
There is an instruction for computing the average of unsigned bytes or words. The instruction is useful
for MPEG decoding, in which motion compensation involves many byte-averaging operations
between and within macroblocks. In addition to speeding up these operations, the instruction also frees
up registers and make it possible to unroll the averaging loops.
Some of the arithmetic and pack instructions produce vector results in which each element saturates
independently of the other elements in the result vector. Such results are clamped (limited) to the
maximum or minimum value representable by the destination data type when the true result exceeds
that maximum or minimum representable value. Saturating data is useful for representing physical-
world data, such as sound and color. It is used, for example, when combining values for pixel coloring.
4.2.7 Branch Removal
Branching is a time-consuming operation that, unlike most 128-bit media vector operations, does not
exhibit parallel behavior (there is only one branch target, not multiple targets, per branch instruction).
In many media applications, a branch involves selecting between only a few (often only two) cases.
Such branches can be replaced with 128-bit media vector compare and vector logical instructions that
simulate predicated execution or conditional moves.
Figure 4-10 on page 115 shows an example of a non-branching sequence that implements a two-way
multiplexer—one that is equivalent to the ternary operator “?:” in C and C++. The comparable code
sequence is explained in “Compare and Write Mask” on page 153.
513-155.eps
operand 1
. . . . . .. . . . . .. . . . . .. . . . . .
. . . . . . . . . . . .
127 0
operand 2
127 0
result
00
127 0
low-order
intermediate result
high-order
intermediate result
ABS ΔABS Δ
ΣΣ
ABS Δ ABS Δ