Intel 64 and IA-32 Architectures Software Developers Manual Volume 2B, Instruction Set Reference, N-Z

4-88 Vol. 2B
INSTRUCTION SET REFERENCE, N-Z
PMADDUBSW — Multiply and Add Packed Signed and Unsigned Bytes
Description
PMADDUBSW multiplies vertically each unsigned byte of the destination operand
(first operand) with the corresponding signed byte of the source operand (second
operand), producing intermediate signed 16-bit integers. Each adjacent pair of
signed words is added and the saturated result is packed to the destination operand.
For example, the lowest-order bytes (bits 7-0) in the source and destination oper-
ands are multiplied and the intermediate signed word result is added with the corre-
sponding intermediate result from the 2nd lowest-order bytes (bits 15-8) of the
operands; the sign-saturated result is stored in the lowest word of the destination
register (15-0). The same operation is performed on the other pairs of adjacent
bytes. Both operands can be MMX register or XMM registers. When the source
operand is a 128-bit memory operand, the operand must be aligned on a 16-byte
boundary or a general-protection exception (#GP) will be generated.
In 64-bit mode, use the REX prefix to access additional registers.
Operation
PMADDUBSW with 64 bit operands:
DEST[15-0] = SaturateToSignedWord(SRC[15-8]*DEST[15-8]+SRC[7-0]*DEST[7-0]);
DEST[31-16] = SaturateToSignedWord(SRC[31-24]*DEST[31-24]+SRC[23-16]*DEST[23-16]);
DEST[47-32] = SaturateToSignedWord(SRC[47-40]*DEST[47-40]+SRC[39-32]*DEST[39-32]);
DEST[63-48] = SaturateToSignedWord(SRC[63-56]*DEST[63-56]+SRC[55-48]*DEST[55-48]);
PMADDUBSW with 128 bit operands:
DEST[15-0] = SaturateToSignedWord(SRC[15-8]* DEST[15-8]+SRC[7-0]*DEST[7-0]);
// Repeat operation for 2nd through 7th word
SRC1/DEST[127-112] = SaturateToSignedWord(SRC[127-120]*DEST[127-120]+ SRC[119-
112]* DEST[119-112]);
Opcode Instruction
64-Bit
Mode
Compat/
Leg Mode Description
0F 38 04 /r PMADDUBSW
mm1,
mm2/m64
Valid Valid Multiply signed and
unsigned bytes, add
horizontal pair of signed
words, pack saturated
signed-words to MM1.
66 0F 38 04 /r PMADDUBSW
xmm1,
xmm2/m128
Valid Valid Multiply signed and
unsigned bytes, add
horizontal pair of signed
words, pack saturated
signed-words to XMM1.