User Guide
128-Bit Media and Scientific Programming 161
24592—Rev. 3.15—November 2009 AMD64 Technology
Move Non-Temporal. The move non-temporal instructions are streaming-store instructions. They
minimize pollution of the cache.
• MOVNTPD—Move Non-Temporal Packed Double-Precision Floating-Point
• MOVNTPS—Move Non-Temporal Packed Single-Precision Floating-Point
• MOVNTSD—Move Non-Temporal Scalar Double-Precision Floating-Point
• MOVNTSS—Move Non-Temporal Scalar Single-Precision Floating-Point
The MOVNTPx instructions copy four packed single-precision floating-point (MOVNTPS) or two
packed double-precision floating-point (MOVNTPD) values from an XMM register into a 128-bit
memory location.
The MOVNTSx instructions store one double precision floating point XMM register value into a 64 bit
memory location or one single precision floating point XMM register value into a 32-bit memory
location.
These instructions indicate to the processor that their data is non-temporal, which assumes that the
data they reference will be used only once and is therefore not subject to cache-related overhead (as
opposed to temporal data, which assumes that the data will be accessed again soon and should be
cached). The non-temporal instructions use weakly-ordered, write-combining buffering of write data,
and they minimize cache pollution. The exact method by which cache pollution is minimized depends
on the hardware implementation of the instruction. For further information, see “Memory
Optimization” on page 92.
Move Mask
• MOVMSKPS—Extract Packed Single-Precision Floating-Point Sign Mask
• MOVMSKPD—Extract Packed Double-Precision Floating-Point Sign Mask
The MOVMSKPS instruction copies the sign bits of four single-precision floating-point values in an
XMM register to the four low-order bits of a 32-bit or 64-bit general-purpose register, with zero-
extension. The MOVMSKPD instruction copies the sign bits o f two double-precision floating-point
values in an XMM register to the two low-order bits of a general-purpose register, with zero-extension.
The result of either instruction is a sign-bit mask that can be used for data-dependent branching. Figure
4-32 shows the MOVMSKPS operation.