User Guide
128-Bit Media and Scientific Programming 171
24592—Rev. 3.15—November 2009 AMD64 Technology
low-order doubleword of the destination. The three high-order doublewords of the destination XMM
register are not modified.
The SQRTSD instruction computes the square root of the low-order double-precision floating-point
value in the second operand (an XMM register or 64-bit memory location) and writes the result in the
low-order quadword of the destination. The high-order quadword of the destination XMM register is
not modified.
Reciprocal Square Root
• RSQRTPS—Reciprocal Square Root Packed Single-Precision Floating-Point
• RSQRTSS—Reciprocal Square Root Scalar Single-Precision Floating-Point
The RSQRTPS instruction computes the approximate reciprocal of the square root of each of four
single-precision floating-point values in the second operand (an XMM register or 128-bit memory
location) and writes the result in the corresponding doubleword of the destination.
The RSQRTSS instruction computes the approximate reciprocal of the square root of the low-order
single-precision floating-point value in the second operand (an XMM register or 32-bit memory
location) and writes the result in the low-order doubleword of the destination. The three high-order
doublewords in the destination XMM register are not modified.
For both RSQRTPS and RSQRTSS, the maximum relative error is less than or equal to 1.5 * 2
–12
.
Reciprocal Estimation
• RCPPS—Reciprocal Packed Single-Precision Floating-Point
• RCPSS—Reciprocal Scalar Single-Precision Floating-Point
The RCPPS instruction computes the approximate reciprocal of each of the four single-precision
floating-point values in the second operand (an XMM register or 128-bit memory location) and writes
the result in the corresponding doubleword of the destination.
The RCPSS instruction computes the approximate reciprocal of the low-order single-precision
floating-point value in the second operand (an XMM register or 32-bit m emory location) and writes
the result in the low-order doubleword of the destination. The three high-order doublewords in the
destination are not modified.
For both RCPPS and RCPSS, the maximum relative error is less than or equal to 1.5 * 2
–12
.
4.6.6 Compare
The floating-point vector-compare instructions compare two operands, and they either write a mask, or
they write the maximum or minimum value, or they set flags. Compare instructions can be used to
avoid branches. Figure 4-10 on page 115 shows an example of using compare instructions.
Compare and Write Mask
• CMPPS—Compare Packed Single-Precision Floating-Point