User Guide

128-Bit Media and Scientific Programming 171

24592—Rev. 3.15—November 2009 AMD64 Technology

low-order doubleword of the destination. The three high-order doublewords of the destination XMM

The SQRTSD instruction computes the square root of the low-order double-precision floating-point

value in the second operand (an XMM register or 64-bit memory location) and writes the result in the

low-order quadword of the destination. The high-order quadword of the destination XMM register is

not modified.

Reciprocal Square Root

• RSQRTPS—Reciprocal Square Root Packed Single-Precision Floating-Point

• RSQRTSS—Reciprocal Square Root Scalar Single-Precision Floating-Point

The RSQRTPS instruction computes the approximate reciprocal of the square root of each of four

single-precision floating-point values in the second operand (an XMM register or 128-bit memory

location) and writes the result in the corresponding doubleword of the destination.

The RSQRTSS instruction computes the approximate reciprocal of the square root of the low-order

single-precision floating-point value in the second operand (an XMM register or 32-bit memory

location) and writes the result in the low-order doubleword of the destination. The three high-order

doublewords in the destination XMM register are not modified.

For both RSQRTPS and RSQRTSS, the maximum relative error is less than or equal to 1.5 * 2

–12

Reciprocal Estimation

• RCPPS—Reciprocal Packed Single-Precision Floating-Point

• RCPSS—Reciprocal Scalar Single-Precision Floating-Point

The RCPPS instruction computes the approximate reciprocal of each of the four single-precision

floating-point values in the second operand (an XMM register or 128-bit memory location) and writes

the result in the corresponding doubleword of the destination.

The RCPSS instruction computes the approximate reciprocal of the low-order single-precision

floating-point value in the second operand (an XMM register or 32-bit m emory location) and writes

the result in the low-order doubleword of the destination. The three high-order doublewords in the

destination are not modified.

For both RCPPS and RCPSS, the maximum relative error is less than or equal to 1.5 * 2

–12

4.6.6 Compare

The floating-point vector-compare instructions compare two operands, and they either write a mask, or

they write the maximum or minimum value, or they set flags. Compare instructions can be used to

avoid branches. Figure 4-10 on page 115 shows an example of using compare instructions.

Compare and Write Mask

• CMPPS—Compare Packed Single-Precision Floating-Point