Intel 64 and IA-32 Architectures Software Developers Manual Volume 2B, Instruction Set Reference, N-Z

ManualsBrandsIntel ManualsOtherIntel Pentium 4 Processor 650 supporting HT Technology

261

262

263

264

265

266

267

268

269

270

Vol. 2B 4-267

INSTRUCTION SET REFERENCE, N-Z

RSQRTPS—Compute Reciprocals of Square Roots of Packed Single-

Precision Floating-Point Values

Description

Performs a SIMD computation of the approximate reciprocals of the square roots of

the four packed single-precision floating-point values in the source operand (second

operand) and stores the packed single-precision floating-point results in the destina-

tion operand. The source operand can be an XMM register or a 128-bit memory loca-

tion. The destination operand is an XMM register. See Figure 10-5 in the Intel

and IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration of

a SIMD single-precision floating-point operation.

The relative error for this approximation is:

|Relative Error|

≤ 1.5 ∗ 2

−12

The RSQRTPS instruction is not affected by the rounding control bits in the MXCSR

returned. A denormal source value is treated as a 0.0 (of the same sign). When a

source value is a negative value (other than −0.0), a floating-point indefinite is

returned. When a source value is an SNaN or QNaN, the SNaN is converted to a QNaN

or the source QNaN is returned.

In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to

access additional registers (XMM8-XMM15).

Operation

DEST[31:0] ← APPROXIMATE(1.0/SQRT(SRC[31:0]));

DEST[63:32] ← APPROXIMATE(1.0/SQRT(SRC[63:32]));

DEST[95:64] ← APPROXIMATE(1.0/SQRT(SRC[95:64]));

DEST[127:96] ← APPROXIMATE(1.0/SQRT(SRC[127:96]));

Intel C/C++ Compiler Intrinsic Equivalent

RSQRTPS __m128 _mm_rsqrt_ps(__m128 a)

Opcode Instruction

64-Bit

Mode

Compat/

Leg Mode Description

0F 52 /r RSQRTPS xmm1,

xmm2/m128

Valid Valid Computes the approximate

reciprocals of the square roots of

the packed single-precision floating-

point values in xmm2/m128 and

stores the results in xmm1.