User Guide
292 x87 Floating-Point Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
based branches that depend on the condition codes for branch direction, because FNSTSW AX is often
a serializing instruction.
6.10.3 Use FSINCOS Instead of FSIN and FCOS
Frequently, a piece of code that needs to compute the sine of an argument also needs to compute the
cosine of that same argument. In such cases, use the FSINCOS instruction to compute both
trigonometric functions concurrently, which is faster than using separate FSIN and FCOS instructions
to accomplish the same task.
6.10.4 Break Up Dependency Chains
Parallelism can be increased by breaking up dependency chains or by evaluating multiple dependency
chains simultaneously (explicitly switching execution between them). Depending on the hardware
implementation of the architecture, the FXCH instruction may prove faster than FST/FLD pairs for
switching execution between dependency chains.