User Guide

94 General-Purpose Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
Some system devices might be sensitive to reads. Normally, applications do not have direct access to
system devices, but instead call an operating-system service routine to perform the access on the
application’s behalf. In this case, it is system software’s responsibility to enforce strong read-ordering.
Write Ordering. Writes affect program order because they affect the state of software-visible
resources. The rules governing write ordering are restrictive:
Generally, out-of-order writes are not allowed. Write instructions executed out-of-order cannot
commit (write) their result to memory until all previous instructions have completed in program
order. The processor can, however, hold the result of an out-of-order write instruction in a private
buffer (not visible to software) until that result can be committed to memory.
System software can create non-cacheable write-combining regions in memory when the order of
writes is known to not affect system devices. When writes are performed to write-combining
memory, they can appear to complete out of order relative to other writes. See “Memory System”
in Volume 2 for additional information.
Speculative writes are not allowed. As with out-of-order writes, speculative write instructions
cannot commit their result to memory until all previous instructions have completed in program
order. Processors can hold the result in a private buffer (not visible to software) until the result can
be committed.
Atomicity of accesses. Single load or store operations (from instructions that do just a single load or
store) are naturally atomic on any AMD64 processor a s long as they do not cross an aligned 8-byte
boundary. Accesses up to eight bytes in size which do cross such a boundary may be performed
atomically using certain instructions with a lock prefix, such as XCHG, CMPXCHG or
CMPXCHG8B, as long as all such accesses are done using the same technique. (Note that misaligned
locked accesses may be subject to heavy performance penalties.) CMPXCHG16B can be used to
perform 16-byte atomic accesses in 64-bit mode (with certain alignment restrictions).
3.9.2 Forcing Memory Order
Special instructions are provided for application software to force memory ordering in situations
where such ordering is important. These instructions are:
Load Fence—The LFENCE instruction forces ordering of memory loads (reads). All memory
loads preceding the LFENCE (in program order) are completed prior to completing memory loads
following the LFENCE. Memory loads cannot be reordered around an LFENCE instruction, but
other non-serializing instructions (such as memory writes) can be reordered around the LFENCE.
Store Fence—The SFENCE instruction forces ordering of memory stores (writes). All memory
stores preceding the SFENCE (in program order) are completed prior to completing memory
stores following the SFENCE. Memory stores cannot be reordered around an SFENCE instruction,
but other non-serializing instructions (such as memory loads) can be reordered around the
SFENCE.
Memory Fence—The MFENCE instruction forces ordering of all memory accesses (reads and
writes). All memory accesses preceding the MFENCE (in program order) are completed prior to
completing any memory access following the MFENCE. Memory accesses cannot be reordered