User Guide

ManualsBrandsAMD ManualsOtherAMD64 ARCHITECTURE

121

122

123

124

125

126

127

128

129

130

94 General-Purpose Programming

AMD64 Technology 24592—Rev. 3.15—November 2009

Some system devices might be sensitive to reads. Normally, applications do not have direct access to

system devices, but instead call an operating-system service routine to perform the access on the

application’s behalf. In this case, it is system software’s responsibility to enforce strong read-ordering.

Write Ordering. Writes affect program order because they affect the state of software-visible

resources. The rules governing write ordering are restrictive:

• Generally, out-of-order writes are not allowed. Write instructions executed out-of-order cannot

commit (write) their result to memory until all previous instructions have completed in program

order. The processor can, however, hold the result of an out-of-order write instruction in a private

buffer (not visible to software) until that result can be committed to memory.

System software can create non-cacheable write-combining regions in memory when the order of

writes is known to not affect system devices. When writes are performed to write-combining

memory, they can appear to complete out of order relative to other writes. See “Memory System”

in Volume 2 for additional information.

• Speculative writes are not allowed. As with out-of-order writes, speculative write instructions

cannot commit their result to memory until all previous instructions have completed in program

order. Processors can hold the result in a private buffer (not visible to software) until the result can

be committed.

Atomicity of accesses. Single load or store operations (from instructions that do just a single load or

store) are naturally atomic on any AMD64 processor a s long as they do not cross an aligned 8-byte

boundary. Accesses up to eight bytes in size which do cross such a boundary may be performed

atomically using certain instructions with a lock prefix, such as XCHG, CMPXCHG or

CMPXCHG8B, as long as all such accesses are done using the same technique. (Note that misaligned

locked accesses may be subject to heavy performance penalties.) CMPXCHG16B can be used to

perform 16-byte atomic accesses in 64-bit mode (with certain alignment restrictions).

3.9.2 Forcing Memory Order

Special instructions are provided for application software to force memory ordering in situations

where such ordering is important. These instructions are:

• Load Fence—The LFENCE instruction forces ordering of memory loads (reads). All memory

loads preceding the LFENCE (in program order) are completed prior to completing memory loads

following the LFENCE. Memory loads cannot be reordered around an LFENCE instruction, but

other non-serializing instructions (such as memory writes) can be reordered around the LFENCE.

• Store Fence—The SFENCE instruction forces ordering of memory stores (writes). All memory

stores preceding the SFENCE (in program order) are completed prior to completing memory

stores following the SFENCE. Memory stores cannot be reordered around an SFENCE instruction,

but other non-serializing instructions (such as memory loads) can be reordered around the

SFENCE.

• Memory Fence—The MFENCE instruction forces ordering of all memory accesses (reads and

writes). All memory accesses preceding the MFENCE (in program order) are completed prior to

completing any memory access following the MFENCE. Memory accesses cannot be reordered