Specifications
Intel
®
64 and IA-32 Architectures Software Developer’s Manual Documentation Changes 235
Documentation Changes
• The page attribute table (PAT) can be used to strengthen memory ordering for a
specific page or group of pages (see Section 11.12, “Page Attribute Table (PAT)”).
The PAT is available only in the Pentium 4, Intel Xeon, and Pentium III processors.
These mechanisms can be used as follows:
Memory mapped devices and other I/O devices on the bus are often sensitive to the
order of writes to their I/O buffers. I/O instructions can be used to (the IN and OUT
instructions) impose strong write ordering on such accesses as follows. Prior to
executing an I/O instruction, the processor waits for all previous instructions in the
program to complete and for all buffered writes to drain to memory. Only instruction
fetch and page tables walks can pass I/O instructions. Execution of subsequent instruc-
tions do not begin until the processor determines that the I/O instruction has been
completed.
Synchronization mechanisms in multiple-processor systems may depend upon a strong
memory-ordering model. Here, a program can use a locking instruction such as the
XCHG instruction or the LOCK prefix to ensure that a read-modify-write operation on
memory is carried out atomically. Locking operations typically operate like I/O opera-
tions in that they wait for all previous instructions to complete and for all buffered writes
to drain to memory (see Section 8.1.2, “Bus Locking”).
Program synchronization can also be carried out with serializing instructions (see
Section 8.3). These instructions are typically used at critical procedure or task bound-
aries to force completion of all previous instructions before a jump to a new section of
code or a context switch occurs. Like the I/O and locking instructions, the processor
waits until all previous instructions have been completed and all buffered writes have
been drained to memory before executing the serializing instruction.
The SFENCE, LFENCE, and MFENCE instructions provide a performance-efficient way of
ensuring load and store memory ordering between routines that produce weakly-
ordered results and routines that consume that data. The functions of these instructions
are as follows:
• SFENCE — Serializes all store (write) operations that occurred prior to the SFENCE
instruction in the program instruction stream, but does not affect load operations.
• LFENCE — Serializes all load (read) operations that occurred prior to the LFENCE
instruction in the program instruction stream, but does not affect store operations.
1
• MFENCE — Serializes all store and load operations that occurred prior to the
MFENCE instruction in the program instruction stream.
Note that the SFENCE, LFENCE, and MFENCE instructions provide a more efficient
method of controlling memory ordering than the CPUID instruction.
The MTRRs were introduced in the P6 family processors to define the cache characteris-
tics for specified areas of physical memory. The following are two examples of how
memory types set up with MTRRs can be used strengthen or weaken memory ordering
for the Pentium 4, Intel Xeon, and P6 family processors:
• The strong uncached (UC) memory type forces a strong-ordering model on memory
accesses. Here, all reads and writes to the UC memory region appear on the bus and
out-of-order or speculative accesses are not performed. This memory type can be
1. Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later
instruction begins execution until LFENCE completes. As a result, an instruction that loads from mem-
ory and that precedes an LFENCE receives data from memory prior to completion of the LFENCE. An
LFENCE that follows an instruction that stores to memory might complete before the data being
stored have become globally visible. Instructions following an LFENCE may be fetched from memory
before the LFENCE, but they will not execute until the LFENCE completes.