Specifications

Intel
®
64 and IA-32 Architectures Software Developer’s Manual Documentation Changes 230
Documentation Changes
Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache
line
Accesses to cacheable memory that are split across bus widths, cache lines, and page
boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel Atom, Intel
Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 proces-
sors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel
Xeon, and P6 family processors provide bus control signals that permit external memory
subsystems to make split accesses atomic; however, nonaligned data accesses will seri-
ously impact the performance of the processor and should be avoided.
An x87 instruction or an SSE instructions that accesses data larger than a quadword may
be implemented using multiple memory accesses. If such an instruction stores to
memory, some of the accesses may complete (writing to memory) while another causes
the operation to fault for architectural reasons (e.g. due an page-table entry that is
marked “not present”). In this case, the effects of the completed accesses may be visible
to software even though the overall instruction caused a fault. If TLB invalidation has
been delayed (see Section 4.10.3.4), such page faults may occur even if all accesses are
to the same page.
...
8.1.2.1 Automatic Locking
The operations on which the processor automatically follows the LOCK semantics are as
follows:
When executing an XCHG instruction that references memory.
When setting the B (busy) flag of a TSS descriptor The processor tests and
sets the busy flag in the type field of the TSS descriptor when switching to a task. To
ensure that two processors do not switch to the same task simultaneously, the
processor follows the LOCK semantics while testing and setting this flag.
...
8.1.2.2 Software Controlled Bus Locking
To explicitly force the LOCK semantics, software can use the LOCK prefix with the
following instructions when they are used to modify a memory location. An invalid-
opcode exception (#UD) is generated when the LOCK prefix is used with any other
instruction or when no write operation is made to memory (that is, when the destination
operand is in a register).
The bit test and modify instructions (BTS, BTR, and BTC).
The exchange instructions (XADD, CMPXCHG, and CMPXCHG8B).
The LOCK prefix is automatically assumed for XCHG instruction.
The following single-operand arithmetic and logical instructions: INC, DEC, NOT, and
NEG.
The following two-operand arithmetic and logical instructions: ADD, ADC, SUB, SBB,
AND, OR, and XOR.
A locked instruction is guaranteed to lock only the area of memory defined by the desti-
nation operand, but may be interpreted by the system as a lock for a larger memory
area.