Specifications

Intel
®
64 and IA-32 Architectures Software Developer’s Manual Documentation Changes 231
Documentation Changes
Software should access semaphores (shared memory used for signalling between
multiple processors) using identical addresses and operand lengths. For example, if one
processor accesses a semaphore using a word access, other processors should not
access the semaphore using a byte access.
NOTE
Do not implement semaphores using the WC memory type. Do not
perform non-temporal stores to a cache line containing a location used to
implement a semaphore.
The integrity of a bus lock is not affected by the alignment of the memory field. The
LOCK semantics are followed for as many bus cycles as necessary to update the entire
operand. However, it is recommend that locked accesses be aligned on their natural
boundaries for better system performance:
Any boundary for an 8-bit access (locked or otherwise).
16-bit boundary for locked word accesses.
32-bit boundary for locked doubleword accesses.
64-bit boundary for locked quadword accesses.
Locked operations are atomic with respect to all other memory operations and all exter-
nally visible events. Only instruction fetch and page table accesses can pass locked
instructions. Locked instructions can be used to synchronize data written by one
processor and read by another processor.
For the P6 family processors, locked operations serialize all outstanding load and store
operations (that is, wait for them to complete). This rule is also true for the Pentium 4
and Intel Xeon processors, with one exception. Load operations that reference weakly
ordered memory types (such as the WC memory type) may not be serialized.
Locked instructions should not be used to ensure that data written can be fetched as
instructions.
...
8.1.3 Handling Self- and Cross-Modifying Code
The act of a processor writing data into a currently executing code segment with the
intent of executing that data as code is called self-modifying code. IA-32 processors
exhibit model-specific behavior when executing self-modified code, depending upon
how far ahead of the current execution pointer the code has been modified.
As processor microarchitectures become more complex and start to speculatively
execute code ahead of the retirement point (as in P6 and more recent processor fami-
lies), the rules regarding which code should execute, pre- or post-modification, become
blurred. To write self-modifying code and ensure that it is compliant with current and
future versions of the IA-32 architectures, use one of the following coding options:
(* OPTION 1 *)
Store modified code (as data) into code segment;
Jump to new code or an intermediate location;
Execute new code;
(* OPTION 2 *)
Store modified code (as data) into code segment;