User's Manual
8-18 Vol. 3
MULTIPLE-PROCESSOR MANAGEMENT
8.2.4  Out-of-Order Stores For String Operations 
The Intel Core 2 Duo, Intel Core, Pentium 4, and P6 family processors modify the 
processors operation during the string store operations (initiated with the MOVS and 
STOS instructions) to maximize performance. Once the “fast string” operations initial 
conditions are met (as described below), the processor will essentially operate on, 
from an external perspective, the string in a cache line by cache line mode. This 
results in the processor looping on issuing a cache-line read for the source address 
and an invalidation on the external bus for the destination address, knowing that all 
bytes in the destination cache line will be modified, for the length of the string. In this 
mode interrupts will only be accepted by the processor on cache line boundaries. It is 
possible in this mode that the destination line invalidations, and therefore stores, will 
be issued on the external bus out of order. 
Code dependent upon sequential store ordering should not use the string operations 
for the entire data structure to be stored. Data and semaphores should be separated. 
Order dependent code should use a discrete semaphore uniquely stored to after any 
string operations to allow correctly ordered data to be seen by all processors.
“Fast string” operation can be disabled by clearing the fast-string-enable bit (bit 0) of 
IA32_MISC_ENABLES MSR.
Initial conditions for “fast string” operations are implementation specific. Example 
conditions include:
• EDI and ESI must be 8-byte aligned for the Pentium III processor. EDI must be 8-
byte aligned for the Pentium 4 processor.
• String operation must be performed in ascending address order.
• The initial operation counter (ECX) must be equal to or greater than 64.
• Source and destination must not overlap by less than a cache line (64 bytes, for 
Intel Core 2 Duo, Intel Core, Pentium M, and Pentium 4 processors; 32 bytes P6 
family and Pentium processors).
• The memory type for both source and destination addresses must be either WB 
or WC.
NOTE
Initial conditions for “fast string“ operation in future Intel 64 or IA-32 processor fami-
lies may differ from above.
8.2.4.1   Memory-Ordering Model for String Operations on Write-back (WB) 
Memory
This section deals with the memory-ordering model for string operations on write-
back (WB) memory for the Intel 64 architecture. 
The memory-ordering model respects the follow principles:
1. Stores within a single string operation may be executed out of order.










