User Guide

General-Purpose Programming 99
24592—Rev. 3.15—November 2009 AMD64 Technology
Cache-control instructions (“Cache-Control Instructions” on page 99) are available to applications
to minimize cache pollution caused by non-temporal data.
Spatial locality refers to data that resides at addresses adjacent to or very close to the data being
referenced. Typically, when data is accessed, it is likely the data at nearby addresses will be
accessed in a short period of time. Caches perform cache-line fills in order to take advantage of
spatial locality. During a cache-line fill, the referenced data and nearest neighbors are loaded into
the cache. If the characteristics of spacial locality do not fit the data used by an application, then the
cache becomes polluted with a large amount of unreferenced data.
Applications can avoid problems with this type of cache pollution by using data structures with
good spatial-locality characteristics.
Another form of cache pollution is stale data. Data that adheres to the principle of locality can become
stale when it is no longer used by the program, or won’t be used again for a long time. Applications can
use the CLFLUSH instruction to remove stale data from the cache.
3.9.6 Cache-Control Instructions
General control and management of the caches is performed by system software and not application
software. System software uses special registers to assign memory types to physical-address ranges,
and page-attribute tables are used to assign memory types t o virtual address ranges. Memory types
define the cacheability characteristics of memory regions and how coherency is maintained with main
memory. See “Memory System” in Volume 2 for additional information on memory typing.
Instructions are available that allow application software to control the cacheability of data it uses on a
more limited basis. These instructions can be used to boost an application’s performance by
prefetching data into the cache, and by avoiding cache pollution. Run-time analysis tools and
compilers may be able to suggest the use of cache-control instructions for critical sections of
application code.
Cache Prefetching. Applications can prefetch entire cache lines into the caching hierarchy using one
of the prefetch instructions. The prefetch should be performed in advance, so that the data is available
in the cache when needed. Although load instructions can mimic the prefetch function, they do not
offer the same performance advantage, because a load instruction may cause a subsequent instruction
to stall until the load completes, but a prefetch instruction will never cause such a stall. Load
instructions also unnecessarily require the use of a register, but prefetch instructions do not.
The instructions available in the AMD64 architecture for cache-line prefetching include one SSE
instruction and two 3DNow! instructions:
PREFETCHlevel—(an SSE instruction) Prefetches read/write data into a specific level of the
cache hierarchy. If the requested data is already in the desired cache level or closer to the processor
(lower cache-hierarchy level), the data is not prefetched. If the operand specifies an invalid
memory address, no exception occurs, and the instruction has no effect. Attempts to prefetch data
from non-cacheable memory, such as video frame buffers, or data from write-combining memory,
are also ignored. The exact actions performed by the PREFETCHlevel instructions depend on the
processor implementation. Current AMD processor families map all PREFETCHlevel instructions