User Guide

98 General-Purpose Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
access a memory address that is cached, the processor maintains coherency by providing the correct
data back to the device and main memory.
When a memory-read occurs as a result of an instruction fetch or operand access, the processor first
checks the cache to see if the requested information is available. A read hit occurs if the information is
available in the cache, and a read miss occurs if the information is not available. Likewise, a write hit
occurs if a memory write can be stored in the cache, and a write miss occurs if it cannot be stored in the
cache.
A read miss or write miss can result in the allocation of a cache line, followed by a cache-line fill. Even
if only a single byte is needed, all bytes in a cache line are loaded from memory by a cache-line fill.
Typically, a cache-line fill must write over an existing cache line in a process called a cache-line
replacement. In this case, if the existing cache line is modified, the processor performs a cache-line
writeback to main memory prior to performing the cache-line fill.
Cache-line writebacks help maintain coherency between the caches and main memory. Internally, the
processor can also maintain cache coherency by internally probing (checking) the other caches and
write buffers for a more recent version of the requested data. External devices can also check a
processor’s caches and write buffers for more recent versions of data by externally probing the
processor. All coherency operations are performed in hardware and are completely transparent to
applications.
Cache Coherency and MOESI. Implementations of the AMD64 architecture maintain coherency
between memory and caches using a five-state protocol known as MOESI. The five MOESI states are
modified, owned, exclusive, shared, and invalid. See “Memory System” in Volume 2 for additional
information on MOESI and cache coherency.
Self-Modifying Code. Software that writes into a code segment is classified as self-modifying code.
To avoid cache-coherency problems due to self-modifying code, implementations of the AMD64
architecture invalidate an instruction cache line during a memory write if the instruction cache line
corresponds to a code-segment memory location. By invalidating the instruction cache line, the
processor is forced to write the modified instruction into main memory. A subsequent fetch of the
modified instruction goes to main memory to get the coherent version of the instruction.
3.9.5 Cache Pollution
Because cache sizes are limited, caches should be filled only with data that is frequently used by an
application. Data that is used infrequently, or not at all, is said to pollute the cache because it occupies
otherwise useful cache lines. Ideally, the best data to cache is data that adheres to the principle of
locality. This principle has two components: temporal locality and spatial locality.
Temporal locality refers to data that is likely to be used more than once in a short period of time. It
is useful to cache temporal data because subsequent accesses can retrieve the data quickly. Non-
temporal data is assumed to be used once, and then not used again for a long period of time, or ever.
Caching of non-temporal data pollutes the cache and should be avoided.