User Guide

96 General-Purpose Programming
AMD64 Technology 24592—Rev. 3.15—November 2009
3.9.3 Caches
Depending on the instruction, operands can be encoded in the instruction opcode or located in
registers, I/O ports, or memory locations. An operand that is located in memory can actually be
physically present in one or more locations within a system’s memory hierarchy.
Memory Hierarchy. A system’s memory hierarchy may have some or all of the following levels:
Main Memory—Main memory is external to the processor chip and is the memory-hierarchy level
farthest from the processor’s execution units. All physical-memory addresses are present in main
memory, which is implemented using relatively slow, but high-density memory devices.
External Caches—External caches are external to the processor chip, but are implemented using
lower-capacity, higher-performance memory devices than system memory. The system uses
external caches to hold copies of frequently-used instructions and data found in main memory. A
subset of the physical-memory addresses can be present in the external caches at any time. A
system can contain any number of external caches, or none at all.
Internal Caches—Internal caches are present on the processor chip itself, and are the closest
memory-hierarchy level to the processor’s execution units. Because of their presence on the
processor chip, access to internal caches is very fast. Internal caches contain copies of the most
frequently-used instructions and data found in main memory and external caches, and their
capacities are relatively small in comparison to external caches. A processor implementation can
contain any number of internal caches, or none at all. Implementations often contain a first-level
instruction cache and first-level data (operand) cache, and they may also contain a higher-capacity
(and slower) second-level internal cache for storing both instructions and data.
Figure 3-19 on page 97 shows an example of a four-level memory hierarchy that combines main
memory, external third-level (L3) cache, and internal second-level (L2) and two first-level (L1) caches.
As the figure shows, the first-level and second-level caches are implemented on the processor chip, and
the third-level cache is external to the processor. The first-level cache is a split cache, with separate
caches used for instructions and data. The second-level and third-level caches are unified (they contain
both instructions and data). Memory at the highest levels of the hierarchy have greater capacity (larger
size), but have slower access, than memory at the lowest levels.
Using caches to store frequently used instructions and data can result in significantly improved
software performance by avoiding accesses to the slower main memory. Applications function
identically on systems without caches and on systems with caches, although cacheless systems
typically execute applications more slowly. Application software can, however, be optimized to make
efficient use of caches when they are present, as described later in this section.