Specifications

PENTIUM® PRO PROCESSOR AT 150, 166, 180, and 200 MHz E
6
Example 1. A Typical Code Fragment
r1 <= mem [r0] /* Instruction 1 */
r2 <= r1 + r2 /* Instruction 2 */
r5 <= r5 + r1 /* Instruction 3 */
r6 <= r6 - r3 /* Instruction 4 */
The cache miss on instruction 1 will take many
internal clocks, so the Pentium Pro processor core
continues to look ahead for other instructions that
could be speculatively executed, and is typically
looking 20 to 30 instructions in front of the instruction
pointer. Within this 20 to 30 instruction window there
will be, on average, five branches that the
fetch/decode unit must correctly predict if the
dispatch/execute unit is to do useful work. The
sparse register set of an Intel Architecture (IA)
processor will create many false dependencies on
registers so the dispatch/execute unit will rename the
IA registers into a larger register set to enable
additional forward progress. The retire unit owns the
programmer’s IA register set and results are only
committed to permanent machine state in these
registers when it removes completed instructions
from the pool in original program order.
Dynamic Execution technology can be summarized
as optimally adjusting instruction execution by
predicting program flow, having the ability to
speculatively execute instructions in any order, and
then analyzing the program’s dataflow graph to
choose the best order to execute the instructions.
2.2. The Pentium
®
Pro Processor
Pipeline
In order to get a closer look at how the Pentium Pro
processor implements Dynamic Execution, Figure 2
shows a block diagram including cache and memory
interfaces. The “Units” shown in Figure 2 represent
groups of stages of the Pentium Pro processor
pipeline.
The FETCH/DECODE unit: An in-order unit that
takes as input the user program instruction
stream from the instruction cache, and decodes
them into a series of micro-operations (µops)
that represent the dataflow of that instruction
stream. The pre-fetch is speculative.
The DISPATCH/EXECUTE unit: An out-of-order
unit that accepts the dataflow stream,
schedules execution of the µops subject to data
dependencies and resource availability and
temporarily stores the results of these
speculative executions.
The RETIRE unit: An in-order unit that knows
how and when to commit (“retire”) the
temporary, speculative results to permanent
architectural state.
The BUS INTERFACE unit: A partially ordered
unit responsible for connecting the three internal
units to the real world. The bus interface unit
communicates directly with the L2 (second
level) cache supporting up to four concurrent
cache accesses. The bus interface unit also
controls a transaction bus, with Modified
Exclusive Shared Invalid (MESI) snooping
protocol, to system memory.