Specifications
E PENTIUM® PRO PROCESSOR AT 150, 166, 180, and 200 MHz
5
“execute” phases, and opens up a wide instruction
window using an instruction pool. This approach
allows the “execute” phase of the Pentium Pro
processor to have much more visibility into the
program’s instruction stream so that better
scheduling may take place. It requires the
instruction “fetch/decode” phase of the Pentium Pro
processor to be much more intelligent in terms of
predicting program flow. Optimized scheduling
requires the fundamental “execute” phase to be
replaced by decoupled “dispatch/execute” and
“retire” phases. This allows instructions to be
started in any order but always be completed in the
original program order. The Pentium Pro processor
is implemented as three independent engines
coupled with an instruction pool as shown in
Figure 1.
2.1. Full Core Utilization
The three independent-engine approach was taken
to more fully utilize the CPU core. Consider the
code fragment in Example :
The first instruction in this example is a load of r1
that, at run time, causes a cache miss. A traditional
CPU core must wait for its bus interface unit to read
this data from main memory and return it before
moving on to instruction 2. This CPU stalls while
waiting for this data and is thus being under-utilized.
To avoid this memory latency problem, the Pentium
Pro processor “looks-ahead” into its instruction pool
at subsequent instructions and will do useful work
rather than be stalled. In the example in Example 1,
instruction 2 is not executable since it depends
upon the result of instruction 1; however, both
instructions 3 and 4 are executable. The Pentium
Pro processor executes instructions 3 and 4 out-of-
order. The results of this out-of-order execution can
not be committed to permanent machine state (i.e.,
the programmer-visible registers) immediately since
the original program order must be maintained. The
results are instead stored back in the instruction
pool awaiting in-order retirement. The core executes
instructions depending upon their readiness to
execute, and not on their original program order,
and is therefore a true dataflow engine. This
approach has the side effect that instructions are
typically executed out-of-order.
Dispatch
/Execute
Unit
Retire
Unit
Instruction
Pool
Fetch/
Decode
Unit
Figure 1. Three Engines Communicating Using an Instruction Pool