Specifications

ManualsBrandsAMD ManualsComputer equipment-K6-2/450 - MHz Processor

PENTIUM® PRO PROCESSOR AT 150, 166, 180, and 200 MHz E

Example 1. A Typical Code Fragment

r1 <= mem [r0] /* Instruction 1 */

r2 <= r1 + r2 /* Instruction 2 */

r5 <= r5 + r1 /* Instruction 3 */

r6 <= r6 - r3 /* Instruction 4 */

The cache miss on instruction 1 will take many

internal clocks, so the Pentium Pro processor core

continues to look ahead for other instructions that

could be speculatively executed, and is typically

looking 20 to 30 instructions in front of the instruction

pointer. Within this 20 to 30 instruction window there

will be, on average, five branches that the

fetch/decode unit must correctly predict if the

dispatch/execute unit is to do useful work. The

sparse register set of an Intel Architecture (IA)

processor will create many false dependencies on

registers so the dispatch/execute unit will rename the

IA registers into a larger register set to enable

additional forward progress. The retire unit owns the

programmer’s IA register set and results are only

committed to permanent machine state in these

registers when it removes completed instructions

from the pool in original program order.

Dynamic Execution technology can be summarized

as optimally adjusting instruction execution by

predicting program flow, having the ability to

speculatively execute instructions in any order, and

then analyzing the program’s dataflow graph to

choose the best order to execute the instructions.

2.2. The Pentium

Pro Processor

Pipeline

In order to get a closer look at how the Pentium Pro

processor implements Dynamic Execution, Figure 2

shows a block diagram including cache and memory

interfaces. The “Units” shown in Figure 2 represent

groups of stages of the Pentium Pro processor

pipeline.

• The FETCH/DECODE unit: An in-order unit that

takes as input the user program instruction

stream from the instruction cache, and decodes

them into a series of micro-operations (µops)

that represent the dataflow of that instruction

stream. The pre-fetch is speculative.

• The DISPATCH/EXECUTE unit: An out-of-order

unit that accepts the dataflow stream,

schedules execution of the µops subject to data

dependencies and resource availability and

temporarily stores the results of these

speculative executions.

• The RETIRE unit: An in-order unit that knows

how and when to commit (“retire”) the

temporary, speculative results to permanent

architectural state.

• The BUS INTERFACE unit: A partially ordered

unit responsible for connecting the three internal

units to the real world. The bus interface unit

communicates directly with the L2 (second

level) cache supporting up to four concurrent

cache accesses. The bus interface unit also

controls a transaction bus, with Modified

Exclusive Shared Invalid (MESI) snooping

protocol, to system memory.