Paper

ports connect to six QPI interfaces to reach other, external, processor sockets
and devices. Each processor is capable of 128 GB/s bandwidth between sockets
and 45 GB/S per second of bandwidth to the memory modules.
Thread Parallelism
Like its predecessors, Intel Itanium processor 9500 series processors are
multi-threaded and support the Intel® Hyperthreading Technology
1
. There has
been signicant advancement in Intel Itanium processor 9500 series to add
new multi-threading optimizations, including the dual-domain multithreading
capability. In the dual domain, the front-end and main pipelines are indepen-
dently threaded. Each pipeline uses independent and separate algorithms to
switch between threads. Many of the core structures have been split into
separate per-thread resources. This includes the instruction buffer, the data
TLBs, and the hardware page walker. The result is that disparate instruction
threads can now run on different parts of the pipeline, to improve performance
even on legacy software without requiring costly recompilation and application
re-qualication.
WHAT IS EPIC?
EPIC or Explicitly Parallel Instruction
Computing, represents a paradigm shift
in the development of instruction set
architectures. Instead of placing the main
burden of extracting parallelism and
performance on the underlying comput-
ing hardware, a synergy is developed be-
tween the software ecosystem and the
hardware implementation. This allows
compilers - which have full access to the
program source code - and the proces-
sors - which have full access to run-time
information as a program executes - to
each be optimized for what each does
best. In order to do this, the instruc-
tion set provides a rich set of features
for software to optimally control the
low-level hardware resources. This most
notably includes the ability for compil-
ers to specify, schedule and exploit the
many forms of parallelism inherent in
user programs.
Figure 2 Dual domain multi-threading. Intel Itanium processor 9500 series en-
ables independent front-end and back-end pipeline operations.
Back-end handles read from instruction
buffer at 4 bundles per cycle, execute
instructions, and accesing data cache
Front-end handles instruction fetch, branch
prediction, register renaming, and sorting 2
bundles per cycle into instruction buffer
Pipeline Parallelism
Intel Itanium processor 9500 series incorporates several major pipelines that
operate independently from one other and are separated by decoupling buf-
fers, representing another major form of parallelism. The front-end pipeline
fetches instructions, performs branch prediction, partially decodes instructions
and renames registers. After owing down the front-end pipeline, 6 instruc-
tions per cycle are placed into a 192-entry instruction buffer. The instruc-
tion buffer is divided into 6 logical queues corresponding to execution unit
type. The main pipeline reads instructions out of this buffer and executes
them in the 12 execution units described above. This allows the core to run
at increased frequencies as well as providing the ability to provide hardware
recovery of errors.
Instruction
Buffer
Back-End
Front-End
3
Intel® Itanium® 9500 Processor Series: Advanced EPIC Architecture
3