Family paper
for data from main memory. The newer processor supports switching
for medium latency events and for spin-lock loops. These and other
enhancements to thread-switching logic help to boost core utilization,
which can improve both application response times and overall
system throughput.
The large and fast cache structures of the Intel Itanium processor 9300
series also help to sustain high levels of core utilization, by delivering data
to the cores at or near clock speed. The processor is available with up to
30 MB of on-die cache versus 27.5 MB on its predecessor. Each core has
its own dedicated L1, L2 and L3 cache. Although the dedicated cache
architecture may increase cache miss rates somewhat compared with
a shared cache solution, it provides lower latency, higher bandwidth and
better quality of service (QoS). This can be especially important when
running multiple mission-critical applications per system. It ensures that
each core has dedicated cache resources to provide more predictable
performance and throughput.
Higher Performance through Enhanced
Instruction-Level Parallelism (ILP)
ILP refers to a processor’s ability to simultaneously process multiple
instructions on each software thread. A high degree of ILP increases
throughput and decreases response times for operations and trans-
actions that must be performed sequentially, which are common in
transactional applications and business intelligence queries. It can
also help to reduce latencies for individual software threads in
multi-threaded workloads.
The Intel Itanium architecture is based on the Explicitly Parallel
Instruction Computing (EPIC) model, which was specifically designed
to enable very high ILP. The processor employs an exceptionally wide
and short pipeline (six instructions wide and only eight stages deep)
and disperses instructions among 11 functional units. It also supports
zero cycle load-use penalties and zero-cycle branch re-steers, and
implements extensive bypasses so a thread is less likely to stall
the pipeline.
Since the basic core structure for the Intel Itanium processor family has
been tuned and optimized over many generations, no major changes
were made to the cores in the Intel Itanium processor 9300 series.
Instead, the focus for improving ILP was to dramatically increase the
processor’s ability to feed its multiple, high-performance cores with
data and instructions (see next section).
However, one significant enhancement to the core architecture was
implemented. The first-level Data Translation Lookaside Buffer (DTLB),
which translates virtual addresses to physical addresses, now supports
larger pages (8 K and 16 K). The larger page sizes help to reduce the
number of misses (and resulting stall cycles), which can improve
performance for many applications.
Improvements in Scalability and Headroom
Communication channels within the processor have received a
major overhaul to help keep the cores operating at high levels of
utilization and to provide even better support for memory- and
I/O-intensive applications.
New Intel® QuickPath Interconnect Technology
The Intel Itanium processor 9300 series marks the first implementa-
tion of Intel QuickPath Interconnect Technology in the Intel Itanium
processor family. Intel QuickPath Interconnect Technology replaces the
Front Side Bus of previous processors with a point-to-point architecture
that is more scalable and resilient. With four full-width Intel QuickPath
Interconnect links and two half-width links, the Intel Itanium processor
9300 series provides peak processor-to-processor and processor-to-I/O
communications up to 96 GB/s, or up to nine times that of its prede-
cessor. It also provides a solution that scales automatically in larger
system designs. As more processors are added, system bandwidth
increases accordingly.
Intel QuickPath Interconnect Technology also provides a strong founda-
tion for scaling system bandwidth in future processor generations.
More links can be added as needed to support additional cores. Since
the links are point-to-point and unidirectional, higher bandwidth per pin
is technically feasible, which furnishes another avenue for future scaling.
Intel QuickPath Interconnect Technology is also being integrated into
Intel® Xeon® processors, so server manufacturers will be able to employ
a common chipset for the two processor families. This will provide
significant economies of scale and help fuel faster innovation
across both architectures.
4
White Paper: The Intel® Itanium® Processor 9300 Series