Intel Microarchitecture and 10 gigabit Ethernet Transforming the Data Center
New Server Architecture Optimizes I/O Performance
10GbE as a universal network fabric provides significant advantages
for creating a flexible, reliable, and agile data center. However,
achieving optimum performance in the dynamic data center also
requires a platform that is optimized for greater I/O scalability.
Servers based on the Intel Xeon processor 5500 series provide the
architectural elements that allow for new levels of 10GbE scalability.
Figure 3 illustrates some of the key differences between this new
architecture and its predecessor.
Faster Memory and Caches
The most noticeable difference between the two architectures of
Figure 3 is that the new architecture has an integrated memory
controller and significantly higher memory bandwidth. In particular, the
Intel Xeon processor 5500 series-based servers support faster DDR3
memory, as opposed to the DDR2 memory of previous generations
of FBDIMMs. This enhancement provides a peak memory bandwidth
available per socket of about 32 gigabytes per second (GB/s), which
is much higher than what is available on previous platforms for the
entire system. With 32 GB available per CPU socket, a dual-processor
system has close to 64 GB/s of peak memory bandwidth, which is
almost 3x more read and 6x more write bandwidth compared to the
previous generation.
Other memory improvements include a fifty percent increase of the
level-two cache (L2C) and the addition of a large last-level cache (LLC).
The inclusive LLC enables dynamic and efficient allocation of shared
cache to all four cores in each processor, increasing performance while
reducing traffic to the processor cores by eliminating unnecessary
core snoops to memory.
Faster Interconnect Architecture
Another key difference between the two generations shown in
Figure 3 is that the Front Side Bus (FSB) has been replaced by the
Intel® QuickPath Interconnect (Intel® QPI). Intel QPI provides higher
bandwidth, and the new architecture also includes an Intel QPI
interface from processor to processor, providing additional coherent
bus bandwidth.
On previous generation platforms, all memory reads and writes
from the CPUs had to traverse the FSB. With the Intel® QuickPath
Architecture, local memory reads and writes do not require an Intel QPI
traversal, except for snoops. This reduces the amount of bandwidth
consumed on Intel QPI. The presence of an additional Intel QPI
interface provides additional bandwidth for I/O-to-memory and CPU-
to-CPU data transfers.
Faster PCI Express Bus
To support higher-bandwidth I/O and the ability to scale across multi-
port 10GbE, the new architecture in Figure 3(b) uses a PCI Express 2.0
(5.0GT/s)—PCIe2—bus from the I/O Hub to the network interface card
(NIC). PCIe2 provides a greater transfer rate, to 5.0 Giga transfers per
second (5.0GT/s), twice the 2.5GT/s rate of first-generation PCIe. With
these new platforms, there will generally be more PCIe2 lanes made
available, which can lead to more slots or wider slots. For example, a
x8 PCIe2 slot is capable of a theoretical peak of 4 GB/s per direction,
and with more lanes, better I/O scaling is possible, providing the ability
to add more ports for superb scaling across multiple 10GbE Server
Adapter ports.
(b) New Architecture(a) Previous Generation
CPU
L2C
CPU
Memory
Controller
L2C
CPU
LLC
Memory
Controller
Memory
Memory
PCIe
Memory
CPU
LLC
PCIe2
(5.0GT/s)
Intel® QuickPath
Interconnect
(Intel® QPI)
Front Side Bus
Memory
Controller
I/O Hub I/O Hub
NIC NIC NIC NIC
Figure 3. Comparison of previous generation architecture (a) to the new Intel® Microarchitecture, codenamed Nehalem, (b) used in the Intel® Xeon®
processor 5500 series.
6
White Paper: Intel® Microarchitecture and 10 Gigabit Ethernet Transforming the Data Center