Accelerating High Speed Networking with Intel I/O Acceleration Technology
It is also important to note that some technologies already exist
for mitigating some CPU overhead issues. For example, all Intel®
PRO Server Adapters and Intel® PRO Network Connections (LAN
on motherboard or LOM) include advanced features designed to
reduce CPU usage.
These features include:
• Interrupt moderation
• TCP checksum offload
• TCP segmentation
• Large send offload
Their implementation does not impair compatibility with other
network components or require special management or modification
of operating systems or applications.
Other approaches exist that claim to address system overhead by
offloading even more processing to the NIC. These approaches
include the TCP offload engine (TOE) and remote direct memory
access (RDMA).
TOE uses a specialized and dedicated processor on the NIC to handle
some of the packet protocol processing. It does not fully address the
other performance bottlenecks shown in Figure 2. In fact, for small
packet sizes and short-duration connections, TOE may be of very
limited benefit in addressing the overall I/O performance problem.
Additionally, given its offloading (that is, copying) of the network
stack to a fixed-speed microcontroller (the offload engine), not only
is performance limited to the speed of the micro-controller, but there
is also a risk that key network capabilities, such as adapter teaming
and failover, may not function in a TOE environment.
As for RDMA-enabled NICs (RNICs), the RDMA protocol supports
direct placement of packet payload data into an application’s memory
space. This addresses the memory access bottleneck category by
reducing data movement overhead for the CPU. However, the RDMA
protocol requires significant changes to the network stack and can
require changes to application software that utilizes the network.
White Paper Accelerating High-Speed Networking with Intel® I/O Acceleration Technology
4
The limitations of existing I/O acceleration solutions became even
clearer as Intel research and development teams began to quantify
each category under different conditions. Figure 3 summarizes
these results for various application I/O sizes and their impact
by task category on percent of CPU utilization.
Notice in Figure 3 that CPU usage by TCP/IP processing is nearly
constant across I/O sizes ranging from 2KB to 64KB. Although
TCP/IP processing is a significant bottleneck, it is not the most
significant bottleneck. Memory accesses account for more CPU
usage in all cases than TCP/IP processing, and system overhead
is the worst I/O bottleneck for application I/O sizes below 8KB.
As stated earlier and indicated by the data in Figure 3, TOE and
RDMA do not address the entire I/O bottleneck issue. What is needed
is a system-wide solution that can fit anywhere in the enterprise
computing hierarchy without requiring modification of application
software and which provides acceleration benefits for all three
network bottlenecks. Intel I/OAT, now available on the new
Dual-Core and Quad-Core Intel® Xeon® processor-based platforms,
is exactly that kind of solution.
Defining the Worst I/O Bottleneck
2KB 4KB 8KB 16KB 32KB 64KB
Figure 3. CPU utilization varies according to I/O size. TCP/IP
processing is fairly constant and tends to be a smaller part of CPU
utilization compared to system overhead.