Accelerating High Speed Networking with Intel I/O Acceleration Technology

ManualsBrandsIntel ManualsOtherIntel 10 Gigabit AT2 Server Adapter

It is also important to note that some technologies already exist

for mitigating some CPU overhead issues. For example, all Intel®

PRO Server Adapters and Intel® PRO Network Connections (LAN

on motherboard or LOM) include advanced features designed to

reduce CPU usage.

These features include:

• Interrupt moderation

• TCP checksum offload

• TCP segmentation

• Large send offload

Their implementation does not impair compatibility with other

network components or require special management or modification

of operating systems or applications.

Other approaches exist that claim to address system overhead by

offloading even more processing to the NIC. These approaches

include the TCP offload engine (TOE) and remote direct memory

access (RDMA).

TOE uses a specialized and dedicated processor on the NIC to handle

some of the packet protocol processing. It does not fully address the

other performance bottlenecks shown in Figure 2. In fact, for small

packet sizes and short-duration connections, TOE may be of very

limited benefit in addressing the overall I/O performance problem.

Additionally, given its offloading (that is, copying) of the network

stack to a fixed-speed microcontroller (the offload engine), not only

is performance limited to the speed of the micro-controller, but there

is also a risk that key network capabilities, such as adapter teaming

and failover, may not function in a TOE environment.

As for RDMA-enabled NICs (RNICs), the RDMA protocol supports

direct placement of packet payload data into an application’s memory

space. This addresses the memory access bottleneck category by

reducing data movement overhead for the CPU. However, the RDMA

protocol requires significant changes to the network stack and can

require changes to application software that utilizes the network.

White Paper Accelerating High-Speed Networking with Intel® I/O Acceleration Technology

The limitations of existing I/O acceleration solutions became even

clearer as Intel research and development teams began to quantify

each category under different conditions. Figure 3 summarizes

these results for various application I/O sizes and their impact

by task category on percent of CPU utilization.

Notice in Figure 3 that CPU usage by TCP/IP processing is nearly

constant across I/O sizes ranging from 2KB to 64KB. Although

TCP/IP processing is a significant bottleneck, it is not the most

significant bottleneck. Memory accesses account for more CPU

usage in all cases than TCP/IP processing, and system overhead

is the worst I/O bottleneck for application I/O sizes below 8KB.

As stated earlier and indicated by the data in Figure 3, TOE and

RDMA do not address the entire I/O bottleneck issue. What is needed

is a system-wide solution that can fit anywhere in the enterprise

computing hierarchy without requiring modification of application

software and which provides acceleration benefits for all three

network bottlenecks. Intel I/OAT, now available on the new

Dual-Core and Quad-Core Intel® Xeon® processor-based platforms,

is exactly that kind of solution.

Defining the Worst I/O Bottleneck

2KB 4KB 8KB 16KB 32KB 64KB

Figure 3. CPU utilization varies according to I/O size. TCP/IP

processing is fairly constant and tends to be a smaller part of CPU

utilization compared to system overhead.