Intel DMA Coalescing White Paper

Note: One impact of delaying interrupts
and DMA operations is an increase in
latency. Most (not all) applications are
quite tolerant of latency.
DMA coalescing is accomplished by using
the existing transmit and receive buffers
on the LAN device to store packets, rather
than immediately transferring packet data
to or from host memory (as current LAN
solutions do). After either a given amount
of network data has been buffered (called
a watermark), or, after a congurable
timer expires, the LAN device exits out
of coalescing mode and bursts data
accesses and interrupts to the platform.
DMA coalescing also enhances previously
existing interrupt moderation behavior by
throttling the observed device interrupt
rate in conjunction with the congurable
DMA coalescing timer rate. The interrupt
rate is governed by the Interrupt-
Moderation-Rate (ITR).
Enable Functionality Only When
Needed
With Intel’s PMT’s support of the ECMA-
393 ProxZzzy specication, servers can
move to low-power standby states (such
as S3), maintain network presence, and
be remotely activated via a variety of
wakeup packet types.
Intel also supports Low-Power-Link-Up
(LPLU). This facility reduces the link power
usage in S3 by negotiating the lowest
link-speed (where bandwidth capacity isn’t
required).
DMA Coalescing Experiments & Testing
Experiments were performed to evaluate
the power saving benets of Intel PMTs
and the impact on network performance.
Intel’s PMT scales to reduce power
consumption over a wide range of
network usage levels. (See Figure 3.)
At network usage below 5%, EEE
(802.3az) was most effective, since there
is more time to keep the link in a low-
powered state. DMA coalescing showed
no signicant benet at such low usage
rates, since not much data is transferred
at those rates.
DMA Coalescing is most effective in the
5% to 35% range, with maximum benet
at 25% usage. Above 35%, power saving
benets decrease. Industry studies report
that most servers experience usage rates
of 20 – 35%, with only 10-15% of a 1 Gbps
link’s bandwidth used.
At higher usage, interrupt moderation
directly reduces platform power by
reducing overall CPU usage. This,
combined with the Intel I350’s low active
power, provides the active system power
benet.
Experiments
Experiments using an Intel® Urbanna DP
platform were run as follows:
1. Vary the network load
2. Vary Interrupt Moderation Rate
3. Measure the platform power
4. Enable DMA Coalescing and vary the
DMA coalescing watchdog time
5. Fix the Interrupt Moderation Rate (ITR)
value
6. Measure the platform power
• Platform – Test setup
2 x 2.93 GHz Quad-core Xeon® CPUs
(X5570)
12 GB (2048 x 6) DDR3 1333MHz
memory
BIOS defaults – enhanced C-states, C6/
Turbo/HT–enabled
I350 development - test adapter
Linux* 2.6.32 with the following
features enabled; tickless, high_res_
timers, hpet_timer, ondemand CPU
governor, Powertop- timer_stats and
PCI-ASPM.
• Manually force ASPM L1 on the network
adaptor port.
• Network connection at 1 Gbps.
Figure 3
3
Intel® I350 Ethernet Controller & DMA Coalescing
3