Maximizing File Transfer Performance Using 10Gb Ethernet and Virtualization

Using the HPN-SSH layer to replace the
OpenSSH layer drives the throughput up to

threads for the cryptographic processing
and more than doubles application layer


If improving the bulk cryptographic
processing helps that much, what could


Without bulk cryptography, scp achieves

The performance gains in these cases,
however, rely on bypassing encryption


over the default OpenSSH layer that is
provided in most Linux distributions; this

According to a presentation created by
SLAC titled “P2P Data Copy Program
bbcp” (http://www.slac.stanford.edu/
grp/scs/paper/chep01-7-018.pdf), bbcp
encrypts sensitive passwords and control
information but does not encrypt the


without encrypting the bulk data, this can
still be an effective trade-off for many
environments where data privacy is not



the best results so far, surpassing even
the HPN-SSH cases with no cryptographic

better than the initial test results with



evaluating the use of HPN-SSH and bbcp

of the techniques tried so far, however,
has even come close to achieving 10
Gbps of throughputnot even reaching
0
10
20
30
40
50
60
70
80
90
100
NETPERF SCP
(SSH)
RSYNC
(SSH)
SCP
(HPN-SSH)
RSYNC
(HPN-SSH)
SCP
(HPN-SSH +
No Crypto)
RSYNC
(HPN-SSH +
No Crypto)
BBCP
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Avg. CPU (%Util)
Receive Throughput (Mbps)
Native Linux*: Various File Copy Tools (8 streams vs. 1 stream)
Receive Throughput – 1 Stream Receive Throughput – 8 Streams Avg. CPU (%Util) – 1 Stream Avg. CPU (%Util) – 8 Streams

and FedEx engineering team focused on
identifying any bottlenecks preventing


of the tools to enhance the performance
through multi-threading, the team also
wanted to know if any other available

To gain a better understanding of the
performance issues, the engineering

streams to attempt to drive up aggregate

obtain better utilization of the platform



show that using eight parallel streams
overcomes the threading limits of these
tools and drives aggregate bulk encrypted





These results demonstrate that using
more parallelism dramatically scales up
Figure 6.

but the testing did not demonstrate eight
times the throughput when using eight

problem does not lie in simply using the
brute-force approach and running more

These results also show that bulk
encryption is expensive, in terms of

HPN-SSH, with its multi-threaded
cryptography, still provides a significant
benefit, but not as dramatic a benefit as

The results associated with the remaining


bulk cryptography, and the third case is
the eight-thread bbcp result in which bulk

results demonstrate that it is possible to
achieve nearly the same 10 Gbps line rate
throughput number as the netperf micro-


As this testing indicates, using multiple
parallel streams and disabling bulk
cryptographic processing is effective
