Maximizing File Transfer Performance Using 10Gb Ethernet and Virtualization
7
Figure 8 indicates that the throughput
in a VM is lower across all cases when
virtualized case, the throughput for the
scp and rsync running over standard ssh,
the throughput ranges from 300 Mbps to
Using the HPN-SSH layer to replace the
Also, disabling cryptography increases
the throughput, but not at as high a level
The same limitations that occurred in the
native case (such as standard tools not
being well threaded and cryptography
adding to the overhead) also apply in this
tools cannot take full advantage of
Most of the tools and utilities—including
ssh and scp—are single threaded; the
HPN-SSH layer to replace the OpenSSH
HPN-SSH, the cryptography operations are
multi-threaded (four threads), which boosts
threaded MAC layer, however, still creates
cryptography disabled, the performance
case with bbcp, which is multi-threaded
(using four threads by default), but the bulk
The next test uses eight parallel streams,
attempting to work around the threading
shows the receive network throughput
transfer tools when running eight parallel
Figure 8.
0
10
20
30
40
50
60
70
80
90
100
NETPERF SCP
(SSH)
RSYNC
(SSH)
SCP
(HPN-SSH)
RSYNC
(HPN-SSH)
SCP
(HPN-SSH +
No Crypto)
RSYNC
(HPN-SSH +
No Crypto)
BBCP
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Avg. CPU (%Util)
Receive Throughput (Mbps)
Receive Throughput – Native Receive Throughput – Virtualized
Avg. CPU (%Util) – Native Avg. CPU (%Util) – Virtualized
ESX* 4.0 GA (1 VM with 8 vCPU): Various File Copy Tools (1 stream)
Figure 9.
0
10
20
30
40
50
60
70
80
90
100
NETPERF SCP
(SSH)
RSYNC
(SSH)
SCP
(HPN-SSH)
RSYNC
(HPN-SSH)
SCP
(HPN-SSH +
No Crypto)
RSYNC
(HPN-SSH +
No Crypto)
BBCP
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Avg. CPU (%Util)
Receive Throughput (Mbps)
Receive Throughput – Native Receive Throughput – Virtualized
Avg. CPU (%Util) – Native Avg. CPU (%Util) – Virtualized
ESX* 4.0 GA (1VM with 8 vCPU): Various File Copy Tools (8 streams)
that cryptography operations are a limiter
the last three cases, in which the copies
Even though these results are better
compared to relying on one-stream data,
rate (approximately 10 Gbps) achieved in
one VM with eight vCPUs, the test team
determined that this might be a good case
for using direct assignment of the 10G NIC
7