NASA Case Study
NASA’s Flexible Cloud Fabric: Moving Cluster Applications to the Cloud
wire speed. Clearly, this software-only
virtualization conguration is insufcient
to support the high-performance
computing demands of the NASA Center
for Climate Simulation. On the other
hand, the gures in the third column
show that virtualizing I/O with the help of
hardware acceleration drives throughput
up considerably, although the highest
throughput gures achieved in this test
case are less than 65 percent of wire speed.
The rightmost column of Table 2 shows
dramatic throughput improvement in the
virtualized environment when SR-IOV
is utilized. In fact, the gures in this
column approach those of the bare-
metal case, indicating that a properly
congured virtualized network can deliver
throughput that is roughly equivalent to
that of a non-virtualized one.
To expand on the Nuttcp results, the test
team performed trials on the other two
benchmarks with different message sizes.
Figure 1 shows throughput (left chart)
and latency (right chart) results for the
Ohio State MPI benchmark. Surprisingly,
the test conguration that uses SR-IOV
actually outperforms the bare-metal
one. The test team postulates that
this performance differential is due to
inefciencies in the Linux* kernel that
are overcome by direct assignment under
SR-IOV. In any event, this test result
does support the nding above that, in
some cases, virtualized performance with
SR-IOV can be comparable to equivalent
non-virtualized performance.
SINGLE-ROOT I/O VIRTUALIZATION
(SR-IOV) DEFINED
Supported by Intel® Ethernet Server
Adapters, SR-IOV is a standard
mechanism for devices to advertise
their ability to be simultaneously
shared among multiple virtual
machines (VMs). SR-IOV allows for
the partitioning of a PCI function into
many virtual functions (VFs) for the
purpose of sharing resources in virtual
or non-virtual environments. Each VF
can support a unique and separate
data path for I/O-related functions, so
for example, the bandwidth of a single
physical port can be partitioned into
smaller slices that may be allocated to
specic VMs or guests.
Finally, the test team considered the
results of throughput testing with the
Intel MKL implementation of LINPACK, as
shown in Figure 2. Here, while the SR-IOV
implementation increases performance
relative to the non-SR-IOV case, its
performance is somewhat lower than
0
110 100 1,000 10,000 100,000 1,000,000 10,000,000
100
200
300
400
500
600
700
800
900
1000
Throughput (MBytes/sec)
Message Size (Bytes)
Throughput
(Higher is better)
VM to VM (with SR-IOV)
Bare Metal to Bare Metal
VM to VM (with virtualized
I/O but without SR-IOV)
VM to VM (with SR-IOV)
Bare Metal to Bare Metal
VM to VM (with virtualized
I/O but without SR-IOV)
Latency
0
0 1,000 2,000 3,000 4,000 5,000
12000
10000
8000
6000
4000
2000
Throughput (MBytes/sec)
Message Size (MB)
(Lower is better)
Figure 1. Virtualized and non-virtualized performance results for the Ohio State University
MPI benchmark.
Ohio State University MPI Benchmarks
3