NASA Case Study

ManualsBrandsIntel ManualsOtherIntel 10 Gigabit CX4 Dual Port Server Adapter

1

2

3

NASA’s Flexible Cloud Fabric: Moving Cluster Applications to the Cloud

wire speed. Clearly, this software-only

virtualization conguration is insufcient

to support the high-performance

computing demands of the NASA Center

for Climate Simulation. On the other

hand, the gures in the third column

show that virtualizing I/O with the help of

hardware acceleration drives throughput

up considerably, although the highest

throughput gures achieved in this test

case are less than 65 percent of wire speed.

The rightmost column of Table 2 shows

dramatic throughput improvement in the

virtualized environment when SR-IOV

is utilized. In fact, the gures in this

column approach those of the bare-

metal case, indicating that a properly

congured virtualized network can deliver

throughput that is roughly equivalent to

that of a non-virtualized one.

To expand on the Nuttcp results, the test

team performed trials on the other two

benchmarks with different message sizes.

Figure 1 shows throughput (left chart)

and latency (right chart) results for the

Ohio State MPI benchmark. Surprisingly,

the test conguration that uses SR-IOV

actually outperforms the bare-metal

one. The test team postulates that

this performance differential is due to

inefciencies in the Linux* kernel that

are overcome by direct assignment under

SR-IOV. In any event, this test result

does support the nding above that, in

some cases, virtualized performance with

SR-IOV can be comparable to equivalent

non-virtualized performance.

SINGLE-ROOT I/O VIRTUALIZATION

(SR-IOV) DEFINED

Supported by Intel® Ethernet Server

Adapters, SR-IOV is a standard

mechanism for devices to advertise

their ability to be simultaneously

shared among multiple virtual

machines (VMs). SR-IOV allows for

the partitioning of a PCI function into

many virtual functions (VFs) for the

purpose of sharing resources in virtual

or non-virtual environments. Each VF

can support a unique and separate

data path for I/O-related functions, so

for example, the bandwidth of a single

physical port can be partitioned into

smaller slices that may be allocated to

specic VMs or guests.

Finally, the test team considered the

results of throughput testing with the

Intel MKL implementation of LINPACK, as

shown in Figure 2. Here, while the SR-IOV

implementation increases performance

relative to the non-SR-IOV case, its

performance is somewhat lower than

0

110 100 1,000 10,000 100,000 1,000,000 10,000,000

100

200

300

400

500

600

700

800

900

1000

Throughput (MBytes/sec)

Message Size (Bytes)

Throughput

(Higher is better)

VM to VM (with SR-IOV)

Bare Metal to Bare Metal

VM to VM (with virtualized

I/O but without SR-IOV)

VM to VM (with SR-IOV)

Bare Metal to Bare Metal

VM to VM (with virtualized

I/O but without SR-IOV)

Latency

0

0 1,000 2,000 3,000 4,000 5,000

12000

10000

8000

6000

4000

2000

Throughput (MBytes/sec)

Message Size (MB)

(Lower is better)

Figure 1. Virtualized and non-virtualized performance results for the Ohio State University

MPI benchmark.

Ohio State University MPI Benchmarks

3