High-Performance Cluster for Biomedical Research Using 10 Gigabit Ethernet iWARP Fabric

ManualsBrandsIntel ManualsOtherIntel 10 Gigabit CX4 Dual Port Server Adapter

The iWARP protocol was developed

to perform within an Ethernet

infrastructure, and thus does not require

any modications to existing Ethernet

networks or equipment. At the same time,

iWARP’s Ethernet compatibility enables

IT organizations to take advantage of

enhancements to Ethernet, such as Data

Center Bridging, low-latency switches, and

IP security.

Standard Ethernet switches and routers

carry iWARP trafc over existing TCP/IP

protocols. Because iWARP is layered over

TCP, network equipment doesn’t need

to process the iWARP layer, nor does it

require any special-purpose functionality.

This enables the use of industry-accepted

management consoles that use existing IP

management protocols. The Open Fabrics

Alliance (www.openfabrics.org) provides

an open source RDMA software stack

that is hardware-agnostic and application-

agnostic for iWARP. These characteristics

allow iWARP to be readily integrated

into existing environments while

meeting stringent cost and performance

requirements.

Performance and Scalability Results

Using this cluster in the lab with the

HPL benchmark running on 4,000 cores,

project engineers attained performance

of 35.81 TeraFLOPS at 84.14 percent

efciency, as shown in Figure 3. The HPL

problem size used was 1,200,000, and the

problem size necessary to achieve half

the performance (N/2 problem size) was

300,000. Importantly, the performance

data scales in a nearly linear fashion

as the number of cores applied to the

benchmark workload increases.

From an engineering perspective, the

linearity of scaling in the results helps

ensure the viability of the topology for

large-scale computational problems. This

cluster demonstrates the best efciency

in an Ethernet solution compared to the

systems on the June 2010 Top 500

list,

as well as a placement within the range of

the top 100 supercomputers for efciency

on that list overall. Moreover, because

the data does not show an obvious

drop-off in efciency at this cluster size,

it suggests that the solution is scalable

40000

30000

20000

10000

500 1000 1500 2000 2500 3000 3500 4000 45000

90.20

89.63

88.79

87.62

86.93

84.14

2764

5432

8162

10740

21310

95910

Performance

Efficiency

Performance

(GF/s)

Number of Cores

100.00

75.00

50.00

25.00

0.00

Efficiency

(%)

Figure 3. As measured using the HPC LINPACK benchmark, the cluster achieves performance of 35.81 TeraFLOPS at 84.14 percent

efficiency using iWARP and 10 gigabit Ethernet.

beyond the size shown here, although

that hypothesis would need to be tested

to verify its validity. From a budgetary

perspective, the results demonstrate that

each compute node added to the cluster,

up to at least 500 nodes, provides value

commensurate with the overall cost of

the cluster.

These performance and efciency results

must be considered in the context that

this cluster conguration oversubscribes

the connections to the Arista 7xxx

switches by a factor of 2.475 to 1. Making

additional connections from the racks

to the network fabric using free ports

to reduce the oversubscription could

potentially result in higher performance.

This is a possible area for future inquiry.

June 2010 Top 500 Entry:

• Performance (R

max

): 35.81 teraops

Rank: #208

• Efciency (R

max

÷R

peak

): 84.14%

Rank: #84

A High-Performance Cluster for Biomedical Research Using 10 Gigabit Ethernet iWARP Fabric