White Papers

ManualsBrandsDell ManualsConverged InfrastructureHigh Performance Computing Solution Resources

7 Dell HPC Omni-Path Fabric: Supported Architecture and Application Study June 2016

1 Introduction

The High Performance Computing (HPC) domain primarily deals with problems which surpass the

capabilities of a standalone machine. With the advent of parallel programming, applications can scale past

a single server. High performance interconnects provide low latency and high bandwidth which are needed

for the application to divide the computational problem among multiple nodes, distribute data and then

merge partial results from each node to a final result. As the computation power increases with greater

number of nodes/cores added to the cluster configuration, the need for efficient and fast communication

has become essential to continue to improve system performance. Applications may be sensitive to

throughput and/or latency capabilities of the interconnect depending upon their communication

characteristics.

Intel

Omni-Path Architecture (OPA) is an evolution of Intel

True Scale Fabric Cray Aries interconnect

[1]

and internal Intel

IP. In contrast to Intel

True Scale Fabric edge switches that support 36 ports of InfiniBand

QDR-40Gbps performance, the new Intel

Omni-Path fabric edge switches support 48 ports of 100Gbps

performance. The switching latency for True Scale edge switches is 165ns-175ns. The switching latency for

the 48-port Omni-Path edge switch has been reduced to around 100ns-110ns. The Omni-Path Host Fabric

Interface (HFI) MPI messaging rate is expected to be around 160 Million messages per second (Mmps) and

a link bandwidth of 100Gbps.

The OPA technology includes a rich feature set. A few of those are described here

[1]

Dynamic Lane Scaling: When one or more physical lanes fail, the fabric continues to function with the

remaining available lanes and the recovery process is transparent to the user and application. This allows

jobs to continue and provides the flexibility of troubleshooting errors at a later time.

Adaptive Routing: This monitors the routing paths of all the fabrics connected to the switch and selects the

least congested path to balance the workload. This implementation is based on the cooperation between

Application-specific integrated circuits (ASIC) and Fabric Manger. The Fabric Manager performs the role of

initializing the fabrics and setting up routing tables, once this is done the ASICs actively monitor and manage

the routing by identifying fabric congestion. This feature helps the fabric to scale.

Dispersive Routing: Initializing and configuring the routes between the neighboring nodes of the fabric is

always critical. Dispersive routing distributes the traffic across multiple paths as opposed to sending them

to the destination via a single path. It helps to achieve maximum communication performance for the

workload and promotes optimal fabric efficiency.

Traffic Flow Optimization: Helps in prioritizing packets in mixed traffic environments like storage and MPI.

This helps to ensure that the high priority packets will not be delayed and there will be less/no latency

variation on the MPI job. Traffic can also be shaped during run time by using congestion control and adaptive

routing.

Software Ecosystem: It leverages the Open Fabric Alliance (OFA)

[2]

and uses a next generation Performance

Scaled Messaging (PSM) layer called PSM2 which supports extreme scale but is still compatible with previous

generation PSM applications. OPA also includes a software suite with extensive capabilities for monitoring

and managing the fabric.

On-load and Offload Model: Intel Omni-Path can support both on-load and offload models depending on

the data characteristics. There are two methods of sending data from one host to another.