White Papers

7 Dell HPC Omni-Path Fabric: Supported Architecture and Application Study June 2016
1 Introduction
The High Performance Computing (HPC) domain primarily deals with problems which surpass the
capabilities of a standalone machine. With the advent of parallel programming, applications can scale past
a single server. High performance interconnects provide low latency and high bandwidth which are needed
for the application to divide the computational problem among multiple nodes, distribute data and then
merge partial results from each node to a final result. As the computation power increases with greater
number of nodes/cores added to the cluster configuration, the need for efficient and fast communication
has become essential to continue to improve system performance. Applications may be sensitive to
throughput and/or latency capabilities of the interconnect depending upon their communication
characteristics.
Intel
®
Omni-Path Architecture (OPA) is an evolution of Intel
®
True Scale Fabric Cray Aries interconnect
[1]
and internal Intel
®
IP. In contrast to Intel
®
True Scale Fabric edge switches that support 36 ports of InfiniBand
QDR-40Gbps performance, the new Intel
®
Omni-Path fabric edge switches support 48 ports of 100Gbps
performance. The switching latency for True Scale edge switches is 165ns-175ns. The switching latency for
the 48-port Omni-Path edge switch has been reduced to around 100ns-110ns. The Omni-Path Host Fabric
Interface (HFI) MPI messaging rate is expected to be around 160 Million messages per second (Mmps) and
a link bandwidth of 100Gbps.
The OPA technology includes a rich feature set. A few of those are described here
[1]
:
Dynamic Lane Scaling: When one or more physical lanes fail, the fabric continues to function with the
remaining available lanes and the recovery process is transparent to the user and application. This allows
jobs to continue and provides the flexibility of troubleshooting errors at a later time.
Adaptive Routing: This monitors the routing paths of all the fabrics connected to the switch and selects the
least congested path to balance the workload. This implementation is based on the cooperation between
Application-specific integrated circuits (ASIC) and Fabric Manger. The Fabric Manager performs the role of
initializing the fabrics and setting up routing tables, once this is done the ASICs actively monitor and manage
the routing by identifying fabric congestion. This feature helps the fabric to scale.
Dispersive Routing: Initializing and configuring the routes between the neighboring nodes of the fabric is
always critical. Dispersive routing distributes the traffic across multiple paths as opposed to sending them
to the destination via a single path. It helps to achieve maximum communication performance for the
workload and promotes optimal fabric efficiency.
Traffic Flow Optimization: Helps in prioritizing packets in mixed traffic environments like storage and MPI.
This helps to ensure that the high priority packets will not be delayed and there will be less/no latency
variation on the MPI job. Traffic can also be shaped during run time by using congestion control and adaptive
routing.
Software Ecosystem: It leverages the Open Fabric Alliance (OFA)
[2]
and uses a next generation Performance
Scaled Messaging (PSM) layer called PSM2 which supports extreme scale but is still compatible with previous
generation PSM applications. OPA also includes a software suite with extensive capabilities for monitoring
and managing the fabric.
On-load and Offload Model: Intel Omni-Path can support both on-load and offload models depending on
the data characteristics. There are two methods of sending data from one host to another.