User Manual
HPC FeaturesRev 2.2-1.0.1
Mellanox Technologies
152
By default the transports (TLS) used are: MXM_TLS=self,shm,ud
5.4.7 Configuring Service Level Support
Service Level Support is currently at alpha level.
Please be aware that the content below is subject to change.
MXM v3.0 added support for Service Level to enable Quality of Service (QoS). If set, every
InfiniBand endpoint in MXM will generate a random Service Level (SL) within the given range,
and use it for outbound communication.
Setting the value is done via the following environment parameter:
MXM_IB_NUM_SLS
Available Service Level values are 1-16 where the default is 1.
5.5 Fabric Collective Accelerator
To meet the needs of scientific research and engineering simulations, supercomputers are grow-
ing at an unrelenting rate. As supercomputers increase in size from mere thousands to hundreds-
of-thousands of processor cores, new performance and scalability challenges have emer
ged. In
the past, performance tuning of parallel applications could be accomplished fairly easily by sepa-
rately optimizing their algorithms, communication, and computational aspects. However, as sys-
tems continue to scale to larger machines, these issues become co-mingled and must be
addressed comprehensively
.
Collective communications execute global communication operations to couple all processes/
nodes in the system and therefore must be executed as quickly and as ef
ficiently as possible.
Indeed, the scalability of most scientific and engineering applications is bound by the scalability
and performance of the collective routines employed. Most current implementations of collective
operations will suffer from the effects of systems noise at extreme-scale (system noise increases
the latency of collective operations by amplifying the effect of small, randomly occurring OS
interrupts during collective progression.) Furthermore, collective operations will consume a sig-
nificant fraction of CPU cycles, cycles that could be better spent doing meaningful computation.
Mellanox Technologies has addressed these two issues, lost CPU cycles and performance lost to
the ef
fects of system noise, by offloading the communications to the host channel adapters
(HCAs) and switches. The technology, named CORE-Direct® (Collectives Offload Resource
Engine), provides the most advanced solution available for handling collective operations
thereby ensuring maximal scalability, minimal CPU overhead, and providing the capability to
overlap communication operations with computation allowing applications to maximize asyn-
chronous communication.
Users may benefit immediately from CORE-Direct® out-of-the-box by simply specifying the
necessary BCOL/SBGP combinations. In order to take maximum advantage of CORE-Direct®,
users may modify their applications to use MPI 3.0 non-blocking routines while using CORE-
Direct® to of
fload the collective "under-the-covers", thereby allowing maximum opportunity to
overlap communication with computation.
Additionally, FCA 3.0 also contains support to build runtime configurable hierarchical collec-
tives. We currently support socket and UMA level discovery with network topology slated for
future versions.
As with FCA 2.X we also provide the ability to accelerate collectives with hard-