User Manual

HPC FeaturesRev 2.2-1.0.1
Mellanox Technologies
144
“get” operations - data transfer from a different PE, and remote pointers, allowing
direct references to data objects owned by another PE
Additional supported operations are collective broadcast and reduction, barrier synchronization,
and atomic memory operations.
An atomic memory operation is an atomic read-and-update oper-
ation, such as a fetch-and-increment, on a remote or local data object.
SHMEM libraries implement active messaging. The sending of data involves only one CPU
where the source processor puts the data into the memory of the destination processor
. Likewise,
a processor can read data from another processor's memory without interrupting the remote CPU.
The remote processor is unaware that its memory has been read or written unless the programmer
implements a mechanism to accomplish this.
5.2.1 Mellanox ScalableSHMEM
The ScalableSHMEM programming library is a one-side communications library that supports a
unique set of parallel programming features including point-to-point and collective routines, syn-
chronizations, atomic operations, and a shared memory paradigm used between the processes of
a parallel programming application.
Mellanox ScalableSHMEM is based on the API defined by the OpenSHMEM.org consortium.
The library works with the OpenFabrics RDMA for Linux stack (OFED), and also has the ability
to utilize MellanoX Messaging libraries (MXM) as well as Mellanox Fabric Collective
Accelera-
tions (FCA), providing an unprecedented level of scalability for SHMEM programs running over
InfiniBand.
The latest ScalableSHMEM software can be downloaded from the Mellanox website.
5.2.2 Running SHMEM with FCA
The Mellanox Fabric Collective Accelerator (FCA) is a unique solution for offloading collective
operations from the Message Passing Interface (MPI) or ScalableSHMEM process onto Mella-
nox InfiniBand managed switch CPUs. As a system-wide solution, FCA utilizes intelligence on
Mellanox InfiniBand switches, Unified Fabric Manager and MPI nodes without requiring addi-
tional hardware. The FCA manager creates a topology based collective tree, and orchestrates an
ef
ficient collective operation using the switch-based CPUs on the MPI/ScalableSHMEM nodes.
FCA accelerates MPI/ScalableSHMEM collective operation performance by up to 100 times
providing a reduction in the overall job runtime. Implementation is simple and transparent during
the job runtime.
FCA is disabled by default and must be configured prior to using it from the Scal-
ableSHMEM.
To enable FCA by default in the ScalableSHMEM:
1. Edit the
/opt/mellanox/openshmem/2.2/etc/openmpi-mca-params.conf file.
2. Set the
scoll_fca_enable parameter to 1.
scoll_fca_enable=1
3. Set the scoll_fca_np parameter to 0.
scoll_fca_np=0