User Manual

ManualsBrandsAsus ManualsServer AccessoriesPEM-FDR

141

142

143

144

145

146

147

148

149

150

HPC FeaturesRev 2.2-1.0.1

Mellanox Technologies

144

• “get” operations - data transfer from a different PE, and remote pointers, allowing

direct references to data objects owned by another PE

Additional supported operations are collective broadcast and reduction, barrier synchronization,

and atomic memory operations.

An atomic memory operation is an atomic read-and-update oper-

ation, such as a fetch-and-increment, on a remote or local data object.

SHMEM libraries implement active messaging. The sending of data involves only one CPU

where the source processor puts the data into the memory of the destination processor

. Likewise,

a processor can read data from another processor's memory without interrupting the remote CPU.

The remote processor is unaware that its memory has been read or written unless the programmer

implements a mechanism to accomplish this.

5.2.1 Mellanox ScalableSHMEM

The ScalableSHMEM programming library is a one-side communications library that supports a

unique set of parallel programming features including point-to-point and collective routines, syn-

chronizations, atomic operations, and a shared memory paradigm used between the processes of

a parallel programming application.

Mellanox ScalableSHMEM is based on the API defined by the OpenSHMEM.org consortium.

The library works with the OpenFabrics RDMA for Linux stack (OFED), and also has the ability

to utilize MellanoX Messaging libraries (MXM) as well as Mellanox Fabric Collective

Accelera-

tions (FCA), providing an unprecedented level of scalability for SHMEM programs running over

InfiniBand.

The latest ScalableSHMEM software can be downloaded from the Mellanox website.

5.2.2 Running SHMEM with FCA

The Mellanox Fabric Collective Accelerator (FCA) is a unique solution for offloading collective

operations from the Message Passing Interface (MPI) or ScalableSHMEM process onto Mella-

nox InfiniBand managed switch CPUs. As a system-wide solution, FCA utilizes intelligence on

Mellanox InfiniBand switches, Unified Fabric Manager and MPI nodes without requiring addi-

tional hardware. The FCA manager creates a topology based collective tree, and orchestrates an

ficient collective operation using the switch-based CPUs on the MPI/ScalableSHMEM nodes.

FCA accelerates MPI/ScalableSHMEM collective operation performance by up to 100 times

providing a reduction in the overall job runtime. Implementation is simple and transparent during

the job runtime.

FCA is disabled by default and must be configured prior to using it from the Scal-

ableSHMEM.

 To enable FCA by default in the ScalableSHMEM:

1. Edit the

/opt/mellanox/openshmem/2.2/etc/openmpi-mca-params.conf file.

2. Set the

scoll_fca_enable parameter to 1.

scoll_fca_enable=1

3. Set the scoll_fca_np parameter to 0.

scoll_fca_np=0