User Manual

ManualsBrandsAsus ManualsServer AccessoriesPEM-FDR

111

112

113

114

115

116

117

118

119

120

HPC FeaturesRev 2.1-1.0.6

Mellanox Technologies

114

5 HPC Features

5.1 Shared Memory Access

The Shared Memory Access (SHMEM) routines provide low-latency, high-bandwidth communi-

cation for use in highly parallel scalable programs. The routines in the SHMEM Application Pro-

gramming Interface (API) provide a programming model for exchanging data between

cooperating parallel processes. The SHMEM API can be used either alone or in combination

with MPI routines in the same parallel program.

The SHMEM parallel programming library is an easy-to-use programming model which uses

highly efficient one-sided communication APIs to provide an intuitive global-view interface to

shared or distributed memory systems. SHMEM's capabilities provide an excellent low level

interface for PGAS applications.

A SHMEM program is of a single program, multiple data (SPMD) style. All the SHMEM pro-

cesses, referred as processing elements (PEs), start simultaneously and run the same program.

Commonly, the PEs perform computation on their own sub-domains of the larger problem, and

periodically communicate with other PEs to exchange information on which the next communi-

cation phase depends.

The SHMEM routines minimize the overhead associated with data transfer requests, maximize

bandwidth, and minimize data latency (the period of time that starts when a PE initiates a transfer

of data and ends when a PE can use the data).

SHMEM routines support remote data transfer through:

•

“put” operations - data transfer to a different PE

•

“get” operations - data transfer from a different PE, and remote pointers, allowing

direct references to data objects owned by another PE

Additional supported operations are collective broadcast and reduction, barrier synchronization,

and atomic memory operations. An atomic memory operation is an atomic read-and-update oper-

ation, such as a fetch-and-increment, on a remote or local data object.

SHMEM libraries implement active messaging. The sending of data involves only one CPU

where the source processor puts the data into the memory of the destination processor. Likewise,

a processor can read data from another processor's memory without interrupting the remote CPU.

The remote processor is unaware that its memory has been read or written unless the programmer

implements a mechanism to accomplish this.

5.1.1 Mellanox ScalableSHMEM

The ScalableSHMEM programming library is a one-side communications library that supports a

unique set of parallel programming features including point-to-point and collective routines, syn-

chronizations, atomic operations, and a shared memory paradigm used between the processes of

a parallel programming application.

Mellanox ScalableSHMEM is based on the API defined by the OpenSHMEM.org consortium.

The library works with the OpenFabrics RDMA for Linux stack (OFED), and also has the ability

to utilize MellanoX Messaging libraries (MXM) as well as Mellanox Fabric Collective Accelera-

tions (FCA), providing an unprecedented level of scalability for SHMEM programs running over

InfiniBand.

The latest ScalableSHMEM software can be downloaded from the Mellanox website.