User guide

ManualsBrandsQlogic ManualsAdapterQME7342

111

112

113

114

115

116

117

118

119

120

6–SHMEM Description and Configuration

Progress Model

6-12 IB0054606-02 A

Active Progress

In the active progress mode SHMEM progress is achieved when the application

calls into the SHMEM library. This approach is well matched to applications that

call into SHMEM frequently, for example, to have a fine grained mix of SHMEM

operations and computation. This mix is typical of many SHMEM applications.

Applications that spend large amount of contiguous time in computation without

calling SHMEM routines will cause SHMEM progress to be delayed for that period

of time. Additionally, applications must not poll on locations waiting for puts to

arrive without calling SHMEM, since progress will not occur and the program will

hang. Instead, SHMEM applications should use one of the wait synchronization

primitives provided by SHMEM. In active progress mode QLogic SHMEM will

achieve full performance.

Passive Progress

In the passive progress mode SHMEM progress will continue to occur when the

application calls into SHMEM, but can additionally occur in the background when

the application is not calling into SHMEM. This is achieved using an additional

progress thread per PE. The progress thread is provided by PSM and is

scheduled at a relatively low frequency, typically 10 to 100 times a second. This

thread will cause independent SHMEM progress where required, both on the

initiator side and the target side of SHMEM operations. In this mode applications

can poll on locations waiting for puts to arrive without calling SHMEM. Progress

will be achieved in this case by the progress thread, though it will incur the

scheduling latency for the progress thread which may have a significant impact on

overall performance if this idiom is used frequently. The scheduling frequency of

the PSM progress thread can be tuned as described in the Environment Variables

section.

Other performance effects of using passive progress include the following:

 The progress thread consumes some CPU cycles, though this is low

because the progress thread runs infrequently.

 The SHMEM library uses additional locks in its implementation to protect its

data structures against concurrent updates from the PE thread and the

progress thread. There is a slight additional cost in the performance critical

path because of this locking. This cost is minimal because contention on the

lock is very low (the progress thread runs infrequently) and because each

progress thread runs on the same CPU core as the corresponding PE

thread (giving good cache locality for the lock).