User guide

6SHMEM Description and Configuration
Progress Model
6-12 IB0054606-02 A
Active Progress
In the active progress mode SHMEM progress is achieved when the application
calls into the SHMEM library. This approach is well matched to applications that
call into SHMEM frequently, for example, to have a fine grained mix of SHMEM
operations and computation. This mix is typical of many SHMEM applications.
Applications that spend large amount of contiguous time in computation without
calling SHMEM routines will cause SHMEM progress to be delayed for that period
of time. Additionally, applications must not poll on locations waiting for puts to
arrive without calling SHMEM, since progress will not occur and the program will
hang. Instead, SHMEM applications should use one of the wait synchronization
primitives provided by SHMEM. In active progress mode QLogic SHMEM will
achieve full performance.
Passive Progress
In the passive progress mode SHMEM progress will continue to occur when the
application calls into SHMEM, but can additionally occur in the background when
the application is not calling into SHMEM. This is achieved using an additional
progress thread per PE. The progress thread is provided by PSM and is
scheduled at a relatively low frequency, typically 10 to 100 times a second. This
thread will cause independent SHMEM progress where required, both on the
initiator side and the target side of SHMEM operations. In this mode applications
can poll on locations waiting for puts to arrive without calling SHMEM. Progress
will be achieved in this case by the progress thread, though it will incur the
scheduling latency for the progress thread which may have a significant impact on
overall performance if this idiom is used frequently. The scheduling frequency of
the PSM progress thread can be tuned as described in the Environment Variables
section.
Other performance effects of using passive progress include the following:
The progress thread consumes some CPU cycles, though this is low
because the progress thread runs infrequently.
The SHMEM library uses additional locks in its implementation to protect its
data structures against concurrent updates from the PE thread and the
progress thread. There is a slight additional cost in the performance critical
path because of this locking. This cost is minimal because contention on the
lock is very low (the progress thread runs infrequently) and because each
progress thread runs on the same CPU core as the corresponding PE
thread (giving good cache locality for the lock).