HP-MPI User's Guide (11th Edition)

Tuning
Processor locality
Chapter 5190
Processor locality
The mpirun option -cpu_bind binds a rank to a locality domain (ldom) to
prevent a process from moving to a different ldom after startup. The
binding occurs before the MPI application is executed.
Similar results can be accomplished using "mpsched" but this has the
advantage of being more load-based distribution, and works well in psets
and across multiple machines.
Binding ranks to ldoms (-cpu_bind)
On SMP systems, processes sometimes move to a different ldom shortly
after startup or during execution. This increases memory latency and
can cause slower performance as the application is now accessing
memory across cells.
Applications which are very memory latency sensitive can show large
performance degradation when memory access is mostly off-cell.
To solve this problem, ranks need to reside in the same ldom which they
were originally created. To accomplish this, HP-MPI provides the
-cpu_bind flag, which locks down a rank to a specific ldom and prevents
it from moving during execution. To accomplish this, the -cpu_bind flag
will preload a shared library at startup for each process, which does the
following:
1. Spins for a short time in a tight loop to let the operating system
distribute processes to CPUs evenly.
2. Determines the current CPU and ldom of the process and if no
oversubscription occurs on the current CPU, it will lock the process to
the ldom of that CPU.
This will evenly distribute the ranks to CPUs, and prevents the ranks
from moving to a different ldom after the MPI application starts,
preventing cross-memory access.
See -cpu_bind under “mpirun options” on page 119 for more
information.