Product specifications

Table Of Contents
D–Troubleshooting
QLogic MPI Troubleshooting
D-24 IB6054601-00 H
S
The following message displays after installation:
$ mpirun -m ~/tmp/sm -np 2 -mpi_latency 1000 1000000
node-00:1.ipath_update_tid_err: failed: Cannot allocate memory
mpi_latency:
/fs2/scratch/infinipath-build-1.3/mpi-1.3/mpich/psm/src
mq_ips.c:691:
mq_ipath_sendcts: Assertion ‘rc == 0’ failed. MPIRUN: Node program
unexpectedly quit. Exiting.
You can check the ulimit -l on all the nodes by running ipath_checkout. A
warning similar to this displays if ulimit -l
is less than 4096:
!!!ERROR!!! Lockable memory less than 4096KB on x nodes
To fix this error, install the infinipath RPM on the node, and reboot it to ensure
that /etc/initscript is run.
Alternately, you can create your own /etc/initscript and set the ulimit
there.
Error Creating Shared Memory Object
QLogic MPI (and PSM) use Linux’s shared memory mapped files to share
memory within a node. When an MPI job is started, a shared memory file is
created on each node for all MPI ranks sharing memory on that one node. During
job execution, the shared memory file remains in /dev/shm. At program exit, the
file is removed automatically by the operating system when the QLogic MPI
(InfiniPath) library properly exits. Also, as an additional backup in the sequence of
commands invoked by mpirun during every MPI job launch, the file is explicitly
removed at program termination.
However, under circumstances such as hard and explicit program termination (i.e.
kill -9 on the mpirun process PID), QLogic MPI cannot guarantee that the
/dev/shm file is properly removed. As many stale files accumulate on each node,
an error message like the following can appear at startup:
node023:6.Error creating shared memory object in shm_open(/dev/shm
may have stale shm files that need to be removed):
If this occurs, administrators should clean up all stale files by running this
command (as a root user):
# rm -rf /dev/shm/psm_shm.*
You can also selectively identify stale files by using a combination of the fuser,
ps, and rm commands (all files start with the psm_shm prefix). Once identified,
you can issue rm commands on the stale files that you own.