User guide

IB0054606-02 A C-1
C Integration with a Batch
Queuing System
Most cluster systems use some kind of batch queuing system as an orderly way to
provide users with access to the resources they need to meet their job’s
performance requirements. One task of the cluster administrator is to allow users
to submit MPI jobs through these batch queuing systems.
For Open MPI, there are resources at openmpi.org
that document how to use the
MPI with three batch queuing systems. The links to the Frequently Asked
Questions (FAQs) for each of the three batch queuing system are as follows:
Torque / PBS Pro: http://www.open-mpi.org/faq/?category=tm
SLURM: http://www.open-mpi.org/faq/?category=slurm
Bproc: http://www.open-mpi.org/faq/?category=bproc
In this Appendix there are two sections which deal with process and file clean-up
after batch MPI/PSM jobs have completed: Clean Termination of MPI Processes
and Clean-up PSM Shared Memory Files.
Clean Termination of MPI Processes
The InfiniPath software normally ensures clean termination of all MPI programs
when a job ends, but in some rare circumstances an MPI process may remain
alive, and potentially interfere with future MPI jobs. To avoid this problem, run a
script before and after each batch job that kills all unwanted processes. QLogic
does not provide such a script, but it is useful to know how to find out which
processes on a node are using the QLogic interconnect. The easiest way to do
this is with the fuser command, which is normally installed in /sbin.
Run these commands as a root user to ensure that all processes are reported.
# /sbin/fuser -v /dev/ipath
/dev/ipath: 22648m 22651m
In this example, processes 22648 and 22651 are using the QLogic interconnect. It
is also possible to use this command (as a root user):
# lsof /dev/ipath