User guide
4–Running MPI on QLogic Adapters
Debugging MPI Programs
4-22 IB0054606-02 A
Debugging MPI Programs
Debugging parallel programs is substantially more difficult than debugging serial
programs. Thoroughly debugging the serial parts of your code before parallelizing
is good programming practice.
MPI Errors
Almost all MPI routines (except MPI_Wtime and MPI_Wtick) return an error
code; either as the function return value in C functions or as the last argument in a
Fortran subroutine call. Before the value is returned, the current MPI error handler
is called. By default, this error handler aborts the MPI job. Therefore, you can get
information about MPI exceptions in your code by providing your own handler for
MPI_ERRORS_RETURN. See the man page for the MPI_Errhandler_set for
details.
See the standard MPI documentation referenced in Appendix H for details on the
MPI error codes.
Using Debuggers
See http://www.open-mpi.org/faq/?category=debugging for details on
debugging with Open MPI.
NOTE
With Open MPI, and other PSM-enabled MPIs, you will typically want to turn
off PSM's CPU affinity controls so that the OpenMP threads spawned by an
MPI process are not constrained to stay on the CPU core of that process,
causing over-subscription of that CPU. Accomplish this using the
IPATH_NO_CPUAFFINITY=1 setting as follows:
OMP_NUM_THREADS=8 (typically set in the ~/.bashrc file)
mprun -np 2 -H host1,host2 -x IPATH_NO_CPUAFFINITY=1
./hybrid_app
In this case, typically there would be 8 or more CPU cores on the host1 and
host2 nodes, and this job would run on a total of 16 threads, 8 on each node.
You can use 'top' and then '1' to monitor that load is distributed to 8 different
CPU cores in this case.
[Both the OMP_NUM_THREADS and IPATH_NO_CPUAFFINITY can be
set in .bashrc or both on the command line after -x options.]
When there are more threads than CPUs, both MPI and OpenMP
performance can be significantly degraded due to over-subscription of the
CPUs