Product specifications

Table Of Contents
D–Troubleshooting
QLogic MPI Troubleshooting
IB6054601-00 H D-31
A
This message occurs when a cable is disconnected, a switch is rebooted, or when
there are other problems with the link. The job continues retrying until the
quiescence interval expires. See the mpirun -q option for information on
quiescence.
If a hardware problem occurs, an error similar to this displays:
infinipath: [error strings ] Hardware error
In this case, the MPI program terminates. The error string may provide additional
information about the problem. To further determine the source of the problem,
examine syslog on the node reporting the problem.
MPI Stats
Using the -print-stats option to mpirun provides a listing to stderr of
various MPI statistics. Here is example output for the -print-stats option
when used with an eight-rank run of the HPCC benchmark, using the following
command:
$ mpirun -np 8 -ppn 1 -m machinefile -M ./hpcc
By default, -M assumes -M=mpi and that the user wants only mpi level statistics.
The man page shows various other low-level categories of statistics that are
provided. Here is another example:
$ mpirun -np 8 -ppn 1 -m machinefile -M=mpi,ipath hpcc
STATS:
MPI Statistics Summary (max,min @ rank)
STATS:
Eager count sent
(max=171.94K @ 0, min=170.10K @ 3, med=170.20K @ 5)
STATS:
Eager bytes sent
(max=492.56M @ 5, min=491.35M @ 0, med=491.87M @ 1)
STATS:
Rendezvous count sent
(max= 5735 @ 0, min= 5729 @ 3, med= 5731 @ 7)
STATS:
Rendezvous bytes sent
(max= 1.21G @ 4, min= 1.20G @ 2, med= 1.21G @ 0)
STATS:
Expected count received
(max=173.18K @ 4, min=169.46K @ 1, med=172.71K @ 7)
STATS:
Expected bytes received
(max= 1.70G @ 1, min= 1.69G @ 2, med= 1.70G @ 7)
STATS:
Unexpect count received
(max= 6758 @ 0, min= 2996 @ 4, med= 3407 @ 2)
STATS:
Unexpect bytes received
(max= 1.48M @ 0, min=226.79K @ 5, med=899.08K @ 2)
STATS:
MPI Statistics Summary
(max,min @ rank)
STATS:
Eager count sent
(max=171.94K @ 0, min=170.10K @ 3, med=170.22K @ 1)
STATS:
Eager bytes sent
(max=492.56M @ 5, min=491.35M @ 0, med=491.87M @ 1)
STATS:
Rendezvous count sent
(max= 5735 @ 0, min= 5729 @ 3, med= 5731 @ 7)
STATS:
Rendezvous bytes sent
(max= 1.21G @ 4, min= 1.20G @ 2, med= 1.21G @ 0)
STATS:
Expected count received
(max=173.18K @ 4, min=169.46K @ 1, med=172.71K @ 7)
STATS:
Expected bytes received
(max= 1.70G @ 1, min= 1.69G @ 2, med= 1.70G @ 7)
STATS:
Unexpect count received
(max= 6758 @ 0, min= 2996 @ 4, med= 3407 @ 2)
STATS:
Unexpect bytes received
(max= 1.48M @ 0, min=226.79K @ 5, med=899.08K @ 2)
STATS:
InfiniPath low-level protocol stats
STATS:
pio busy count
(max=190.01K @ 0, min=155.60K @ 1, med=160.76K @ 5)
STATS:
scb unavail exp count
(max= 9217 @ 0, min= 7437 @ 7, med= 7727 @ 4)
STATS:
tid update count
(max=292.82K @ 6, min=290.59K @ 2, med=292.55K @ 4)
STATS:
interrupt thread count
(max= 941 @ 0, min= 335 @ 7, med= 439 @ 2)
STATS:
interrupt thread success
(max= 0.00 @ 3, min= 0.00 @ 1, med= 0.00 @ 0)