User guide

DTroubleshooting
Kernel and Initialization Issues
IB0054606-02 A D-5
InfiniPath ib_qib Initialization Failure
There may be cases where ib_qib was not properly initialized. Symptoms of this
may show up in error messages from an MPI job or another program. Here is a
sample command and error message:
$ mpirun -np 2 -m ~/tmp/mbu13 osu_latency
<nodename>:ipath_userinit: assign_port command failed:
Network is down
<nodename>:can’t open /dev/ipath, network down
This will be followed by messages of this type after 60 seconds:
MPIRUN<node_where_started>: 1 rank has not yet exited 60
seconds after rank 0 (node <nodename>) exited without reaching
MPI_Finalize().
MPIRUN<node_where_started>:Waiting at most another 60 seconds
for the remaining ranks to do a clean shutdown before
terminating 1 node processes.
If this error appears, check to see if the InfiniPath driver is loaded by typing:
$ lsmod | grep ib_qib
If no output is displayed, the driver did not load for some reason. In this case, try
the following commands (as root):
# modprobe -v ib_qib
# lsmod | grep ib_qib
# dmesg | grep -i ib_qib | tail -25
The output will indicate whether the driver has loaded. Printing out messages
using dmesg may help to locate any problems with ib_qib.
If the driver loaded, but MPI or other programs are not working, check to see if
problems were detected during the driver and QLogic hardware initialization with
the command:
$ dmesg | grep -i ib_qib
This command may generate more than one screen of output.
Also, check the link status with the commands:
$ cat /sys/class/infiniband/ipath*/device/status_str
These commands are normally executed by the ipathbug-helper script, but
running them separately may help locate the problem.
See also “status_str” on page G-35 and “ipath_checkout” on page G-25.