Product specifications
Table Of Contents
- Table of Contents
- 1 Introduction
- 2 Feature Overview
- 3 Step-by-Step Cluster Setup and MPI Usage Checklists
- 4 InfiniPath Cluster Setup and Administration
- Introduction
- Installed Layout
- Memory Footprint
- BIOS Settings
- InfiniPath and OpenFabrics Driver Overview
- OpenFabrics Drivers and Services Configuration and Startup
- Other Configuration: Changing the MTU Size
- Managing the InfiniPath Driver
- More Information on Configuring and Loading Drivers
- Performance Settings and Management Tips
- Host Environment Setup for MPI
- Checking Cluster and Software Status
- 5 Using QLogic MPI
- Introduction
- Getting Started with MPI
- QLogic MPI Details
- Use Wrapper Scripts for Compiling and Linking
- Configuring MPI Programs for QLogic MPI
- To Use Another Compiler
- Process Allocation
- mpihosts File Details
- Using mpirun
- Console I/O in MPI Programs
- Environment for Node Programs
- Environment Variables
- Running Multiple Versions of InfiniPath or MPI
- Job Blocking in Case of Temporary InfiniBand Link Failures
- Performance Tuning
- MPD
- QLogic MPI and Hybrid MPI/OpenMP Applications
- Debugging MPI Programs
- QLogic MPI Limitations
- 6 Using Other MPIs
- A mpirun Options Summary
- B Benchmark Programs
- C Integration with a Batch Queuing System
- D Troubleshooting
- Using LEDs to Check the State of the Adapter
- BIOS Settings
- Kernel and Initialization Issues
- OpenFabrics and InfiniPath Issues
- Stop OpenSM Before Stopping/Restarting InfiniPath
- Manual Shutdown or Restart May Hang if NFS in Use
- Load and Configure IPoIB Before Loading SDP
- Set $IBPATH for OpenFabrics Scripts
- ifconfig Does Not Display Hardware Address Properly on RHEL4
- SDP Module Not Loading
- ibsrpdm Command Hangs when Two Host Channel Adapters are Installed but Only Unit 1 is Connected to the Switch
- Outdated ipath_ether Configuration Setup Generates Error
- System Administration Troubleshooting
- Performance Issues
- QLogic MPI Troubleshooting
- Mixed Releases of MPI RPMs
- Missing mpirun Executable
- Resolving Hostname with Multi-Homed Head Node
- Cross-Compilation Issues
- Compiler/Linker Mismatch
- Compiler Cannot Find Include, Module, or Library Files
- Problem with Shell Special Characters and Wrapper Scripts
- Run Time Errors with Different MPI Implementations
- Process Limitation with ssh
- Number of Processes Exceeds ulimit for Number of Open Files
- Using MPI.mod Files
- Extending MPI Modules
- Lock Enough Memory on Nodes When Using a Batch Queuing System
- Error Creating Shared Memory Object
- gdb Gets SIG32 Signal Under mpirun -debug with the PSM Receive Progress Thread Enabled
- General Error Messages
- Error Messages Generated by mpirun
- MPI Stats
- E Write Combining
- F Useful Programs and Files
- G Recommended Reading
- Glossary
- Index

D–Troubleshooting
Kernel and Initialization Issues
IB6054601-00 H D-5
A
OpenFabrics Load Errors if ib_ipath Driver Load Fails
When the ib_ipath driver fails to load, the other OpenFabrics drivers/modules
will load and be shown by lsmod, but commands like ibstatus, ibv_devinfo,
and ipath_control -i will fail as follows:
# ibstatus
Fatal error: device ’*’: sys files not found
(/sys/class/infiniband/*/ports)
# ibv_devinfo
libibverbs: Fatal: couldn’t read uverbs ABI version.
No IB devices found
# ipath_control -i
InfiniPath driver not loaded ?
No InfiniPath info available
InfiniPath ib_ipath Initialization Failure
There may be cases where ib_ipath was not properly initialized. Symptoms of
this may show up in error messages from an MPI job or another program. Here is
a sample command and error message:
$ mpirun -np 2 -m ~/tmp/mbu13 osu_latency
<nodename>:ipath_userinit: assign_port command failed: Network is
down
<nodename>:can’t open /dev/ipath, network down
This will be followed by messages of this type after 60 seconds:
MPIRUN<node_where_started>: 1 rank has not yet exited 60 seconds
after rank 0 (node <nodename>) exited without reaching
MPI_Finalize().
MPIRUN<node_where_started>:Waiting at most another 60 seconds for
the remaining ranks to do a clean shutdown before terminating 1
node processes.
If this error appears, check to see if the InfiniPath driver is loaded by typing:
$ lsmod | grep ib_ipath
If no output is displayed, the driver did not load for some reason. In this case, try
the following commands (as root):
# modprobe -v ib_ipath
# lsmod | grep ib_ipath
# dmesg | grep -i ipath | tail -25
The output will indicate whether the driver has loaded. Printing out messages
using dmesg may help to locate any problems with ib_ipath.