Product specifications
Table Of Contents
- Table of Contents
- 1 Introduction
- 2 Feature Overview
- 3 Step-by-Step Cluster Setup and MPI Usage Checklists
- 4 InfiniPath Cluster Setup and Administration
- Introduction
- Installed Layout
- Memory Footprint
- BIOS Settings
- InfiniPath and OpenFabrics Driver Overview
- OpenFabrics Drivers and Services Configuration and Startup
- Other Configuration: Changing the MTU Size
- Managing the InfiniPath Driver
- More Information on Configuring and Loading Drivers
- Performance Settings and Management Tips
- Host Environment Setup for MPI
- Checking Cluster and Software Status
- 5 Using QLogic MPI
- Introduction
- Getting Started with MPI
- QLogic MPI Details
- Use Wrapper Scripts for Compiling and Linking
- Configuring MPI Programs for QLogic MPI
- To Use Another Compiler
- Process Allocation
- mpihosts File Details
- Using mpirun
- Console I/O in MPI Programs
- Environment for Node Programs
- Environment Variables
- Running Multiple Versions of InfiniPath or MPI
- Job Blocking in Case of Temporary InfiniBand Link Failures
- Performance Tuning
- MPD
- QLogic MPI and Hybrid MPI/OpenMP Applications
- Debugging MPI Programs
- QLogic MPI Limitations
- 6 Using Other MPIs
- A mpirun Options Summary
- B Benchmark Programs
- C Integration with a Batch Queuing System
- D Troubleshooting
- Using LEDs to Check the State of the Adapter
- BIOS Settings
- Kernel and Initialization Issues
- OpenFabrics and InfiniPath Issues
- Stop OpenSM Before Stopping/Restarting InfiniPath
- Manual Shutdown or Restart May Hang if NFS in Use
- Load and Configure IPoIB Before Loading SDP
- Set $IBPATH for OpenFabrics Scripts
- ifconfig Does Not Display Hardware Address Properly on RHEL4
- SDP Module Not Loading
- ibsrpdm Command Hangs when Two Host Channel Adapters are Installed but Only Unit 1 is Connected to the Switch
- Outdated ipath_ether Configuration Setup Generates Error
- System Administration Troubleshooting
- Performance Issues
- QLogic MPI Troubleshooting
- Mixed Releases of MPI RPMs
- Missing mpirun Executable
- Resolving Hostname with Multi-Homed Head Node
- Cross-Compilation Issues
- Compiler/Linker Mismatch
- Compiler Cannot Find Include, Module, or Library Files
- Problem with Shell Special Characters and Wrapper Scripts
- Run Time Errors with Different MPI Implementations
- Process Limitation with ssh
- Number of Processes Exceeds ulimit for Number of Open Files
- Using MPI.mod Files
- Extending MPI Modules
- Lock Enough Memory on Nodes When Using a Batch Queuing System
- Error Creating Shared Memory Object
- gdb Gets SIG32 Signal Under mpirun -debug with the PSM Receive Progress Thread Enabled
- General Error Messages
- Error Messages Generated by mpirun
- MPI Stats
- E Write Combining
- F Useful Programs and Files
- G Recommended Reading
- Glossary
- Index

D–Troubleshooting
QLogic MPI Troubleshooting
IB6054601-00 H D-25
A
gdb Gets SIG32 Signal Under mpirun -debug with the PSM
Receive Progress Thread Enabled
When you run mpirun -debug and the PSM receive progress thread is enabled,
gdb (the GNU debugger) reports the following error:
(gdb) run
Starting program: /usr/bin/osu_bcast < /dev/null [Thread debugging
using libthread_db enabled] [New Thread 46912501386816 (LWP
13100)] [New Thread 1084229984 (LWP 13103)] [New Thread 1094719840
(LWP 13104)]
Program received signal SIG32, Real-time event 32.
[Switching to Thread 1084229984 (LWP 22106)] 0x00000033807c0930 in
poll () from /lib64/libc.so.6
This signal is generated when the main thread cancels the progress thread. To fix
this problem, disable the receive progress thread when debugging an MPI
program. Add the following line to $HOME/.mpirunrc:
export PSM_RCVTHREAD=0
NOTE:
It is important that /dev/shm be writable by all users, or else error
messages like the ones in this section can be expected. Also, non-QLogic
MPIs that use PSM may be more prone to stale shared memory files when
processes are abnormally terminated.
NOTE:
Remove the above line from $HOME/.mpirunrc after you debug an MPI
program. If this line is not removed, the PSM receive progress thread will be
permanently disabled. To check if the receive progress thread is enabled,
look for output similar to the following when using the mpirun -verbose
flag:
idev-17:0.env PSM_RCVTHREAD Recv thread flags
0 disables thread) => 0x1
The value 0x1 indicates that the receive thread is currently enabled. A value
of 0x0 indicates that the receive thread is disabled.