Product specifications

Table Of Contents
D–Troubleshooting
QLogic MPI Troubleshooting
D-28 IB6054601-00 H
S
The following message usually indicates a node failure or malfunctioning link in
the fabric:
Couldn’t connect to <IP> (LID=<lid>:<port>:<subport>). Time
elapsed 00:00:30. Still trying...
IP is the MPI rank’s IP address, and <lid><port><subport> are the rank’s lid,
port, and subport.
If messages similar to the following display, it may mean that the program is trying
to receive to an invalid (unallocated) memory address, perhaps due to a logic
error in the program, usually related to malloc/free:
ipath_update_tid_err: Failed TID update for rendezvous, allocation
problem
kernel: infinipath: get_user_pages (0x41 pages starting at
0x2aaaaeb50000
kernel: infinipath: Failed to lock addr 0002aaaaeb50000, 65 pages:
errno 12
TID is short for Token ID, and is part of the QLogic hardware. This error indicates
a failure of the program, not the hardware or driver.
MPI Messages
Some MPI error messages are issued from the parts of the code inherited from
the MPICH implementation. See the MPICH documentation for message
descriptions. This section discusses the error messages specific to the QLogic
MPI implementation.
These messages appear in the mpirun output. Most are followed by an abort,
and possibly a backtrace. Each is preceded by the name of the function in which
the exception occurred.
The following message is always followed by an abort. The processlabel is
usually in the form of the host name followed by process rank:
processlabel Fatal Error in filename line_no: error_string
At the time of publication, the possible error_strings are:
Illegal label format character.
Memory allocation failed.
Error creating shared memory object.
Error setting size of shared memory object.
Error mmapping shared memory.
Error opening shared memory object.
Error attaching to shared memory.
Node table has inconsistent len! Hdr claims %d not %d
Timeout waiting %d seconds to receive peer node table from mpirun