User guide
D–Troubleshooting
Performance Issues
IB0054606-02 A D-9
Broken Intermediate Link
Sometimes message traffic passes through the fabric while other traffic appears
to be blocked. In this case, MPI jobs fail to run.
In large cluster configurations, switches may be attached to other switches to
supply the necessary inter-node connectivity. Problems with these inter-switch (or
intermediate) links are sometimes more difficult to diagnose than failure of the
final link between a switch and a node. The failure of an intermediate link may
allow some traffic to pass through the fabric while other traffic is blocked or
degraded.
If you notice this behavior in a multi-layer fabric, check that all switch cable
connections are correct. Statistics for managed switches are available on a
per-port basis, and may help with debugging. See your switch vendor for more
information.
QLogic recommends using FastFabric to help diagnose this problem. If
FastFabric is not installed in the fabric, there are two diagnostic tools, ibhosts
and ibtracert, that may also be helpful. The tool ibhosts lists all the IB nodes
that the subnet manager recognizes. To check the IB path between two nodes,
use the ibtracert command.
Performance Issues
The following sections discuss known performance issues.
Large Message Receive Side Bandwidth Varies with
Socket Affinity on Opteron Systems
On Opteron systems, when using the QLE7240 or QLE7280 in DDR mode, there
is a receive side bandwidth bottleneck for CPUs that are not adjacent to the PCI
Express root complex. This may cause performance to vary. The bottleneck is
most obvious when using SendDMA with large messages on the farthest sockets.
The best case for SendDMA is when both sender and receiver are on the closest
sockets. Overall performance for PIO (and smaller messages) is better than with
SendDMA.