Troubleshooting guide

Appendix C: Troubleshooting Performance
If you suspect a performance anomaly, we suggest:
1. Run mx_dmabench or gm_debug -L on each node in the cluster to ensure that
all nodes report consistent read/write performance and PCI speed.
2. If you are using the Fabric Management System (FMS), does fm_show_alerts
detect significant badcrcs in the fabric? Alternatively, check for badcrcs in the
mx_counters or gm_counters output, as well as the hardware counters on the
switch.
If you see a large numbers of badcrcs (hundreds, thousands), then you may have a
failing hardware component (cable, port on switch, or port on NIC) that needs to
isolated and replaced.
3. Run mx_pingpong or gm_allsize to test performance.
Is the performance comparable to that reported on the Myrinet Performance
webpage (http://www.myri.com/scs/performance/Myrinet-2000/)?
The test program mx_pingpong can be run to test the MX PingPong latency and
unidirectional bandwidth between two hosts. Adding the -V flag to the
mx_pingpong command will augment the test with verification of the contents of
all messages, at the cost of significantly degraded performance. For a list of all
options to mx_pingpong, type mx_pingpong -help.
Latency and Unidirectional Bandwidth
To test the MX PingPong latency and unidirectional bandwidth between two
hosts (host1 and host2), type the following on host1:
mx_pingpong
and on host2 type:
mx_pingpong -d host1:0
The output from this command will consist of three columns of data: the first
column lists the message size (in bytes), the second column lists the latency (in
microseconds), and the third column lists the unidirectional bandwidth (in MB/s).
Similarly, the test program gm_allsize can be used to measure the GM latency
and bandwidth. Adding the --verify flag to any gm_allsize command will
augment the test with verification of the contents of all messages, at the cost of
© 2007 Myricom, Inc. DRAFT
42