Troubleshooting guide

4. If you're using GM-1, run the gm_allsize "hardware loopback test" as follows:
gm_counters [--board=n]
gm_simpleroute --loopback [--board=
n]
gm_allsize --geometric --exit-on-error [--board=
n]
gm_counters [--board=
n]
The --board flag is only necessary if the board number is other than 0.
If the hardware loopback test completed successfully, and the value for Bad CRC8
reported by mx_counters or badcrc__invalid or badcrc_cnt reported by gm_counters
did not increase significantly, then the Myrinet NIC is not the point of failure. The
problem may reside with the cable or the Myrinet switch port.
Note that after running gm_simpleroute, the GM-1 mapper must be re-run to restore the
routes to other nodes in the system. For GM-2, gm_simpleroute --enable-software-
loopback must be run before restarting the gm_mapper on the host.
If the foregoing procedure is not feasible, you can try installing the suspect NIC in
another PCI slot or in another host. Does the problem follow the suspect NIC? If you use
an alternative NIC, does the problem disappear?
If the questionable NIC fails in a PCI slot which is successful with another Myrinet NIC -
especially another NIC of the same class - then this NIC has probably failed.
If a NIC is identified as the point of failure, contact help@myri.com to return this NIC
for repair/replacement. You will be assigned a "Return Material Authorization" (RMA)
number. The information required for an RMA is outlined in the Myrinet FAQ
(http://www.myri.com/scs/FAQ/).
© 2007 Myricom, Inc. DRAFT
41