Troubleshooting guide
B.3. How do I determine if a Myrinet NIC has failed?
If exchanging the cable and the port on the switch line card do not eliminate the errors,
then the Myrinet NIC may be the point of failure. Here are some suggestions for
determining whether a Myrinet NIC has failed.
First, try using the NIC in isolation by running the mx_pingpong "hardware loopback
test" or gm_allsize "hardware loopback test".
The hardware loopback test is performed as follows:
1. Disconnect the standard Myrinet cable from the NIC and attach a fiber loopback
cable/plug.
M3F-L Fiber Loopback cable (plug)
2. If you're using MX, run the mx_pingpong "hardware loopback test" as follows:
mx_counters [-b <n>] | grep Bad
su root
mx_stop_mapper
env MX_DISABLE_SELF="1" MX_DISABLE_SHMEM="1" mx_pingpong [-b
<
n>] -e 0 -r 1 &
env MX_DISABLE_SELF="1" MX_DISABLE_SHMEM="1" mx_pingpong [-b
<
n>] -e 1 -r 0 -d <hostname>:0
mx_counters [-b <
n>] | grep Bad
where <hostname> is the name of the host on which the test is being run, and the [-b
<n>] option is only necessary if the board number is other than 0.
3. If you're using GM-2, run the gm_allsize "hardware loopback test" as follows:
gm_counters [--board=n]
su root
killall gm_mapper
gm_simpleroute --disable-software-loopback [--board=
n]
gm_allsize --geometric --exit-on-error [--board=
n]
gm_counters [--board=
n]
© 2007 Myricom, Inc. DRAFT
40