Troubleshooting guide

VIII. Testing/Validation
Once the MX, GM-2, or GM-1 firmware is running on all hosts in the cluster, and all
host-to-switch and switch-to-switch cables have been connected, you are ready to verify
the health of all of the Myrinet hardware components in the Myrinet installation by
performing the following sequence of tests. The
Fabric Management System (FMS) is the
recommended diagnostic tool for Myrinet-2000 networks. Requirements for the
installation of FMS are summarized on the
FMS webpage
(http://www.myri.com/scs/fms/).
Run fm_status to check the current status of the FMS
Run fm_switch to ensure that the FMS database includes all switches
Run fm_db2wirelist to look for any missing hosts
Check the LEDs on each switch port and NIC port
Test performance between each host and NIC
Test performance between each host and the switch
Run mpi_stress to stress all of the connections in the fabric
Run fm_show_alerts for diagnostic information on any damaged/failing
hardware components
If FMS cannot be installed, refer to the diagnostic procedures in the “Troubleshooting”
section of the FAQ: http://www.myri.com/cgi-bin/fom?file=481.
These steps are detailed below and are also described in the “Troubleshooting” section of
the FAQ (http://www.myri.com/scs/FAQ/). Once you have performed these tests, you will
have a solid Myrinet installation.
1. Run fm_status to check the current status of the FMS.
$ fm_status
If you are using Myrinet-2000 M3-CLOS-ENCL or M3-SPINE-ENCL switches, it
should take less than 30 seconds to map the Myrinet fabric. If it takes longer, please
submit a bug report to
help@myri.com.
If you are using Myrinet-2000 M3-E* switches, it may take up to five minutes to map the
Myrinet fabric. If it takes longer, please submit a bug report to help@myri.com.
2. Run fm_switch to ensure that the FMS database includes all switches
To view a list of all of the switch enclosures currently defined in the FMS database, type
$ fm_switch
If there are any switches missing from the database, add the missing switch to the
database by issuing the command
© 2007 Myricom, Inc. DRAFT
28