HP XC System Software Administration Guide Version 2.1
InfiniBand
http://www.voltaire.com
Other tools have been written specifically f or use with the HP XC system.
To use the diagnostic tools, the system interconnect must be properly co nfigu red. Th e IP
addresses m ust be co nfigured and the /etc/hosts file must be updated with the switch
names, for example MR0 N00 for Myrinet system interconnect and QR0N00 for Quadrics system
interconnect. T hese topics are discussed in the HP XC System Software Installation Guide.
_________________________ Note _________________________
Link errors are co mmon when a no de boots or reboots. During boot, the system
interconnect dr iver is initiated putting the system interconnect into a full reset. This
puts the link into reset and always causes an err or on the swi tch connected to the
system interconnect.
The following diagnostic tools described in this section:
• Myrinet system interconnect (Section 1 5.4.1)
• Quadrics system interconnect (Section 15.4.2)
• Gigabit Ethernet system in terconnect (Section 15.4.3)
15.4.1 HP XC Diagnostic Tools for the Myrinet System Interconnect
This section describes tools that were developed specifically for diagnosing the Myrinet system
interconnect on th e HP XC system. See your system ’s hardware Installa tion and Operation
Guide for inform ation about standard d iagnostic tools.
15.4.1.1 The gm_prodmode_mon Diagnostic Tool
This program monitors the GM2.1 switch, reads current environmental parameters, and
generates alerts if the values are o utsid e the operating parameters recommended by the
manufacturer. These param eters are:
bad C rcs Should be 0 (zero).
Temperature
The temperature should be less than 104°F (40°C).
Voltage The voltage shoul d be within +/- 10% of nom inal voltag e.
Fan speed The fan speed should be above the m inimum.
The gm_prodmode_mon diagnostic t ool searches /etc/hosts for entries whose name
matches the regular expression “MR0[NT][0–9][0–9]”.
This command uses the links -dump command to o btain the current values and parses the
output. The gm_prodmode_mon diagnostic tool generates an alert if any e rro rs are found.
All alerts a r e logged in the system log file /var/log/nodename.
The f ormat of this com m a nd is:
gm_prodmode_mon [-help][-verbose][-d directory-name]
The output from the gm_prodmode_mon is logged to
/var/log/diag/myrinet/gm_prodmode_mon/links.log by
default, but you can specify another directory with the -d option .
Output is displaye d to the stdout to show the progress of th e diagnostic test.
Using Diagnostic Tools 15-5