HP XC System Software Administration Guide Version 3.1

$ bsub -nmax -o ./ mpirun -prot -TCP -srun -v -n max \
/opt/hptc/contrib/bin/dgemm.x
The max parameter is the maximum number of processors available to you in the lsf partition.
No warning messages appear when all the specified nodes are performing at their peak efficiency.
19.4 Using the System Interconnect Diagnostic Tools
Various tools enable you to diagnose the system interconnect. Some tools are provided by the system
interconnect manufacturer and are discussed in the Installation and Operation Guide (the hardware
documentation) for your system. Be sure to consult the appropriate Web page for these system interconnect
tools:
Myrinet http://www.myrinet.com
Quadrics http://www.quadrics.com
InfiniBand http://www.voltaire.com
Other tools have been written specifically for use with the HP XC system.
To use the diagnostic tools, you must ensure that the system interconnect is properly configured. The IP
addresses must be configured and the /etc/hosts file must be updated with the switch names, for
example MR0N00 for Myrinet system interconnect and QR0N00 for Quadrics system interconnect. These
topics are discussed in the HP XC System Software Installation Guide.
Note:
Link errors are common when a node boots or reboots. During boot, the system interconnect driver is
initiated, putting the system interconnect into a full reset. This puts the link into reset and always causes
an error on the switch connected to the system interconnect.
This section describes the following diagnostic tools:
“HP XC Diagnostic Tools for the Myrinet System Interconnect” (page 222)
“Using Diagnostic Tools for the Quadrics System Interconnect” (page 223)
“Using Diagnostic Tools for the Gigabit Ethernet System Interconnect” (page 228)
19.4.1 HP XC Diagnostic Tools for the Myrinet System Interconnect
This section describes tools that were developed specifically for diagnosing the Myrinet system interconnect
(from Myricom, Inc.) on the HP XC system. See your system's hardware installation and operation guide
for information about standard diagnostic tools.
19.4.1.1 The gm_prodmode_mon Diagnostic Tool
This program monitors the GM2.1 switch, reads current environment parameters, and generates alerts if
the values of the following parameters are outside the operating ranges recommended by the manufacturer:
bad Crcs The value should be zero(0).
Temperature The temperature should be less than 104°F (40°C).
Voltage The voltage should be within +/- 10 percent of nominal voltage.
Fan speed The fan speed should be above the minimum.
The gm_prodmode_mon diagnostic tool searches /etc/hosts for entries whose name matches the regular
expression “MR0[NT][09][09]”.
This command uses the links -dump command to obtain the current values and parses the output. The
gm_prodmode_mon diagnostic tool generates an alert if any errors are found. All alerts are logged in the
/var/log/messages file.
The format of this command is:
gm_prodmode_mon-[-help]-[-verbose]-[-d directory-name]
222 Using Diagnostic Tools