HP XC System Software Administration Guide Version 3.1

19.4.2.4 The qsnet2_drain_test Diagnostic Tool
This tool runs up to six tests for the Quadrics switches in an HP XC system:
Runs the qsctrl utility to verify that the system interconnects are running within the proper
environmental parameters for operation.
Runs qsnet2_level_test at level 1.
Runs qsnet2_level_test at level 2.
Runs qsnet2_level_test at level 3.
Runs qsportmap on federated systems to test the link cable connectivity.
Runs qsnet2_level_test at level 4 on federated systems.
Note:
You must launch this command from the head node. Run this command only during allocated preventive
maintenance time frames because this diagnostic tool uses the adapter and the link 100 percent of the time
during the test and, as a result, has a great affect on machine performance.
The command format for qsnet2_drain_test utility is shown here:
qsnet2_drain_test [-help] [-d logdirectory]
The -help option displays the command line options.
The -d option enables you to specify a log directory. The output from the qsnet2_drain_test utility
and from individual tests is bundled in a tar file (compressed with the gzip utility) and placed in the
specified log directory; the directory/var/log/diag/quadrics/qsnet2_drain_test is used by
default if the -d option is not specified.
Note:
You must manually unzip the tar file, extract the files, and examine them for errors.
19.4.3 Using Diagnostic Tools for the InfiniBand Interconnect
The ib_prodmode_mon diagnostic tool monitors the Infiniband switches, looking for InfiniBand network
errors, generating alerts and notifying you if it detects these network errors:
Links running at 1X speeds instead of the normal 4X
Links reporting excessive Receive errors
Links reporting IB_TIMEOUT, meaning the node is down.
Links reporting a state other than PORT_ACTIVE, meaning the link is down.
The output that ib_prodmode_mon produces identifies the bad links so that you can take corrective
action. It resembles the following:
date time node ib_prodmode_mon:
IR0N00 - Link xc9n1 GUID 0008f10403961325 (LID 3 PORT 1)
<==> GUID 0008f10400410876 (LID 1 PORT 1 ) running at 1X
ibt1 - Link ibt1 SLOT 1 PORT 14 GUID 0008f104003f0726 (LID 2 PORT 23)
<==> R1C5-IB14 PORT 7 GUID 0008f10400410a10 (LID 617 PORT 7 ) reporting
4297 RcvErrs, which is above threshold of 2400
ibt1 - Link 0008f1040396cd64 PORT 1 GUID 0008f1040396cd65 (LID 950 PORT
1) <==> R3C10-IB58 PORT 18 GUID 0008f104004108cc (LID 589 PORT 18 )
reporting Status ALERT IB_TIMEOUT
The ib_prodmode_mon diagnostic tool searches /etc/hosts for entries whose name matches the regular
expression "IR0[NT][09][09]".
This command uses the wget command to obtain the PortCounters.csv file from the switch and parses
the output. The ib_prodmode_mon diagnostic tool generates an alert if it finds any errors. All alerts are
logged in the /var/log/messages file and the ib_prodmode_mon.log file.
Example 4 227