HP XC System Software Administration Guide Version 4.0

Table Of Contents
Typically, this entry reports the number of alerts in a specified period of time and allows you to access
the most recent log.
A warning or critical message indicates that one or more rules defined in the /opt/hptc/nagios/
etc/syslogAlertRules file matches the specified node's consolidated log file.
Take the appropriate action based on the message.
Service: System Event Log
Status Information: Node Syslog alerts information
A warning or critical message indicates that one or more rules defined in the /opt/hptc/nagios/
etc/selRules file matches the specified node's firmware System Event Log.
Take the appropriate action based on the System Event Log message.
Service: System Free Space
Status Information: Node / and /var free space
This entry typically displays the status of the /, /var, and /hptc_cluster file systems on the node.
A warning or critical message indicates that the thresholds for the specific node were exceeded.
Clean up disk space.
21.4 System Interconnect Troubleshooting
This section describes the troubleshooting steps for the following supported system interconnects:
“Myrinet System Interconnect Troubleshooting” (page 254)
“Quadrics System Interconnect Troubleshooting” (page 255)
“OFED Troubleshooting Procedures” (page 256)
21.4.1 Myrinet System Interconnect Troubleshooting
The following troubleshooting information applies to the Myrinet system interconnect. Perform
these steps on any node on which you suspect a problem to determine if your HP XC system is
configured properly. If these tests pass but you are still experiencing difficulty, see Chapter 20:
Using Diagnostic Tools (page 233).
1. Run the gm_board_info test:
# /opt/gm/bin/gm_board_info
This command displays all the nodes in the HP XC system.
2. Make sure that you are running an HP XC kernel. The HP XC kernels are identified by the
presence of XC in the kernel name:
# uname -a
Linux n16 2.4.21-15.7hp.XCsmp #1 SMP date ... GNU/Linux
3. Make sure that your system has Myrinet boards installed:
# lspci -v | grep Myrinet
05:0d.0 Network controller: MYRICOM Inc. Myrinet 2000 . . .
Subsystem: MYRICOM Inc. Myrinet 2000 Scalable Cluster Interconnect
4. Run the gm_debug test:
# /opt/gm/bin/gm_debug
This command should complete without errors; there should be no nonzero counters
containing the string bad.
5. Make sure all the Myrinet RPMs are installed:
# rpm -q -a
.
.
.
254 Troubleshooting