HP XC System Software Administration Guide Version 3.0

Using the dgemm Utility to Analyze Performance.....................................................................................151
Using the System Interconnect Diagnostic Tools.......................................................................................152
HP XC Diagnostic Tools for the Myrinet System Interconnect.................................................................152
The gm_prodmode_mon Diagnostic Tool.....................................................................................152
The gm_drain_test Diagnostic Tool..............................................................................................153
Using Diagnostic Tools for the Quadrics System Interconnect................................................................153
The swmlogger Daemon............................................................................................................153
The qselantestp Diagnostic Tool..................................................................................................154
The qsnet2_level_test Diagnostic Tool..........................................................................................154
The qsnet2_drain_test Diagnostic Tool.........................................................................................156
Using Diagnostic Tools for the Gigabit Ethernet System Interconnect......................................................157
17 Troubleshooting
System Interconnect Troubleshooting......................................................................................................159
Myrinet System Interconnect Troubleshooting......................................................................................159
Quadrics System Interconnect Troubleshooting...................................................................................160
InfiniBand System Interconnect Troubleshooting..................................................................................161
SLURM Troubleshooting.......................................................................................................................163
SLURM Configuration Issues............................................................................................................163
SLURM Run-Time Troubleshooting.....................................................................................................164
LSF-HPC Troubleshooting......................................................................................................................165
18 Servicing the HP XC System
Adding a Node..................................................................................................................................167
Replacing a Client Node.....................................................................................................................168
Replacing a System Interconnect Board in an CP6000 System...................................................................169
A Installing LSF-HPC for SLURM into an Existing Standard LSF Cluster ..............171
Assumptions.......................................................................................................................................171
Requirement.......................................................................................................................................172
Sample Case......................................................................................................................................172
HP XC Preparation..............................................................................................................................172
Installing LSF-HPC...............................................................................................................................175
Perform Post Installation Tasks...............................................................................................................178
Configuring the LSF Alias.....................................................................................................................179
Starting LSF on the HP XC System..........................................................................................................179
Sample Running Jobs...........................................................................................................................180
Troubleshooting..................................................................................................................................181
B Installing Standard LSF on a Subset of Nodes.............................................183
Requirements......................................................................................................................................183
Assumptions.......................................................................................................................................183
Sample Case......................................................................................................................................184
Instructions.........................................................................................................................................184
Glossary..................................................................................................189
Index.......................................................................................................195
Table of Contents 7