HP XC System Software Administration Guide Version 2.1
15.4.2.1
The swmlo gger Daemon
...................................................
15-6
15.4.2.2
The qselantestp Diagnostic Tool .......................................... 15-7
15.4.2.3
The qsn et2_ level_
test Diagnostic Tool ...................................
15-8
15.4.2.4
The qsnet2_drain_test Diagnostic Tool ................................... 15-10
15.4.3
Using Diagnostic
Tools for the G igabit Ethernet System Interconnect .....
15-11
16 Troubleshooting
16.1
System Interconnect Troubleshooting ............................................... 16-1
16.1.1
Myrinet System Interconnect Troubleshooting ................................ 16-1
16.1.2
Quadrics System Interconnect Troubleshooting .............................. 16-2
16.1.3
InfiniBand Troubleshooting .................................................... 16-3
16.2
SLURM Troubleshooting ............................................................ 16-5
16.2.1
SLURM Configuration Issues .................................................. 16-5
16.2.2
SLURM Run-Time Troubleshooting ........................................... 16-6
16.3
LSF Troubleshooting ................................................................. 16-7
17 Servicing the HP XC System
17.1
Adding a Node ........................................................................ 17-1
17.2
Replacing a Node ..................................................................... 17-2
17.3
Replacing a System Interconnect Card in an XC6000 System .................... 17-4
Glossary
Index
Examples
12-1
A Basic Job Launch Without the JOB_STARTER Script Configured ........... 12-7
12-2
Launching Another Job Without the JOB_STARTER Script Configured ........ 12-8
12-3
Launching a Job Successfully Without the JOB_STARTER Script using srun .. 12-8
12-4
Launching a Job Successfully Without the JOB_STARTER Script Using
mpirun ................................................................................. 12-8
12-5
ABasicJobLaunchwiththeJOB_STARTER Script Configured ................ 12-8
14-1
Unedited fstab.proto File ............................................................. 14-2
14-2
The fstab.proto File Edited for Internal File System Mounting ................... 14-7
14-3
The fstab.proto File Edited for Remote File System Mounting ................... 14-10
Figures
1-1
LVS View of Cluster .................................................................. 1-8
1-2
HP X
C File System Hierarchy .......................................................
1-9
1-
3
HP XC Hierarchy Under /opt/hptc .................................................. 1-11
6-1
Ti
ered Structure for Node Events ....................................................
6-
1
6
-2
Nagios Main Window ................................................................ 6-4
6-3
N
agios Login Screen .................................................................
6
-4
6-4
Nagios Menu .......................................................................... 6-6
14-1
Internal File System Mounting Example ........................................... 14-4
14-2
Remote File System Mounting Example ........................................... 14-8
Tables
1-1
HP XC System Commands .......................................................... 1-4
C
ontents vii