HP XC System Software Administration Guide Version 2.1

gsi 64304 6 [ipoib-ud ats ibat cm_2]
adaptor-tavor 148496 1
vverbs-base 54576 0 [sdp ipoib-ud ats devucm q_mng i
bat
cm_2 gsi a
daptor-tavor]
mlog 9792 0 [sockets sdp ipoib-ud ats devucm q_mng
ibat cm_2
gsi adaptor-tavor vverbs-base]
repository 82208 0 [sockets sdp ipoib-ud ats q_mng ibat
cm_2 gsi
adaptor-tavor vverbs-base mlog]
hadump 2240 0 [gsi]
mod_ib_mgt 76992 0 (unused)
mod_vapi 137824 0 [adaptor-tavor mod_ib_mgt]
mod_vipkl 235776 0 [mod_vapi]
mod_thh 248064 0 [mod_vapi]
mod_hh 26896 0 [mod_vipkl mod_thh]
mod_vapi_common 74824 0 [mod_ib_mgt mod_vapi mod_vipkl mod_th
h]
mod_mpga 28832 0 [mod_vapi]
mosal 140768 0 [mod_ib_mgt mod_vapi mod_vipkl mod_thh
mod_vap i_common mod_mpga]
7. The InfiniBand ipoib0 IP interface should be up. Use the ifconfig command to
display the interface network configuration.
# ifconfig ipoib0
ipoib0 Link encap:Ethernet HWaddr 00:02:C9:50:FF:80
inet addr:172.22.0.3 Bcast:172.22.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:7 errors:0 dropped:0 overruns:0 frame:0
TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:420 (420.0 b) TX bytes:240 (240.0 b)
You can try to ping other nodes that are connected to the network.
8. There is additional inform a tion about InfiniBand under the /proc/voltaire directory.
Use the find comm and to display it.
# find /proc/voltaire -type f -pri
nt -exec cat {} \;
16.2 SLURM Troubleshooting
The following sections d iscuss S LURM trou bles h ooting in terms of con figuratio n i ssues and
Run-Time troubleshooting.
16.2.1 SLURM Configuration Issues
SLURM consists of the following primary components:
slurmctld
a master/backup daemon.
slurmd
a slave daemon.
command binaries The sinfo, srun, scancel, squeue,andscontrol
commands.
slurm.conf
The SLURM configuration file, /hptc_clus-
ter/slurm/etc/slurm.conf. This file contains all the
information necessary to understand how SLURM is configured on
XC, including the following:
Logging (syslog is the default logging mechani s m)
Debug level (The debug levels r a nge from 1 to 7; the default
debug level is 3)
Nodes (all nodes are listed by default)
Troubleshooting 16-5