HP XC System Software Installation Guide Version 3.2

# qsctrl
qsctrl: QR0N00:00:0:0 <--> Elan:0:0 state 3 should be 4
qsctrl: QR0N00:00:0:1 <--> Elan:0:1 state 3 should be 4
qsctrl: QR0N00:00:0:2 <--> Elan:0:2 state 3 should be 4
qsctrl: QR0N00:00:0:3 <--> Elan:0:3 state 3 should be 4
qsctrl: QR0N00:00:1:0 <--> Elan:0:4 state 3 should be 4
qsctrl: QR0N00:00:1:1 <--> Elan:0:5 state 3 should be 4
qsctrl: QR0N00:00:1:2 <--> Elan:0:6 state 3 should be 4
qsctrl: QR0N00:00:1:3 <--> Elan:0:7 state 3 should be 4
qsctrl: QR0N00:00:2:0 <--> Elan:0:8 state 3 should be 4
qsctrl: QR0N00:00:2:1 <--> Elan:0:9 state 3 should be 4
qsctrl: QR0N00:00:2:2 <--> Elan:0:10 state 3 should be 4
qsctrl: QR0N00:00:2:3 <--> Elan:0:11 state 3 should be 4
qsctrl: QR0N00:00:3:0 <--> Elan:0:12 state 3 should be 4
qsctrl: QR0N00:00:3:1 <--> Elan:0:13 state 3 should be 4
qsctrl: QR0N00:00:3:2 <--> Elan:0:14 state 3 should be 4
qsctrl: QR0N00:00:3:3 <--> Elan:0:15 state 3 should be 4
qsctrl: QR0N00:01:0:0 <--> Elan:0:16 state 3 should be 4
qsctrl: QR0N00:01:0:1 <--> Elan:0:17 state 3 should be 4
qsctrl: QR0N00:01:0:2 <--> Elan:0:18 state 3 should be 4
qsctrl: QR0N00:01:0:3 <--> Elan:0:19 state 3 should be 4
qsctrl: Warning: failed link state check on 1 modules
To work around this issue, configure out the unconnected links using the qsctrl -o command.
For example,
# qsctrl -o QR0N00:00:0:0
# qsctrl -o QR0N00:00:0:1
# qsctrl -o QR0N00:00:0:2
NOTE: You must only configure out links where the ELAN connections are identified with the
Elan designator in the destination field. For example,
qsctrl: QR0N00:02:3:3 <--> Elan:0:47 link state normal
Links that have the QR0Nxx designator in both the origin and destination field must not be
configured out. Doing so will cause the whole chip to go into reset. For example, do not configure
out the link if it looks as follows
qsctrl: QR0N00:04:0:3 <--> QR0N00:03:7:4 link state reset
If you attach nodes to any of these ports, you must configure them back in again before the link
can be used. For example:
# qsctrl -i QR0N00:00:0:0
12.6 Troubleshooting SLURM
This section provides suggestions for troubleshooting problems when the SLURM sinfo
command reports nodes to be in the down state after an installation or upgrade.
# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 15 idle n[1-14,16]
lsf up infinite 1 down n15
Use the following command to determine why SLURM marked this node as being down:
# sinfo -R
REASON NODELIST
Low RealMemory [slurm@Mar 02 22:34] n15
The most common reason reported by the sinfo command is Not Responding, which means
that something is wrong with the communication between the primary slurmctld daemon
and the slurmd daemon on the affected node or nodes . In that situation, log in to the affected
node or nodes and troubleshooting the slurmd daemon.
176 Troubleshooting