HP XC System Software Administration Guide Version 4.0

Table Of Contents
# scontrol update NodeName=nodelist State=drain Reason="describe reason here"
See “The nodelist Parameter” (page 33) for a discussion on the use of the nodelist parameter.
The reason that you provide for the node draining is displayed by the sinfo command. Be brief
but descriptive.
Here, node n17 is drained so that it can be removed from service for maintenance:
# scontrol update nodename=n17 state=drain reason="maintenance"
After the node has drained, use the scontrol command to remove a node from service. The
following shows the command to remove the drained node in the example, node n17.
# scontrol update nodename=n17 state=down
The scontrol command returns nodes to an IDLE state so that they can be reused. The following
command places n17 in the IDLE state to return it to service:
# scontrol update NodeName=nodelist State=resume
When returning a node to service, HP recommends that you set the state to DRAIN, even if no
jobs are currently running. This has two advantages:
It is easier to recognize nodes that are down unexpectedly when skimming the output of
the sinfo command.
If the node is rebooted accidentally or a as part of the maintenance procedure, the DRAIN
state persists. The DOWN state may or may not persist, pending on the setting of the
NodeName/State parameter in the slurm.conf file.
Table 15-4 shows the corresponding meaning of the output of the sinfo command for various
transitions:
Table 15-4 Output of the sinfo command for Various Transitions
Meaning:sinfo shows:Transition Cause:
The node is running a job
alloc
Transient Network Congestion
The slurmctld daemon has lost
contact with the node
alloc*
Contact between the node and the
slurmctld daemon has been restored
alloc
The node is ready to accept a job
idle
Node fails while no job is running on
the node.
The slurmctld daemon lost contact
with the node
idle*
The slurmctld daemon has removed
the node from service (see `sinfo -R`)
down*
The node has been returned to service
idle
The node is running a job.
alloc
Node fails while a job is running on
the node
The slurmctld daemon lost contact
with the node.
alloc*
The slurmctld daemon has removed
the node from service (see sinfo -R).
down*
The node has been returned to service.
idle
15.6 Draining Nodes 183