HP XC System Software Administration Guide Version 3.0

Table 12-4. Output of the sinfo command for Various Transitions Output of the sinfo command for
Various Transitions
Meaning:sinfo shows:Transition Cause:
The node is running a joballocTransient Network Congestion
The slurmctld daemon has lost contact
with the node
alloc*
Contact between the node and the
slurmctld daemon has been restored
alloc
The node is ready to accept a jobidleNode fails while no job is running on the
node.
The slurmctld daemon lost contact with
the node
idle*
The slurmctld daemon has removed the
node from service (see `sinfo -R`)
down*
The node has been returned to serviceidle
The node is running a job.allocNode fails while a job is running on the
node
The slurmctld daemon lost contact with
the node.
alloc*
The slurmctld daemon has removed the
node from service (see sinfo -R).
down*
The node has been returned to service.idle
The node is ready to accept a job.idleThe System Administrator sets the node
state to down.
The slurmctld daemon has removed the
node from service.
down
The slurmctld daemon lost contact with
the node (see sinfo -R).
down*
The node has been returned to service.idle
The node is running a job.allocThe System Administrator sets the node
state to drain while a job is running on
the node.
SLURM is waiting for the job or jobs to finish.drng
SLURM removed the node from service.drain
The slurmctld daemon lost contact with
the node (see sinfo -R).
drain*
The node has been returned to service.idle
The node is ready to accept a job.idleThe System Administrator sets the node
state to drain while a job is running on
the node.
SLURM removed the node from service.drain
The slurmctld daemon lost contact with
the node (see sinfo -R).
drain*
The node has been returned to service.idle
Configuring the SLURM Epilog Script
SLURM provides the capability of automatically killing rogue processes at the end of a job using an epilog
script.
When configured, the SLURM epilog script is launched after the user's job on the node completes. This script
checks that the user has another job assigned to this node, and, if not, sends a SIGKILL signal to all the
processes that belong to that user on all the nodes in the user's allocation.
NOTE: If the user logged in from a node that is also a compute node, the epilog script also terminates the
user's login. You can avoid this problem by editing the EPILOG_EXCLUDE_NODES variable in the epilog
file. It is empty by default. Specify the host names of the login nodes, separated by spaces, so that the epilog
script does not kill the user jobs on those nodes; for example:
Configuring the SLURM Epilog Script 115