HP XC System Software Administration Guide Version 4.0

Table Of Contents
Table 15-4 Output of the sinfo command for Various Transitions (continued)
Meaning:sinfo shows:Transition Cause:
The node is ready to accept a job.
idle
The System Administrator sets the
node state to down.
The slurmctld daemon has removed
the node from service.
down
The slurmctld daemon lost contact
with the node (see sinfo -R).
down*
The node has been returned to service.
idle
The node is running a job.
alloc
The System Administrator sets the
node state to drain while a job is
running on the node.
SLURM is waiting for the job or jobs
to finish.
drng
SLURM removed the node from
service.
drain
The slurmctld daemon lost contact
with the node (see sinfo -R).
drain*
The node has been returned to service.
idle
The node is ready to accept a job.
idle
The System Administrator sets the
node state to drain while a job is
running on the node.
SLURM removed the node from
service.
drain
The slurmctld daemon lost contact
with the node (see sinfo -R).
drain*
The node has been returned to service.
idle
15.7 Configuring the SLURM Epilog Script
SLURM provides the capability of automatically killing rogue processes at the end of a job using
an epilog script.
When configured, the SLURM epilog script is launched after the user's job on the node completes.
This script verifies that the user has another job assigned to this node, and, if not, sends a SIGKILL
signal to all the processes that belong to that user on all the nodes in the user's allocation.
NOTE: If the user logged in from a node that is also a compute node, the epilog script also ends
the user's login. You can avoid this problem by editing the EPILOG_EXCLUDE_NODES variable
in the epilog file. It is empty by default. Specify the host names of the login nodes, separated by
spaces, so that the epilog script does not kill the user jobs on those nodes; for example:
EPILOG_EXCLUDE_NODES="n101 n102 n103 n104 n105"
The SLURM epilog is located at /opt/hptc/slurm/etc/slurm.epilog.clean initially.
You can maintain the file in this directory, move it to another directory, or move it to a shared
directory. If you decide to maintain this file in a local directory on each node, be sure to propagate
the SLURM epilog file to all the nodes in the HP XC system. The following example moves the
SLURM epilog file to a shared directory:
# mv /opt/hptc/slurm/etc/slurm.epilog.clean \
/hptc_cluster/slurm/slurm.epilog.clean
Enable this script by configuring it in the SLURM configuration file, /hptc_cluster/slurm/
etc/slurm.conf. Edit the Epilog declaration line in this file as follows:
Epilog=/hptc_cluster/slurm/slurm.epilog.clean
Be sure to restart SLURM.
184 Managing SLURM