HP XC System Software Administration Guide Version 3.0
EPILOG_EXCLUDE_NODES="n101 n102 n103 n104 n105"
The SLURM epilog is located at /opt/hptc/slurm/etc/slurm.epilog.clean initially. You can
maintain the file in this directory, move it to another directory, or move it to a shared directory. If you decide
to maintain this file in a local directory on each node, be sure to propagate the SLURM epilog file to all the
nodes in the HP XC system. The following example moves the SLURM epilog file to a shared directory:
# mv /opt/hptc/slurm/etc/slurm.epilog.clean \
/hptc_cluster/slurm/slurm.epilog.clean
Enable this script by configuring it in the SLURM configuration file,
/hptc_cluster/slurm/etc/slurm.conf. Edit the Epilog declaration line in this file as follows:
Epilog=/hptc_cluster/slurm/slurm.epilog.clean
Be sure to restart SLURM.
SLURM Daemon Log Maintentance
By default SLURM daemon logs are stored in /var/slurm/log/ on each node that runs SLURM daemons.
The slurmctld controller daemon writes to the slurmctld.log file, and the slurmd daemon writes to
the slurmd.log .log file. These log files and their location are configured in the slurm.conf file. You
can view this information with the scontrol command, as follows:
# scontrol show config | grep LogFile
SlurmctldLogFile = /var/slurm/log/slurmctld.log
SlurmdLogFile = /var/slurm/log/slurmd.log
Over time these logs become large, particularly if you increase SLURM daemon debugging:
# scontrol show config | grep -i debug
SlurmctldDebug = 3
SlurmdDebug = 3
The daemon debug value ranges from 1 to 7, with 7 being very verbose. The default value is 3.
To cache these log files without disrupting SLURM operation, rename these files. Be sure the new name is
intuitive if you intend to archive them:
# mv /var/slurm/log/slurmctld.log{,.old}
# mv /var/slurm/log/slurmd.log{,.old}
Use the pdsh command to rename the files on a clusterwide basis:
# scontrol ping
Slurmctld(primary/backup) at n16/n15 are UP/UP
# pdsh -w n[15-16] 'mv /var/slurm/log/slurmctld.log{,.old}'
# pdsh -a 'mv /var/slurm/log/slurmd.log{,.old}'
Note the SLURM daemons will still write to the renamed files. To have the daemons write to the new daemon
log files, run the following command:
# scontrol reconfig
Now the SLURM daemons will write to the originally named log files. You can either archive the old files or
delete them.
You can automate the procedure for caching SLURM log files by using a cron job on the head node set for
an interval appropriate for your site.
116 Managing SLURM