HP XC System Software Administration Guide Version 4.0

Table Of Contents
that the slurmctld daemon failed, the backup daemon assumes the responsibilities of the
primary slurmctld daemon. On returning to service, the primary slurmctld daemon regains
control of the SLURM subsystem from the backup slurmctld daemon.
SLURM offers a set of utilities that provide information about SLURM configuration, state, and
jobs, most notably scontrol, squeue, and sinfo. See scontrol(1), squeue(1), and sinfo(1) for
more information about these utilities.
SLURM enables you to collect and analyze job accounting information. “Configuring Job
Accounting” (page 180) describes how to configure job accounting information on the HP XC
system.
“SLURM Troubleshooting” (page 262) provides SLURM troubleshooting information.
15.2 Configuring SLURM
The HP XC system provides global and local directories for SLURM files:
The /hptc_cluster/slurm directory is the sharable location for SLURM files that need
to be shared between the nodes. The SLURM slurmctld state files, job logging files, and
the slurm.conf configuration file reside there.
The location for SLURM files that should remain local to a given node is /var/slurm; the
files in this directory are not shared between nodes. All SLURM daemon logs and slurmd
state information are stored there.
As a resource manager on an HP XC system, SLURM allocates exclusive or nonexclusive access
to resources on compute nodes for users to perform work; it provides a framework to start,
execute, and monitor work (normally parallel jobs) on the set of allocated nodes.
All SLURM configuration options are set and stored in the /hptc_cluster/slurm/etc/
slurm.conf file; For information about available options, see slurm.conf(5). The slurm.conf
file also contains useful commentary on the purpose of each setting.
The following SLURM configuration settings are preset statically on HP XC systems:
StateSaveLocation=/hptc_cluster/slurm/state
SlurmdSpoolDir=/var/slurm/state
SlurmUser=slurm
SlurmctldLogFile=/var/slurm/log/slurmctld.log
SlurmdLogFile=/var/slurm/log/slurmd.log
SlurmctldPidFile=/var/slurm/run/slurmctld.pid
SlurmdPidFile=/var/slurm/run/slurmd.pid
AuthType=auth/munge
JobCompType=jobcomp/filetxt
JobCompLoc=/hptc_cluster/slurm/job/slurm.job.log
ProctrackType=proctrack/pgid
PropagatePrioProcess=1
PropagateResourceLimitsExcept=NPROC
JobCredentialPrivateKey = /opt/hptc/slurm/etc/keys/.slurm.key
JobCredentialPublicCertificate = /opt/hptc/slurm/etc/keys/slurm.cert
MinJobAge=1200
MaxJobCount=40000
ReturnToService=1
The slurm.conf file must be available on each node of the HP XC system.
Table 15-1 displays the SLURM configuration settings that are set (and, if necessary, adjusted)
during the execution of the cluster_config utility.
Table 15-1 SLURM Configuration Settings
Default Value*Setting
Lowest-numbered resource management node
ControlMachine
Second-lowest resource management node (if available)
BackupController
170 Managing SLURM