HP XC System Software Administration Guide Version 2.1
11
SLURM Administration
The HP XC system uses the Simple Linux Utility for Resource M anagement (SLURM). This
chapter discusses issues specific to the HP XC system.
• An overview of SLURM on the HP XC system (S
ection 11.1)
• A discussion on SLURM configuration (Section 11.2)
• An overview of lim iting user access (Section 11.3)
• Information on job accounting (Secti
on 11.4)
• A discussion on how to monitor and manage SLURM (Section 11.5)
For your conven ience, the HP XC Documentation CD contains the SLURM Reference Manual,
which is also available from the following web site:
http://www.llnl.gov/LCdocs/slu
rm/
11.1 An Overview of SLURM
SLURM p rovides a simple, lightwe igh t, scalable inf rast ructur e for managing the computin g
resources of the HP XC system. SLURM contains a job launcher, srun,thatoffersmuch
flexibility in requesting resources and dispatching serial or parallel applications. SLURM also
features a Pluggable Authentication Mod ule which, when enabled, can provide more control
over access to the computing resources.
SLURM uses two daemons on the HP
XC system:
slurmd
This daem on runs on each com pute node in the HP XC system and
is responsible for the follow ing :
• Starting each job on its node
• Monitoring the job’s resour
ce use
• Enforcing limits (for example, memory size)
• Freeing up resources when the job completes.
slurmctld
This SLURM controller daemon is responsible for the following:
• Monitoring the availabil
ity of the com pute nodes
• Managing node characteristics and node partitions
• Managing jobs, th at is, the queuing, sched uling, and maintaining
the state of jo bs
SLURM also allows you to configure a backup slurmctld daemon. If present, this backup
daemon monitor s the state of the prim ary slurmctld daemon. If the backup daemon detects
that the slurmctld daemon failed, the back up daemon assum es the responsibilities of the
primary slurmctld daemon. O n returning to service, the primary slurmctld daemon
regains con tro l of the SLURM subsystem fro m the backup slurmctld daemon.
SLURM offers a set of u
tilities that p rovide information about SLURM configuratio n, state, and
jobs, most notably sc
ontrol, squeue,andsinfo.Seescontrol
(1), squeue(1),and
sinfo
(1) for m ore in
formation about these utilities.
SLURM Administration 11-1