HP XC System Software Administration Guide Version 3.2.1

ManualsBrandsHP ManualsSoftwareHP XC System 3.x Software

191

192

193

194

195

196

197

198

199

200

16.2.1 Integration of LSF-HPC with SLURM

The LSF component of the LSF-HPC with SLURM product acts primarily as the workload

scheduler and node allocator running on top of SLURM. The SLURM component provides a job

execution and monitoring layer for LSF-HPC with SLURM. LSF-HPC with SLURM uses SLURM

interfaces to perform the following:

• To query system topology information for scheduling purposes.

• To create allocations for user jobs.

• To dispatch and launch user jobs.

• To monitor user job status.

• To signal user jobs and cancel allocations.

• To gather user job accounting information.

The major difference between LSF-HPC with SLURM and Standard LSF-HPC is that LSF-HPC

with SLURM daemons run on only one node in the HP XC system, that node is known as the

LSF execution host. The LSF-HPC with SLURM daemons rely on SLURM to provide information

on the other computing resources (nodes) in the system. The LSF-HPC with SLURM daemons

consolidate this information into one entity, such that these daemons present the HP XC system

as one virtual LSF host.

Note:

LSF-HPC with SLURM operates only with the nodes in the SLURM lsf partition. As mentioned

in the previous paragraph, LSF-HPC with SLURM groups these nodes into one virtual LSF host,

presenting the HP XC system as a single, large SMP host. If there is no lsf partition in SLURM,

then LSF-HPC with SLURM sets the processor count to 1 and closes this single virtual HP XC

host.

Example 16-1 shows how to use the controllsf command to determine which node is the LSF

execution host.

Example 16-1 Determining the LSF Execution Host

# controllsf show current

LSF is currently running on n16, and assigned to n16

All LSF-HPC with SLURM administration must be done from the LSF execution host. You can

run the lsadmin and badmin commands only on this host; they are not intended to be run on

any other nodes in the HP XC system and may produce false results if they are.

When the LSF-HPC with SLURM scheduler determines that it is time to dispatch a job, it requests

an allocation of nodes from SLURM. After the successful allocation, LSF-HPC with SLURM

prepares the job environment with the necessary SLURM allocation variables, that is,

SLURM_JOBID and SLURM_NPROCS. The SLURM_JOBID is a 32-bit integer that uniquely identifies

a SLURM allocation in the system; the SLURM_JOBID can be reused. The job dispatch depends

on the type of job:

• For a batch job:

LSF-HPC with SLURM submits the job to SLURM as a batch job and passively monitors it

with the squeue command.

• For an interactive job:

LSF-HPC with SLURM launches the user's job locally on the LSF execution host.

An LSF-HPC with SLURM job starter script for LSF queues is provided and configured by default

on the HP XC system to launch interactive jobs on the first allocated node. This ensures that

interactive jobs behave just as they would if they were batch jobs. The job starter script is discussed

in more detail in “Job Starter Scripts” (page 196).

16.2 LSF-HPC with SLURM 195