HP XC System Software Administration Guide Version 3.2.1
16.2.1 Integration of LSF-HPC with SLURM
The LSF component of the LSF-HPC with SLURM product acts primarily as the workload
scheduler and node allocator running on top of SLURM. The SLURM component provides a job
execution and monitoring layer for LSF-HPC with SLURM. LSF-HPC with SLURM uses SLURM
interfaces to perform the following:
• To query system topology information for scheduling purposes.
• To create allocations for user jobs.
• To dispatch and launch user jobs.
• To monitor user job status.
• To signal user jobs and cancel allocations.
• To gather user job accounting information.
The major difference between LSF-HPC with SLURM and Standard LSF-HPC is that LSF-HPC
with SLURM daemons run on only one node in the HP XC system, that node is known as the
LSF execution host. The LSF-HPC with SLURM daemons rely on SLURM to provide information
on the other computing resources (nodes) in the system. The LSF-HPC with SLURM daemons
consolidate this information into one entity, such that these daemons present the HP XC system
as one virtual LSF host.
Note:
LSF-HPC with SLURM operates only with the nodes in the SLURM lsf partition. As mentioned
in the previous paragraph, LSF-HPC with SLURM groups these nodes into one virtual LSF host,
presenting the HP XC system as a single, large SMP host. If there is no lsf partition in SLURM,
then LSF-HPC with SLURM sets the processor count to 1 and closes this single virtual HP XC
host.
Example 16-1 shows how to use the controllsf command to determine which node is the LSF
execution host.
Example 16-1 Determining the LSF Execution Host
# controllsf show current
LSF is currently running on n16, and assigned to n16
All LSF-HPC with SLURM administration must be done from the LSF execution host. You can
run the lsadmin and badmin commands only on this host; they are not intended to be run on
any other nodes in the HP XC system and may produce false results if they are.
When the LSF-HPC with SLURM scheduler determines that it is time to dispatch a job, it requests
an allocation of nodes from SLURM. After the successful allocation, LSF-HPC with SLURM
prepares the job environment with the necessary SLURM allocation variables, that is,
SLURM_JOBID and SLURM_NPROCS. The SLURM_JOBID is a 32-bit integer that uniquely identifies
a SLURM allocation in the system; the SLURM_JOBID can be reused. The job dispatch depends
on the type of job:
• For a batch job:
LSF-HPC with SLURM submits the job to SLURM as a batch job and passively monitors it
with the squeue command.
• For an interactive job:
LSF-HPC with SLURM launches the user's job locally on the LSF execution host.
An LSF-HPC with SLURM job starter script for LSF queues is provided and configured by default
on the HP XC system to launch interactive jobs on the first allocated node. This ensures that
interactive jobs behave just as they would if they were batch jobs. The job starter script is discussed
in more detail in “Job Starter Scripts” (page 196).
16.2 LSF-HPC with SLURM 195