HP XC System Software Administration Guide Version 4.0

Table Of Contents
and monitoring layer for LSF with SLURM. LSF with SLURM uses SLURM interfaces to perform
the following:
To query system topology information for scheduling purposes.
To create allocations for user jobs.
To dispatch and launch user jobs.
To monitor user job status.
To signal user jobs and cancel allocations.
To gather user job accounting information.
The major difference between LSF with SLURM and Standard LSF is that LSF with SLURM
daemons run on only one node in the HP XC system, that node is known as the LSF execution
host. The LSF with SLURM daemons rely on SLURM to provide information on the other
computing resources (nodes) in the system. The LSF with SLURM daemons consolidate this
information into one entity, such that these daemons present the HP XC system as one virtual
LSF host.
Note:
LSF with SLURM operates only with the nodes in the SLURM lsf partition. As mentioned in
the previous paragraph, LSF with SLURM groups these nodes into one virtual LSF host, presenting
the HP XC system as a single, large SMP host. If there is no lsf partition in SLURM, then LSF
with SLURM sets the processor count to 1 and closes this single virtual HP XC host.
Example 16-1 shows how to use the controllsf command to determine which node is the LSF
execution host.
Example 16-1 Determining the LSF Execution Host
# controllsf show current
LSF is currently running on n16, and assigned to n16
All LSF with SLURM administration must be done from the LSF execution host. You can run the
lsadmin and badmin commands only on this host; they are not intended to be run on any other
nodes in the HP XC system and may produce false results if they are.
When the LSF with SLURM scheduler determines that it is time to dispatch a job, it requests an
allocation of nodes from SLURM. After the successful allocation, LSF with SLURM prepares the
job environment with the necessary SLURM allocation variables, that is, SLURM_JOBID and
SLURM_NPROCS. The SLURM_JOBID is a 32-bit integer that uniquely identifies a SLURM allocation
in the system; the SLURM_JOBID can be reused. The job dispatch depends on the type of job:
For a batch job:
LSF with SLURM submits the job to SLURM as a batch job and passively monitors it with
the squeue command.
For an interactive job:
LSF with SLURM launches the user's job locally on the LSF execution host.
An LSF with SLURM job starter script for LSF queues is provided and configured by default on
the HP XC system to launch interactive jobs on the first allocated node. This ensures that
interactive jobs behave just as they would if they were batch jobs. The job starter script is discussed
in more detail in “Job Starter Scripts” (page 192).
The environment in which the job is launched contains SLURM and LSF with SLURM environment
variables that describe the job's allocation. SLURM srun commands in the user's job use the
SLURM environment variables to distribute the tasks throughout the allocation.
The integration of LSF with SLURM has one drawback: the bsub command's -i option for
providing input to the user job is not supported. A workaround is to provide any file input
16.2 LSF with SLURM 191