HP XC System Software Administration Guide Version 3.0
• To signal user jobs and cancel allocations.
• To gather user job accounting information.
The major difference between LSF-HPC and Standard LSF is that LSF-HPC daemons run on only one node in
the HP XC system, that node is known as the LSF execution host. The LSF-HPC daemons rely on SLURM to
provide information on the other computing resources (nodes) in the system. The LSF-HPC daemons consolidate
this information into one entity, such that these daemons present the HP XC system as one virtual LSF host.
Note
LSF-HPC operates only with the nodes in the SLURM lsf partition. As mentioned in the previous paragraph,
LSF-HPC groups these nodes into one virtual LSF host, presenting the HP XC system as a single, large SMP
host. If there is no lsf partition in SLURM, then LSF-HPC sets the processor count to 1 and closes this single
virtual HP XC host.
Example 13-1. shows how to use the controllsf command to determine which node is the LSF-HPC
execution host.
Example 13-1. Determining the LSF-HPC Execution Host Determining the LSF-HPC Execution Host
# controllsf show current
LSF is currently running on n16, and assigned to n16
All LSF-HPC administration must be done from the LSF-HPC execution host. You can run the lsadmin and
badmin commands only on this host; they are not intended to be run on any other nodes in the HP XC
system and may produce false results if they are.
When the LSF-HPC scheduler determines that it is time to dispatch a job, it requests an allocation of nodes
from SLURM. After the successful allocation, LSF-HPC prepares the job environment with the necessary SLURM
allocation variables, that is, SLURM_JOBID and SLURM_NPROCS. The SLURM_JOBID is a 32-bit integer
that uniquely identifies a SLURM allocation in the system; the SLURM_JOBID can be reused. The job dispatch
depends on the type of job:
For a batch job: LSF-HPC submits the job to SLURM as a batch job and passively monitors
it with the squeue command.
For an interactive job: LSF-HPC launches the user's job locally on the LSF-HPC execution host.
An LSF-HPC job starter script for LSF-HPC queues is provided and configured by default on the HP XC system
to launch interactive jobs on the first allocated node. This ensures that interactive jobs behave just as they
would if they were batch jobs. The job starter script is discussed in more detail in “Job Starter Scripts”
(page 119).
The environment in which the job is launched contains SLURM and LSF-HPC environment variables that
describe the job's allocation. SLURM srun commands in the user's job make use of the SLURM environment
variables to distribute the tasks throughout the allocation.
The integration of SLURM and LSF-HPC has one drawback: the bsub command's -i option for providing
input to the user job is not supported. A workaround is to provide any file input directly to the job. The
SLURM srun command supports an --input option (also available in its short form as the -i option) that
provides input to all tasks.
Job Starter Scripts
LSF-HPC dispatches all jobs locally. The default installation of LSF-HPC for SLURM on the HP XC system
provides a job starter script that is configured for use by all LSF-HPC queues. This job starter script adjusts
the LSB_HOSTS and LSB_MCPU_HOSTS environment variables to the correct resource values in the allocation.
Then, the job starter script uses the srun command to launch the user task on the first node in the allocation.
If this job starter script is not configured for a queue, the user jobs begin execution locally on the LSF-HPC
execution host. In this case, it is recommended that the user job uses one or more srun commands to make
use of the resources allocated to the job. Work done on the LSF-HPC execution host competes for core time
with the LSF-HPC daemons, and could affect the overall performance of LSF-HPC on the HP XC system.
The bqueues -l command displays the full queue configuration, including whether or not a job starter
script has been configured. See the Platform LSF documentation or the bqueues(1) manpage for more
information on the use of this command.
Administering LSF-HPC 119