HP XC System Software Administration Guide Version 3.1

ManualsBrandsHP ManualsSoftwareHP XC System 3.x Software

171

172

173

174

175

176

177

178

179

180

(nodes) in the system. The LSF-HPC with SLURM daemons consolidate this information into one entity,

such that these daemons present the HP XC system as one virtual LSF host.

Note:

LSF-HPC with SLURM operates only with the nodes in the SLURM lsf partition. As mentioned in the

previous paragraph, LSF-HPC with SLURM groups these nodes into one virtual LSF host, presenting the

HP XC system as a single, large SMP host. If there is no lsf partition in SLURM, then LSF-HPC with

SLURM sets the processor count to 1 and closes this single virtual HP XC host.

Example 15-1 shows how to use the controllsf command to determine which node is the LSF execution

host.

Example 15-1 Determining the LSF Execution Host

# controllsf show current

LSF is currently running on n16, and assigned to n16

All LSF-HPC with SLURM administration must be done from the LSF execution host. You can run the

lsadmin and badmin commands only on this host; they are not intended to be run on any other nodes

in the HP XC system and may produce false results if they are.

When the LSF-HPC with SLURM scheduler determines that it is time to dispatch a job, it requests an

allocation of nodes from SLURM. After the successful allocation, LSF-HPC with SLURM prepares the job

environment with the necessary SLURM allocation variables, that is, SLURM_JOBID and SLURM_NPROCS.

The SLURM_JOBID is a 32-bit integer that uniquely identifies a SLURM allocation in the system; the

SLURM_JOBID can be reused. The job dispatch depends on the type of job:

• For a batch job:

LSF-HPC with SLURM submits the job to SLURM as a batch job and passively monitors it with the

squeue command.

• For an interactive job:

LSF-HPC with SLURM launches the user's job locally on the LSF execution host.

An LSF-HPC with SLURM job starter script for LSF queues is provided and configured by default on the

HP XC system to launch interactive jobs on the first allocated node. This ensures that interactive jobs

behave just as they would if they were batch jobs. The job starter script is discussed in more detail in “Job

Starter Scripts” (page 179).

The environment in which the job is launched contains SLURM and LSF-HPC with SLURM environment

variables that describe the job's allocation. SLURM srun commands in the user's job use the SLURM

environment variables to distribute the tasks throughout the allocation.

The integration of LSF-HPC with SLURM has one drawback: the bsub command's -i option for providing

input to the user job is not supported. A workaround is to provide any file input directly to the job. The

SLURM srun command supports an --input option (also available in its short form as the -i option)

that provides input to all tasks.

15.2.1.1 Job Starter Scripts

LSF-HPC with SLURM dispatches all jobs locally. The default installation of LSF-HPC with SLURM on

the HP XC system provides a job starter script that is configured for use by all LSF queues. This job starter

script adjusts the LSB_HOSTS and LSB_MCPU_HOSTS environment variables to the correct resource values

in the allocation. Then, the job starter script uses the srun command to launch the user task on the first

node in the allocation.

If this job starter script is not configured for a queue, the user jobs begin execution locally on the LSF

execution host. In this case, it is recommended that the user job uses one or more srun commands to make

use of the resources allocated to the job. Work done on the LSF execution host competes for core time with

the LSF-HPC with SLURM daemons, and could affect the overall performance of LSF-HPC with SLURM

on the HP XC system.

15.2 LSF-HPC with SLURM 179