LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 513
Running Parallel Jobs
Configuring memory reservation for pending parallel jobs
Use the RESOURCE_RESERVE parameter in lsb.queues to reserve host memory
for pending jobs, as described in Memory Reservation for Pending Jobs on page
408.
lsb.queues 1 Set the RESOURCE_RESERVE parameter in a queue defined in lsb.queues.
The RESOURCE_RESERVE parameter overrides the SLOT_RESERVE
parameter. If both RESOURCE_RESERVE and SLOT_RESERVE are defined in
the same queue, job slot reservation and memory reservation are enabled and
an error is displayed when the cluster is reconfigured. SLOT_RESERVE is
ignored. Backfill on memory may still take place.
The following queue enables both memory reservation and backfill in the same
queue:
Begin Queue
QUEUE_NAME = reservation_backfill
DESCRIPTION = For resource reservation and backfill
PRIORITY = 40
RESOURCE_RESERVE = MAX_RESERVE_TIME[20]
BACKFILL = Y
End Queue
Enable per-slot memory reservation
By default, memory is reserved for parallel jobs on a per-host basis. For example, by
default, the command:
bsub -n 4 -R "rusage[mem=500]" -q reservation myjob
requires the job to reserve 500 MB on each host where the job runs.
1 To enable per-slot memory reservation, define
RESOURCE_RESERVE_PER_SLOT=y in
lsb.params. In this example, if
per-slot reservation is enabled, the job must reserve 500 MB of memory for
each job slot (4 * 500 = 2 GB) on the host in order to run.
Backfill Scheduling: Allowing Jobs to Use Reserved Job Slots
By default, a reserved job slot cannot be used by another job. To make better use of
resources and improve performance of LSF, you can configure backfill scheduling.
About backfill scheduling
Backfill scheduling allows other jobs to use the reserved job slots, as long as the
other jobs do not delay the start of another job. Backfilling, together with processor
reservation, allows large parallel jobs to run while not underutilizing resources.
In a busy cluster, processor reservation helps to schedule large parallel jobs sooner.
However, by default, reserved processors remain idle until the large job starts. This
degrades the performance of LSF because the reserved resources are idle while jobs
are waiting in the queue.