Platform LSF Administration Guide Version 6.2

Allowing Jobs to Use Reserved Job Slots
Administering Platform LSF
448
Allowing Jobs to Use Reserved Job Slots
By default, a reserved job slot cannot be used by another job. To make better use of
resources and improve performance of LSF, you can configure backfill scheduling.
About backfill scheduling
Backfill scheduling allows other jobs to use the reserved job slots, as long as the other
jobs will not delay the start of another job. Backfilling, together with processor
reservation, allows large parallel jobs to run while not underutilizing resources.
In a busy cluster, processor reservation helps to schedule large parallel jobs sooner.
However, by default, reserved processors remain idle until the large job starts. This
degrades the performance of LSF because the reserved resources are idle while jobs are
waiting in the queue.
Backfill scheduling allows the reserved job slots to be used by small jobs that can run
and finish before the large job starts. This improves the performance of LSF because it
increases the utilization of resources.
How backfilling works
For backfill scheduling, LSF assumes that a job will run until its run limit expires. Backfill
scheduling works most efficiently when all the jobs in the cluster have a run limit.
Since jobs with a shorter run limit have more chance of being scheduled as backfill jobs,
users who specify appropriate run limits in a backfill queue will be rewarded by
improved turnaround time.
Once the big parallel job has reserved sufficient job slots, LSF calculates the start time
of the big job, based on the run limits of the jobs currently running in the reserved slots.
LSF cannot backfill if the big job is waiting for a job that has no run limit defined.
If LSF can backfill the idle job slots, only jobs with run limits that expire before the start
time of the big job will be allowed to use the reserved job slots. LSF cannot backfill with
a job that has no run limit.
Example
In this scenario, assume the cluster consists of a 4-CPU multiprocessor host.
1
A sequential job (job1) with a run limit of 2 hours is submitted and gets started at
8:00 am (figure a).