LSF Version 7.3 - Administering Platform LSF
Backfill Scheduling: Allowing Jobs to Use Reserved Job Slots
514 Administering Platform LSF
Backfill scheduling allows the reserved job slots to be used by small jobs that can
run and finish before the large job starts. This improves the performance of LSF
because it increases the utilization of resources.
How backfilling works
For backfill scheduling, LSF assumes that a job can run until its run limit expires.
Backfill scheduling works most efficiently when all the jobs in the cluster have a run
limit.
Since jobs with a shorter run limit have more chance of being scheduled as backfill
jobs, users who specify appropriate run limits in a backfill queue is rewarded by
improved turnaround time.
Once the big parallel job has reserved sufficient job slots, LSF calculates the start
time of the big job, based on the run limits of the jobs currently running in the
reserved slots. LSF cannot backfill if the big job is waiting for a job that has no run
limit defined.
If LSF can backfill the idle job slots, only jobs with run limits that expire before the
start time of the big job is allowed to use the reserved job slots. LSF cannot backfill
with a job that has no run limit.
Example
In this scenario, assume the cluster consists of a 4-CPU multiprocessor host.
1 A sequential job (
job1) with a run limit of 2 hours is submitted and gets started
at 8:00 am (figure a).
2 Shortly afterwards, a parallel job (
job2) requiring all 4 CPUs is submitted. It
cannot start right away because
job1 is using one CPU, so it reserves the
remaining 3 processors (figure b).
3 At 8:30 am, another parallel job (
job3) is submitted requiring only two
processors and with a run limit of 1 hour. Since
job2 cannot start until 10:00am
(when
job1 finishes), its reserved processors can be backfilled by job3 (figure
c). Therefore
job3 can complete before job2's start time, making use of the idle
processors.
4
Job3 finishes at 9:30am and job1 at 10:00am, allowing job2 to start shortly
after 10:00am. In this example, if
job3's run limit was 2 hours, it would not be
able to backfill
job2's reserved slots, and would have to run after job2 finishes.