Platform LSF Administration Guide Version 6.2
Chapter 28
Running Parallel Jobs
Administering Platform LSF
445
Reserving Processors
About processor reservation
When parallel jobs have to compete with sequential jobs for job slots, the slots that
become available are likely to be taken immediately by a sequential job. Parallel jobs need
multiple job slots to be available before they can be dispatched. If the cluster is always
busy, a large parallel job could be pending indefinitely. The more processors a parallel
job requires, the worse the problem is.
Processor reservation solves this problem by reserving job slots as they become
available, until there are enough reserved job slots to run the parallel job.
You might want to configure processor reservation if your cluster has a lot of sequential
jobs that compete for job slots with parallel jobs.
How processor reservation works
Processor reservation is disabled by default.
If processor reservation is enabled, and a parallel job cannot be dispatched because there
are not enough job slots to satisfy its minimum processor requirements, the job slots that
are currently available will be reserved and accumulated.
A reserved job slot is unavailable to any other job. To avoid deadlock situations in which
the system reserves job slots for multiple parallel jobs and none of them can acquire
sufficient resources to start, a parallel job will give up all its reserved job slots if it has
not accumulated enough to start within a specified time. The reservation time starts
from the time the first slot is reserved. When the reservation time expires, the job cannot
reserve any slots for one scheduling cycle, but then the reservation process can begin
again.
Configuring processor reservation
To enable processor reservation, set SLOT_RESERVE in lsb.queues and specify the
reservation time (a job cannot hold any reserved slots after its reservation time expires).
Syntax
SLOT_RESERVE=MAX_RESERVE_TIME[n].
where n is an integer by which to multiply MBD_SLEEP_TIME. MBD_SLEEP_TIME
is defined in
lsb.params; the default value is 60 seconds.
Example
Begin Queue
.
PJOB_LIMIT=1
SLOT_RESERVE = MAX_RESERVE_TIME[5]
.
End Queue
In this example, if MBD_SLEEP_TIME is 60 seconds, a job can reserve job slots for 5
minutes. If MBD_SLEEP_TIME is 30 seconds, a job can reserve job slots for 5 *30=
150 seconds, or 2.5 minutes.