LSF Version 7.3 - Administering Platform LSF
Job Scheduling and Dispatch
34 Administering Platform LSF
Job Scheduling and Dispatch
Submitted jobs sit in queues until they are scheduled and dispatched to a host for
execution. When a job is submitted to LSF, many factors control when and where
the job starts to run:
◆ Active time window of the queue or hosts
◆ Resource requirements of the job
◆ Availability of eligible hosts
◆ Various job slot limits
◆ Job dependency conditions
◆ Fairshare constraints
◆ Load conditions
Scheduling policies
First-Come,
First-Served (FCFS)
scheduling
By default, jobs in a queue are dispatched in first-come, first-served (FCFS) order.
This means that jobs are dispatched according to their order in the queue. Since
jobs are ordered according to job priority, this does not necessarily mean that jobs
will be dispatched in the order of submission. The order of jobs in the queue can
also be modified by the user or administrator.
Service level
agreement (SLA)
scheduling
An SLA in LSF is a “just-in-time” scheduling policy that defines an agreement
between LSF administrators and LSF users. The SLA scheduling policy defines how
many jobs should be run from each SLA to meet the configured goals.
Fairshare
scheduling and
other policies
If a fairshare scheduling policy has been specified for the queue or if host partitions
have been configured, jobs are dispatched in accordance with these policies instead.
To solve diverse problems, LSF allows multiple scheduling policies in the same
cluster. LSF has several queue scheduling policies such as exclusive, preemptive,
fairshare, and hierarchical fairshare.
Scheduling and dispatch
Jobs are scheduled at regular intervals (5 seconds by default, configured by the
parameter JOB_SCHEDULING_INTERVAL in
lsb.params). Once jobs are
scheduled, they can be immediately dispatched to hosts.
To prevent overloading any host, LSF waits a short time between dispatching jobs
to the same host. The delay is configured by the JOB_ACCEPT_INTERVAL
parameter in
lsb.params or lsb.queues; the default is 60 seconds. If
JOB_ACCEPT_INTERVAL is set to zero, more than one job can be started on a
host at a time.
For large clusters, define LSF_SERVER_HOSTS in
lsf.conf to decrease the load
on the master LIM.
Some operating systems, such as Linux and AIX, let you increase the number of file
descriptors that can be allocated to the master host. You do not need to limit the
number of file descriptors to 1024 if you want fast job dispatching. To take
advantage of the greater number of file descriptors, you must set the parameter
LSB_MAX_JOB_DISPATCH_PER_SESSION in
lsf.conf to a value greater than