LSF Version 7.3 - Administering Platform LSF
Suspending Conditions
554 Administering Platform LSF
When jobs are running on a host, LSF periodically checks the load levels on that
host. If any load index exceeds the corresponding per-host or per-queue
suspending threshold for a job, LSF suspends the job. The job remains suspended
until the load levels satisfy the scheduling thresholds.
At regular intervals, LSF gets the load levels for that host. The period is defined by
the SBD_SLEEP_TIME parameter in the
lsb.params file. Then, for each job
running on the host, LSF compares the load levels against the host suspending
conditions and the queue suspending conditions. If any suspending condition at
either the corresponding host or queue level is satisfied as a result of increased load,
the job is suspended. A job is only suspended if the load levels are too high for that
particular job’s suspending thresholds.
There is a time delay between when LSF suspends a job and when the changes to
host load are seen by the LIM. To allow time for load changes to take effect, LSF
suspends no more than one job at a time on each host.
Jobs from the lowest priority queue are checked first. If two jobs are running on a
host and the host is too busy, the lower priority job is suspended and the higher
priority job is allowed to continue. If the load levels are still too high on the next
turn, the higher priority job is also suspended.
If a job is suspended because of its own load, the load drops as soon as the job is
suspended. When the load goes back within the thresholds, the job is resumed until
it causes itself to be suspended again.
Exceptions
In some special cases, LSF does not automatically suspend jobs because of load
levels. LSF does not suspend a job:
◆ Forced to run with brun -f.
◆ If it is the only job running on a host, unless the host is being used interactively.
When only one job is running on a host, it is not suspended for any reason
except that the host is not interactively idle (the
it interactive idle time load
index is less than one minute). This means that once a job is started on a host,
at least one job continues to run unless there is an interactive user on the host.
Once the job is suspended, it is not resumed until all the scheduling conditions
are met, so it should not interfere with the interactive user.
◆ Because of the paging rate, unless the host is being used interactively. When a
host has interactive users, LSF suspends jobs with high paging rates, to improve
the response time on the host for interactive users. When a host is idle, the
pg
(paging rate) load index is ignored. The PG_SUSP_IT parameter in
lsb.params controls this behaviour. If the host has been idle for more than
PG_SUSP_IT minutes, the
pg load index is not checked against the suspending
threshold.
Suspending Conditions
LSF provides different alternatives for configuring suspending conditions.
Suspending conditions are configured at the host level as load thresholds, whereas
suspending conditions are configured at the queue level as either load thresholds,
or by using the STOP_COND parameter in the
lsb.queues file, or both.