LSF Version 7.3 - Administering Platform LSF
Administering Platform LSF 553
C HAPTER
35
Load Thresholds
Contents
◆ Automatic Job Suspension on page 553
◆ Suspending Conditions on page 554
Automatic Job Suspension
Jobs running under LSF can be suspended based on the load conditions on the
execution hosts. Each host and each queue can be configured with a set of
suspending conditions. If the load conditions on an execution host exceed either
the corresponding host or queue suspending conditions, one or more jobs running
on that host are suspended to reduce the load.
When LSF suspends a job, it invokes the SUSPEND action. The default SUSPEND
action is to send the signal SIGSTOP.
By default, jobs are resumed when load levels fall below the suspending conditions.
Each host and queue can be configured so that suspended checkpointable or
rerunnable jobs are automatically migrated to another host instead.
If no suspending threshold is configured for a load index, LSF does not check the
value of that load index when deciding whether to suspend jobs.
Suspending thresholds can also be used to enforce inter-queue priorities. For
example, if you configure a low-priority queue with an
r1m (1 minute CPU run
queue length) scheduling threshold of 0.25 and an
r1m suspending threshold of
1.75, this queue starts one job when the machine is idle. If the job is CPU intensive,
it increases the run queue length from 0.25 to roughly 1.25. A high-priority queue
configured with a scheduling threshold of 1.5 and an unlimited suspending
threshold sends a second job to the same host, increasing the run queue to 2.25.
This exceeds the suspending threshold for the low priority job, so it is stopped. The
run queue length stays above 0.25 until the high priority job exits. After the high
priority job exits the run queue index drops back to the idle level, so the low priority
job is resumed.