LSF Version 7.3 - Administering Platform LSF

Suspending Conditions
556 Administering Platform LSF
Configure load thresholds consistently across queues. If a low priority queue
has higher suspension thresholds than a high priority queue, then jobs in the
higher priority queue are suspended before jobs in the low priority queue.
Configuring load thresholds at host level
A shared resource cannot be used as a load threshold in the Hosts section of the
lsf.cluster.cluster_name file.
Configuring suspending conditions at queue level
The condition for suspending a job can be specified using the queue-level
STOP_COND parameter. It is defined by a resource requirement string. Only the
select section of the resource requirement string is considered when stopping a
job. All other sections are ignored.
This parameter provides similar but more flexible functionality for
loadStop.
If
loadStop thresholds have been specified, then a job is suspended if either the
STOP_COND is TRUE or the
loadStop thresholds are exceeded.
Example This queue suspends a job based on the idle time for desktop machines and based
on availability of swap and memory on compute servers. Assume
cs is a Boolean
resource defined in the
lsf.shared file and configured in the
lsf.cluster.cluster_name file to indicate that a host is a compute server:
Begin Queue
.
STOP_COND= select[((!cs && it < 5) || (cs && mem < 15 && swap < 50))]
.
End Queue
Viewing host-level and queue-level suspending conditions
The suspending conditions are displayed by the bhosts -l and bqueues -l
commands.
Viewing job-level suspending conditions
The thresholds that apply to a particular job are the more restrictive of the host and
queue thresholds, and are displayed by the
bjobs -l command.
Viewing suspend reason
The bjobs -lp command shows the load threshold that caused LSF to suspend a
job, together with the scheduling parameters.
The use of STOP_COND affects the suspending reasons as displayed by the
bjobs
command. If STOP_COND is specified in the queue and the
loadStop thresholds
are not specified, the suspending reasons for each individual load index are not
displayed.
Resuming suspended jobs
Jobs are suspended to prevent overloading hosts, to prevent batch jobs from
interfering with interactive use, or to allow a more urgent job to run. When the host
is no longer overloaded, suspended jobs should continue running.