Platform LSF Administration Guide Version 6.2

Suspending Conditions
Administering Platform LSF
476
Suspending Conditions
LSF provides different alternatives for configuring suspending conditions. Suspending
conditions are configured at the host level as load thresholds, whereas suspending
conditions are configured at the queue level as either load thresholds, or by using the
STOP_COND parameter in the
lsb.queues file, or both.
The load indices most commonly used for suspending conditions are the CPU run
queue lengths (
r15s, r1m, and r15m), paging rate (pg), and idle time (it). The (swp)
and (
tmp) indices are also considered for suspending jobs.
To give priority to interactive users, set the suspending threshold on the
it (idle time)
load index to a non-zero value. Jobs are stopped when any user is active, and resumed
when the host has been idle for the time given in the
it scheduling condition.
To tune the suspending threshold for paging rate, it is desirable to know the behaviour
of your application. On an otherwise idle machine, check the paging rate using
lsload,
and then start your application. Watch the paging rate as the application runs. By
subtracting the active paging rate from the idle paging rate, you get a number for the
paging rate of your application. The suspending threshold should allow at least 1.5 times
that amount. A job can be scheduled at any paging rate up to the scheduling threshold,
so the suspending threshold should be at least the scheduling threshold plus 1.5 times
the application paging rate. This prevents the system from scheduling a job and then
immediately suspending it because of its own paging.
The effective CPU run queue length condition should be configured like the paging rate.
For CPU-intensive sequential jobs, the effective run queue length indices increase by
approximately one for each job. For jobs that use more than one process, you should
make some test runs to determine your job’s effect on the run queue length indices.
Again, the suspending threshold should be equal to at least the scheduling threshold plus
1.5 times the load for one job.
Configuring load thresholds at queue level
The queue definition (lsb.queues) can contain thresholds for 0 or more of the load
indices. Any load index that does not have a configured threshold has no effect on job
scheduling.
Syntax
Each load index is configured on a separate line with the format:
load_index
=
loadSched
/
loadStop
Specify the name of the load index, for example r1m for the 1-minute CPU run queue
length or
pg for the paging rate. loadSched is the scheduling threshold for this load
index.
loadStop is the suspending threshold. The loadSched condition must be
satisfied by a host before a job is dispatched to it and also before a job suspended on a
host can be resumed. If the
loadStop condition is satisfied, a job is suspended.
The
loadSched and loadStop thresholds permit the specification of conditions
using simple AND/OR logic. For example, the specification:
MEM=100/10
SWAP=200/30
translates into a loadSched condition of mem>=100 && swap>=200 and a
loadStop condition of mem < 10 || swap < 30.