LSF Version 7.3 - Administering Platform LSF
Administering Platform LSF 555
Load Thresholds
The load indices most commonly used for suspending conditions are the CPU run
queue lengths (
r15s, r1m, and r15m), paging rate (pg), and idle time (it). The (swp)
and (
tmp) indices are also considered for suspending jobs.
To give priority to interactive users, set the suspending threshold on the
it (idle
time) load index to a non-zero value. Jobs are stopped when any user is active, and
resumed when the host has been idle for the time given in the
it scheduling
condition.
To tune the suspending threshold for paging rate, it is desirable to know the
behaviour of your application. On an otherwise idle machine, check the paging rate
using
lsload, and then start your application. Watch the paging rate as the
application runs. By subtracting the active paging rate from the idle paging rate, you
get a number for the paging rate of your application. The suspending threshold
should allow at least 1.5 times that amount. A job can be scheduled at any paging
rate up to the scheduling threshold, so the suspending threshold should be at least
the scheduling threshold plus 1.5 times the application paging rate. This prevents
the system from scheduling a job and then immediately suspending it because of its
own paging.
The effective CPU run queue length condition should be configured like the paging
rate. For CPU-intensive sequential jobs, the effective run queue length indices
increase by approximately one for each job. For jobs that use more than one process,
you should make some test runs to determine your job’s effect on the run queue
length indices. Again, the suspending threshold should be equal to at least the
scheduling threshold plus 1.5 times the load for one job.
Configuring load thresholds at queue level
The queue definition (lsb.queues) can contain thresholds for 0 or more of the load
indices. Any load index that does not have a configured threshold has no effect on
job scheduling.
Syntax Each load index is configured on a separate line with the format:
load_index = loadSched/loadStop
Specify the name of the load index, for example r1m for the 1-minute CPU run
queue length or
pg for the paging rate. loadSched is the scheduling threshold for
this load index.
loadStop is the suspending threshold. The loadSched condition
must be satisfied by a host before a job is dispatched to it and also before a job
suspended on a host can be resumed. If the
loadStop condition is satisfied, a job is
suspended.
The
loadSched and loadStop thresholds permit the specification of conditions
using simple AND/OR logic. For example, the specification:
MEM=100/10
SWAP=200/30
translates into a loadSched condition of mem>=100 && swap>=200 and a loadStop
condition of
mem < 10 || swap < 30.
Theory ◆ The r15s, r1m, and r15m CPU run queue length conditions are compared to the
effective queue length as reported by
lsload -E, which is normalised for
multiprocessor hosts. Thresholds for these parameters should be set at
appropriate levels for single processor hosts.