LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 595
Interactive Jobs with bsub
At the queue level, suspending conditions are defined as STOP_COND as
described in
lsb.queues or as suspending load threshold. At the host level,
suspending conditions are defined as stop load threshold as described in
lsb.hosts.
Resuming
conditions
These conditions determine when a suspended job can be resumed. When these
conditions are met, a RESUME action is performed on a suspended job.
At the queue level, resume conditions are defined as by RESUME_COND in
lsb.queues, or by the loadSched thresholds for the queue if RESUME_COND is
not defined.
Types of load indices
To effectively reduce interference between jobs, correct load indices should be used
properly. Below are examples of a few frequently used parameters.
Paging rate (pg) The paging rate (pg) load index relates strongly to the perceived interactive
performance. If a host is paging applications to disk, the user interface feels very
slow.
The paging rate is also a reflection of a shortage of physical memory. When an
application is being paged in and out frequently, the system is spending a lot of time
performing overhead, resulting in reduced performance.
The paging rate load index can be used as a threshold to either stop sending more
jobs to the host, or to suspend an already running batch job to give priority to
interactive users.
This parameter can be used in different configuration files to achieve different
purposes. By defining paging rate threshold in
lsf.cluster.cluster_name, the
host will become busy from LIM’s point of view; therefore, no more jobs will be
advised by LIM to run on this host.
By including paging rate in queue or host scheduling conditions, jobs can be
prevented from starting on machines with a heavy paging rate, or can be suspended
or even killed if they are interfering with the interactive user on the console.
A job suspended due to
pg threshold will not be resumed even if the resume
conditions are met unless the machine is interactively idle for more than
PG_SUSP_IT seconds.
Interactive idle time
(it)
Strict control can be achieved using the idle time (it) index. This index measures
the number of minutes since any interactive terminal activity. Interactive terminals
include hard wired ttys,
rlogin and lslogin sessions, and X shell windows such as
xterm. On some hosts, LIM also detects mouse and keyboard activity.
This index is typically used to prevent batch jobs from interfering with interactive
activities. By defining the suspending condition in the queue as
it<1 && pg>50, a
job from this queue will be suspended if the machine is not interactively idle and
the paging rate is higher than 50 pages per second. Furthermore, by defining the
resuming condition as
it>5 && pg<10 in the queue, a suspended job from the
queue will not resume unless it has been idle for at least five minutes and the paging
rate is less than ten pages per second.