Platform LSF Administration Guide Version 6.2

Chapter 35
Interactive Jobs with bsub
Administering Platform LSF
519
Performance Tuning for Interactive Batch Jobs
LSF is often used on systems that support both interactive and batch users. On one
hand, users are often concerned that load sharing will overload their workstations and
slow down their interactive tasks. On the other hand, some users want to dedicate some
machines for critical batch jobs so that they have guaranteed resources. Even if all your
workload is batch jobs, you still want to reduce resource contentions and operating
system overhead to maximize the use of your resources.
Numerous parameters can be used to control your resource allocation and to avoid
undesirable contention.
Types of load conditions
Since interferences are often reflected from the load indices, LSF responds to load
changes to avoid or reduce contentions. LSF can take actions on jobs to reduce
interference before or after jobs are started. These actions are triggered by different load
conditions. Most of the conditions can be configured at both the queue level and at the
host level. Conditions defined at the queue level apply to all hosts used by the queue,
while conditions defined at the host level apply to all queues using the host.
Scheduling
conditions
These conditions, if met, trigger the start of more jobs. The scheduling conditions are
defined in terms of load thresholds or resource requirements.
At the queue level, scheduling conditions are configured as either resource requirements
or scheduling load thresholds, as described in
lsb.queues. At the host level, the
scheduling conditions are defined as scheduling load thresholds, as described in
lsb.hosts.
Suspending
conditions
These conditions affect running jobs. When these conditions are met, a SUSPEND
action is performed to a running job.
At the queue level, suspending conditions are defined as STOP_COND as described in
lsb.queues or as suspending load threshold. At the host level, suspending conditions
are defined as stop load threshold as described in
lsb.hosts.
Resuming
conditions
These conditions determine when a suspended job can be resumed. When these
conditions are met, a RESUME action is performed on a suspended job.
At the queue level, resume conditions are defined as by RESUME_COND in
lsb.queues, or by the loadSched thresholds for the queue if RESUME_COND is
not defined.
Types of load indices
To effectively reduce interference between jobs, correct load indices should be used
properly. Below are examples of a few frequently used parameters.
Paging rate (pg)
The paging rate (pg) load index relates strongly to the perceived interactive
performance. If a host is paging applications to disk, the user interface feels very slow.
The paging rate is also a reflection of a shortage of physical memory. When an
application is being paged in and out frequently, the system is spending a lot of time
performing overhead, resulting in reduced performance.