Platform LSF Administration Guide Version 6.2
Cluster Concepts
Administering Platform LSF
62
Resources
Resource usage
The LSF system uses built-in and configured resources to track resource availability and
usage. Jobs are scheduled according to the resources available on individual hosts.
Jobs submitted through the LSF system will have the resources they use monitored while
they are running. This information is used to enforce resource limits and load thresholds
as well as fairshare scheduling.
LSF collects information such as:
◆
Total CPU time consumed by all processes in the job
◆
Total resident memory usage in KB of all currently running processes in a job
◆
Total virtual memory usage in KB of all currently running processes in a job
◆
Currently active process group ID in a job
◆
Currently active processes in a job
On UNIX, job-level resource usage is collected through PIM.
Commands
◆
lsinfo—View the resources available in your cluster
◆
bjobs -l—View current resource usage of a job
Configuration
◆
SBD_SLEEP_TIME in lsb.params—Configures how often resource usage
information is sampled by PIM, collected by
sbatchd, and sent to mbatchd
Load indices
Load indices measure the availability of dynamic, non-shared resources on hosts in the
cluster. Load indices built into the LIM are updated at fixed time intervals.
Commands
◆
lsload -l—View all load indices
◆
bhosts -l—View load levels on a host
External load
indices
Defined and configured by the LSF administrator and collected by an External Load
Information Manager (ELIM) program. The ELIM also updates LIM when new values
are received.
Commands
◆
lsinfo—View external load indices
Static resources
Built-in resources that represent host information that does not change over time, such
as the maximum RAM available to user processes or the number of processors in a
machine. Most static resources are determined by the LIM at start-up time.
Static resources can be used to select appropriate hosts for particular jobs based on
binary architecture, relative CPU speed, and system configuration.
Load thresholds
Two types of load thresholds can be configured by your LSF administrator to schedule
jobs in queues. Each load threshold specifies a load index value:
◆
loadSched determines the load condition for dispatching pending jobs. If a host’s
load is beyond any defined
loadSched, a job will not be started on the host. This
threshold is also used as the condition for resuming suspended jobs.
◆
loadStop determines when running jobs should be suspended.