Platform LSF Reference Version 6.2
lsb.queues
Platform LSF Reference
415
LSF has two methods of enforcing memory usage:
◆
OS Memory Limit Enforcement
◆
LSF Memory Limit Enforcement
OS memory limit
enforcement
OS memory limit enforcement is the default MEMLIMIT behavior and does not require
further configuration. OS enforcement usually allows the process to eventually run to
completion. LSF passes MEMLIMIT to the OS which uses it as a guide for the system
scheduler and memory allocator. The system may allocate more memory to a process if
there is a surplus. When memory is low, the system takes memory from and lowers the
scheduling priority (re-nice) of a process that has exceeded its declared MEMLIMIT.
Only available on systems that support
RLIMIT_RSS for setrlimit().
Not supported on:
◆
Sun Solaris 2.x
◆
Windows
LSF memory limit
enforcement
To enable LSF memory limit enforcement, set LSB_MEMLIMIT_ENFORCE in
lsf.conf to y. LSF memory limit enforcement explicitly sends a signal to kill a
running process once it has allocated memory past MEMLIMIT.
You can also enable LSF memory limit enforcement by setting LSB_JOB_MEMLIMIT
in
lsf.conf to y. The difference between LSB_JOB_MEMLIMIT set to y and
LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the
per-job memory limit enforced by LSF is enabled. The per-process memory limit
enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the
per-job memory limit enforced by LSF and the per-process memory limit enforced by
the OS are enabled.
Available for all systems on which LSF collects total memory usage.
Example
The following configuration defines a queue with a memory limit of 5000 KB:
Begin Queue
QUEUE_NAME = default
DESCRIPTION = Queue with memory limit of 5000 kbytes
MEMLIMIT = 5000
End Queue
Default
Unlimited
MIG
Syntax
MIG=
minutes
Description
Enables automatic job migration and specifies the migration threshold, in minutes.
Does not affect MultiCluster jobs that are forwarded to a remote cluster.
If a checkpointable or rerunnable job dispatched to the host is suspended (SSUSP state)
for longer than the specified number of minutes, the job is migrated (unless another job
on the same host is being migrated). A value of 0 (zero) specifies that a suspended job
should be migrated immediately.
If a migration threshold is defined at both host and queue levels, the lower threshold is
used.