Platform LSF Reference Version 6.2
Parameters
Platform LSF Reference
522
◆
LSF-enforced per-job limit—When the total memory allocated to all processes in
the job exceeds the memory limit, LSF sends the following signals to kill the job:
SIGINT, SIGTERM, then SIGKILL. The interval between signals is 10 seconds by
default.
On UNIX, the time interval between SIGINT, SIGKILL, SIGTERM can be
configured with the parameter JOB_TERMINATE_INTERVAL in
lsb.params.
◆
OS-enforced per process limit—When the memory allocated to one process of the
job exceeds the memory limit, the operating system enforces the limit. LSF passes
the memory limit to the operating system. Some operating systems apply the
memory limit to each process, and some do not enforce the memory limit at all.
OS memory limit enforcement is only available on systems that support
RLIMIT_RSS for setrlimit().
The following operating systems do not support the memory limit at the OS level
and the job will be allowed to run without a memory limit:
❖
Windows
❖
Sun Solaris 2.x
Default
Undefined; per-process memory limit enforced by the OS; per-job memory limit
enforced by LSF disabled
Notes
To make LSB_JOB_MEMLIMIT take effect, use the command badmin hrestart
all to restart all sbatchds in the cluster.
If LSB_JOB_MEMLIMIT is set, it overrides the setting of the parameter
LSB_MEMLIMIT_ENFORCE. The parameter LSB_MEMLIMIT_ENFORCE is
ignored.
The difference between LSB_JOB_MEMLIMIT set to y and
LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the
per-job memory limit enforced by LSF is enabled. The per-process memory limit
enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the
per-job memory limit enforced by LSF and the per-process memory limit enforced by
the OS are enabled.
Changing the default Terminate job control action—You can define a different
Terminate action in
lsb.queues with the parameter JOB_CONTROLS if you do not
want the job to be killed. For more details on job controls, see Administering
Platform LSF.
Limitations
If a job is running and the parameter is changed, LSF is not able to reset the type of limit
enforcement for running jobs.
◆
If the parameter is changed from per-process limit enforced by the OS to per-job
limit enforced by LSF (LSB_JOB_MEMLIMIT=n or undefined changed to
LSB_JOB_MEMLIMIT=y), both per-process limit and per-job limit will affect the
running job. This means that signals may be sent to the job either when the memory
allocated to an individual process exceeds the memory limit or the sum of memory
allocated to all processes of the job exceed the limit. A job that is running may be
killed by LSF.
◆
If the parameter is changed from per-job limit enforced by LSF to per-process limit
enforced by the OS (LSB_JOB_MEMLIMIT=y changed to