LSF Version 7.3 - Administering Platform LSF
Administering Platform LSF 587
Configuring Job Controls
TERMINATE job actions
Use caution when configuring TERMINATE job actions that do more than just kill
a job. For example, resource usage limits that terminate jobs change the job state to
SSUSP while LSF waits for the job to end. If the job is not killed by the TERMINATE
action, it remains suspended indefinitely.
TERMINATE_WHEN parameter (lsb.queues)
In certain situations you may want to terminate the job instead of calling the default
SUSPEND action. For example, you may want to kill jobs if the run window of the
queue is closed. Use the TERMINATE_WHEN parameter to configure the queue
to invoke the TERMINATE action instead of SUSPEND.
See the Platform LSF Configuration Reference for information about the
lsb.queues file and the TERMINATE_WHEN parameter.
Syntax TERMINATE_WHEN = [LOAD] [PREEMPT] [WINDOW]
Example The following defines a night queue that will kill jobs if the run window closes.
Begin Queue
NAME = night
RUN_WINDOW = 20:00-08:00
TERMINATE_WHEN = WINDOW
JOB_CONTROLS = TERMINATE[ kill -KILL $LSB_JOBPIDS;
echo "job $LSB_JOBID killed by queue run window" |
mail $USER ]
End Queue
LSB_SIGSTOP parameter (lsf.conf)
Use LSB_SIGSTOP to configure the SIGSTOP signal sent by the default SUSPEND
action.
If LSB_SIGSTOP is set to anything other than SIGSTOP, the SIGTSTP signal that is
normally sent by the SUSPEND action is not sent. For example, if
LSB_SIGSTOP=SIGKILL, the three default signals sent by the TERMINATE action
(SIGINT, SIGTERM, and SIGKILL) are sent 10 seconds apart.
See the Platform LSF Configuration Reference for information about the
lsf.conf
file.
Avoiding signal and action deadlock
Do not configure a job control to contain the signal or command that is the same
as the action associated with that job control. This will cause a deadlock between
the signal and the action.
For example, the
bkill command uses the TERMINATE action, so a deadlock
results when the TERMINATE action itself contains the
bkill command.
Any of the following job control specifications will cause a deadlock:
◆ JOB_CONTROLS=TERMINATE[bkill]
◆ JOB_CONTROLS=TERMINATE[brequeue]
◆ JOB_CONTROLS=RESUME[bresume]
◆ JOB_CONTROLS=SUSPEND[bstop]