LSF Version 7.3 - Administering Platform LSF
Custom Job Controls for Resource Preemption
330 Administering Platform LSF
Customizing the SUSPEND action
Ask your application vendor what job control signals or actions cause your
application to suspend a job and release the preemption resources. You need to
replace the default SUSPEND action (the SIGSTOP signal) with another signal or
script that works properly with your application when it suspends the job. For
example, your application might be able to catch SIGTSTP instead of SIGSTOP.
By default, LSF sends SIGCONT to resume suspended jobs. You should find out if
this causes your application to take the resources back when it resumes the job (for
example, if it checks out a license again). If not, you need to modify the RESUME
action also.
Whatever changes you make to the SUSPEND job control affects all suspended jobs
in the queue, including preempted jobs, jobs that are suspended because of load
thresholds, and jobs that you suspend using LSF commands. Similarly, changes
made to the RESUME job control also affect the whole queue.
Killing Preempted Jobs
If you want to use resource preemption, but cannot get your application to release
or take back the resource, you can configure LSF to kill the low-priority job instead
of suspending it. This method is less efficient because when you kill a job, you lose
all the work, and you have to restart the job from the beginning.
◆ You can configure LSF to kill and requeue suspended jobs (use brequeue as the
SUSPEND job control in lsb.queues). This kills all jobs suspended in the queue,
not just preempted jobs.
◆ You can configure LSF to kill preempted jobs instead of suspending them
(TERMINATE_WHEN=PREEMPT in lsb.queues). In this case, LSF does not
restart the preempted job, you have to resubmit it manually.