LSF Version 7.3 - Administering Platform LSF

Backfill Scheduling: Allowing Jobs to Use Reserved Job Slots
520 Administering Platform LSF
You should configure REQUEUE_EXIT_VALUES for the queue so that
resubmission is automatic. In order to terminate completely, jobs must have
specific exit values:
If jobs are checkpointible, use their checkpoint exit value.
If jobs periodically save data on their own, use the SIGTERM exit value.
View the run limits for interruptible backfill jobs (bjobs and bhist)
1 Use bjobs to display the run limit calculated based on the configured
queue-level run limit.
For example, the interruptible backfill queue
lazy configures RUNLIMIT=60:
bjobs -l 135
Job <135>, User <user1>, Project <default>, Status <RUN>, Queue <lazy>, Command
<myjob>
Mon Nov 21 11:49:22: Submitted from host <hostA>, CWD <$HOME/H
PC/jobs>;
RUNLIMIT
59.5 min of hostA
Mon Nov 21 11:49:26: Started on <hostA>, Execution Home </home
/user1>, Execution CWD </home/user1/HPC/jobs>;
2 Use bhist to display job-level run limit if specified.
For example, job 135 was submitted with a run limit of 3 hours:
bsub -n 1 -q lazy -W 3:0 myjob
Job <135> is submitted to queue <lazy>.
bhist displays the job-level run limit:
bhist -l 135
Job <135>, User <user1>, Project <default>, Command <myjob>
Mon Nov 21 11:49:22: Submitted from host <hostA>, to Queue <la
zy>, CWD <$HOME/HPC/jobs>;
RUNLIMIT
180.0 min of hostA
Mon Nov 21 11:49:26: Dispatched to <hostA>;
Mon Nov 21 11:49:26: Starting (Pid 2746);
Mon Nov 21 11:49:27: Interruptible backfill runtime limit is 59.5 minutes;
Mon Nov 21 11:49:27: Running with execution home </home/user1>, Execution CWD
...