LSF Version 7.3 - Administering Platform LSF
Backfill Scheduling: Allowing Jobs to Use Reserved Job Slots
520 Administering Platform LSF
You should configure REQUEUE_EXIT_VALUES for the queue so that
resubmission is automatic. In order to terminate completely, jobs must have
specific exit values:
◆ If jobs are checkpointible, use their checkpoint exit value.
◆ If jobs periodically save data on their own, use the SIGTERM exit value.
View the run limits for interruptible backfill jobs (bjobs and bhist)
1 Use bjobs to display the run limit calculated based on the configured
queue-level run limit.
For example, the interruptible backfill queue
lazy configures RUNLIMIT=60:
bjobs -l 135
Job <135>, User <user1>, Project <default>, Status <RUN>, Queue <lazy>, Command
<myjob>
Mon Nov 21 11:49:22: Submitted from host <hostA>, CWD <$HOME/H
PC/jobs>;
RUNLIMIT
59.5 min of hostA
Mon Nov 21 11:49:26: Started on <hostA>, Execution Home </home
/user1>, Execution CWD </home/user1/HPC/jobs>;
2 Use bhist to display job-level run limit if specified.
For example, job 135 was submitted with a run limit of 3 hours:
bsub -n 1 -q lazy -W 3:0 myjob
Job <135> is submitted to queue <lazy>.
bhist displays the job-level run limit:
bhist -l 135
Job <135>, User <user1>, Project <default>, Command <myjob>
Mon Nov 21 11:49:22: Submitted from host <hostA>, to Queue <la
zy>, CWD <$HOME/HPC/jobs>;
RUNLIMIT
180.0 min of hostA
Mon Nov 21 11:49:26: Dispatched to <hostA>;
Mon Nov 21 11:49:26: Starting (Pid 2746);
Mon Nov 21 11:49:27: Interruptible backfill runtime limit is 59.5 minutes;
Mon Nov 21 11:49:27: Running with execution home </home/user1>, Execution CWD
...