LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 733
Understanding Platform LSF Job Exit Information
LSF RMS integration exit values
For the RMS integrations with LSF (HP AlphaServer SC and Linux QsNet), LSF
jobs running through RMS will return
rms_run() return code as the job exit code.
RMS documents certain exit codes and corresponding job exit reasons.
See the
rms_run() man page for more information.
Job being migrated
bmig -m togni
Job <213> is being
migrated
33280 SIGNAL -1 SIG_CHKPNT Fri Feb 14 15:04:42: Migration
requested by user or
administrator <iayaz>;
Specified Hosts <togni>;
Fri Feb 14 15:04:44: Job is being
requeued;
Fri Feb 14 15:05:01: Job has
been requeued;
Fri Feb 14 15:05:01: Pending:
Migrating job is waiting for
rescheduling;
Job killed due
REQUEUE_EXIT_VALUE
bsub "sleep 100;exit 34"
8704 Undefined Fri Feb 14 15:10:21: Pending:
Requeued job is waiting for
rescheduling;(exit code 34)>;
Job killed by LSF when
CPULIMIT enforced by
LSF
158 SIGNAL -24 SIG_TERM_CPULIMIT Wed Feb 19 14:18:13: Exited by
signal 30. The CPU time used is
89.4 seconds.
Job killed because
queue level CPULIMIT is
reached.
40448 Undefined Fri Feb 14 15:30:01: Exited with
exit code 158. The CPU time
used is 61.2 seconds;
Job killed because
queue level RUNLIMIT is
reached.
37120 Undefined Fri Feb 14 15:37:44: Exited with
exit code 145. The CPU time
used is 0.2 seconds;
Job killed due to the
check pointing.
bchkpnt -k 838
Job <838> is being
checkpointed
9 SIGNAL -1 SIG_CHKPNT Fri Feb 14 17:59:12: Checkpoint
succeeded (actpid 25298);
Fri Feb 14 17:59:12: Exited by
signal 9. The CPU time used is
0.1 seconds;
Job killed when reaches
the MEMLIMIT
bsub -M 5
"/home/iayaz/script/me
mwrite -m 10 -r 2"
2 SIGNAL -25 SIG_TERM_MEMLIMIT Fri Feb 21 10:50:50: Exited by
signal 2. The CPU time used is
0.1 seconds;
Job killed when
termination time
approaches
bsub -t 21:11:10 sleep
500;date
37120 Undefined Exited with exit code 145. The
CPU time used is 0.2 seconds;
Job killed when
TERMINATE_WHEN =
LOAD
33280 SIGNAL -15 SIG_TERM_LOAD Exited with exit code 130. The
CPU time used is 7.2 seconds.
Job killed when
TERMINATE_WHEN =
PREEMPT
33280 SIGNAL -16 SIG_TERM_PREEMPT Exited with exit code 130. The
CPU time used is 0.3 seconds;
Example termination cause LSB_JOBEXIT_STAT LSB_JOBEXIT_INFO Example bhist output