LSF Version 7.3 - Platform LSF Configuration Reference

Example termination
cause
LSB_JOBEXIT_
STAT
LSB_JOBEXIT_INFO Example bhist output
Job killed when reaches
the MEMLIMIT bsub -M 5
"/home/iayaz/script/
memwrite -m 10 -r 2"
2 SIGNAL -25
SIG_TERM_MEMLIMIT
Fri Feb 21 10:50:50: Exited by
signal 2. The CPU time used is
0.1 seconds;
Job killed when
termination time
approaches bsub -t
21:11:10 sleep 500;date
37120 Undefined Exited with exit code 145. The
CPU time used is 0.2 seconds;
Job killed when
TERMINATE_WHEN =
LOAD
33280 SIGNAL -15 SIG_TERM_LOAD Exited with exit code 130. The
CPU time used is 7.2 seconds.
Job killed when
TERMINATE_WHEN =
PREEMPT
33280 SIGNAL -16
SIG_TERM_PREEMPT
Exited with exit code 130. The
CPU time used is 0.3 seconds;
LSF RMS integration exit values
For the RMS integrations with LSF (HP AlphaServer SC and Linux QsNet), LSF jobs running
through RMS will return rms_run() return code as the job exit code. RMS documents certain
exit codes and corresponding job exit reasons.
See the rms_run() man page for more information.
Upon successful completion, rms_run() returns the global OR of the exit status values of the
processes in the parallel program. If one of the processes is killed, rms_run() returns a status
value of 128 plus the signal number. It can also return the following codes:
Return Code
RMS Meaning
0 A process exited with the code 127 (GLOBAL EXIT), which indicates success, causing all of the
processes to exit.
123 A process exited with the code 123 (GLOBAL ERROR) causing all of the processes to exit.
124 The node the job executing on has been removed from the system.
125 One or more processes were still running when the exit timeout expired.
126 The resource is inadequate for the request.
Understanding Platform LSF job exit information
Platform LSF Configuration Reference 617