LSF Version 7.3 - Platform LSF Configuration Reference

Termination signals are operating system dependent, so signal 5
may not be SIGTRAP and 11 may not be SIGSEGV on all UNIX
and Linux systems. You need to pay attention to the execution
host type in order to correct translate the exit value if the job has
been signaled.
bhist and bjobs output
In most cases, bjobs and bhist show the application exit value (128 + signal). In some cases,
bjobs and bhist show the actual signal value.
If LSF sends catchable signals to the job, it displays the exit value. For example, if you run
bkill jobID to kill the job, LSF passes SIGINT, which causes the job to exit with exit code
130 (SIGINT is 2 on most systems, 128+2 = 130).
If LSF sends uncatchable signals to the job, then the entire process group for the job exits with
the corresponding signal. For example, if you run bkill -s SEGV jobID to kill the job,
bjobs and bhist show
Exited by signal 7
Example
The following example shows a job that exited with exit code 139, which means that the job
was terminated with signal 11 (SIGSEGV on most systems, 139-128=11). This means that the
application had a core dump.
bjobs -l 2012
Job <2012>, User , Project , Status , Queue , Command
Fri Dec 27 22:47:28: Submitted from host , CWD <$HOME>;
Fri Dec 27 22:47:37: Started on , Execution Home , Execution CWD ; Fri Dec 27
22:48:02: Exited with exit code 139. The CPU time used is 0.2 seconds.
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp
mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
cpuspeed bandwidth
loadSched - -
loadStop - -
LSF job termination reason logging
When LSF takes action on a job, it may send multiple signals. In the case of job termination,
LSF will send, SIGINT, SIGTERM and SIGKILL in succession until the job has terminated.
As a result, the job may exit with any of those corresponding exit values at the system level.
Other actions may send "warning" signals to applications (SIGUSR2) etc. For specific signal
sequences, refer to the LSF documentation for that feature.
Run bhist to see the actions that LSF takes on a job:
bhist -l 1798
Job <1798>, User <user1>, Command <sleep 10000>
Tue Feb 25 16:35:31: Submitted from host <hostA>, to Queue <normal>, CWD <$H
OME/lsf_7.0/conf/lsbatch/lsf_7.0/configdir>;
Tue Feb 25 16:35:51: Dispatched to <hostA>;
Tue Feb 25 16:35:51: Starting (Pid 12955);
Tue Feb 25 16:35:53: Running with execution home </home/user1>, Execution CWD <
/home/user1/Testing/lsf_7.0/conf/lsbatch/lsf_7.0/configdir>,
Execution Pid <12955>;
Tue Feb 25 16:38:20: Signal <KILL> requested by user or administrator <user1>;
Tue Feb 25 16:38:22: Exited with exit code 130. The CPU time used is 0.1 seconds;
Summary of time in seconds spent in various states by Tue Feb 25 16:38:22
PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
Understanding Platform LSF job exit information
Platform LSF Configuration Reference 611