LSF Version 7.3 - Using Platform LSF HPC

JOB_CONTROLS = TERMINATE[kill -CONT -$LSB_JOBRES_PID; kill -TERM
-$LSB_JOBRES_PID]
If pam and the job RES are in different process groups (for example, pam is started
by a wrapper, which could set its own PGID). Use both LSB_JOBRES_PID and
LSB_PAMPID to make sure your parallel jobs are cleaned up.
JOB_CONTROLS = TERMINATE[kill -CONT -$LSB_JOBRES_PID -$LSB_PAMPID; kill -TERM
-$LSB_JOBRES_PID -$LSB_PAMPID]
sbatchd
See the Platform LSF Configuration Guide for information about JOB_CONTROLS
in the
lsb.queues file.
See Administering Platform LSF for information about configuring job controls.
Sample job termination script for queue job control
By default, LSF sends a SIGUSR2 signal to terminate a job that has reached its run limit
or deadline. Some applications do not respond to the SIGUSR2 signal (for example,
LAM/MPI), so jobs may not exit immediately when a job run limit is reached. You
should configure your queues with a custom job termination action specified by the
JOB_CONTROLS parameter.
Use the following sample job termination control script for the TERMINATE job
control in the
hpc_linux queue for LAM/MPI jobs:
#!/bin/sh
#JOB_CONTROL_LOG=job.control.log.$LSB_BATCH_JID
JOB_CONTROL_LOG=/dev/null
kill -CONT -$LSB_JOBRES_PID >>$JOB_CONTROL_LOG 2>&1
if [ "$LSB_PAM_PID" != "" -a "$LSB_PAM_PID" != "0" ]; then
kill -TERM $LSB_PAM_PID >>$JOB_CONTROL_LOG 2>&1
MACHINETYPE=`uname -a | cut -d" " -f 5`
while [ "$LSB_PAM_PID" != "0" -a "$LSB_PAM_PID" != "" ] # pam is running
do
if [ "$MACHINETYPE" = "CRAY" ]; then
PIDS=`(ps -ef; ps auxww) 2>/dev/null | egrep ".*[/\[ \t]pam[]
\t]*$"| sed -n "/grep/d;s/^ *[^ \t]* *\([0-9]*\).*/\1/p" | sort -u`
else
PIDS=`(ps -ef; ps auxww) 2>/dev/null | egrep " pam |/pam |
pam$|/pam$"| sed -n "/grep/d;s/^ *[^ \t]* *\([0-9]*\).*/\1/p" | sort -u`
fi
echo PIDS=$PIDS >> $JOB_CONTROL_LOG
if [ "$PIDS" = "" ]; then # no pam is running
break;
fi