LSF Version 7.3 - Administering Platform LSF
Configuring Job Control Actions
586 Administering Platform LSF
CHKPNT Checkpoint the job. Only valid for SUSPEND and TERMINATE actions.
◆ If the SUSPEND action is CHKPNT, the job is checkpointed and then stopped
by sending the SIGSTOP signal to the job automatically.
◆ If the TERMINATE action is CHKPNT, then the job is checkpointed and killed
automatically.
command A /bin/sh command line.
◆ Do not quote the command line inside an action definition.
◆ Do not specify a signal followed by an action that triggers the same signal (for
example, do not specify
JOB_CONTROLS=TERMINATE[bkill] or
JOB_CONTROLS=TERMINATE[brequeue]). This will cause a deadlock between
the signal and the action.
Using a command as a job control action
◆ The command line for the action is run with /bin/sh -c so you can use shell
features in the command.
◆ The command is run as the user of the job.
◆ All environment variables set for the job are also set for the command action.
The following additional environment variables are set:
❖ LSB_JOBPGIDS—a list of current process group IDs of the job
❖ LSB_JOBPIDS—a list of current process IDs of the job
◆ For the SUSPEND action command, the environment variables
LSB_SUSP_REASONS and LSB_SUSP_SUBREASONS are also set. Use them
together in your custom job control to determine the exact load threshold that
caused a job to be suspended.
❖ LSB_SUSP_REASONS—an integer representing a bitmap of suspending
reasons as defined in
lsbatch.h. The suspending reason can allow the
command to take different actions based on the reason for suspending the
job.
❖ LSB_SUSP_SUBREASONS—an integer representing the load index that
caused the job to be suspended. When the suspending reason
SUSP_LOAD_REASON (suspended by load) is set in
LSB_SUSP_REASONS, LSB_SUSP_SUBREASONS is set to one of the load
index values defined in
lsf.h.
◆ The standard input, output, and error of the command are redirected to the
NULL device, so you cannot tell directly whether the command runs correctly.
The default null device on UNIX is
/dev/null.
◆ You should make sure the command line is correct. If you want to see the
output from the command line for testing purposes, redirect the output to a file
inside the command line.