LSF Version 7.3 - Platform LSF Configuration Reference

Commands to monitor
Command Description
bacct -l
Displays accounting statistics for finished jobs, including termination
reasons. TERM_CHKPNT indicates that a job was checkpointed and
killed.
If JOB_CONTROL is defined for a queue, LSF does not display the
result of the action.
bhist -l
Displays the actions that LSF took on a completed job, including job
checkpoint, restart, and migration to another host.
bjobs -l
Displays information about pending, running, and suspended jobs,
including the checkpoint directory, the checkpoint period, and the
checkpoint method (either application or default).
Commands to control
Command
Description
bmod -k "checkpoint_dir
[checkpoint_period]
[method=echkpnt_application]"
Resubmits a job and changes the checkpoint directory, checkpoint
period, and the checkpoint and restart executables associated with
the job.
<checking on what bmod can actually change or whether it can just
specify these if they have not already been specified with bsub. Will
find out when Yousri is back from vacation.>
bmod -kn
Dissociates the checkpoint directory from a job, which makes the job
no longer checkpointable.
bchkpnt
Checkpoints the most recently submitted checkpointable job. Users
can specify particular jobs to checkpoint by including various
bchkpnt options.
bchkpnt -p checkpoint_period job_ID
Checkpoints a job immediately and changes the checkpoint period for
the job.
bchkpnt -k job_ID
Checkpoints a job immediately and kills the job.
bchkpnt -p 0 job_ID
Checkpoints a job immediately and disables periodic checkpointing.
brestart
Restarts a checkpointed job on the first available host.
brestart -m
Restarts a checkpointed job on the specified host or host group.
bmig
Migrates one or more running jobs from one host to another. The jobs
must be checkpointable or rerunnable.
Checkpoints, kills, and restarts one or more checkpointable jobs.
Feature: Job checkpoint and restart
Platform LSF Configuration Reference 101