Platform LSF Administration Guide Version 6.2
Chapter 25
Job Checkpoint, Restart, and Migration
Administering Platform LSF
403
Enabling Periodic Checkpointing
Periodic checkpointing involves creating a checkpoint file at regular time intervals during
the execution of your job. LSF provides the ability to enable periodic checkpointing
manually on the command line and automatically through configuration. Automatic
periodic checkpointing is discussed in “Automatically Checkpointing Jobs” on page 404.
LSF can only perform a checkpoint for checkpointable jobs as described in “Making
Jobs Checkpointable” on page 401.
Manually enabling periodic checkpointing involves specifying a checkpoint period in
minutes.
At job submission
LSF uses the -k "checkpoint_dir checkpoint_period" option of bsub to
enable periodic checkpointing at job submission. For example, to periodically
checkpoint
my_job every 2 hours (120 minutes):
%
bsub -k "my_dir 120" my_job
Job <123> is submitted to default queue <default>.
After job
submission
LSF uses the -p period option of bchkpnt to enable periodic checkpointing after
submission. When a checkpoint period is specified after submission, LSF checkpoints
the job immediately then checkpoints it again after the specified period of time. For
example, to periodically checkpoint a job with job ID 123 every 2 hours (120 minutes):
%
bchkpnt -p 120 123
Job <123> is being checkpointed
You can also use the -p option of bchkpnt to change a checkpoint period. For
example, to change the checkpoint period of a job with job ID 123 to every 4 hours (240
minutes):
%
bchkpnt -p 240 123
Job <123> is being checkpointed
Disabling periodic checkpointing
To disable periodic checkpointing, specify a period of 0 (zero). For example, to disable
periodic checkpointing for a job with job ID 123:
%
bchkpnt -p 0 123
Job <123> is being checkpointed