LSF Version 7.3 - Platform LSF Configuration Reference

If the command line is… And… Then…
bsub -k "my_dir"
In lsb.applications,
CHKPNT_PERIOD=360
LSF saves the checkpoint file to my_dir/
job_ID every 360 minutes
bsub -k "240"
In lsb.applications,
CHKPNT_DIR=app_dir
CHKPNT_PERIOD=360
In lsb.queues,
CHKPNT=other_dir
LSF saves the checkpoint file to app_dir/
job_ID every 240 minutes
Configuration to modify job checkpoint and restart
There are configuration parameters that modify various aspects of job checkpoint and restart
behavior by:
Specifying mandatory application-level checkpoint and restart executables that apply to
all checkpointable batch jobs in the cluster
Specifying the directory that contains customized application-level checkpoint and restart
executables
Saving standard output and standard error to files in the checkpoint directory
Automatically checkpointing jobs before suspending or terminating them
For Cray systems only, copying all open job files to the checkpoint directory
Configuration to specify mandatory application-level executables
You can specify mandatory checkpoint and restart executables by defining the parameter
LSB_ECHKPNT_METHOD in lsf.conf or as an environment variable.
Configuration
file
Parameter and syntax Behavior
lsf.conf LSB_ECHKPNT_METHOD="echkpnt_applicat
ion"
The specified echkpnt runs for all
batch jobs submitted to the cluster. At
restart, the corresponding erestart
runs.
For example, if
LSB_ECHKPNT_METHOD=fluent,
at checkpoint, LSF runs
echkpnt.fluent and at restart, LSF
runs erestart.fluent.
If an LSF user specifies a different
echkpnt_application at the job level
using bsub -k or bmod -k, the job
level value overrides the value in
lsf.conf.
Feature: Job checkpoint and restart
98 Platform LSF Configuration Reference