LSF Version 7.3 - Platform LSF Configuration Reference

LSF uses the default executables echkpnt.default and erestart.default for kernel-level checkpoint and restart.
User-level checkpoint and restart
For systems that do not support kernel-level checkpoint and restart, LSF provides a job checkpoint and restart
implementation that is transparent to your applications and does not require you to rewrite code. User-level job
checkpoint and restart is enabled by linking your application files to the LSF checkpoint libraries in LSF_LIBDIR. LSF
uses the default executables echkpnt.default and erestart.default for user-level checkpoint and restart.
Application-level checkpoint and restart
Different applications have different checkpointing implementations that require the use of customized external
executables (echkpnt.application and erestart.application). Application-level checkpoint and restart enables you
to configure LSF to use specific echkpnt.application and erestart.application executables for a job, queue, or
cluster. You can write customized checkpoint and restart executables for each application that you use.
LSF uses a combination of corresponding checkpoint and restart executables. For example, if you use
echkpnt.fluent to checkpoint a particular job, LSF will use erestart.fluent to restart the checkpointed job.
You cannot override this behavior or configure LSF to use a specific restart executable.
Scope
Applicability
Details
Operating system
Kernel-level checkpoint and restart using the LSF checkpoint libraries works only
with supported operating system versions and architecture for:
SGI IRIX 6.4 and later
SGI Altix ProPack 3 and later
Job types
Non-interactive batch jobs submitted with bsub or bmod
Non-interactive batch jobs, including chunk jobs, checkpointed with bchkpnt
Non-interactive batch jobs migrated with bmig
Non-interactive batch jobs restarted with brestart
Feature: Job checkpoint and restart
92 Platform LSF Configuration Reference