LSF Version 7.3 - Platform LSF Configuration Reference

Applicability Details
Dependencies
UNIX and Windows user accounts must be valid on all hosts in the cluster, or the
correct type of account mapping must be enabled.
For a mixed UNIX/Windows cluster, UNIX/Windows user account mapping
must be enabled.
For a cluster with a non-uniform user name space, between-host account
mapping must be enabled.
For a MultiCluster environment with a non-uniform user name space, cross-
cluster user account mapping must be enabled.
The checkpoint and restart executables run under the user account of the user who
submits the job. User accounts must have the correct permissions to
Successfully run executables located in LSF_SERVERDIR or
LSB_ECHKPNT_METHOD_DIR
Write to the checkpoint directory
The erestart.application executable must have access to the original command
line used to submit the job.
For user-level checkpoint and restart, you must have access to your application
object (.o) files.
To allow restart of a checkpointed job on a different host than the host on which
the job originally ran, both the original and the new hosts must:
Be binary compatible
Run the same dot version of the operating system for predictable results
Have network connectivity and read/execute permissions to the checkpoint and
restart executables (in LSF_SERVERDIR by default)
Have network connectivity and read/write permissions to the checkpoint
directory and the checkpoint file
Have access to all files open during job execution so that LSF can locate them
using an absolute path name
Limitations
bmod cannot change the echkpnt and erestart executables associated with a
job.
Configuration to enable job checkpoint and restart
The job checkpoint and restart feature requires that a job be made checkpointable at the job
or queue level. LSF users can make jobs checkpointable by submitting jobs using bsub -k
and specifying a checkpoint directory. Queue administrators can make all jobs in a queue
checkpointable by specifying a checkpoint directory for the queue.
Feature: Job checkpoint and restart
Platform LSF Configuration Reference 93