LSF Version 7.3 - Using Platform LSF HPC
LSF installs echkpnt.ls_dyna and erestart.ls_dyna, which are special
versions of
echkpnt and erestart to allow checkpointing with LS-Dyna. Use bsub
-a ls_dyna
to make sure your job uses these files.
The method name
ls_dyna, uses the esub for LS-Dyna jobs, which sets the
checkpointing method
LSB_ECHKPNT_METHOD="ls_dyna" to use
echkpnt.ls_dyna and erestart.ls_dyna.
When you submit a checkpointing job, you specify a checkpoint directory.
Before the job starts running, LSF sets the environment variable LSB_CHKPNT_DIR
to a subdirectory of the checkpoint directory specified in the command line, or the
CHKPNT parameter in lsb.queues. This subdirectory is identified by the job ID and
only contains files related to the submitted job.
For checkpointing to work when running an LS-Dyna job from LSF, you must CD to
the directory that LSF sets in
$LSB_CHKPNT_DIR after submitting LS-Dyna jobs. You
must change to this directory whether submitting a single job or multiple jobs. LS-Dyna
puts all its output files in this directory.
When you checkpoint a job, LSF creates a checkpoint trigger file named D3KIL in the
working directory of the job.
The
D3KIL file contains an entry depending on the desired checkpoint outcome:
◆
sw1. causes the job to checkpoint and exit. LS-Dyna writes to a restart data file
d3dump and exits.
◆
sw3. causes the job to checkpoint and continue running. LS-Dyna writes to a
restart data file
d3dump and continues running until the next checkpoint.
LS-Dyna does not remove the D3KIL trigger file after checkpointing the job.
If a job is restarted, LSF attempts to restart the job with the -r
restart_file
option used to replace any existing
-i or -r options in the original LS-Dyna command.
LS-Dyna uses the checkpointed data to restart the process from that checkpoint point,
rather than starting the entire job from the beginning.
Each time a job is restarted, it is assigned a new job ID, and a new job subdirectory is
created in the checkpoint directory. Files in the checkpoint directory are never deleted
by LSF, but you may choose to remove old files once the LS-Dyna job is finished and
the job history is no longer required.
Submitting LS-Dyna jobs
To submit LS_Dyna jobs, redirect a job script to the standard input of bsub, including
parameters required for checkpointing. With job scripts, you can manage two limitations
of LS-Dyna job submissions: