LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 487
Chunk Job Dispatch
Migrating jobs with bmig changes the dispatch sequence of the chunk job members.
They are not redispatched in the order they were originally submitted.
Rerunnable chunk jobs
If the execution host becomes unavailable, rerunnable chunk job members are
removed from the queue and dispatched to a different execution host.
See Chapter 28, “Job Requeue and Job Rerun” for more information about
rerunnable jobs.
Checkpointing chunk jobs
Only running chunk jobs can be checkpointed. If bchkpnt -k is used, the job is also
killed after the checkpoint file has been created. If chunk job in WAIT state is
checkpointed,
mbatchd rejects the checkpoint request.
See Chapter 29, “Job Checkpoint, Restart, and Migration” for more information
about checkpointing jobs.
Fairshare policies and chunk jobs
Fairshare queues can use job chunking. Jobs are accumulated in the chunk job so
that priority is assigned to jobs correctly according to the fairshare policy that
applies to each user. Jobs belonging to other users are dispatched in other chunks.
TERMINATE_WHEN job control action
If the TERMINATE_WHEN job control action is applied to a chunk job, sbatchd
kills the chunk job element that is running and puts the rest of the waiting elements
into pending state to be rescheduled later.
Enforce resource usage limits on chunk jobs
By default, resource usage limits are not enforced for chunk jobs because chunk jobs
are typically too short to allow LSF to collect resource usage.
1 To enforce resource limits for chunk jobs, define LSB_CHUNK_RUSAGE=Y
in
lsf.conf. Limits may not be enforced for chunk jobs that take less than a
minute to run.
WAIT Job finishes (NJOBS-1, PEND -1)
Resume (
bresume) USUSP Entire chunk is resumed (RUN +1, USUSP -1)
Migrate (
bmig) WAIT Removed from chunk
Switch queue
(
bswitch)
RUN Job is removed from the chunk and switched; all other
WAIT jobs are requeued to PEND
WAIT Only the WAIT job is removed from the chunk and
switched, and requeued to PEND
Checkpoint
(
bchkpnt)
RUN Job is checkpointed normally
Modify (
bmod) PEND Removed from the chunk to be scheduled later
Action (Command) Job State Effect on Job (State)