Platform LSF Administration Guide Version 6.2

Submitting and Controlling Chunk Jobs
Administering Platform LSF
414
Migrating jobs with bmig will change the dispatch sequence of the chunk job members.
They will not be redispatched in the order they were originally submitted.
Rerunnable chunk jobs
If the execution host becomes unavailable, rerunnable chunk job members are removed
from the queue and dispatched to a different execution host.
See Chapter 24, “Job Requeue and Job Rerun” for more information about rerunnable
jobs.
Checkpointing chunk jobs
Only running chunk jobs can be checkpointed. If bchkpnt -k is used, the job is also
killed after the checkpoint file has been created. If chunk job in WAIT state is
checkpointed,
mbatchd rejects the checkpoint request.
See Chapter 25, “Job Checkpoint, Restart, and Migration” for more information about
checkpointing jobs.
Fairshare policies and chunk jobs
Fairshare queues can use job chunking. Jobs are accumulated in the chunk job so that
priority is assigned to jobs correctly according to the fairshare policy that applies to each
user. Jobs belonging to other users are dispatched in other chunks.
TERMINATE_WHEN job control action
If the TERMINATE_WHEN job control action is applied to a chunk job, sbatchd
kills the chunk job element that is running and puts the rest of the waiting elements into
pending state to be rescheduled later.
Enforcing resource usage limits on chunk jobs
By default, resource usage limits are not enforced for chunk jobs because chunk jobs are
typically too short to allow LSF to collect resource usage.
To enforce resource limits for chunk jobs, define LSB_CHUNK_RUSAGE=Y in
lsf.conf. Limits may not be enforced for chunk jobs that take less than a minute to
run.
See Chapter 29, “Runtime Resource Usage Limits” for more information.
Migrate (bmig) WAIT Removed from chunk
Switch queue
(bswitch)
RUN Job is removed from the chunk and switched; all other
WAIT jobs are requeued to PEND
WAIT Only the WAIT job is removed from the chunk and
switched, and requeued to PEND
Checkpoint
(bchkpnt)
RUN Job is checkpointed normally
Modify (bmod) PEND Removed from the chunk to be scheduled later
Action (Command) Job State Effect on Job (State)