LSF Version 7.3 - Administering Platform LSF
Submitting and Controlling Chunk Jobs
486 Administering Platform LSF
Submitting and Controlling Chunk Jobs
When a job is submitted to a queue or application profile configured with the
CHUNK_JOB_SIZE parameter, LSF attempts to place the job in an existing chunk.
A job is added to an existing chunk if it has the same characteristics as the first job
in the chunk:
◆ Submitting user
◆ Resource requirements
◆ Host requirements
◆ Queue or application profile
◆ Job priority
If a suitable host is found to run the job, but there is no chunk available with the
same characteristics, LSF creates a new chunk.
Resources reserved for any member of the chunk are reserved at the time the chunk
is dispatched and held until the whole chunk finishes running. Other jobs requiring
the same resources are not dispatched until the chunk job is done.
For example, if all jobs in the chunk require a software license, the license is checked
out and each chunk job member uses it in turn. The license is not released until the
last chunk job member is finished running.
WAIT status
When sbatchd receives a chunk job, it does not start all member jobs at once. A
chunk job occupies a single job slot. Even if other slots are available, the chunk job
members must run one at a time in the job slot they occupy. The remaining jobs in
the chunk that are waiting to run are displayed as
WAIT by bjobs. Any jobs in WAIT
status are included in the count of pending jobs by
bqueues and busers. The
bhosts command shows the single job slot occupied by the entire chunk job in the
number of jobs shown in the NJOBS column.
The
bhist -l command shows jobs in WAIT status as Waiting ...
The bjobs -l command does not display a WAIT reason in the list of pending jobs.
Controlling chunk jobs
Job controls affect the state of the members of a chunk job. You can perform the
following actions on jobs in a chunk job:
Action (Command) Job State Effect on Job (State)
Suspend (bstop) PEND Removed from chunk (PSUSP)
RUN All jobs in the chunk are suspended
(NRUN -1, NSUSP +1)
USUSP No change
WAIT Removed from chunk (PSUSP)
Kill (
bkill) PEND Removed from chunk (NJOBS -1, PEND -1)
RUN Job finishes, next job in the chunk starts if one exists
(NJOBS -1, PEND -1)
USUSP Job finishes, next job in the chunk starts if one exists
(NJOBS -1, PEND -1, SUSP -1, RUN +1)