Platform LSF Administration Guide Version 6.2
Chapter 26
Chunk Job Dispatch
Administering Platform LSF
413
Submitting and Controlling Chunk Jobs
When a job is submitted to a queue configured with the CHUNK_JOB_SIZE
parameter, LSF attempts to place the job in an existing chunk. A job is added to an
existing chunk if it has the same characteristics as the first job in the chunk:
◆
Submitting user
◆
Resource requirements
◆
Host requirements
◆
Queue
If a suitable host is found to run the job, but there is no chunk available with the same
characteristics, LSF creates a new chunk.
Resources reserved for any member of the chunk are reserved at the time the chunk is
dispatched and held until the whole chunk finishes running. Other jobs requiring the
same resources are not dispatched until the chunk job is done.
For example, if all jobs in the chunk require a software license, the license is checked out
and each chunk job member uses it in turn. The license is not released until the last
chunk job member is finished running.
WAIT status
When sbatchd receives a chunk job, it will not start all member jobs at once. A chunk
job occupies a single job slot. Even if other slots are available, the chunk job members
must run one at a time in the job slot they occupy. The remaining jobs in the chunk that
are waiting to run are displayed as
WAIT by bjobs. Any jobs in WAIT status are included
in the count of pending jobs by
bqueues and busers. The bhosts command shows
the single job slot occupied by the entire chunk job in the number of jobs shown in the
NJOBS column.
The
bhist -l command shows jobs in WAIT status as Waiting ...
The bjobs -l command does not display a WAIT reason in the list of pending jobs.
Controlling chunk jobs
Job controls affect the state of the members of a chunk job. You can perform the
following actions on jobs in a chunk job:
Action (Command) Job State Effect on Job (State)
Suspend (bstop) PEND Removed from chunk (PSUSP)
RUN All jobs in the chunk are suspended
(NRUN -1, NSUSP +1)
USUSP No change
WAIT Removed from chunk (PSUSP)
Kill (bkill) PEND Removed from chunk (NJOBS -1, PEND -1)
RUN Job finishes, next job in the chunk starts if one exists
(NJOBS -1, PEND -1)
USUSP Job finishes, next job in the chunk starts if one exists
(NJOBS -1, PEND -1, SUSP -1, RUN +1)
WAIT Job finishes (NJOBS-1, PEND -1)
Resume (bresume) USUSP Entire chunk is resumed (RUN +1, USUSP -1)