LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 499
C HAPTER
32
Running Parallel Jobs
Contents
How LSF Runs Parallel Jobs on page 499
Preparing Your Environment to Submit Parallel Jobs to LSF on page 500
Submitting Parallel Jobs on page 500
Starting Parallel Tasks with LSF Utilities on page 501
Job Slot Limits For Parallel Jobs on page 502
Specifying a Minimum and Maximum Number of Processors on page 502
Specifying a First Execution Host on page 503
Controlling Processor Allocation Across Hosts on page 504
Running Parallel Processes on Homogeneous Hosts on page 507
Limiting the Number of Processors Allocated on page 508
Reserving Processors on page 511
Reserving Memory for Pending Parallel Jobs on page 512
Backfill Scheduling: Allowing Jobs to Use Reserved Job Slots on page 513
Parallel Fairshare on page 522
How Deadline Constraint Scheduling Works For Parallel Jobs on page 523
Optimized Preemption of Parallel Jobs on page 523
How LSF Runs Parallel Jobs
When LSF runs a job, the LSB_HOSTS variable is set to the names of the hosts
running the batch job. For a parallel batch job, LSB_HOSTS contains the complete
list of hosts that LSF has allocated to that job.
LSF starts one controlling process for the parallel batch job on the first host in the
host list. It is up to your parallel application to read the LSB_HOSTS environment
variable to get the list of hosts, and start the parallel job components on all the other
allocated hosts.
LSF provides a generic interface to parallel programming packages so that any
parallel package can be supported by writing shell scripts or wrapper programs.