LSF Version 7.3 - Administering Platform LSF
Using Goal-Oriented SLA Scheduling
342 Administering Platform LSF
Service classes
SLA definitions consist of service-level goals that are expressed in individual service
classes. A service class is the actual configured policy that sets the service-level goals
for the LSF system. The SLA defines the workload (jobs or other services) and users
that need the work done, while the service class that addresses the SLA defines
individual goals, and a time window when the service class is active.
Service-level goals
You configure the following kinds of goals:
Deadline goals A specified number of jobs should be completed within a specified time window.
For example, run all jobs submitted over a weekend.
Velocity goals Expressed as concurrently running jobs. For example: maintain 10 running jobs
between 9:00 a.m. and 5:00 p.m. Velocity goals are well suited for short jobs (run
time less than one hour). Such jobs leave the system quickly, and configuring a
velocity goal ensures a steady flow of jobs through the system.
Throughput goals Expressed as number of finished jobs per hour. For example: finish 15 jobs per hour
between the hours of 6:00 p.m. and 7:00 a.m. Throughput goals are suitable for
medium to long running jobs. These jobs stay longer in the system, so you typically
want to control their rate of completion rather than their flow.
Combining
different types of
goals
You might want to set velocity goals to maximize quick work during the day, and
set deadline and throughput goals to manage longer running work on nights and
over weekends.
How service classes perform goal-oriented scheduling
Goal-oriented scheduling makes use of other, lower level LSF policies like queues
and host partitions to satisfy the service-level goal that the service class expresses.
The decisions of a service class are considered first before any queue or host
partition decisions. Limits are still enforced with respect to lower level scheduling
objects like queues, hosts, and users.
Optimum number
of running jobs
As jobs are submitted, LSF determines the optimum number of job slots (or
concurrently running jobs) needed for the service class to meet its service-level
goals. LSF schedules a number of jobs at least equal to the optimum number of slots
calculated for the service class.
LSF attempts to meet SLA goals in the most efficient way, using the optimum
number of job slots so that other service classes or other types of work in the cluster
can still progress. For example, in a service class that defines a deadline goal, LSF
spreads out the work over the entire time window for the goal, which avoids
blocking other work by not allocating as many slots as possible at the beginning to
finish earlier than the deadline.