LSF Version 7.3 - Administering Platform LSF
Administering Platform LSF 365
Goal-Oriented SLA-Driven Scheduling
LSF host partitions are typically used to implement user-based fairshare policies.
Complete the following steps to allow the EGO-enabled SLA to allocate hosts to
specific users.
1 Log on to the LSF master host as the cluster administrator.
2 Log on to the Platform Management Console:
a Define an EGO resource group that contains the selected hosts.
b Define an EGO consumer that is associated with this resource group.
3 Edit
lsb.users and define a user group that configures the fairshare policy.
4 Edit
lsb.serviceclasses and define a service class that uses the consumer
and user group.
5 Run badmin reconfig to reconfigure mbatchd.
With this configuration, when the job is submitted to the defined SLA, the job is
scheduled according to the user-based fairshare policy within the selected hosts.
Advanced configuration
lsb.serviceclasses ◆ EGO_RES_REQ=res_req—EGO resource requirement that specifies the
characteristics of the hosts that EGO will assign to the SLA. Must be a valid
EGO resource requirement. The EGO resource requirement string supports
the
select section, but the format is different from LSF resource requirements.
For example:
EGO_RES_REQ=select(linux && maxmem > 100)
◆ MAX_HOST_IDLE_TIME=seconds—number of seconds that the SLA will
hold on to hosts that have no jobs running before LSF releases them to EGO.
Each SLA can configure a different idle time. The default is 120 seconds. Do not
set this parameter to a small value, or LSF may release hosts too quickly.
lsb.params ◆ DEFAULT_SLA_VELOCITY—the number of slots that the SLA should
request for parallel jobs running in the SLA. By default, an EGO-enabled SLA
requests slots from EGO based on the number of jobs the SLA needs to run, not
the number of slots required by the job. If the jobs themselves require more
than one slot, they will remain pending. To avoid this for parallel jobs, set
DEFAULT_SLA_VELOCITY to the total number of slots that are expected to
be used by parallel jobs.
◆ MBD_EGO_CONNECT_TIMEOUT—timeout parameter for network I/O
with EGO
vemkd. The default is 3 seconds.
◆ MBD_EGO_READ_TIMEOUT—timeout parameter for network I/O with
EGO
vemkd. The default is 3 seconds.
◆ MBD_EGO_TIME2LIVE—how long EGO should keep information about
host allocations in case
mbatchd restarts. The default is 1440 minutes (24
hours).