Platform LSF Administration Guide Version 6.2

ManualsBrandsHP ManualsSoftwareHP XC System 3.x Software

Administering Platform

LSF

Version 6.2

February 2006

Comments to: doc@platform.com

Summary of content (660 pages)

PAGE 1
Administering Platform LSF® Version 6.2 February 2006 Comments to: doc@platform.
PAGE 2
Copyright © 1994-2006 Platform Computing Corporation All rights reserved. We’d like to hear from you You can help us make this document better by telling us what you think of the content, organization, and usefulness of the information. If you find an error, or just want to make a suggestion for improving this document, please address your comments to doc@platform.com. Your comments should pertain only to Platform documentation. For product support, contact support@platform.com.
PAGE 3
Contents Welcome . . . . . . . . About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . What’s New in the Platform LSF Version 6.2 1 . . . . . . . . . . . . . 13 . . . . . . . . . . . . . . . . . . . . 14 . . . . . . . . . . . . . . . . . . . 16 What’s New in Platform LSF Version 6.1 . . . . . . . . . . . . . . . . . . . . . 24 What’s New in Platform LSF Version 6.0 . . . . .
PAGE 4
Contents 4 Working with Hosts Host Status . . . . . . . . . . . . . . . . . . . . . . . . 91 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Viewing Host Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Controlling Hosts . 99 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 5
Contents 8 Platform LSF Licensing . . The LSF License File . . . . . . . . . . . . . . . . . . . . . . . . . 165 . . . . . . . . . . . . . . . . . . . . . . 166 How LSF Permanent Licensing Works . . . . . . . . . . . . . . . . . . . . . . 171 Installing a Demo License . . . . . . . . . . Installing a Permanent License Updating a License FLEXlm Basics . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
Contents Part III: Scheduling Policies 12 Time Syntax and Configuration Specifying Time Values . . . . . . . . . . . . . . . . . . . . . . 239 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 Specifying Time Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 . . . . . . . . . . . . . . . . . . . . . . . . . . 242 . . . . . . . . . . . . . . . . . .
PAGE 7
Contents Typical Slot Allocation Scenarios . . . . . . . . . . . . . . . . . . . 297 . . . . . . . . . . . . . . . . . . . 303 Users Affected by Multiple Fairshare Policies . . . . . . . . . . . . . . . . . . 307 Ways to Configure Fairshare . . . . . . . . . . . . . . . . . . 308 . . . . . Using Historical and Committed Run Time . . . . 17 Goal-Oriented SLA-Driven Scheduling . .
PAGE 8
Contents 24 Job Requeue and Job Rerun About Job Requeue . . . . . . . . . . . . . . . . . . . . . . . . 385 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Automatic Job Requeue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Reverse Requeue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 . . . . . . . . . . . . . . . . . . .
PAGE 9
Contents 28 Running Parallel Jobs . . . . . How LSF Runs Parallel Jobs . . . . . . . . . . . . . . . . . . . . . . 427 . . . . . . . . . . . 428 Preparing Your Environment to Submit Parallel Jobs to LSF . . . . . . . . . . 429 Submitting Parallel Jobs . . . . . . . . . . . . . . . . . Starting Parallel Tasks with LSF Utilities Job Slot Limits For Parallel Jobs . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Contents 33 External Job Submission and Execution Controls . . . . . . . . . . . . . 493 Understanding External Executables . . . . . . . . . . . . . . . . . . . . . . 494 Using esub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Working with eexec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 . . . . 34 Configuring Job Controls . . .
PAGE 11
Contents 39 Tuning the Cluster Tuning LIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Adjusting LIM Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Load Thresholds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 . . . . . . . . . . . 559 . . . . . . . . . .
PAGE 12
Contents Part VIII: LSF Utilities 45 Using lstcsh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624 Task Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 Automatic Remote Execution . . . . . .
PAGE 13
Welcome Contents ◆ ◆ ◆ ◆ ◆ ◆ ◆ “About This Guide” on page 14 “What’s New in the Platform LSF Version 6.2” on page 16 “What’s New in Platform LSF Version 6.1” on page 24 “What’s New in Platform LSF Version 6.
PAGE 14
About This Guide About This Guide Last update February 22 2006 Latest version www.platform.com/Support/Documentation.htm Purpose of this guide This guide describes how to manage and configure Platform LSF® software (“LSF”).
PAGE 15
Welcome Typographical conventions Typeface Meaning Example Courier The names of on-screen computer output, commands, files, and directories What you type, exactly as shown ◆ Book titles, new words or terms, or words to be emphasized ◆ Command-line place holders—replace with a real name or value ◆ Names of GUI elements that you manipulate The lsid command Bold Courier Italics Bold Sans Serif Type cd /bin The queue specified by queue_name Click OK Command notation Notation Meaning Example Quotes "
PAGE 16
What’s New in the Platform LSF Version 6.2 What’s New in the Platform LSF Version 6.
PAGE 17
Welcome Since the total file size of the info directory is divided among its subdirectories, your cluster can process more job operations before reaching the total size limit of the job files. ◆ Improved job forwarding for remote Platform LSF MultiCluster resources— Job submission forwarding policy can now consider remote resource availability before forwarding jobs.
PAGE 18
What’s New in the Platform LSF Version 6.2 Platform LSF License Scheduler License ◆ optimization ◆ Policy ◆ enhancement ◆ ◆ ◆ Grid license management plugin—Integration with Macrovision® FLEXnet Manager™ Grid Filter ensures greater productivity by optimizing licenses, both within and outside a Platform LSF grid environment.
PAGE 19
Welcome New and improved license optimization features ◆ ◆ ◆ Improved ◆ infrastructure ◆ ◆ ◆ New LSF Analytics cubes FLEXnet Manager data collection—Integration with Macrovision® FLEXnet Manager™ provides more detailed and accurate historical license usage information for greater resource usage visibility across an entire organization.
PAGE 20
What’s New in the Platform LSF Version 6.2 New Datamart ETL ◆ flows ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ Hourly FLEX License Daily FLEX License Hourly Job Slot Usage by Group Daily Job Slot Usage by Group License Statistics by Feature and Server Job Statistics by Feature Top 10 Projects or Users by Feature License Denial vs.
PAGE 21
Welcome Platform LSF HPC Interruptible Designed to improve cluster utilization, the new interruptible backfill scheduling policy backfill allows reserved job slots to be used by low priority small jobs that will be terminated when the higher priority large jobs are about to start.
PAGE 22
What’s New in the Platform LSF Version 6.2 For estimated start time, after making reservation, scheduler sorts all running jobs in ascending order based on their finish time and goes through this sorted job list to add up slots used by each running job till it satisfies minimal job slots request. The finish time of last visited job will be job estimated start time Reservation decisions made by greedy slot reservation will not have the accurate estimated start time and future allocation information.
PAGE 23
Welcome The Intel® MPI Library (“Intel MPI”) is a high-performance message-passing library for developing applications that can run on multiple cluster interconnects chosen by the user at runtime. It supports TCP, shared memory, and high-speed interconnects like InfiniBand and Myrinet. Intel MPI supports all MPI-1 features and many MPI-2 features, including file I/O, generalized requests, and preliminary thread support. it is based on the MPICH2 specification.
PAGE 24
What’s New in Platform LSF Version 6.1 What’s New in Platform LSF Version 6.
PAGE 25
Welcome ❖ LSF_HOST_ADDR_RANGE in lsf.cluster.cluster_name LSF_DYNAMIC_HOST_TIMEOUT in lsf.conf is obsolete. Use the badmin command to add dynamic hosts to the host group or remove dynamic hosts from the host group (You need to run badmin mbdrestart for these changes to take effect).
PAGE 26
What’s New in Platform LSF Version 6.1 ◆ % lshosts HOST_NAME hostA type SOL64 Slot-based parallel job scheduling—by default, you cannot specify more slots than the system's eligible number of processors. Since you can define more than one slot for each CPU, slot-based job scheduling allows you to submit parallel jobs based on the number of available slots rather than processors. For example, hostA has 2 processors, but has 4 slots defined: model Ultra5F cpuf 18.
PAGE 27
Welcome Reliability and ◆ usability ◆ ◆ Pending job management—limits the number of pending jobs by setting cluster wide, user level and user group level settings. This prevents the overloading of the cluster. Use the -oo and -oe options of bsub and bmod to overwrite the LSF output and error files if they already exist. The original behavior of appending to the output and error files is still available with the -o and -e options.
PAGE 28
What’s New in Platform LSF Version 6.1 License ownership ◆ and distribution Slot release on license preemption—slots will no longer be held for a job when a license has been preempted. This is enabled by default, but can be disabled by setting the following keyword in the lsf.conf file: LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE = n ◆ Default License Scheduler projects—allows job submission without an explicit project assigned.
PAGE 29
Welcome ❖ bjobs -Lp lp_name displays all jobs belonging to the license project ❖ bhist -Lp lp_name displays all jobs belonging to the license project ❖ bacct -Lp lp_name displays all the jobs that ran in the license project ❖ blstat -Lp lp_name displays all license usage in the license project The -P option for this command is no longer available and is replaced by the -Lp option.
PAGE 30
What’s New in Platform LSF Version 6.1 To manage interactive (non-LSF) tasks in License Scheduler projects, you require the LSF Task Manager, taskman. The Task Manager utility is supported by but not shipped with License Scheduler. For more information about taskman, contact Platform. Set the ALLOCATION keyword in the Features section of lsf.licensescheduler. This feature ignores the global setting of the ENABLE_INTERACTIVE parameter because ALLOCATION is configured for the feature.
PAGE 31
Welcome Data collection ◆ ◆ ◆ ◆ ◆ ◆ Database ◆ integration Data collection framework—new, lightweight method of collecting data from multiple clusters. Month/level/date dimension—“Month” dimension is now included in all cubes. Pending time ranking—Workload cube includes “seconds”-level ranking for pending time. UNIX exit code dimension—Workload cube collects data about the UNIX exit code. Configuration data logging—collect data about LSF host group and user group configuration for data logging.
PAGE 32
What’s New in Platform LSF Version 6.0 What’s New in Platform LSF Version 6.0 Platform LSF Version 6.
PAGE 33
Welcome You use the bsla command to track the progress of your projects and see whether they are meeting the goals of your policy. See Chapter 17, “Goal-Oriented SLA-Driven Scheduling” for more information. Platform LSF Platform LSF License Scheduler ensures that higher priority work never has to wait for License Scheduler a license. Prioritized sharing of application licenses allows you to make policies that control the way software licenses are shared among different users in your organization.
PAGE 34
What’s New in Platform LSF Version 6.0 Queue-based Prevents starvation of low-priority work and ensures high-priority jobs get the resources fairshare they require by sharing resources among queues. Queue-based fairshare extends your existing user- and project-based fairshare policies by enabling flexible slot allocation per queue based on slot share units you configure. See Chapter 16, “Fairshare Scheduling” for more information.
PAGE 35
Welcome ◆ Predefined ptile value with optional multiple ptile values, per host type or host model. For example: span[ptile='!',HP:8,SGI:8,LINUX:2] same[type] The job requests 8 processors on a host of type HP or SGI, and 2 processors on a host of type LINUX, and the predefined maximum job slot limit in lsb.hosts (MXJ) for other host types. See Chapter 15, “Specifying Resource Requirements” for more information.
PAGE 36
What’s New in Platform LSF Version 6.0 See Chapter 3, “Working with Your Cluster”, Chapter 4, “Working with Hosts”, and Chapter 5, “Working with Queues” for more information. Platform LSF Understand cluster operations better, so that you can improve performance and Reports troubleshoot configuration problems. Platform LSF Reports provides a lightweight reporting package for single LSF clusters.
PAGE 37
Welcome Non-normalized Presents consistent job run time limits no matter which host runs the job. With nonjob run time limit normalized job run limit configured, job run time is not normalized by CPU factor. If ABS_RUNLIMIT=Y is defined in lsb.params, the run time limit is not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted with a run limit. See Chapter 29, “Runtime Resource Usage Limits” for more information.
PAGE 38
Upgrade and Compatibility Notes Upgrade and Compatibility Notes UPGRADE document To upgrade to LSF Version 6.2, follow the steps in Upgrading Platfor m LSF on UNIX and Linux.(lsf6.2_upgrade.pdf or upgrade.html). Maintenance pack availability At release, Platform LSF Version 6.2 includes all bug fixes and solutions up to and including the August 2005 Platform LSF Version 6.1 Maintenance Pack. Fixes after August 2005 will be included in the first Platform LSF Version 6.2 Maintenance Pack.
PAGE 39
Welcome Server host compatibility Platform LSF To use new features introduced in Platform LSF Version 6.2, you must upgrade all hosts in your cluster to 6.2. LSF 6.x and 5.x servers are compatible with Version 6.2 master hosts. All LSF 6.x and 5.x features are supported by 6.2 master hosts. Cross-product compatibility See “”Platform LSF Family of Products Compatibility” at www.platform.com/Support/Compatibility.Upgrade.htm for more information. New 64-bit Linux host types The pre-6.
PAGE 40
Upgrade and Compatibility Notes Changed behavior Open advance Open advance reservations allow jobs to run even after the associated reservation reservations expires. A job with the open advance reservation will only be treated as an advance reservation job during the reservation window, after which it becomes a normal job. This prevents the job from being killed and ensures that LSF does not prevent any previously suspended jobs from running or interfere with any existing scheduling policies.
PAGE 41
Welcome New configuration parameters and environment variables The following new parameters and environment variables have been added for LSF Version 6.2: lsb.hosts ◆ Condensed host groups To define condensed host groups, add a CONDENSE column to the HostGroup section.
PAGE 42
Upgrade and Compatibility Notes from hostD1 to hostD100, hostD102, all hosts from hostD201 to hostD300, and hostD320): ... (hostD[1-100,102,201-300,320]) Restrictions ❖ You cannot use more than one set of square brackets in a single host group definition. The following example is not correct: ... (hostA[1-10]B[1-20] hostC[101-120]) The following example is correct: ... (hostA[1-20] hostC[101-120]) ❖ You cannot define subgroups that contain wildcards and special characters.
PAGE 43
Welcome The not operator can only be used with the all keyword or to exclude users from user groups. The not operator does not exclude LSF administrators from the queue definintion. lsf.conf ◆ LSB_MAX_PROBE_SBD=integer Specifies the maximum number of sbatchd instances can be polled by mbatchd in the interval MBD_SLEEP_TIME/10. Use this parameter in large clusters to reduce the time it takes for mbatchd to probe all sbatchds.
PAGE 44
Upgrade and Compatibility Notes lsf.licensescheduler ◆ ◆ Parameters section: ❖ DISTRIBUTION_POLICY_VIOLATION_ACTION=(PERIOD reporting_period CMD reporting_command) Optional. Defines how License Scheduler handles distribution policy violations. Distribution policy violations are caused by non-LSF workloads because LSF License Scheduler explicitly follows its distribution policies.
PAGE 45
Welcome Optional. Defines the priority for each project where “0” is the lowest priority, and the higher number specifies a higher priority. This column allows the user to override the default behaviour, which is to look at the cumulative inuse to determine the preemption order (the project using the most license tokens is preempted first). Environment ◆ variables ◆ ◆ CLEARCASE_DRIVE=drive_letter : CLEARCASE_MOUNTDIR=path LS_LICENSE_SERVER_feature="domain:server:num_available ...
PAGE 46
Upgrade and Compatibility Notes blstat ◆ -G Displays dynamic hierarchical license information. blstat -G also works with the -t option to only display hierarchical information for the specified feature names. ◆ -s Displays license usage of the LSF and non-LSF workloads. Workload distributions are defined by WORKLOAD_DISTRIBUTION in lsf.licensescheduler. If there are any distribution policy violations, blstat marks these with an asterisk (*) at the beginning of the line.
PAGE 47
Welcome brsvadd -o ◆ -o Creates an open advance reservation. A job with an open advance reservation will only have the advance reservation property during the reservation window, after which the job becomes a normal job, not subject to termination when the reservation window closes. You must specify the same information as for normal advance reservations. This prevents jobs from being killed if the reservation window is too small.
PAGE 48
Upgrade and Compatibility Notes New files added to installation The following new files have been added to the Platform LSF Version 6.2 License Scheduler installation: ◆ LSF_BINDIR/bltasks Displays current information about interactive tasks managed by License Scheduler (submitted using taskman). By default, displays information about all tasks. ◆ LSF_BINDIR/blkill Terminates a running or waiting interactive task in License Scheduler. Users can kill their own tasks.
PAGE 49
Welcome Learn About Platform Products Information about Platform products is available from the following sources: ◆ ◆ ◆ ◆ “World Wide Web and FTP” “Release notes and UPGRADE” “Platform documentation” “Platform training” World Wide Web and FTP The latest information about all supported releases of Platform LSF is available on the Platform Web site at www.platform.com.
PAGE 50
Get Technical Support Get Technical Support Contact Platform Contact Platform Computing or your LSF vendor for technical support. Use one of the following to contact Platform technical support: Email support@platform.com World Wide Web www.platform.com Mail Platform Support Platform Computing Corporation 3760 14th Avenue Markham, Ontario Canada L3R 3T7 When contacting Platform, please include the full name of your company. See the Platform Web site at www.platform.com/Company/Contact.Us.
PAGE 51
Welcome Administering Platform LSF 51
PAGE 52
Get Technical Support 52 Administering Platform LSF
PAGE 53
C H A P T E R 1 About Platform LSF Contents ◆ ◆ “Cluster Concepts” on page 54 “Job Life Cycle” on page 64 Administering Platform LSF 53
PAGE 54
Cluster Concepts Cluster Concepts Compute Host Commands Compute Host Submission Host Master Host Compute Host Clusters, jobs, and queues Cluster A group of computers (hosts) running LSF that work together as a single unit, combining computing power and sharing workload and resources. A cluster provides a single-system image for disparate computing resources. Hosts can be grouped into clusters in a number of ways.
PAGE 55
Chapter 1 About Platform LSF Job slot A job slot is a bucket into which a single unit of work is assigned in the LSF system. Hosts are configured to have a number of job slots available and queues dispatch jobs to fill job slots. Commands ◆ bhosts —View job slot limits for hosts and host groups ◆ bqueues —View job slot limits for queues ◆ busers —View job slot limits for users and user groups Configuration ◆ Define job slot limits in lsb.resources.
PAGE 56
Cluster Concepts Hosts Host An individual computer in the cluster. Each host may have more than 1 processor. Multiprocessor hosts are used to run parallel jobs. A multiprocessor host with a single process queue is considered a single machine, while a box full of processors that each have their own process queue is treated as a group of separate machines.
PAGE 57
Chapter 1 About Platform LSF Master host Where the master LIM and mbatchd run. An LSF server host that acts as the overall coordinator for that cluster. Each cluster has one master host to do all job scheduling and dispatch. If the master host goes down, another LSF server in the cluster becomes the master host. All LSF daemons run on the master host. The LIM on the master host is the master LIM.
PAGE 58
Cluster Concepts ◆ badmin hrestart —Restarts sbatchd Configuration ◆ Port number defined in lsf.conf res Remote Execution Server (RES) running on each server host. Accepts remote execution requests to provide transparent and secure remote execution of jobs and tasks. Commands ◆ lsadmin resstartup —Starts res ◆ lsadmin resshutdown —Shuts down res ◆ lsadmin resrestart —Restarts res Configuration ◆ Port number defined in lsf.conf lim Load Information Manager (LIM) running on each server host.
PAGE 59
Chapter 1 About Platform LSF Master LIM The LIM running on the master host. Receives load information from the LIMs running on hosts in the cluster. Forwards load information to mbatchd, which forwards this information to mbschd to support scheduling decisions. If the master LIM becomes unavailable, a LIM on another host automatically takes over.
PAGE 60
Cluster Concepts The bsub command stops display of output from the shell until the job completes, and no mail is sent to you by default. Use Ctrl-C at any time to terminate the job. Commands ◆ bsub -I —Submit an interactive job Interactive task A command that is not submitted to a batch queue and scheduled by LSF, but is dispatched immediately. LSF locates the resources needed by the task and chooses the best host among the candidate hosts that has the required resources and is lightly loaded.
PAGE 61
Chapter 1 About Platform LSF All computers that run the same operating system on the same computer architecture are of the same type—in other words, binary-compatible with each other. Each host type usually requires a different set of LSF binary files. Commands ◆ lsinfo -t —View all host types defined in lsf.shared Configuration ◆ ◆ Defined in lsf.shared Mapped to hosts in lsf.cluster.cluster_name Host model The combination of host type and CPU speed (CPU factor) of the computer.
PAGE 62
Cluster Concepts Resources Resource usage The LSF system uses built-in and configured resources to track resource availability and usage. Jobs are scheduled according to the resources available on individual hosts. Jobs submitted through the LSF system will have the resources they use monitored while they are running. This information is used to enforce resource limits and load thresholds as well as fairshare scheduling.
PAGE 63
Chapter 1 About Platform LSF To schedule a job on a host, the load levels on that host must satisfy both the thresholds configured for that host and the thresholds for the queue from which the job is being dispatched. The value of a load index may either increase or decrease with load, depending on the meaning of the specific load index. Therefore, when comparing the host load conditions with the threshold values, you need to use either greater than (>) or less than (<), depending on the load index.
PAGE 64
Job Life Cycle Job Life Cycle 6 job report (output, errors, info) email job report Submit job (bsub) 1 Submission Host Job PEND 5 Job RUN 3 4 dispatch job Queue 2 Compute Host Master Host 1 Submit a job You submit a job from an LSF client or server with the bsub command. If you do not specify a queue when submitting the job, the job is submitted to the default queue. Jobs are held in a queue waiting to be scheduled and have the PEND state.
PAGE 65
Chapter 1 About Platform LSF 4 Run job sbatchd handles job execution.
PAGE 66
Job Life Cycle 66 Administering Platform LSF
PAGE 67
C H A P T E 2 R How the System Works LSF can be configured in different ways that affect the scheduling of jobs. By default, this is how LSF handles a new job: 1 2 3 4 5 Contents ◆ ◆ ◆ ◆ ◆ Receive the job. Create a job file. Return the job ID to the user. Schedule the job and select the best available host. Dispatch the job to a selected host. Set the environment on the host. Start the job.
PAGE 68
Job Submission Job Submission The life cycle of a job starts when you submit the job to LSF. On the command line, bsub is used to submit jobs, and you can specify many options to bsub to modify the default behavior. Jobs must be submitted to a queue. Queues Queues represent a set of pending jobs, lined up in a defined order and waiting for their opportunity to use resources. Queues implement different job scheduling and control policies.
PAGE 69
Chapter 2 How the System Works Automatic queue selection Typically, a cluster has multiple queues. When you submit a job to LSF you might define which queue the job will enter. If you submit a job without specifying a queue name, LSF considers the requirements of the job and automatically chooses a suitable queue from a list of candidate default queues. If you did not define any candidate default queues, LSF will create a new queue using all the default settings, and submit the job to that queue.
PAGE 70
Job Scheduling and Dispatch Job Scheduling and Dispatch Submitted jobs sit in queues until they are scheduled and dispatched to a host for execution.
PAGE 71
Chapter 2 How the System Works Dispatch order Jobs are not necessarily dispatched in order of submission. Each queue has a priority number set by an LSF Administrator when the queue is defined. LSF tries to start jobs from the highest priority queue first. By default, LSF considers jobs for dispatch in the following order: For each queue, from highest to lowest priority. If multiple queues have the same priority, LSF schedules all the jobs from these queues in first-come, first-served order.
PAGE 72
Host Selection Host Selection Each time LSF attempts to dispatch a job, it checks to see which hosts are eligible to run the job. A number of conditions determine whether a host is eligible: Host dispatch windows ◆ Resource requirements of the job ◆ Resource requirements of the queue ◆ Host list of the queue ◆ Host load levels ◆ Job slot limits of the host. A host is only eligible to run a job if all the conditions are met.
PAGE 73
Chapter 2 How the System Works Job Execution Environment When LSF runs your jobs, it tries to make it as transparent to the user as possible. By default, the execution environment is maintained to be as close to the submission environment as possible. LSF will copy the environment from the submission host to the execution host.
PAGE 74
Fault Tolerance Fault Tolerance LSF is designed to continue operating even if some of the hosts in the cluster are unavailable. One host in the cluster acts as the master, but if the master host becomes unavailable another host takes over. LSF is available as long as there is one available host in the cluster. LSF can tolerate the failure of any host or group of hosts in the cluster. When a host crashes, all jobs running on that host are lost. No other pending or running jobs are affected.
PAGE 75
Chapter 2 How the System Works Host failure If an LSF server host fails, jobs running on that host are lost. No other jobs are affected. Jobs can be submitted so that they are automatically rerun from the beginning or restarted from a checkpoint on another host if they are lost because of a host failure. If all of the hosts in a cluster go down, all running jobs are lost. When a host comes back up and takes over as master, it reads the lsb.events file to get the state of all batch jobs.
PAGE 76
Fault Tolerance 76 Administering Platform LSF
PAGE 77
P A R I T Managing Your Cluster Contents ◆ ◆ ◆ ◆ ◆ ◆ Chapter 3, “Working with Your Cluster” Chapter 4, “Working with Hosts” Chapter 5, “Working with Queues” Chapter 6, “Managing Jobs” Chapter 7, “Managing Users and User Groups” Chapter 8, “Platform LSF Licensing”
PAGE 78
PAGE 79
C H A P T E R 3 Working with Your Cluster Contents ◆ ◆ ◆ ◆ ◆ ◆ “Viewing Cluster Information” on page 80 “Default Directory Structures” on page 82 “Cluster Administrators” on page 85 “Controlling Daemons” on page 86 “Controlling mbatchd” on page 88 “Reconfiguring Your Cluster” on page 89 Administering Platform LSF 79
PAGE 80
Viewing Cluster Information Viewing Cluster Information LSF provides commands for users to get information about the cluster. Cluster information includes the cluster master host, cluster name, cluster resource definitions, cluster administrator, and so on. To view the ... Run ...
PAGE 81
Chapter 3 Working with Your Cluster % bparams -l System default queues for automatic queue selection: DEFAULT_QUEUE = normal idle The interval for dispatching jobs by master batch daemon: MBD_SLEEP_TIME = 20 (seconds) The interval for checking jobs by slave batch daemon: SBD_SLEEP_TIME = 15 (seconds) The interval for a host to accept two batch jobs subsequently: JOB_ACCEPT_INTERVAL = 1 (* MBD_SLEEP_TIME) The idle time of a host for resuming pg suspended jobs: PG_SUSP_IT = 180 (seconds) The amount of time d
PAGE 82
Default Directory Structures Default Directory Structures UNIX and Linux The following diagram shows a typical directory structure for a new UNIX or Linux installation with lsfinstall. Depending on which products you have installed and platforms you have selected, your directory structure may vary. LSF_TOP 1 conf work 2 lsbatch 5 3 log 4 version cluster_name cluster_name configdir lsb.hosts lsb.params lsb.queues … license.dat lsf.cluster.cluster_name lsf.conf lsf.shared lsf.task profile.
PAGE 83
Chapter 3 Working with Your Cluster Pre-4.2 UNIX and Linux installation directory structure The following diagram shows a cluster installed with lsfsetup. It uses the pre-4.2 directory structure.
PAGE 84
Default Directory Structures Microsoft Windows The following diagram shows the directory structure for a default Windows installation. lsf bin badmin bjobs lsadmin … xbmod xlsadmin xlsbatch … etc conf examples cluster_name lim res sbatchd … configdir lsf.conf lsbatch html include lib logs scripts work cluster_name lsf lsbatch.h lsf.h logdir lsb.alarms lsb.calendar lsb.hosts lsb.nqsmaps lsb.params lsb.queues lsb.users license.dat lsf.cluster.cluster_name lsf.install lsf.shared lsf.
PAGE 85
Chapter 3 Working with Your Cluster Cluster Administrators Primary cluster Required. The first cluster administrator, specified during installation. The primary LSF administrator administrator account owns the configuration and log files. The primary LSF administrator has permission to perform clusterwide operations, change configuration files, reconfigure the cluster, and control jobs submitted by all users. Cluster Optional. May be configured during or after installation.
PAGE 86
Controlling Daemons Controlling Daemons Prerequisites To control all daemons in the cluster, you must ◆ Be logged on as root or a user listed in the /etc/lsf.sudoers file See the Platfor m LSF Refer ence for configuration details of lsf.sudoers. ◆ Be able to run the rsh or ssh commands across all LSF hosts without having to enter a password. See your operating system documentation for information about configuring the rsh and ssh commands. The shell command specified by LSF_RSH in lsf.
PAGE 87
Chapter 3 Working with Your Cluster Daemon Action Command Permissions LIM Start lsadmin limstartup [host_name ...|all] Shut down Restart Restartall in cluster lsadmin limshutdown [host_name ...|all] lsadmin limrestart [host_name ...|all] lsadmin reconfig Must be root or a user listed in lsf.sudoers for the startup command Must be the LSF administrator for other commands sbatchd Restarting sbatchd on a host does not affect jobs that are running on that host.
PAGE 88
Controlling mbatchd Controlling mbatchd User the badmin command to control mbatchd. Reconfiguring mbatchd Run badmin reconfig. When you reconfigure the cluster, mbatchd is not restarted. Only configuration files are reloaded. If you add a host to a host group, a host to a queue, or change resource configuration in the Hosts section of lsf.cluster.cluster_name , the change is not recognized by jobs that were submitted before you reconfigured.
PAGE 89
Chapter 3 Working with Your Cluster Reconfiguring Your Cluster After changing LSF configuration files, you must tell LSF to reread the files to update the configuration. Use the following commands to reconfigure a cluster are: ◆ ◆ ◆ lsadmin reconfig badmin reconfig badmin mbdrestart The reconfiguration commands you use depend on which files you change in LSF. The following table is a quick reference. After making changes to ... Use ... Which ... hosts lsb.hosts lsb.modules lsb.nqsmaps lsb.params lsb.
PAGE 90
Reconfiguring Your Cluster % badmin reconfig Checking configuration files ... No errors found. Do you want to reconfigure? [y/n] y Reconfiguration initiated The badmin reconfig command checks for configuration errors. If no fatal errors are found, you are asked to confirm reconfiguration. If fatal errors are found, reconfiguration is aborted. Reconfiguring the cluster by restarting mbatchd Run badmin mbdrestart to restart mbatchd: % badmin mbdrestart Checking configuration files ... No errors found.
PAGE 91
C H A P T E 4 R Working with Hosts Contents ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ “Host Status” on page 92 “Viewing Host Information” on page 94 “Controlling Hosts” on page 99 “Adding a Host” on page 101 “Removing a Host” on page 103 “Adding Hosts Dynamically” on page 104 “Adding Host Types and Host Models to lsf.
PAGE 92
Host Status Host Status Host status describes the ability of a host to accept and run batch jobs in terms of daemon states, load levels, and administrative controls. The bhosts and lsload commands display host status. bhosts Displays the current status of the host: STATUS Description ok unavail unreach closed unlicensed Host is available to accept and run new batch jobs. Host is down, or LIM and sbatchd are unreachable. LIM is running but sbatchd is unreachable. Host will not accept new jobs.
PAGE 93
Chapter 4 Working with Hosts Total 1.0 -0.0 -0.0 4% Reserved 0.0 0.0 0.0 0% LOAD THRESHOLD USED FOR SCHEDULING: r15s r1m r15m ut loadSched loadStop - 9.4 0.0 pg - 148 0 io - 2 0 ls - 3 4231M 0 0M it - 698M 0M tmp - 233M 0M swp - mem - lsload Displays the current state of the host: Status Description ok -ok busy Host is available to accept and run batch jobs and remote tasks. LIM is running but RES is unreachable. Does not affect batch jobs, only used for remote task placement (i.e., lsrun).
PAGE 94
Viewing Host Information Viewing Host Information LSF uses some or all of the hosts in a cluster as execution hosts. The host list is configured by the LSF administrator. Use the bhosts command to view host information. Use the lsload command to view host load information. To view... Run...
PAGE 95
Chapter 4 Working with Hosts Viewing detailed server host information Run bhosts -l host_name and lshosts -l host_name to display all information about each server host such as the CPU factor and the load thresholds to start, suspend, and resume jobs. For example: % bhosts -l hostB HOST hostB STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOWS ok 20.20 0 0 0 0 0 CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem Total 0.1 0.1 0.1 9% 0.7 24 17 0 394M 396M 12M Reserved 0.
PAGE 96
Viewing Host Information Viewing host architecture information An LSF cluster may consist of hosts of differing architectures and speeds. The lshosts command displays configuration information about hosts. All these parameters are defined by the LSF administrator in the LSF configuration files, or determined by the LIM directly from the system. Host types represent binary compatible hosts; all hosts of the same type can run the same executable.
PAGE 97
Chapter 4 Working with Hosts % lsinfo -m MODEL_NAME PC1133 HP9K735 HP9K778 Ultra5S Ultra2 Enterprise3000 CPU_FACTOR 23.10 4.50 5.50 10.30 20.20 20.00 ARCHITECTURE x6_1189_PentiumIIICoppermine HP9000735_125 HP9000778 SUNWUltra510_270_sparcv9 SUNWUltra2_300_sparc SUNWUltraEnterprise_167_sparc Run lsinfo -M to display all host models defined in lsf.shared: % lsinfo -M MODEL_NAME CPU_FACTOR UNKNOWN_AUTO_DETECT 1.00 DEFAULT 1.00 LINUX133 2.50 PC200 4.50 Intel_IA64 12.00 Ultra5S 10.30 PowerPC_G4 12.
PAGE 98
Viewing Host Information LOAD THRESHOLD USED FOR SCHEDULING: r15s r1m r15m ut loadSched loadStop - pg - io - ls - it - tmp - swp - mem - THRESHOLD AND LOAD USED FOR EXCEPTIONS: JOB_EXIT_RATE Threshold 4.00 Load 0.00 Use bhosts -x to see hosts whose job exit rate has exceeded the threshold for longer than JOB_EXIT_RATE_DURATION, and are still high. By default, these hosts will be closed the next time LSF checks host exceptions and invokes eadmin.
PAGE 99
Chapter 4 Working with Hosts Controlling Hosts Hosts are opened and closed by an LSF Administrator or root issuing a command or through configured dispatch windows. Closing a host Run badmin hclose: % badmin hclose hostB Close ...... done If the command fails, it may be because the host is unreachable through network problems, or because the daemons on the host are not running. Opening a host Run badmin hopen: % badmin hopen hostB Open ......
PAGE 100
Controlling Hosts followed by % badmin hclose -C "Weekly backup" hostA will generate records in lsb.events: "HOST_CTRL" "6.0 1050082346 1 "hostA" 32185 "lsfadmin" "backup" "HOST_CTRL" "6.0 1050082373 1 "hostA" 32185 "lsfadmin" "Weekly backup" Use badmin hist or badmin hhist to display administrator comments for closing and opening hosts. For example: % badmin hhist Fri Apr 4 10:35:31: Host closed by administrator Weekly backup.
PAGE 101
Chapter 4 Working with Hosts Adding a Host Use lsfinstall to add a host to an LSF cluster. Contents ◆ “Adding a host of an existing type using lsfinstall” on page 101 “Adding a host of a new type using lsfinstall” on page 102 See the Platfor m LSF Refer ence for more information about lsfinstall. ◆ Adding a host of an existing type using lsfinstall Compatibility lsfinstall is not compatible with clusters installed with lsfsetup.
PAGE 102
Adding a Host ❖ If any host type or host model is DEFAULT, follow the steps in “DEFAULT host type or model” on page 608 to fix the problem. Adding a host of a new type using lsfinstall Compatibility lsfinstall is not compatible with clusters installed with lsfsetup. To add a Notice host to a cluster originally installed with lsfsetup, you must upgrade your cluster. 1 2 3 4 Verify that the host type does not already exist in your cluster: a Log on to any host in the cluster. You do not need to be root.
PAGE 103
Chapter 4 Working with Hosts Removing a Host Removing a host from LSF involves preventing any additional jobs from running on the host, removing the host from LSF, and removing the host from the cluster. CAUTION Never remove the master host from LSF. If you want to remove your current default master from LSF, change lsf.cluster.cluster_name to assign a different default master host. Then remove the host that was once the master host. 1 2 3 4 5 Log on to the LSF host as root.
PAGE 104
Adding Hosts Dynamically Adding Hosts Dynamically By default, all configuration changes made to LSF are static. You must manually change the configuration and restart the cluster (or at least all master candidates). Dynamic host configuration allows you to add hosts to the cluster without manually changing the configuration. To enable dynamic host configuration, you must define the following parameters: ◆ ◆ LSF_MASTER_LIST and LSF_DYNAMIC_HOST_WAIT_TIME in lsf.conf LSF_HOST_ADDR_RANGE in lsf.cluster.
PAGE 105
Chapter 4 Working with Hosts Upon startup, slave hosts wait to receive the acknowledgement from the master LIM. This acknowledgement indicates to the slave host that it has already been added to the cluster. If it does not receive the acknowledgement within the time specified by LSF_DYNAMIC_HOST_WAIT_TIME in lsf.conf, the slave host contacts the master LIM to add itself to the cluster. If you did not define LSF_DYNAMIC_HOST_WAIT_TIME in lsf.
PAGE 106
Adding Hosts Dynamically Adding dynamic hosts in a shared file system If the new dynamically added hosts share the same set of configuration and binary files with normal hosts, you only need to start the LSF daemons on that host and the host is recognized by the master as an LSF host. New installation 1 Specify the installation options in install.config. The following parameters are required: ❖ ❖ ❖ ❖ 2 Existing 1 installation LSF_TOP="/path" LSF_ADMINS="user_name [user_name ...
PAGE 107
Chapter 4 Working with Hosts Adding dynamic hosts in a non-shared file system (slave hosts) If each dynamic slave host has its own LSF binaries and local lsf.conf and shell environment scripts (cshrc.lsf and profile.lsf), you must install LSF on each slave host. 1 Specify installation options in the slave.config file. The following parameters are required: LSF_SERVER_HOSTS="host_name [host_name ...
PAGE 108
Adding Hosts Dynamically You only need to run hostsetup if you want LSF to automatically start when the host is rebooted. For example: # cd /usr/local/lsf/6.2/install # ./hostsetup --top="/usr/local/lsf" --boot="y" For complete hostsetup usage, enter hostsetup -h. 5 Use the following commands start LSF: # lsadmin limstartup # lsadmin resstartup # badmin hstartup Limitation The first time a non-shared slave host joins the cluster, daemons on the new host can only be started on local host.
PAGE 109
Chapter 4 Working with Hosts Adding Host Types and Host Models to lsf.shared The lsf.shared file contains a list of host type and host model names for most operating systems. You can add to this list or customize the host type and host model names. A host type and host model name can be any alphanumeric string up to 29 characters long. Adding a custom host type or model 1 2 Log on as the LSF administrator on any host in the cluster. Edit lsf.
PAGE 110
Registering Service Ports Registering Service Ports LSF uses dedicated UDP and TCP ports for communication. All hosts in the cluster must use the same port numbers to communicate with each other. The service port numbers can be any numbers ranging from 1024 to 65535 that are not already used by other services. To make sure that the port numbers you supply are not already used by applications registered in your service database check /etc/services or use the command ypcat services.
PAGE 111
Chapter 4 Working with Hosts # /etc/services entries for LSF daemons # res 3878/tcp # remote execution server lim 3879/udp # load information manager mbatchd 3881/tcp # master lsbatch daemon sbatchd 3882/tcp # slave lsbatch daemon # # Add this if ident is not already defined # in your /etc/services file ident 113/tcp auth tap # identd 3 4 5 Run lsadmin reconfig to reconfigure LIM. Run badmin reconfig to reconfigure mbatchd. Run lsfstartup to restart all daemons in the cluster.
PAGE 112
Registering Service Ports 9 Run badmin reconfig to reconfigure mbatchd. 10 Run lsfstartup to restart all daemons in the cluster.
PAGE 113
Chapter 4 Working with Hosts Host Naming LSF needs to match host names with the corresponding Internet host addresses. LSF looks up host names and addresses the following ways: ◆ ◆ ◆ In the /etc/hosts file Sun Network Information Service/Yellow Pages (NIS or YP) Internet Domain Name Service (DNS). DNS is also known as the Berkeley Internet Name Domain (BIND) or named, which is the name of the BIND daemon. Each host is configured to use one or more of these mechanisms.
PAGE 114
Host Naming 177.16.1.1 atlasD0 atlas0 atlas1 atlas2 atlas3 atlas4 ... atlas31 177.16.1.2 atlasD1 atlas32 atlas33 atlas34 atlas35 atlas36 ... atlas63 ... In the new format, you still map the nodes to the LSF hosts, so the number of lines remains the same, but the format is simplified because you only have to specify ranges for the nodes, not each node individually as an alias: ... 177.16.1.1 atlasD0 atlas[0-31] 177.16.1.2 atlasD1 atlas[32-63] ...
PAGE 115
Chapter 4 Working with Hosts Hosts with Multiple Addresses Multi-homed Hosts that have more than one network interface usually have one Internet address for hosts each interface. Such hosts are called multi-homed hosts. LSF identifies hosts by name, so it needs to match each of these addresses with a single host name. To do this, the host name information must be configured so that all of the Internet addresses for a host resolve to the same name.
PAGE 116
Hosts with Multiple Addresses Example /etc/hosts entries No unique official The following example is for a host with two interfaces, where the host does not have a name unique official name. # Address Official name # Interface on network A AA.AA.AA.AA host-AA.domain # Interface on network B BB.BB.BB.BB host-BB.domain Aliases host.domain host-AA host host-BB host Looking up the address AA.AA.AA.AA finds the official name host-AA.domain. Looking up address BB.BB.BB.BB finds the name host-BB.domain.
PAGE 117
Chapter 4 Working with Hosts # address AA.AA.AA.AA.in-addr.arpa BB.BB.BB.BB.in-addr.arpa class IN IN type PTR PTR name host.domain host.domain If it is not possible to change the system host name database, create the hosts file local to the LSF system, and configure entries for the multi-homed hosts only. Host names and addresses not found in the hosts file are looked up in the standard name system on your host.
PAGE 118
Host Groups Host Groups You can define a host group within LSF or use an external executable to retrieve host group members. Use bhosts to view a list of existing hosts. Use bmgroup to view host group membership use. Where to use host groups LSF host groups can be used in defining the following parameters in LSF configuration files: ◆ ◆ HOSTS in lsb.queues for authorized hosts for the queue HOSTS in lsb.
PAGE 119
Chapter 4 Working with Hosts Using wildcards and special characters to define host names You can use special characters when defining host group members under the GROUP_MEMBER column to specify hosts. These are useful to define several hosts in a single entry, such as for a range of hosts, or for all host names with a certain text string. If a host matches more than one host group, that host is a member of all groups.
PAGE 120
Host Groups Begin HostGroup GROUP_NAME GROUP_MEMBER groupA (hostA*) groupB (groupA) End HostGroup Defining condensed host groups You can define condensed host groups to display information for its hosts as a summary for the entire group. This is useful because it allows you to see the total statistics of the host group as a whole instead of having to add up the data yourself. This allows you to better plan the distribution of jobs submitted to the hosts and host groups in your cluster.
PAGE 121
Chapter 4 Working with Hosts % bjobs JOBID 520 521 522 USER user1 user1 user1 STAT RUN RUN PEND QUEUE normal normal normal FROM_HOST host5 host5 host5 EXEC_HOST hg1 hg1 JOB_NAME SUBMIT_TIME sleep 1001 Apr 15 13:50 sleep 1001 Apr 15 13:50 sleep 1001 Apr 15 13:51 External host group requirements (egroup) An external host group is a host group for which membership is not statically configured, but is instead retrieved by running an external executable with the name egroup.
PAGE 122
Tuning CPU Factors Tuning CPU Factors CPU factors are used to differentiate the relative speed of different machines. LSF runs jobs on the best possible machines so that response time is minimized. To achieve this, it is important that you define correct CPU factors for each machine model in your cluster. How CPU factors affect performance Incorrect CPU factors can reduce performance the following ways.
PAGE 123
Chapter 4 Working with Hosts Begin HostModel MODELNAME CPUFACTOR #HPUX (HPPA) HP9K712S 2.5 HP9K712M 2.5 HP9K712F 4.0 ARCHITECTURE # keyword (HP9000712_60) (HP9000712_80) (HP9000712_100) See the Platfor m LSF Refer ence for information about the lsf.shared file. 3 4 5 Save the changes to lsf.shared. Run lsadmin reconfig to reconfigure LIM. Run badmin reconfig to reconfigure mbatchd.
PAGE 124
Handling Host-level Job Exceptions Handling Host-level Job Exceptions You can configure hosts so that LSF detects exceptional conditions while jobs are running, and take appropriate action automatically. You can customize what exceptions are detected, and the corresponding actions. By default, LSF does not detect any exceptions. eadmin script When an exception is detected, LSF takes appropriate action by running the script LSF_SERVERDIR/eadmin on the master host.
PAGE 125
Chapter 4 Working with Hosts Tuning Tune JOB_EXIT_RATE_DURATION carefully. Shorter values may raise false alarms, longer values may not trigger exceptions frequently enough. Example Exit rate hostA exit rate Threshold Time t0 t1 t2 t3 In the diagram, the job exit rate of hostA exceeds the configured threshold. LSF monitors hostA from time t1 to time t2 (t2=t1 + JOB_EXIT_RATE_DURATION in lsb.params). At t2, the exit rate is still high, and a host exception is detected.
PAGE 126
Handling Host-level Job Exceptions 126 Administering Platform LSF
PAGE 127
C H A P T E 5 R Working with Queues Contents ◆ ◆ ◆ ◆ ◆ ◆ “Queue States” on page 128 “Viewing Queue Information” on page 129 “Controlling Queues” on page 132 “Adding and Removing Queues” on page 135 “Managing Queues” on page 136 “Handling Job Exceptions” on page 137 Administering Platform LSF 127
PAGE 128
Queue States Queue States Queue states, displayed by bqueues, describe the ability of a queue to accept and start batch jobs using a combination of the following states: ◆ ◆ ◆ ◆ Open queues accept new jobs Closed queues do not accept new jobs Active queues start jobs on available hosts Inactive queues hold all jobs State Description Open:Active Open:Inact Closed:Active Closed:Inact Accepts and starts new jobs—normal processing Accepts and holds new jobs—collecting Does not accept new jobs, but continu
PAGE 129
Chapter 5 Working with Queues Viewing Queue Information The bqueues command displays information about queues. The bqueues -l option also gives current statistics about the jobs in a particular queue such as the total number of jobs in the queue, the number of jobs running, suspended, and so on. To view the... Run...
PAGE 130
Viewing Queue Information 20000 K 20000 K 2048 K SCHEDULING PARAMETERS r15s r1m r15m loadSched 0.7 1.0 loadStop 1.5 2.5 20000 K ut 0.2 - pg 4.0 8.
PAGE 131
Chapter 5 Working with Queues PRIO NICE STATUS 30 20 Open:Active MAX JL/U JL/P JL/H NJOBS 0 PEND 0 RUN SSUSP USUSP 0 0 0 RSV 0 STACKLIMIT MEMLIMIT 2048 K 5000 K SCHEDULING PARAMETERS r15s r1m loadSched loadStop - r15m - ut - pg - io - ls - it - tmp - swp - mem - JOB EXCEPTION PARAMETERS OVERRUN(min) UNDERRUN(min) IDLE(cputime/runtime) Threshold 5 2 0.
PAGE 132
Controlling Queues Controlling Queues Queues are controlled by an LSF Administrator or root issuing a command or through configured dispatch and run windows. Closing a queue Run badmin qclose: % badmin qclose normal Queue is closed When a user tries to submit a job to a closed queue the following message is displayed: % bsub -q normal ...
PAGE 133
Chapter 5 Working with Queues % badmin qhist Fri Apr 4 10:50:36: Queue closed by administrator change configuration. bqueues -l also displays the comment text: % bqueues -l normal QUEUE: normal -- For normal low priority jobs, running only if hosts are lightly loaded. Th is is the default queue.
PAGE 134
Controlling Queues Begin Queue QUEUE_NAME = queue1 PRIORITY = 45 DISPATCH_WINDOW = 4:30-12:00 End Queue 3 Reconfigure the cluster using: a b 4 lsadmin reconfig badmin reconfig Run bqueues -l to display the dispatch windows. Run Windows A run window specifies one or more time periods during which jobs dispatched from a queue are allowed to run. When a run window closes, running jobs are suspended, and pending jobs remain pending. The suspended jobs are resumed when the window opens again.
PAGE 135
Chapter 5 Working with Queues Adding and Removing Queues Adding a queue 1 2 3 4 Log in as the LSF administrator on any host in the cluster. Edit lsb.queues to add the new queue definition. You can copy another queue definition from this file as a starting point; remember to change the QUEUE_NAME of the copied queue. Save the changes to lsb.queues. Run badmin reconfig to reconfigure mbatchd. Adding a queue does not affect pending or running jobs.
PAGE 136
Managing Queues Managing Queues Restricting host use by queues You may want a host to be used only to run jobs submitted to specific queues. For example, if you just added a host for a specific department such as engineering, you may only want jobs submitted to the queues engineering1 and engineering2 to be able to run on the host. 1 2 Log on as root or the LSF administrator on any host in the cluster. Edit lsb.queues, and add the host to the HOSTS parameter of specific queues.
PAGE 137
Chapter 5 Working with Queues Handling Job Exceptions You can configure queues so that LSF detects exceptional conditions while jobs are running, and take appropriate action automatically. You can customize what exceptions are detected, and the corresponding actions. By default, LSF does not detect any exceptions. eadmin script When an exception is detected, LSF takes appropriate action by running the script LSF_SERVERDIR/eadmin on the master host.
PAGE 138
Handling Job Exceptions JOB_OVERRUN Specifies a threshold for job overrun. If a job runs longer than the specified run time, LSF invokes eadmin to trigger the action for a job overrun exception. JOB_UNDERRUN Specifies a threshold for job underrun. If a job exits before the specified number of minutes, LSF invokes eadmin to trigger the action for a job underrun exception. Example The following queue defines thresholds for all job exceptions: Begin Queue ... JOB_UNDERRUN = 2 JOB_OVERRUN = 5 JOB_IDLE = 0.10 .
PAGE 139
C H A P T E 6 R Managing Jobs Contents ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ “Understanding Job States” on page 140 “Viewing Job Information” on page 143 “Changing Job Order Within Queues” on page 145 “Switching Jobs from One Queue to Another” on page 146 “Forcing Job Execution” on page 147 “Suspending and Resuming Jobs” on page 148 “Killing Jobs” on page 149 “Sending a Signal to a Job” on page 151 “Using Job Groups” on page 152 Administering Platform LSF 139
PAGE 140
Understanding Job States Understanding Job States The bjobs command displays the current state of the job.
PAGE 141
Chapter 6 Managing Jobs ◆ ◆ ◆ ◆ ◆ ◆ Dispatch windows during which the queue can dispatch and qualified hosts can accept jobs Run windows during which jobs from the queue can run Limits on the number of job slots configured for a queue, a host, or a user Relative priority to other users and jobs Availability of the specified resources Job dependency and pre-execution conditions Maximum If the user or user group submitting the job has reached the pending job threshold as pending job specified by MAX_PEND_JO
PAGE 142
Understanding Job States WAIT state (chunk jobs) If you have configured chunk job queues, members of a chunk job that are waiting to run are displayed as WAIT by bjobs. Any jobs in WAIT status are included in the count of pending jobs by bqueues and busers, even though the entire chunk job has been dispatched and occupies a job slot. The bhosts command shows the single job slot occupied by the entire chunk job in the number of jobs shown in the NJOBS column.
PAGE 143
Chapter 6 Managing Jobs Viewing Job Information The bjobs command is used to display job information. By default, bjobs displays information for the user who invoked the command. For more information about bjobs, see the LSF Refer ence and the bjobs(1) man page. Viewing all jobs for all users Run bjobs -u all to display all jobs for all users.
PAGE 144
Viewing Job Information % bjobs -x -l -a Job <2>, User , Project , Status , Queue , Command Wed Aug 13 14:23:35: Submitted from host , CWD <$HOME>, Output File , Specified Hosts ; Wed Aug 13 14:23:43: Started on , Execution Home , Execution CWD ; Resource usage collected. IDLE_FACTOR(cputime/runtime): 0.
PAGE 145
Chapter 6 Managing Jobs Changing Job Order Within Queues By default, LSF dispatches jobs in a queue in the order of arrival (that is, first-come-first-served), subject to availability of suitable server hosts. Use the btop and bbot commands to change the position of pending jobs, or of pending job array elements, to affect the order in which jobs are considered for dispatch. Users can only change the relative position of their own jobs, and LSF administrators can change the position of any users’ jobs.
PAGE 146
Switching Jobs from One Queue to Another Switching Jobs from One Queue to Another You can use the command bswitch to change jobs from one queue to another. This is useful if you submit a job to the wrong queue, or if the job is suspended because of queue thresholds or run windows and you would like to resume the job. Switching a single job Run bswitch to move pending and running jobs from queue to queue.
PAGE 147
Chapter 6 Managing Jobs Forcing Job Execution A pending job can be forced to run with the brun command. This operation can only be performed by an LSF administrator. You can force a job to run on a particular host, to run until completion, and other restrictions. For more information, see the brun command. When a job is forced to run, any other constraints associated with the job such as resource requirements or dependency conditions are ignored.
PAGE 148
Suspending and Resuming Jobs Suspending and Resuming Jobs A job can be suspended by its owner or the LSF administrator. These jobs are considered user-suspended and are displayed by bjobs as USUSP. If a user suspends a high priority job from a non-preemptive queue, the load may become low enough for LSF to start a lower priority job in its place. The load created by the low priority job can prevent the high priority job from resuming. This can be avoided by configuring preemptive queues.
PAGE 149
Chapter 6 Managing Jobs Killing Jobs The bkill command cancels pending batch jobs and sends signals to running jobs. By default, on UNIX, bkill sends the SIGKILL signal to running jobs. Before SIGKILL is sent, SIGINT and SIGTERM are sent to give the job a chance to catch the signals and clean up. The signals are forwarded from mbatchd to sbatchd. sbatchd waits for the job to exit before reporting the status.
PAGE 150
Killing Jobs Forcing removal of a job from LSF Run bkill -r to force the removal of the job from LSF. Use this option when a job cannot be killed in the operating system. The bkill -r command removes a job from the LSF system without waiting for the job to terminate in the operating system.
PAGE 151
Chapter 6 Managing Jobs Sending a Signal to a Job LSF uses signals to control jobs, to enforce scheduling policies, or in response to user requests. The principal signals LSF uses are SIGSTOP to suspend a job, SIGCONT to resume a job, and SIGKILL to terminate a job. Occasionally, you may want to override the default actions. For example, instead of suspending a job, you might want to kill or checkpoint it.
PAGE 152
Using Job Groups Using Job Groups A collection of jobs can be organized into job groups for easy management. A job group is a container for jobs in much the same way that a directory in a file system is a container for files. For example, a payroll application may have one group of jobs that calculates weekly payments, another job group for calculating monthly salaries, and a third job group that handles the salaries of part-time or contract employees.
PAGE 153
Chapter 6 Managing Jobs ◆ % bgadd /risk_group creates a job group named risk_group under the root group /. ◆ % bgadd /risk_group/portfolio1 creates a job group named portfolio1 under job group /risk_group. ◆ % bgadd /risk_group/portfolio1/current creates a job group named current under job group /risk_group/portfolio1.
PAGE 154
Using Job Groups % bjobs -l -g /risk_group Job <101>, User , Project , Job Group , Status , Queue , Command Tue Jun 17 16:21:49: Submitted from host , CWD ; ...
PAGE 155
Chapter 6 Managing Jobs % bhist -l 105 Job <105>, User , Project , Job Group , Command Wed May 14 15:24:07: Submitted from host , to Queue , CWD <$HOME/lsf51/5.1/sparc-sol7-64/bin>; Wed May 14 15:24:10: Parameters of Job are changed: Job group changes to: /risk_group/portfolio2/monthly; Wed May 14 15:24:17: Dispatched to ; Wed May 14 15:24:17: Starting (Pid 8602); ...
PAGE 156
Using Job Groups 156 Administering Platform LSF
PAGE 157
C H A P T E 7 R Managing Users and User Groups Contents ◆ ◆ ◆ ◆ “Viewing User and User Group Information” on page 158 “About User Groups” on page 160 “Existing User Groups as LSF User Groups” on page 161 “LSF User Groups” on page 162 Administering Platform LSF 157
PAGE 158
Viewing User and User Group Information Viewing User and User Group Information You can display information about LSF users and user groups using the busers and bugroup commands. The busers command displays information about users and user groups. The default is to display information about the user who invokes the command.
PAGE 159
Chapter 7 Managing Users and User Groups % bugroup -l GROUP_NAME: testers USERS: user1 user2 SHARES: [user1, 4] [others, 10] GROUP_NAME: USERS: SHARES: engineers user3 user4 user10 user9 [others, 10] [user9, 4] GROUP_NAME: USERS: SHARES: system all users [user9, 10] [others, 15] GROUP_NAME: USERS: SHARES: 16] develop user4 user10 user11 engineers/ [engineers, 40] [user4, 15] [user10, 34] [user11, Administering Platform LSF 159
PAGE 160
About User Groups About User Groups User groups act as aliases for lists of users. The administrator can also limit the total number of running jobs belonging to a user or a group of users. You can define user groups in LSF in several ways: Use existing user groups in the configuration files Create LSF-specific user groups ◆ Use an external executable to retrieve user group members If desired, you can use all three methods, provided the user and group names are different.
PAGE 161
Chapter 7 Managing Users and User Groups Existing User Groups as LSF User Groups User groups already defined in your operating system often reflect existing organizational relationships among users. It is natural to control computer resource access using these existing groups. You can specify existing UNIX user groups anywhere an LSF user group can be specified. How LSF recognizes UNIX user groups Only group members listed in the /etc/group file or the file group.byname NIS map are accepted.
PAGE 162
LSF User Groups LSF User Groups You can define an LSF user group within LSF or use an external executable to retrieve user group members. Use bugroup to view user groups and members, use busers to view all users in the cluster. Where to use LSF user groups LSF user groups can be used in defining the following parameters in LSF configuration files: ◆ USERS in lsb.queues for authorized queue users ◆ USER_NAME in lsb.users for user job slot limits USER_SHARES (optional) in lsb.
PAGE 163
Chapter 7 Managing Users and User Groups External user group requirements (egroup) An external user group is a user group for which membership is not statically configured, but is instead retrieved by running an external executable with the name egroup. The egroup executable must be in the directory specified by LSF_SERVERDIR. This feature allows a site to maintain group definitions outside LSF and import them into LSF configuration at initialization time.
PAGE 164
LSF User Groups 164 Administering Platform LSF
PAGE 165
Chapter 8 Platform LSF Licensing C H A P T E 8 R Platform LSF Licensing Contents ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ “The LSF License File” on page 166 “How LSF Permanent Licensing Works” on page 171 “Installing a Demo License” on page 173 “Installing a Permanent License” on page 175 “Updating a License” on page 180 “FLEXlm Basics” on page 182 “Multiple FLEXlm License Server Hosts” on page 185 “Partial Licensing” on page 187 “Floating Client Licenses” on page 191 “Troubleshooting License Issues” on page 197 Admi
PAGE 166
The LSF License File The LSF License File This section helps you to understand the types of LSF licenses and the contents of the LSF license file. It does not contain information required to install your license. Evaluation (demo) license You can use a demo license to install Platform LSF and get it running temporarily, then switch to the permanent license before the evaluation period expires with no interruption in service, as described in “Installing a permanent license for the first time” on page 175.
PAGE 167
Chapter 8 Platform LSF Licensing Enforcement of grid license managment plugin licenses The new license optimization features enabled by the grid license management plugin require the lsf_mv_grid_filter license feature. The number of lsf_mv_grid_filter licenses should be at least the number of LSF License Scheduler licenses. Banded licensing You can use permanent licenses with restictions in operating system and hardware configurations.
PAGE 168
The LSF License File Format of the demo license file This is intended to familiarize you with the demo license file. You do not need to read this section if you are only interested in installing the license. LSF licenses are stored in a text file. The default name of the license file is license.dat. The license.dat file for a demo license contains a FEATURE line for each LSF product. Each feature contains an expiry date and ends with the string DEMO. For example: FEATURE lsf_base lsf_ld 6.
PAGE 169
Chapter 8 Platform LSF Licensing Example demo The following is an example of a demo license file. This file licenses LSF 6.2, advance license file reservation, and Platform Make. The license is valid until February 24, 2006. Version Product FEATURE FEATURE FEATURE FEATURE FEATURE FEATURE FEATURE Expiry date DEMO license lsf_base lsf_ld 6.200 24-Feb-2006 0 9C4CF8EDE8WL096AAF77 "Platform" DEMO lsf_manager lsf_ld 6.200 24-Feb-2006 0 BC0CE84D6165BFD64E4A "Platform" DEMO lsf_sched_fairshare lsf_ld 6.
PAGE 170
The LSF License File Example permanent license file The following is an example of a permanent license file. Server name LSF vendor daemon Host ID (lmhostid) License vendor daemon path License port number SERVER hosta 880a0748a 1700 DAEMON lsf_ld /usr/share/lsf/lsf_62/6.2/sparc-sol2/etc/lsf_ld FEATURE lsf_base lsf_ld 6.200 1-jun-0000 10 DCF7C3D92A5471A12345 "Platform" FEATURE lsf_manager lsf_ld 6.200 1-jun-0000 10 4CF7D37944B023A12345 "Platform" FEATURE lsf_sched_fairshare lsf_ld 6.
PAGE 171
Chapter 8 Platform LSF Licensing How LSF Permanent Licensing Works This section is intended to give you a better understanding of how LSF licensing works in a production environment with a permanent license. It does not contain information required to install your license. Platform LSF uses the FLEXlm license management product from GLOBEtrotter Software to control its licenses. LSF licenses are controlled centrally through the LSF master LIM.
PAGE 172
How LSF Permanent Licensing Works When the slave LIMs start, they contact the master host to get the licenses they need. 3 Check out licenses needed for client hosts listed in LSF_CONFDIR/lsf.cluster.cluster_name. If the license checkout fails for any host, that host is unlicensed. The master LIM tries to check out the license later. LSF license grace period If the master LIM finds the license server daemon has gone down or is unreachable, LSF has a grace period before the whole cluster is unlicensed.
PAGE 173
Chapter 8 Platform LSF Licensing Installing a Demo License This section includes instructions for licensing LSF with a new demo license. Most new users should follow the procedure under “Installing and licensing LSF for the first time” on page 173. If you already have LSF installed, see “Installing a demo license manually” on page 173.
PAGE 174
Installing a Demo License Getting a demo license To get a demo license from Platform or your Platform LSF vendor, complete the evaluation form on the Platform Web site at www.platform.com. If you have any questions about your demo license, contact license@platform.com. Location of the LSF license file for a demo license For a demo license, each LSF host must be able to read the license file.
PAGE 175
Chapter 8 Platform LSF Licensing Installing a Permanent License This section includes instructions for licensing LSF with a new permanent license. If you have not yet installed LSF, you can use a demo license to get started. See “Installing a Demo License” on page 173. If you already have LSF, see “Installing a permanent license for the first time” on page 175.
PAGE 176
Installing a Permanent License See “LSF_LICENSE_FILE parameter” on page 177. 7 Start the license server daemon. See “Starting the license daemons” on page 182.
PAGE 177
Chapter 8 Platform LSF Licensing Viewing and editing the license file Your LSF license should be a text file (normally named license.dat). Use any text editor such as vi or emacs to open a copy of your license file for viewing or editing. For example, If you receive your license from Platform as text, you must create a new file and copy the text into the file. ◆ You might have to modify lines in the license, such as the path in the DAEMON line when you install a new permanent license.
PAGE 178
Installing a Permanent License SERVER hosta 68044d20 1700 LSF_LICENSE_FILE would be: LSF_LICENSE_FILE="1700@hosta" FLEXlm default If you installed FLEXlm separately from LSF to manage other software licenses, the default FLEXlm installation puts the license file in the following location: /usr/local/flexlm/licenses/license.
PAGE 179
Chapter 8 Platform LSF Licensing FLEXlm license server host A permanent LSF license is tied to the host ID of a particular license server host and cannot be used on another host. If you are already running FLEXlm to support other software licenses, you can use the existing license server host to manage LSF also. In this case, you will add your Platform LSF license key to the existing FLEXlm license file.
PAGE 180
Updating a License Updating a License This section is intended for those who are updating an existing LSF license file. To switch your demo license to a permanent license, see “Installing a Permanent License” on page 175. To update a license: 1 2 Contact Platform to get the license. See “Requesting a new license” on page 180.
PAGE 181
Chapter 8 Platform LSF Licensing % lsadmin reconfig % lsadmin restart on the master LIM The license file is re-read and the changes accepted by LSF. At this point, the LSF license has been updated. However, some products may also require installation or upgrade of LSF software before you can use the new functionality. Updating a license with INCREMENT lines If you received one or more INCREMENT lines, update your license by adding the lines to your existing license file. 1 2 3 Edit your license.
PAGE 182
FLEXlm Basics FLEXlm Basics This section is for users installing a permanent license, as FLEXlm is not used with demo licenses. Users who already know how to use FLEXlm will not need to read this section. FLEXlm is used by many UNIX software packages because it provides a simple and flexible method for controlling access to licensed software. A single FLEXlm license server daemon can handle licenses for many software packages, even if those packages come from different vendors.
PAGE 183
Chapter 8 Platform LSF Licensing Checking the license server status If you are using a permanent LSF license, use the lmstat command to check the status of the license server daemon. This check can tell you whether or not your attempt to start your license server daemon succeeded. If your attempt failed, see “lmgrd fails with message "Port already in use"” on page 200. The lmstat command is in LSF_SERVERDIR. For example: /usr/share/lsf/lsf_62/6.
PAGE 184
FLEXlm Basics FLEXlm log file Read this to familiarize yourself with the FLEXlm log file. The FLEXlm license server daemons log messages about the state of the license server hosts, and when licenses are checked in or out. This log helps to resolve problems with the license server hosts and to track license use. The log file grows over time. You can remove or rename the existing FLEXlm log file at any time. You must choose a location for the log file when you start the license daemon.
PAGE 185
Chapter 8 Platform LSF Licensing Multiple FLEXlm License Server Hosts This section applies to permanent licenses only. Read this section if you are interested in the various ways you can distribute your licenses. This is valuable if you are interested in having some form of backup in case of failure. Compare with “Selecting a license server host” on page 179 to make an educated decision.
PAGE 186
Multiple FLEXlm License Server Hosts 3 4 See “Starting the license daemons” on page 182. Start lmgrd on all license server hosts, not just one. To allow the new permanent licenses to take effect, reconfigure the cluster with the commands: ❖ ❖ lsadmin reconfig badmin reconfig Redundant license server hosts Configuring multiple license server hosts is optional. It provides a way to keep LSF running if a license server host goes down. There are two ways to configure multiple license servers.
PAGE 187
Chapter 8 Platform LSF Licensing Partial Licensing This section applies to permanent licenses. Read this if you have a cluster in which not all of the hosts will require licenses for the same LSF products. In this section, you will learn how to save money through distributing your licenses efficiently. Not all hosts in the cluster need to be licensed for the same set of LSF products.
PAGE 188
Partial Licensing LOAD_THRESHOLDS: r15s r1m r15m console 3.5 0.0 ut pg io ls it - - - - - tmp - swp mem - - tmp2 - nio - Example of partial licensing Here is an example that will allow you to better visualize the concept of partial licensing. Through this example, you can learn how to configure your hosts to use partial licensing. Scenario In the following configuration, the license file contains licenses for LSF, Platform Make and Platform LSF MultiCluster.
PAGE 189
Chapter 8 Platform LSF Licensing LOAD_THRESHOLDS: r15s r1m r15m HOST_NAME: type SUNSOL ut - pg - io - ls - it - tmp - swp - mem - hostB model DEFAULT cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server 1.
PAGE 190
Partial Licensing HOST_NAME: type SUNSOL hostB model DEFAULT cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server 1.
PAGE 191
Chapter 8 Platform LSF Licensing Floating Client Licenses LSF floating client is valuable if you have a cluster in which not all of the hosts will be active at the same time. In this section, you will learn how to save money through distributing your licenses efficiently. An LSF floating client license is a type of LSF license to be shared among several client hosts at different times. Floating client licenses are not tied to specific hosts.
PAGE 192
Floating Client Licenses Administration Since LSF floating client hosts are not listed in lsf.cluster.cluster_name, some commands administration commands will not work if issued from LSF floating client hosts. Always run administration commands from server hosts. Floating client hosts and host types/models This differentiates between client hosts and floating client hosts in terms of the restrictions on host types or models. For LSF client hosts, you can list the host type and model in lsf.cluster.
PAGE 193
Chapter 8 Platform LSF Licensing ... Begin Parameters PRODUCTS= LSF_Base LSF_Manager LSF_Sched_Fairshare LSF_Sched_Parallel LSF_Sched_Preemption LSF_Sched_Resource_Reservation LSF_Make FLOAT_CLIENTS= 25 End Parameters ... The FLOAT_CLIENTS parameter sets the size of your license pool in the cluster. When the master LIM starts up, the number of licenses specified in FLOAT_CLIENTS (or fewer) can be checked out for use as floating client licenses. If the parameter FLOAT_CLIENTS is not specified in lsf.
PAGE 194
Floating Client Licenses Configuring security for LSF floating client licenses Read this section to learn how to configure security against the issues presented in “Security issues with floating client licenses” on page 193. To resolve these security issues, the LSF administrator can limit which client hosts submit requests in the cluster by adding a domain or a range of domains in lsf.cluster.cluster_name with the parameter FLOAT_CLIENTS_ADDR_RANGE.
PAGE 195
Chapter 8 Platform LSF Licensing ◆ FLOAT_CLIENTS_ADDR_RANGE=100.172.1.13 100.*.30-54 124.24*.1.*-34 All client hosts belonging to a domain with the address 100.172.1.13 will be allowed access. All client hosts belonging to domains starting with 100, then any number, then a range of 30 to 54 will be allowed access. All client hosts belonging to domains starting with 124, then from 24 onward, then 1, then from 0 to 34 will be allowed access. ◆ FLOAT_CLIENTS_ADDR_RANGE=12.23.45.
PAGE 196
Floating Client Licenses 3 Submit a command from a host not listed in lsf.cluster.cluster_name. For example, if you submitted the following job from hostC: % bsub sleep 1000 You would get the following response: Job <104> is submitted to default queue . 4 % lshosts HOST_NAME hostA hostB hostC type SUNSOL SUNSOL UNKNOWN From any LSF host, with LSF_ENVDIR set to this cluster, enter the lshosts command: model DEFAULT DEFAULT UNKNOWN cpuf ncpus maxmem maxswp server 1.0 1 128M 602M Yes 1.0 No 1.
PAGE 197
Chapter 8 Platform LSF Licensing Troubleshooting License Issues ◆ ◆ ◆ ◆ ◆ ◆ “"lsadmin reconfig" gives "User permission denied" message” on page 197 “Primary cluster administrator receives email “Your cluster has experienced license overuse” message” on page 197 “lsadmin command fails with "ls_gethostinfo: Host does not have a software license"” on page 197 “LSF commands give "Host does not have a software license"” on page 199 “LSF commands fail with "ls_initdebug: Unable to open file lsf.
PAGE 198
Troubleshooting License Issues or kill -9 lim_PID 198 Administering Platform LSF
PAGE 199
Chapter 8 Platform LSF Licensing 3 After the old LIM has died, start the new LIM on the master host using: ❖ lsadmin limstartup or ❖ LSF_SERVERDIR/lim as root. LSF commands give "Host does not have a software license" You may see this message after running lsid, lshosts, or other ls* commands. Typical problems: ◆ Your demo license (not tied to FLEXlm server) has expired. Solution: a ◆ Check the license.dat file to check the expiry date.
PAGE 200
Troubleshooting License Issues LSF commands fail with "ls_initdebug: Unable to open file lsf.conf" You might see this message after running lsid. This message indicates that the LSF commands cannot access the lsf.conf file or lsf.conf does not exist in /etc. Solution: ◆ Use LSF_CONFDIR/csrhc.lsf or LSF_CONFDIR/profile.lsf to set up your LSF environment. or ◆ If you know the location of lsf.conf, set the LSF_ENVDIR environment variable to point to the directory containing the lsf.conf file.
PAGE 201
P A R II T Working with Resources Contents ◆ ◆ ◆ Chapter 9, “Understanding Resources” Chapter 10, “Adding Resources” Chapter 11, “Managing Software Licenses with LSF”
PAGE 202
PAGE 203
C H A P T E 9 R Understanding Resources Contents ◆ ◆ ◆ ◆ ◆ ◆ “About LSF Resources” on page 204 “How Resources are Classified” on page 206 “How LSF Uses Resources” on page 208 “Load Indices” on page 209 “Static Resources” on page 212 “Automatic Detection of Hardware Reconfiguration” on page 213 Administering Platform LSF 203
PAGE 204
About LSF Resources About LSF Resources The LSF system uses built-in and configured resources to track job resource requirements and schedule jobs according to the resources available on individual hosts. Viewing available resources lsinfo Use lsinfo to list the resources available in your cluster.
PAGE 205
Chapter 9 Understanding Resources MODEL_NAME DEC3000 R10K PENT200 IBM350 SunSparc HP735 HP715 CPU_FACTOR 10.00 14.00 6.00 7.00 6.00 9.00 5.00 lshosts Use lshosts to get a list of the resources defined on a specific host: % lshosts hostA HOST_NAME type hostA SOL732 model Ultra2 cpuf ncpus maxmem maxswp server RESOURCES 20.
PAGE 206
How Resources are Classified How Resources are Classified By values Boolean resources Numerical resources String resources By the way values change Dynamic Resources Static Resources By definitions By scope Resources that denote the availability of specific features Resources that take numerical values, such as all the load indices, number of processors on a host, or host CPU factor Resources that take string values, such as host type, host model, host status Resources that change their values dynami
PAGE 207
Chapter 9 Understanding Resources Resource Name Describes Meaning of Example Name solaris frame Operating system Available software Solaris operating system FrameMaker license Shared resources Shared resources are configured resources that are not tied to a specific host, but are associated with the entire cluster, or a specific subset of hosts within the cluster.
PAGE 208
How LSF Uses Resources How LSF Uses Resources Jobs submitted through the LSF system will have the resources they use monitored while they are running. This information is used to enforce resource usage limits and load thresholds as well as for fairshare scheduling.
PAGE 209
Chapter 9 Understanding Resources Load Indices Load indices are built-in resources that measure the availability of dynamic, non-shared resources on hosts in the LSF cluster. Load indices built into the LIM are updated at fixed time intervals. Exter nal load indices are defined and configured by the LSF administrator. An External Load Information Manager (ELIM) program collects the values of site-defined external load indices and updates LIM when new values are received. Load indices collected by LIM .
PAGE 210
Load Indices CPU run queue lengths (r15s, r1m, r15m) The r15s, r1m and r15m load indices are the 15-second, 1-minute and 15-minute average CPU run queue lengths. This is the average number of processes ready to use the CPU during the given interval. On UNIX, run queue length indices are not necessarily the same as the load averages printed by the uptime(1) command; uptime load averages on some platforms also include processes that are in short-term wait states (such as paging or disk I/O).
PAGE 211
Chapter 9 Understanding Resources ◆ /tmp on UNIX ◆ C:\temp on Windows Swap space (swp) The swp index gives the currently available virtual memory (swap space) in MB. This represents the largest process that can be started on the host. Memory (mem) The mem index is an estimate of the real memory currently available to user processes. This represents the approximate size of the largest process that could be started on a host without causing the host to start paging.
PAGE 212
Static Resources Static Resources Static resources are built-in resources that represent host information that does not change over time, such as the maximum RAM available to user processes or the number of processors in a machine. Most static resources are determined by the LIM at start-up time, or when LSF detects hardware configuration changes. Static resources can be used to select appropriate hosts for particular jobs based on binary architecture, relative CPU speed, and system configuration.
PAGE 213
Chapter 9 Understanding Resources Automatic Detection of Hardware Reconfiguration Some UNIX operating systems support dynamic hardware reconfiguration—that is, the attaching or detaching of system boards in a live system without having to reboot the host. Supported platforms LSF is able to recognize changes in ncpus, maxmem, maxswp, maxtmp in the following platforms: ◆ ◆ ◆ ◆ ◆ Sun Solaris 2.5+ HP-UX 10.10+ Compaq Alpha 5.0+ IBM AIX 4.0+ SGI IRIX 6.
PAGE 214
Automatic Detection of Hardware Reconfiguration How dynamic hardware changes affect LSF LSF uses ncpus, maxmem, maxswp, maxtmp to make scheduling and load decisions. When processors are added or removed, LSF licensing is affected because LSF licenses are based on the number of processors. If you put a processor offline: Per host or per-queue load thresholds may be exceeded sooner. This is because LSF uses the number of CPUS and relative CPU speeds to calculate effective run queue length.
PAGE 215
C H A P T E 10 R Adding Resources Contents ◆ ◆ ◆ ◆ ◆ “About Configured Resources” on page 216 “Adding New Resources to Your Cluster” on page 217 “Static Shared Resource Reservation” on page 221 “External Load Indices and ELIM” on page 222 “Modifying a Built-In Load Index” on page 227 Administering Platform LSF 215
PAGE 216
About Configured Resources About Configured Resources LSF schedules jobs based on available resources. There are many resources built into LSF, but you can also add your own resources, and then use them same way as built-in resources. For maximum flexibility, you should characterize your resources clearly enough so that users have satisfactory choices.
PAGE 217
Chapter 10 Adding Resources Adding New Resources to Your Cluster To add host resources to your cluster, use the following steps: 1 2 3 Log in to any host in the cluster as the LSF administrator. Define new resources in the Resource section of lsf.shared. Specify at least a name and a brief description, which will be displayed to a user by lsinfo. See “Configuring lsf.shared Resource Section” on page 218.
PAGE 218
Configuring lsf.shared Resource Section Configuring lsf.shared Resource Section Configured resources are defined in the Resource section of lsf.shared. There is no distinction between shared and non-shared resources. You must specify at least a name and description for the resource, using the keywords RESOURCENAME and DESCRIPTION. ◆ ◆ A resource name cannot begin with a number. A resource name cannot contain any of the following characters : ◆ .
PAGE 219
Chapter 10 Adding Resources Configuring lsf.cluster.cluster_name ResourceMap Section Resources are associated with the hosts for which they are defined in the ResourceMap section of lsf.cluster.cluster_name. For each resource, you must specify the name and the hosts that have it. If the ResourceMap section is not defined, then any dynamic resources specified in lsf.shared are not tied to specific hosts, but are shared across all hosts in the cluster.
PAGE 220
Configuring lsf.cluster.cluster_name ResourceMap Section ◆ ◆ ◆ ◆ ◆ Type square brackets around the list of hosts, as shown. You can omit the parenthesis if you only specify one set of hosts. Each set of hosts within square brackets specifies an instance of the resource. The same host cannot be in more than one instance of a resource. All hosts within the instance share the quantity of the resource indicated by its value. The keyword all refers to all the server hosts in the cluster, collectively.
PAGE 221
Chapter 10 Adding Resources Static Shared Resource Reservation You must use resource reservation to prevent over-committing static shared resources when scheduling. The usual situation is that you configure single-user application licenses as static shared resources, and make that resource one of the job requirements. You should also reserve the resource for the duration of the job.
PAGE 222
External Load Indices and ELIM External Load Indices and ELIM The LSF Load Information Manager (LIM) collects built-in load indices that reflect the load situations of CPU, memory, disk space, I/O, and interactive activities on individual hosts. While built-in load indices might be sufficient for most jobs, you might have special workload or resource dependencies that require custom exter nal load indices defined and configured by the LSF administrator.
PAGE 223
Chapter 10 Adding Resources Configuring your application-specific SELIM The master ELIM is installed as LSF_SERVERDIR/melim. After installation: 1 2 3 Define the external resources you need. Write your application-specific SELIM to track these resources, as described in “Writing an ELIM” on page 224. Put your ELIM in LSF_SERVERIR. Naming your ELIM Use the following naming conventions: ◆ On UNIX, LSF_SERVERDIR/elim.application For example, elim.license ◆ On Windows, LSF_SERVERDIR\elim.application.
PAGE 224
External Load Indices and ELIM Environment When LIM starts, the following environment variables are set for ELIM: variables ◆ ◆ LSF_MASTER: This variable is defined if the ELIM is being invoked on the master host. It is undefined otherwise. This can be used to test whether the ELIM should report on cluster-wide resources that only need to be collected on the master host. LSF_RESOURCES: This variable contains a list of resource names (separated by spaces) on which the ELIM is expected to report.
PAGE 225
Chapter 10 Adding Resources Example 1 Write an ELIM. The following sample ELIM (LSF_SERVERDIR/elim.mysrc) sets the value of myrsc resource to 2. In a real ELIM, you would have a command to retrieve whatever value you want to retrieve and set the value. #!/bin/sh while : do # set the value for resource "myrsc" val=2 # create an output string in the format: # number_indices index1_name index1_value...
PAGE 226
External Load Indices and ELIM Additional Example code for an ELIM is included in the LSF_MISC/examples directory. The examples elim.c file is an ELIM written in C. You can modify this example to collect the load indices you want. Debugging an ELIM Set the parameter LSF_ELIM_DEBUG=y in the Parameters section of lsf.cluster.cluster_name to log all load information received by LIM from the ELIM in the LIM log file. Set the parameter LSF_ELIM_BLOCKTIME=seconds in the Parameters section of lsf.cluster.
PAGE 227
Chapter 10 Adding Resources Modifying a Built-In Load Index The ELIM can return values for the built-in load indices. In this case the value produced by the ELIM overrides the value produced by the LIM. Considerations ◆ ◆ ◆ The ELIM must ensure that the semantics of any index it supplies are the same as that of the corresponding index returned by the lsinfo(1) command. The name of an external load index must not be one of the resource name aliases: cpu, idle, login, or swap.
PAGE 228
Modifying a Built-In Load Index 228 Administering Platform LSF
PAGE 229
C H A P T E R 11 Managing Software Licenses with LSF Software licenses are valuable resources that must be fully utilized. This section discusses how LSF can help manage licensed applications to maximize utilization and minimize job failure due to license problems.
PAGE 230
Using Licensed Software with LSF Using Licensed Software with LSF Many applications have restricted access based on the number of software licenses purchased. LSF can help manage licensed software by automatically forwarding jobs to licensed hosts, or by holding jobs in batch queues until licenses are available.
PAGE 231
Chapter 11 Managing Software Licenses with LSF Host-locked Licenses Host-locked software licenses allow users to run an unlimited number of copies of the product on each of the hosts that has a license. Configuring host-locked licenses You can configure a Boolean resource to represent the software license, and configure your application to require the license resource. When users run the application, LSF chooses the best host from the set of licensed hosts.
PAGE 232
Counted Host-Locked Licenses Counted Host-Locked Licenses Counted host-locked licenses are only available on specific licensed hosts, but also place a limit on the maximum number of copies available on the host. Configuring counted host-locked licenses You configure counted host-locked licenses by having LSF determine the number of licenses currently available.
PAGE 233
Chapter 11 Managing Software Licenses with LSF Network Floating Licenses A network floating license allows a fixed number of machines or users to run the product at the same time, without restricting which host the software can run on. Floating licenses are cluster-wide resources; rather than belonging to a specific host, they belong to all hosts in the cluster.
PAGE 234
Network Floating Licenses Licenses used outside of LSF control To handle the situation where application licenses are used by jobs outside of LSF, use an ELIM to dynamically collect the actual number of licenses available instead of relying on a statically configured value. The ELIM periodically informs LSF of the number of available licenses, and LSF takes this into consideration when scheduling jobs.
PAGE 235
Chapter 11 Managing Software Licenses with LSF If the Verilog licenses are not cluster-wide, but can only be used by some hosts in the cluster, the resource requirement string should include the defined() tag in the select section: select[defined(verilog)] rusage[verilog=1] Preventing underutilization of licenses One limitation to using a dedicated queue for licensed jobs is that if a job does not actually use the license, then the licenses will be under-utilized.
PAGE 236
Network Floating Licenses #!/bin/sh # lic_starter: If application fails with no license, exit 99, # otherwise, exit 0. The application displays # "no license" when it fails without license available.
PAGE 237
P A R III T Scheduling Policies Contents ◆ ◆ ◆ ◆ ◆ ◆ Chapter 12, “Time Syntax and Configuration” Chapter 13, “Deadline Constraint and Exclusive Scheduling” Chapter 14, “Preemptive Scheduling” Chapter 15, “Specifying Resource Requirements” Chapter 16, “Fairshare Scheduling” Chapter 17, “Goal-Oriented SLA-Driven Scheduling”
PAGE 238
PAGE 239
C H A P T E 12 R Time Syntax and Configuration Contents ◆ ◆ ◆ ◆ “Specifying Time Values” on page 240 “Specifying Time Windows” on page 241 “Specifying Time Expressions” on page 242 “Using Automatic Time-based Configuration” on page 243 Administering Platform LSF 239
PAGE 240
Specifying Time Values Specifying Time Values To specify a time value, a specific point in time, specify at least the hour. Day and minutes are optional. Time value syntax time = hour | hour:minute | day:hour:minute hour integer from 0 to 23, representing the hour of the day. minute integer from 0 to 59, representing the minute of the hour. If you do not specify the minute, LSF assumes the first minute of the hour (:00).
PAGE 241
Chapter 12 Time Syntax and Configuration Specifying Time Windows To specify a time window, specify two time values separated by a hyphen (-), with no space in between. time_window = time1-time2 Time 1 is the start of the window and time 2 is the end of the window. Both time values must use the same syntax.
PAGE 242
Specifying Time Expressions Specifying Time Expressions Time expressions use time windows to specify when to change configurations. For more details on time windows, see “Specifying Time Windows” on page 241. Time expression syntax A time expression is made up of the time keyword followed by one or more spaceseparated time windows enclosed in parenthesis. Time expressions can be combined using the &&, ||, and ! logical operators.
PAGE 243
Chapter 12 Time Syntax and Configuration Using Automatic Time-based Configuration Variable configuration is used to automatically change LSF configuration based on time windows. It is supported in the following files: ◆ ◆ ◆ ◆ ◆ lsb.hosts lsb.params lsb.queues lsb.resources lsb.users You define automatic configuration changes in configuration files by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command.
PAGE 244
Using Automatic Time-based Configuration # - 10 jobs can run from normal queue # - any number can run from short queue between 18:30 and 19:30 # all other hours you are limited to 100 slots in the short queue # - each other queue can run 30 jobs Begin Limit PER_QUEUE normal # if time(18:30-19:30) short #else short #endif (all ~normal ~short) End Limit HOSTS license1 SLOTS 10 license1 - license1 100 license1 30 # Example lsb.users example From 12 - 1 p.m.
PAGE 245
Chapter 12 Time Syntax and Configuration #if time(expression) statement #elif time(expression) statement #elif time(expression) statement #else statement #endif Verifying configuration Use the following LSF commands to verify configuration: ◆ ◆ ◆ ◆ bparams(1) busers(1) bhosts(1) bqueues(1) Administering Platform LSF 245
PAGE 246
Using Automatic Time-based Configuration 246 Administering Platform LSF
PAGE 247
C H A P T E R 13 Deadline Constraint and Exclusive Scheduling Contents ◆ ◆ “Using Deadline Constraint Scheduling” on page 248 “Using Exclusive Scheduling” on page 249 Administering Platform LSF 247
PAGE 248
Using Deadline Constraint Scheduling Using Deadline Constraint Scheduling Deadline constraints will suspend or terminate running jobs at a certain time.
PAGE 249
Chapter 13 Deadline Constraint and Exclusive Scheduling Using Exclusive Scheduling Exclusive scheduling gives a job exclusive use of the host that it runs on. LSF dispatches the job to a host that has no other jobs running, and does not place any more jobs on the host until the exclusive job is finished. How exclusive scheduling works When an exclusive job (bsub -x) is submitted to an exclusive queue (EXCLUSIVE = Y in lsb.
PAGE 250
Using Exclusive Scheduling 250 Administering Platform LSF
PAGE 251
C H A P T E 14 R Preemptive Scheduling Contents ◆ ◆ ◆ “About Preemptive Scheduling” on page 252 “How Preemptive Scheduling Works” on page 253 “Configuring Preemptive Scheduling” on page 255 Administering Platform LSF 251
PAGE 252
About Preemptive Scheduling About Preemptive Scheduling Preemptive scheduling lets a pending high-priority job take job slots away from a running job of lower priority. When two jobs compete for the same job slots, LSF automatically suspends the low-priority job to make slots available to the high-priority job. The low-priority job is resumed as soon as possible. Use preemptive scheduling if you have long-running low-priority jobs causing highpriority jobs to wait an unacceptably long time.
PAGE 253
Chapter 14 Preemptive Scheduling How Preemptive Scheduling Works Preemptive scheduling occurs when two jobs compete for the same job slots. If a highpriority job is pending, LSF can suspend a lower priority job that is running, and then start the high-priority job using the job slot that becomes available. For this to happen, the high-priority job must be pending in a preemptive queue, or the low-priority job must belong to a preemptable queue.
PAGE 254
How Preemptive Scheduling Works Job slot limits specified at the queue level are never affected by preemptive scheduling; they are always enforced for both running and suspended jobs. Preemption of multiple job slots If multiple slots are required, LSF can preempt multiple jobs, until sufficient slots are available. For example, one or more jobs might be preempted for a job that needs multiple job slots.
PAGE 255
Chapter 14 Preemptive Scheduling Configuring Preemptive Scheduling To configure preemptive scheduling, make at least one queue in the cluster preemptive (not the lowest-priority queue) or preemptable (not the highest-priority queue). To make a queue preemptive or preemptable, set PREEMPTION in the queue definition (lsb.queues) to PREEMPTIVE or PREEMPTABLE. A queue can be both: its jobs can always preempt jobs in lower priority queues, and can always be preempted by jobs from higher priority queues.
PAGE 256
Configuring Preemptive Scheduling Configuring additional job slot limits for preemptive scheduling The following job slot limits are always affected by preemptive scheduling: ◆ ◆ Total job slot limit for hosts, specified at the host level (SLOTS and HOSTS in lsb.resources) Total job slot limit for individual users, specified at the user level (SLOTS and USERS in lsb.
PAGE 257
Chapter 14 Preemptive Scheduling Jobs from low_q1 are preferred first for preemption before jobs from low_q2 and low_q3. If preemptable queue preference and preemption on jobs with least run time are both enabled, the queue preference for the job is considered first, then the job run time. Preempting jobs with the least run time By default, when more than one preemptable job exists (low-priority jobs holding the required job slots), LSF preempts a job from the least-loaded host.
PAGE 258
Configuring Preemptive Scheduling 258 Administering Platform LSF
PAGE 259
C H A P T E R 15 Specifying Resource Requirements Contents ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ “About Resource Requirements” on page 260 “Queue-level Resource Requirements” on page 261 “Job-level Resource Requirements” on page 262 “About Resource Requirement Strings” on page 263 “Selection String” on page 265 “Order String” on page 268 “Usage String” on page 269 “Span String” on page 273 “Same String” on page 274 Administering Platform LSF 259
PAGE 260
About Resource Requirements About Resource Requirements Resource requirements define which hosts a job can run on. Each job has its resource requirements. Hosts that match the resource requirements are the candidate hosts. When LSF schedules a job, it uses the load index values of all the candidate hosts. The load values for each host are compared to the scheduling conditions. Jobs are only dispatched to a host if all load values are within the scheduling thresholds.
PAGE 261
Chapter 15 Specifying Resource Requirements Queue-level Resource Requirements Each queue can define resource requirements that will be applied to all the jobs in the queue. When resource requirements are specified for a queue, and no job-level resource requirement is specified, the queue-level resource requirements become the default resource requirements for the job.
PAGE 262
Job-level Resource Requirements Job-level Resource Requirements Each job can specify resource requirements. Job-level resource requirements override any resource requirements specified in the remote task list. In some cases, the queue specification sets an upper or lower bound on a resource. If you attempt to exceed that bound, your job will be rejected. Syntax To specify resource requirements for your job, use bsub -R and specify the resource requirement string as usual.
PAGE 263
Chapter 15 Specifying Resource Requirements About Resource Requirement Strings Most LSF commands accept a -R res_req argument to specify resource requirements. The exact behaviour depends on the command. For example, specifying a resource requirement for the lsload command displays the load levels for all hosts that have the requested resources. Specifying resource requirements for the lsrun command causes LSF to select the best host out of the set of hosts that have the requested resources.
PAGE 264
About Resource Requirement Strings ◆ ◆ ◆ ◆ 264 Administering Platform LSF In a select string, a host must satisfy both queue-level and job-level requirements for the job to be dispatched. order and span sections defined at the queue level are ignored if different order and span requirements are specified at the job level. The default order string is r15s:pg. An order section defined at the queue level is ignored if different order requirements are specified at the job level.
PAGE 265
Chapter 15 Specifying Resource Requirements Selection String The selection string specifies the characteristics a host must have to match the resource requirement. It is a logical expression built from a set of resource names. The selection string is evaluated for each host; if the result is non-zero, then that host is selected. Syntax The selection string can combine resource names with logical and arithmetic operators.
PAGE 266
Selection String Operators These operators can be used in selection strings. The operators are listed in order of decreasing precedence.
PAGE 267
Chapter 15 Specifying Resource Requirements Specifying exclusive resources An exclusive resource may be used in the resource requirement string of any placement or scheduling command, such as bsub, lsplace, lsrun, or lsgrun. An exclusive resource is a special resource that is assignable to a host. This host will not receive a job unless that job explicitly requests the host.
PAGE 268
Order String Order String The order string allows the selected hosts to be sorted according to the values of resources. The values of r15s, r1m, and r15m used for sorting are the normalized load indices returned by lsload -N. The order string is used for host sorting and selection. The ordering begins with the rightmost index in the order string and proceeds from right to left.
PAGE 269
Chapter 15 Specifying Resource Requirements Usage String This string defines the expected resource usage of the job. It is used to specify resource reservations for jobs, or for mapping jobs on to hosts and adjusting the load when running interactive jobs. By default, no resources are reserved. Batch jobs The resource usage (rusage) section can be specified at the job level or with the queue configuration parameter RES_REQ. Syntax rusage[usage_string [, usage_string][|| usage_string] ...
PAGE 270
Usage String This example indicates that 50 MB memory should be reserved for the job. As the job runs, the amount reserved will decrease at approximately 0.5 MB per minute until the 100 minutes is up. How queue-level and job-level rusage sections are resolved Job-level rusage overrides the queue-level specification: ◆ ◆ ◆ For internal load indices (r15s, r1m, r15m, ut, pg, io, ls, it, tmp, swp, and mem), the job-level value cannot be larger than the queue-level value. For external load indices (e.g.
PAGE 271
Chapter 15 Specifying Resource Requirements ◆ A queue with the same resource requirements could specify: RES_REQ = rusage[mem=20, license=1:duration=2] ◆ The following job requests 20 MB of memory and 50 MB of swap space for 1 hour, and 1 license to be reserved for 2 minutes: % bsub -R "rusage[mem=20:swp=50:duration=1h, license=1:duration=2]" myjob ◆ The following job requests 50 MB of swap space, linearly decreasing the amount reserved over a duration of 2 hours, and requests 1 license to be reserved
PAGE 272
Usage String res is one of the resources whose value is returned by the lsload command. rusage[r1m=0.5:mem=20:swp=40] The above example indicates that the task is expected to increase the 1-minute run queue length by 0.5, consume 20 MB of memory and 40 MB of swap space. If no value is specified, the task is assumed to be intensive in using that resource. In this case no more than one task will be assigned to a host regardless of how many CPUs it has. The default resource usage for a task is r15s=1.
PAGE 273
Chapter 15 Specifying Resource Requirements Span String A span string specifies the locality of a parallel job. If span is omitted, LSF allocates the required processors for the job from the available set of processors. Syntax Two kinds of span string are supported: ◆ span[hosts=1] Indicates that all the processors allocated to this job must be on the same host. ◆ span[ptile=value ] Indicates the number of processors on each host that should be allocated to the job.
PAGE 274
Same String Same String You must have the parallel batch job scheduler plugin installed in order to use the same string. Parallel jobs run on multiple hosts. If your cluster has heterogeneous hosts, some processes from a parallel job may for example, run on Solaris and some on SGI IRIX. However, for performance reasons you may want all processes of a job to run on the same type of host instead of having some processes run on one type of host and others on another type of host.
PAGE 275
C H A P T E R 16 Fairshare Scheduling To configure any kind of fairshare scheduling, you should understand the following concepts: User share assignments ◆ Dynamic share priority ◆ Job dispatch order You can configure fairshare at either host level or queue level. If you require more control, you can implement hierarchical fairshare. You can also set some additional restrictions when you submit a job.
PAGE 276
Understanding Fairshare Scheduling Understanding Fairshare Scheduling By default, LSF considers jobs for dispatch in the same order as they appear in the queue (which is not necessarily the order in which they are submitted to the queue). This is called first-come, first-served (FCFS) scheduling. Fairshare scheduling divides the processing power of the LSF cluster among users and queues to provide fair access to resources.
PAGE 277
Chapter 16 Fairshare Scheduling User Share Assignments Both queue-level and host partition fairshare use the following syntax to define how shares are assigned to users or user groups. Syntax [user , number_shar es ] Enclose each user share assignment in square brackets, as shown. Separate multiple share assignments with a space between each set of square brackets. ◆ user Specify users of the queue or host partition.
PAGE 278
User Share Assignments ❖ ❖ ◆ If there are 3 users in total, the single remaining user has all 8 shares, and is almost as important as User1 and User2. If there are 12 users in total, then 10 users compete for those 8 shares, and each of them is significantly less important than User1 and User2. [User1, 10] [User2, 6] [default, 4] The relative percentage of shares held by a user will change, depending on the number of users who are granted shares by default.
PAGE 279
Chapter 16 Fairshare Scheduling Dynamic User Priority LSF calculates a dynamic user priority for individual users or for a group, depending on how the shares are assigned. The priority is dynamic because it changes as soon as any variable in formula changes. By default, a user’s dynamic priority gradually decreases after a job starts, and the dynamic priority immediately increases when the job finishes.
PAGE 280
Dynamic User Priority cpu_time The cumulative CPU time used by the user (measured in hours). LSF calculates the cumulative CPU time using the actual (not normalized) CPU time and a decay factor such that 1 hour of recently-used CPU time decays to 0.1 hours after an interval of time specified by HIST_HOURS in lsb.params (5 hours by default). run_time The total run time of running jobs (measured in hours). job_slots The number of job slots reserved and in use.
PAGE 281
Chapter 16 Fairshare Scheduling How Fairshare Affects Job Dispatch Order Within a queue, jobs are dispatched according to the queue’s scheduling policy. For FCFS queues, the dispatch order depends on the order of jobs in the queue (which depends on job priority and submission time, and can also be modified by the job owner).
PAGE 282
Host Partition User-based Fairshare Host Partition User-based Fairshare User-based fairshare policies configured at the host level handle resource contention across multiple queues. You can define a different fairshare policy for every host partition. If multiple queues use the host partition, a user has the same priority across multiple queues. To run a job on a host that has fairshare, users must have a share assignment (USER_SHARES in the HostPartition section of lsb.hosts).
PAGE 283
Chapter 16 Fairshare Scheduling Queue-level User-based Fairshare User-based fairshare policies configured at the queue level handle resource contention among users in the same queue. You can define a different fairshare policy for every queue, even if they share the same hosts. A user’s priority is calculated separately for each queue. To submit jobs to a fairshare queue, users must be allowed to use the queue (USERS in lsb.queues) and must have a share assignment (FAIRSHARE in lsb.queues).
PAGE 284
Cross-queue User-based Fairshare Cross-queue User-based Fairshare User-based fairshare policies configured at the queue level handle resource contention across multiple queues. Applying the same fairshare policy to several queues With cross-queue fairshare, the same user-based fairshare policy can apply to several queues can at the same time.
PAGE 285
Chapter 16 Fairshare Scheduling Examples % bqueues QUEUE_NAME normal short license PRIO STATUS 30 Open:Active 40 Open:Active 50 Open:Active MAX JL/U JL/P JL/H NJOBS 1 4 2 1 10 1 1 1 PEND 1 0 0 RUN 0 1 1 SUSP 0 0 0 % bqueues -l normal QUEUE: normal -- For normal low priority jobs, running only if hosts are lightly loaded. This is the default queue.
PAGE 286
Cross-queue User-based Fairshare Configuring cross-queue fairshare Considerations ◆ ◆ ◆ Steps 1 2 FAIRSHARE must be defined in the master queue. If it is also defined in the queues listed in FAIRSHARE_QUEUES, it will be ignored. Cross-queue fairshare can be defined more than once within lsb.queues. You can define several sets of master-slave queues. However, a queue cannot belong to more than one master-slave set.
PAGE 287
Chapter 16 Fairshare Scheduling Begin Queue QUEUE_NAME = queue3 PRIORITY = 50 NICE = 10 PREEMPTION = PREEMPTIVE QJOB_LIMIT = 10 UJOB_LIMIT = 1 PJOB_LIMIT = 1 End Queue Controlling job dispatch order in cross-queue fairshare DISPATCH_ORDER Use DISPATCH_ORDER=QUEUE in the master queue to define an order ed crossparameter queue fairshare set. DISPATCH_ORDER indicates that jobs are dispatched according (lsb.queues) to the order of queue priorities, not user fairshare priority.
PAGE 288
Hierarchical User-based Fairshare Hierarchical User-based Fairshare For both queue and host partitions, hierarchical user-based fairshare lets you allocate resources to users in a hierarchical manner. By default, when shares are assigned to a group, group members compete for resources according to FCFS policy. If you use hierarchical fairshare, you control the way shares that are assigned collectively are divided among group members.
PAGE 289
Chapter 16 Fairshare Scheduling ◆ ◆ ◆ ◆ ◆ ◆ Number of shares Dynamic share priority (LSF compares dynamic priorities of users who belong to same group, at the same level) Number of started jobs Number of reserved jobs CPU time, in seconds (cumulative CPU time for all members of the group, recursively) Run time, in seconds (historical and actual run time for all members of the group, recursively) Example % bhpart -r Partition1 HOST_PARTITION_NAME: Partition1 HOSTS: HostA SHARE_INFO_FOR: Partition1/ USER/GR
PAGE 290
Hierarchical User-based Fairshare Begin UserGroup GROUP_NAME GROUP_MEMBER GroupB (User1 User2) GroupC (User3 User4) GroupA (GroupB GroupC User5) End UserGroup ◆ ◆ USER_SHARES () ([User3, 3] [User4, 4]) ([User5, 1] [default, 10]) User groups must be defined before they can be used (in the GROUP_MEMBER column) to define other groups. Enclose the share assignment list in parentheses, as shown, even if you do not specify any user share assignments.
PAGE 291
Chapter 16 Fairshare Scheduling Queue-based Fairshare When a priority is set in a queue configuration, a high priority queue tries to dispatch as many jobs as it can before allowing lower priority queues to dispatch any job. Lower priority queues are blocked until the higher priority queue cannot dispatch any more jobs. However, it may be desirable to give some preference to lower priority queues and regulate the flow of jobs from the queue.
PAGE 292
Queue-based Fairshare Interaction with other scheduling policies ◆ ◆ ◆ Queues participating in a queue-based fairshare pool cannot be preemptive or preemptable. You should not configure slot reservation (SLOT_RESERVE) in queues that use queue-based fairshare. Cross-queue user-based fairshare (FAIRSHARE_QUEUES) can undo the dispatching decisions of queue-based fairshare. Cross-queue user-based fairshare queues should not be part of a queue-based fairshare pool.
PAGE 293
Chapter 16 Fairshare Scheduling Configuring Slot Allocation per Queue Configure as many pools as you need in lsb.queues. SLOT_SHARE parameter The SLOT_SHARE parameter represents the percentage of running jobs (job slots) in use from the queue. SLOT_SHARE must be greater than zero (0) and less than or equal to 100. The sum of SLOT_SHARE for all queues in the pool does not need to be 100%. It can be more or less, depending on your needs.
PAGE 294
Configuring Slot Allocation per Queue Begin Queue QUEUE_NAME = PRIORITY = SLOT_POOL = SLOT_SHARE = HOSTS = ... End Queue Begin Queue QUEUE_NAME = PRIORITY = SLOT_POOL = SLOT_SHARE = HOSTS = ... End Queue ◆ queue3 46 poolA 20 groupA The following configures a pool named poolB, with three queues with equal shares, using the hosts in host group groupB: Begin Queue QUEUE_NAME = PRIORITY = SLOT_POOL = SLOT_SHARE = HOSTS = ... End Queue Begin Queue QUEUE_NAME = PRIORITY = SLOT_POOL = SLOT_SHARE = HOSTS = ...
PAGE 295
Chapter 16 Fairshare Scheduling Viewing Queue-based Fairshare Allocations View configured job slot share Use bqueues -l to show the job slot share (SLOT_SHARE) and the hosts participating in the share pool (SLOT_POOL): QUEUE: queue1 PARAMETERS/STATISTICS PRIO NICE STATUS MAX JL/U JL/P JL/H NJOBS PEND 50 20 Open:Active 0 0 Interval for a host to accept two jobs is 0 seconds RUN SSUSP USUSP 0 0 0 RSV 0 STACKLIMIT MEMLIMIT 2048 K 5000 K SCHEDULING PARAMETERS r15s r1m loadSched loadStop - r15m - ut - pg
PAGE 296
Viewing Queue-based Fairshare Allocations poolB contains: ◆ queue4 queue5 ◆ queue6 bqueues shows the number of running jobs in each queue: ◆ % bqueues QUEUE_NAME queue1 queue2 queue3 queue4 queue5 queue6 PRIO 50 48 46 44 43 42 How to interpret ◆ the shares ◆ STATUS Open:Active Open:Active Open:Active Open:Active Open:Active Open:Active MAX JL/U JL/P JL/H NJOBS 492 500 498 985 985 985 PEND 484 495 496 980 980 980 RUN 8 5 2 5 5 5 SUSP 0 0 0 0 0 0 queue1 has a 50% share—and can run 8 jobs queue2 has
PAGE 297
Chapter 16 Fairshare Scheduling Typical Slot Allocation Scenarios 3 queues with SLOT_SHARE 50%, 30%, 20%, with 15 job slots This scenario has three phases: 1 % bqueues QUEUE_NAME Roma Verona Genova PRIO STATUS 50 Open:Active 48 Open:Active 48 Open:Active 2 % bqueues QUEUE_NAME Roma Verona Genova MAX JL/U JL/P JL/H NJOBS - 1000 995 996 PEND 992 990 994 RUN 8 5 2 SUSP 0 0 0 When queue Verona has done its work, queues Roma and Genova get their respective shares of 8 and 3.
PAGE 298
Typical Slot Allocation Scenarios The following figure illustrates phases 1, 2, and 3: 2 pools, 30 job slots, and 2 queues out of any pool ◆ poolA uses 15 slots and contains queues Roma (50% share, 8 slots), Verona (30% share, 5 slots), and Genova (20% share, 2 remaining slots to total 15). ◆ poolB with 15 slots containing queues Pisa (30% share, 5 slots), Venezia (30% share, 5 slots), and Bologna (30% share, 5 slots).
PAGE 299
Chapter 16 Fairshare Scheduling QUEUE_NAME Roma Verona Genova Pisa Milano Parma Venezia Bologna PRIO STATUS 50 Open:Active 48 Open:Active 48 Open:Active 44 Open:Active 43 Open:Active 43 Open:Active 43 Open:Active 43 Open:Active MAX JL/U JL/P JL/H NJOBS - 1000 - 1000 - 1000 - 1000 2 2 - 1000 - 1000 PEND 992 995 998 995 2 2 995 995 RUN 8 5 2 5 0 0 5 5 SUSP 0 0 0 0 0 0 0 0 When Milano and Parma have jobs, their higher priority reduces the share of slots free and in use by Venezia and Bologna: Administe
PAGE 300
Typical Slot Allocation Scenarios QUEUE_NAME Roma Verona Genova Pisa Milano Parma Venezia Bologna PRIO STATUS 50 Open:Active 48 Open:Active 48 Open:Active 44 Open:Active 43 Open:Active 43 Open:Active 43 Open:Active 43 Open:Active MAX JL/U JL/P JL/H NJOBS 992 993 996 995 10 11 995 995 PEND 984 990 994 990 7 8 995 995 RUN 8 3 2 5 3 3 2 2 SUSP 0 0 0 0 0 0 0 0 Round-robin slot distribution—13 queues and 2 pools Pool poolA has 3 hosts each with 7 slots for a total of 21 slots to be shared.
PAGE 301
Chapter 16 Fairshare Scheduling Initially, queues Livorno, Palermo, and Venezia in poolB are not assigned any slots because the first 7 higher priority queues have used all 21 slots available for allocation. As jobs run and each queue accumulates used slots, LSF favors queues that have not run jobs yet. As jobs finish in the first 7 queues of poolB, slots are redistributed to the other queues that originally had no jobs (queues Livorno, Palermo, and Venezia).
PAGE 302
Typical Slot Allocation Scenarios 3 queues in one A pool configures 3 queues: pool with 50%, ◆ queue1 50% with short-running jobs 30%, 20% shares ◆ queue2 20% with short-running jobs queue3 30% with longer running jobs As queue1 and queue2 finish their jobs, the number of jobs in queue3 expands, and as queue1 and queue2 get more work, LSF rebalances the usage: ◆ 10 queues sharing In this example, queue1 (the curve with the highest peaks) has the longer running jobs 10% each of 50 and so has less accumula
PAGE 303
Chapter 16 Fairshare Scheduling Using Historical and Committed Run Time By default, as a job is running, the dynamic priority decreases gradually until the job has finished running, then increases immediately when the job finishes. In some cases this can interfere with fairshare scheduling if two users who have the same priority and the same number of shares submit jobs at the same time.
PAGE 304
Using Historical and Committed Run Time Without the historical run time, the dynamic priority increases suddenly as soon as the job finishes running because the run time becomes zero, which gives no chance for jobs pending for other users to start.
PAGE 305
Chapter 16 Fairshare Scheduling Limitation If you use queue-level fairshare, and a running job has a committed run time, you should not switch that job to or from a fairshare queue (using bswitch). The fairshare calculations will not be correct. Run time displayed by bqueues and bhpart The run time displayed by bqueues and bhpart is the sum of the actual, accumulated run time and the historical run time, but does not include the committed run time.
PAGE 306
Using Historical and Committed Run Time When a committed run time factor is included in the priority calculation, the dynamic priority drops as soon as the job is dispatched, rather than gradually dropping as the job runs: 306 Administering Platform LSF
PAGE 307
Chapter 16 Fairshare Scheduling Users Affected by Multiple Fairshare Policies If you belong to multiple user groups, which are controlled by different fairshare policies, each group probably has a different dynamic share priority at any given time. By default, if any one of these groups becomes the highest priority user, you could be the highest priority user in that group, and LSF would attempt to place your job.
PAGE 308
Ways to Configure Fairshare Ways to Configure Fairshare Global fairshare Global fairshare balances resource usage across the entire cluster according to one single fairshare policy. Resources used in one queue affect job dispatch order in another queue. If two users compete for resources, their dynamic share priority is the same in every queue. Configuring To configure global fairshare, you must use host partition fairshare.
PAGE 309
Chapter 16 Fairshare Scheduling Example Begin HostPartition HPART_NAME = equal_share_partition HOSTS = all USER_SHARES = [default, 1] End HostPartition Priority user and static priority fairshare There are two ways to configure fairshare so that a more important user’s job always overrides the job of a less important user, regardless of resource use.
PAGE 310
Ways to Configure Fairshare Static priority fairshare Static priority fairshare assigns resources to the user with the most shares. Resource usage is ignored. If two users compete for resources, the most important user’s job always runs first. Configuring To implement static priority fairshare, edit lsb.params and set all the weighting factors used in the dynamic priority formula to 0 (zero).
PAGE 311
C H A P T E R 17 Goal-Oriented SLA-Driven Scheduling Contents ◆ ◆ ◆ ◆ “Using Goal-Oriented SLA Scheduling” on page 312 ❖ “Service-level agreements in LSF” on page 312 ❖ “Service classes” on page 312 ❖ “Service-level goals” on page 312 ❖ “How service classes perform goal-oriented scheduling” on page 313 ❖ “Submitting jobs to a service class” on page 313 ❖ “Modifying SLA jobs (bmod)” on page 314 “Configuring Service Classes for SLA Scheduling” on page 315 ❖ “User groups for service classes” on pag
PAGE 312
Using Goal-Oriented SLA Scheduling Using Goal-Oriented SLA Scheduling Goal-oriented scheduling policies help you configure your workload so that your jobs are completed on time and reduce the risk of missed deadlines. They enable you to focus on the “what and when” of your projects, not the low-level details of “how” resources need to be allocated to satisfy various workloads.
PAGE 313
Chapter 17 Goal-Oriented SLA-Driven Scheduling How service classes perform goal-oriented scheduling Goal-oriented scheduling makes use of other, lower level LSF policies like queues and host partitions to satisfy the service-level goal that the service class expresses. The decisions of a service class are considered first before any queue or host partition decisions. Limits are still enforced with respect to lower level scheduling objects like queues, hosts, and users.
PAGE 314
Using Goal-Oriented SLA Scheduling Modifying SLA jobs (bmod) Use the -sla option of bmod to modify the service class a job is attached to, or to attach a submitted job to a service class. Use bmod -slan to detach a job from a service class. For example: % bmod -sla Kyuquot 2307 Attaches job 2307 to the service class Kyuquot. % bmod -slan 2307 Detaches job 2307 from the service class Kyuquot.
PAGE 315
Chapter 17 Goal-Oriented SLA-Driven Scheduling Configuring Service Classes for SLA Scheduling Configure service classes in LSB_CONFDIR/cluster_name/configdir/lsb.serviceclasses. Each service class is defined in a ServiceClass section. Each service class section begins with the line Begin ServiceClass and ends with the line End ServiceClass.
PAGE 316
Configuring Service Classes for SLA Scheduling ◆ The service class Nanaimo defines a deadline goal that is active during the weekends and at nights.
PAGE 317
Chapter 17 Goal-Oriented SLA-Driven Scheduling Begin ServiceClass NAME = Tevere PRIORITY = 20 GOALS = [VELOCITY 100 timeWindow (9:00-17:00)] \ [DEADLINE timeWindow (17:30-8:30 5:17:30-1:8:30)] DESCRIPTION = "nine to five" End ServiceClass Administering Platform LSF 317
PAGE 318
Viewing Information about SLAs and Service Classes Viewing Information about SLAs and Service Classes Monitoring the progress of an SLA (bsla) Use bsla to display the properties of service classes configured in lsb.serviceclasses and dynamic state information for each service class. Examples ◆ One velocity goal of service class Tofino is active and on time. The other configured velocity goal is inactive.
PAGE 319
Chapter 17 Goal-Oriented SLA-Driven Scheduling GOAL: VELOCITY 8 ACTIVE WINDOW: (9:00-17:30) STATUS: Active:On time SLA THROUGHPUT: 0.00 JOBS/CLEAN_PERIOD GOAL: DEADLINE ACTIVE WINDOW: (17:30-9:00) STATUS: Inactive SLA THROUGHPUT: 0.00 JOBS/CLEAN_PERIOD NJOBS 0 ◆ PEND 0 RUN 0 SSUSP 0 USUSP 0 FINISH 0 The throughput goal of service class Inuvik is always active.
PAGE 320
Viewing Information about SLAs and Service Classes GOAL: THROUGHPUT 3 ACTIVE WINDOW: Always Open STATUS: Active:On time SLA THROUGHPUT: 4.00 JOBs/CLEAN_PERIOD OPTIMUM NUMBER OF RUNNING JOBS: 4 NJOBS 104 PEND 96 RUN 4 SSUSP 0 USUSP 0 FINISH 4 These two service classes have the following historical performance. For SLA Inuvik, bacct shows a total throughput of 8.94 jobs per hour over a period of 20.
PAGE 321
Chapter 17 Goal-Oriented SLA-Driven Scheduling Total CPU time consumed: 18.0 Average CPU time consumed: 0.2 Maximum CPU time of a job: 0.3 Minimum CPU time of a job: 0.1 Total wait time in queues: 2371955.0 Average wait time in queue:27263.8 Maximum wait time in queue:39125.0 Minimum wait time in queue: 7.0 Average turnaround time: 30596 (seconds/job) Maximum turnaround time: 44778 Minimum turnaround time: 3355 Average hog factor of a job: 0.00 ( cpu time / turnaround time ) Maximum hog factor of a job: 0.
PAGE 322
Understanding Service Class Behavior Understanding Service Class Behavior A simple deadline goal The following service class configures an SLA with a simple deadline goal with a half hour time window. Begin ServiceClass NAME = Quadra PRIORITY = 20 GOALS = [DEADLINE timeWindow (16:15-16:45)] DESCRIPTION = short window End ServiceClass Six jobs submitted with a run time of 5 minutes each will use 1 slot for the half hour time window.
PAGE 323
Chapter 17 Goal-Oriented SLA-Driven Scheduling When the finished job curve (nfinished) meets the total number of jobs curve (njobs) the deadline is met. All jobs are finished well ahead of the actual configured deadline, and the goal of the SLA was met.
PAGE 324
Understanding Service Class Behavior The following illustrates the progress of the deadline SLA Qualicum running 280 jobs overnight with random runtimes until the morning deadline. As with the simple deadline goal example, when the finished job curve (nfinished) meets the total number of jobs curve (njobs) the deadline is met with all jobs completed ahead of the configured deadline.
PAGE 325
Chapter 17 Goal-Oriented SLA-Driven Scheduling The following illustrates the progress of the velocity SLA Comox running 100 jobs with random runtimes over a 14 hour period. When an SLA is missing its goal Use the CONTROL_ACTION parameter in your service class to configure an action to be run if the SLA goal is delayed for a specified number of minutes. CONTROL_ACTION (lsb.
PAGE 326
Understanding Service Class Behavior SLA statistics files Each active SLA goal generates a statistics file for monitoring and analyzing the system. When the goal becomes inactive the file is no longer updated. The files are created in the LSB_SHAREDIR/cluster_name/logdir/SLA directory. Each file name consists of the name of the service class and the goal type. For example the file named Quadra.deadline is created for the deadline goal of the service class name Quadra. The following file named Tofino.
PAGE 327
P A R IV T Job Scheduling and Dispatch Contents ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ Chapter 18, “Resource Allocation Limits” Chapter 19, “Reserving Resources” Chapter 20, “Advance Reservation” Chapter 21, “Dispatch and Run Windows” Chapter 22, “Job Dependencies” Chapter 23, “Job Priorities” Chapter 24, “Job Requeue and Job Rerun” Chapter 25, “Job Checkpoint, Restart, and Migration” Chapter 26, “Chunk Job Dispatch” Chapter 27, “Job Arrays” Chapter 28, “Running Parallel Jobs”
PAGE 328
PAGE 329
C H A P T E R 18 Resource Allocation Limits Contents ◆ ◆ “About Resource Allocation Limits” on page 330 “Configuring Resource Allocation Limits” on page 333 Administering Platform LSF 329
PAGE 330
About Resource Allocation Limits About Resource Allocation Limits Contents ◆ ◆ ◆ ◆ “What resource allocation limits do” on page 330 “How LSF enforces limits” on page 331 “How LSF counts resources” on page 331 “Limits for resource consumers” on page 332 What resource allocation limits do By default, resource consumers like users, hosts, queues, or projects are not limited in the resources available to them for running jobs. Resource allocation limits configured in lsb.
PAGE 331
Chapter 18 Resource Allocation Limits Resource allocation limits and resource usage limits Resource allocation limits are not the same as r esource usage limits, which are enforced during job run time. For example, you set CPU limits, memory limits, and other limits that take effect after a job starts running. See Chapter 29, “Runtime Resource Usage Limits” for more information.
PAGE 332
About Resource Allocation Limits Limits for resource consumers Host groups If a limit is specified for a host group, the total amount of a resource used by all hosts in that group is counted. If a host is a member of more than one group, each job running on that host is counted against the limit for all groups to which the host belongs. Limits for users Jobs are normally queued on a first-come, first-served (FCFS) basis.
PAGE 333
Chapter 18 Resource Allocation Limits Configuring Resource Allocation Limits Contents ◆ ◆ ◆ ◆ ◆ ◆ “lsb.resources file” on page 333 “Enabling resource allocation limits” on page 334 “Configuring cluster-wide limits” on page 334 “Compatibility with pre-version 6.2 job slot limits” on page 334 “How resource allocation limits map to pre-version 6.2 job slot limits” on page 334 “Example limit configurations” on page 336 lsb.
PAGE 334
Configuring Resource Allocation Limits Enabling resource allocation limits Resource allocation limits scheduling plugin To enable resource allocation limits in your cluster, configure the resource allocation limits scheduling plugin schmod_limit in lsb.modules. Configuring lsb.
PAGE 335
Chapter 18 Resource Allocation Limits ◆ ◆ ◆ TMP LICENSE RESOURCE How conflicting limits are resolved Similar conflicting For similar limits configured in lsb.resources, lsb.users, lsb.hosts, or limits lsb.queues, the most restrictive limit is used. For example, a slot limit of 3 for all users is configured in lsb.resources: Begin Limit NAME = user_limit1 USERS = all SLOTS = 3 End Limit This is similar, but not equivalent to an existing MAX_JOBS limit of 2 is configured in lsb.users.
PAGE 336
Configuring Resource Allocation Limits Only one job (827) remains pending because the more restrictive limit of 3 in lsb.resources is enforced: % bjobs -p JOBID USER STAT QUEUE FROM_HOST JOB_NAME 827 user1 PEND normal hostA sleep 1000 Resource (slot) limit defined cluster-wide has been reached; SUBMIT_TIME Jan 22 16:38 Equivalent New limits in lsb.resources that are equivalent to existing limits in lsb.users, conflicting limits lsb.hosts, or lsb.
PAGE 337
Chapter 18 Resource Allocation Limits Begin Limit HOSTS license1 license1 license1 End Limit SLOTS 10 30 MEM 200 300 PER_QUEUE normal short (all ~normal ~short) Example 4 All users in user group ugroup1 except user1 using queue1 and queue2 and running jobs on hosts in host group hgroup1 are limited to 2 job slots per processor on each host: Begin Limit NAME = limit1 # Resources: SLOTS_PER_PROCESSOR = 2 #Consumers: QUEUES = queue1 queue2 USERS = ugroup1 ~user1 PER_HOST = hgroup1 End Limit Example 5 use
PAGE 338
Configuring Resource Allocation Limits Example 8 Limit users in the develop group to 1 job on each host, and 50% of the memory on the host. Begin Limit NAME = develop_group_limit # Resources: SLOTS = 1 MEM = 50% #Consumers: USERS = develop PER_HOST = all End Limit Example 9 Limit software license lic1, with quantity 100, where user1 can use 90 licenses and all other users are restricted to 10.
PAGE 339
Chapter 18 Resource Allocation Limits Viewing Information about Resource Allocation Limits Your job may be pending because some configured resource allocation limit has been reached. Use the blimits command to show the dynamic counters of resource allocation limits configured in Limit sections in lsb.resources. blimits displays the current resource usage to show what limits may be blocking your job.
PAGE 340
Viewing Information about Resource Allocation Limits SWP = 50% MEM = 10% End Limit Begin Limit NAME = limit_ext1 PER_HOST = all RESOURCE = ([user1_num,30] [hc_num,20]) End Limit blimits displays the following: % blimits INTERNAL RESOURCE LIMITS: NAME limit1 limit1 limit1 USERS user1 user1 user1 QUEUES q2 q3 q4 HOSTS hostA hostA hostC PROJECTS - SLOTS - MEM 10/25 - TMP 30/2953 40/590 SWP 10/258 - EXTERNAL RESOURCE LIMITS: NAME limit_ext1 limit_ext1 USERS ◆ ◆ 340 Administering Platform LSF QUEUE
PAGE 341
C H A P T E 19 R Reserving Resources Contents ◆ ◆ ◆ ◆ ◆ “About Resource Reservation” on page 342 “Using Resource Reservation” on page 343 “Memory Reservation for Pending Jobs” on page 345 “Time-based Slot Reservation” on page 348 “Viewing Resource Reservation Information” on page 355 Administering Platform LSF 341
PAGE 342
About Resource Reservation About Resource Reservation When a job is dispatched, the system assumes that the resources that the job consumes will be reflected in the load information. However, many jobs do not consume the resources they require when they first start. Instead, they will typically use the resources over a period of time. For example, a job requiring 100 MB of swap is dispatched to a host having 150 MB of available swap.
PAGE 343
Chapter 19 Reserving Resources Using Resource Reservation Queue-level resource reservation At the queue level, resource reservation allows you to specify the amount of resources to reserve for jobs in the queue. It also serves as the upper limits of resource reservation if a user also specifies it when submitting a job. Queue-level resource reservation and pending reasons The use of RES_REQ affects the pending reasons as displayed by bjobs.
PAGE 344
Using Resource Reservation Configuring per-resource reservation To enable greater flexibility for reserving numeric resources are reserved by jobs, configure the ReservationUsage section in lsb.resources to reserve resources like license tokes per resource as PER_JOB, PER_SLOT, or PER_HOST. For example: Begin ReservationUsage RESOURCE METHOD licenseX PER_JOB licenseY PER_HOST licenseZ PER_SLOT End ReservationUsage Only user-defined numeric resources can be reserved.
PAGE 345
Chapter 19 Reserving Resources Memory Reservation for Pending Jobs About memory reservation for pending jobs By default, the rusage string reserves resources for running jobs. Because resources are not reserved for pending jobs, some memory-intensive jobs could be pending indefinitely because smaller jobs take the resources immediately before the larger jobs can start running. The more memory a job requires, the worse the problem is.
PAGE 346
Memory Reservation for Pending Jobs See “Examples” on page 346 for examples of jobs that use memory reservation. How memory reservation for pending jobs works Amount of The amount of memory reserved is based on the currently available memory when the memory reserved job is pending. For example, if LIM reports that a host has 300 MB of memory available, the job submitted by the following command: % bsub -R "rusage[mem=400]" -q reservation my_job will be pending and reserve the 300 MB of available memory.
PAGE 347
Chapter 19 Reserving Resources Submitting a third job with same requirements will reserve one job slot, and reserve all free memory, if the amount of free memory is between 20 MB and 200 MB (some free memory may be used by the operating system or other software.
PAGE 348
Time-based Slot Reservation Time-based Slot Reservation Existing LSF slot reservation works in simple environments, where host-based MXJ limit is only constraint to job slot request. In complex environments, where more than one constraints exist, for example job topology or generic slot limit: Estimated job start time becomes inaccurate The scheduler makes a reservation decision that can postpone estimated job start time or decrease cluster utilization.
PAGE 349
Chapter 19 Reserving Resources Time-based reservation and greedy reservation compared Start time prediction Time-based reservation Greedy reservation Backfill scheduling if free slots are available Correct with no job topology Correct for job topology requests Correct based on resource allocation limits Correct for memory requests When no slots are free for reservation Future allocation and reservation based on earliest start time bjobs displays best estimate bjobs displays predicted future allocation A
PAGE 350
Time-based Slot Reservation Configuring time-based slot reservation Greedy slot reservation is the default slot reservation mechanism and time-based slot reservation is disabled. LSB_TIME_RESERVE_NUMJOBS (lsf.conf) Use LSB_TIME_RESERVE_NUMJOBS=maximum_r eser vation_jobs in lsf.conf to enable time-based slot reservation. The value must be a positive integer. LSB_TIME_RESERVE_NUMJOBS controls maximum number of jobs using timebased slot reservation.
PAGE 351
Chapter 19 Reserving Resources 3 Job4 finishes, freeing 4 more CPUs for future allocation, 2 on host A, and 2 on host C: A B C D Job3 Job3 Job3 Job3 Future available J Current Inuse Job3 Job3 4 Job1 Current available Job1 Job1 finishes, freeing 2 more CPUs for future allocation, 1 on host C, and 1 host D: A B C D Job3 Job3 Job3 Job3 Future available J Current Inuse Job3 Current available Job3 5 Job5 can now be placed with 2 CPUs on host A, 2 CPUs on host C, and 2 CPUs on host
PAGE 352
Time-based Slot Reservation ◆ If preemptive scheduling is used, the estimated start time may not be accurate. The scheduler may calculate and estimated time, but actually it may preempt other jobs to start earlier. Slot limit The following slot limits are enforced: enforcement ◆ Slot limits configured in lsb.resources (SLOTS, PER_SLOT) ◆ MXJ, JL/U in lsb.hosts ◆ PJOB_LIMIT, HJOB_LIMIT, QJOB_LIMIT, UJOB_LIMIT in lsb.queues Memory request To request memory resources, configure RESOURCE_RESERVE in lsb.
PAGE 353
Chapter 19 Reserving Resources To minimize the overhead in recalculating the predicted start times to include previously skipped jobs, you should configure a small value for LSB_TIME_RESERVE_NUMJOBS in lsf.conf. Reservation scenarios Scenario 1 Even though no running jobs finish and no host status in cluster are changed, a job’s future allocation may still change from time to time.
PAGE 354
Time-based Slot Reservation Examples Example 1 Three hosts, 4 CPUs each: qat24, qat25, and qat26. Job 11895 uses 4 slots on qat24 (10 hours). Job 11896 uses 4 slots on qat25 (12 hours), and job 11897 uses 2 slots on qat26 (9 hours). Job 11898 is submitted and requests -n 6 -R "span[ptile=2]". % bjobs -l 11898 Job <11898>, User , Project , Status , Queue , Job Priority <50>, Command .. RUNLIMIT 840.
PAGE 355
Chapter 19 Reserving Resources Viewing Resource Reservation Information Viewing host-level resource information (bhosts) Use bhosts -l to show the amount of resources reserved on each host. In the following example, 143 MB of memory is reserved on hostA, and no memory is currently available on the host. $ bhosts -l hostA HOST hostA STATUS CPUF JL/U DOW ok 20.00 CURRENT LOAD USED r15s p mem Total 1.5 M 0M Reserved 0.
PAGE 356
Viewing Resource Reservation Information $ bsub -m hostA -n 2 -q reservation -R"rusage[mem=60]" sleep 8888 Job <3> is submitted to queue .
PAGE 357
C H A P T E 20 R Advance Reservation Contents ◆ ◆ ◆ “Understanding Advance Reservations” on page 358 “Configuring Advance Reservation” on page 359 “Using Advance Reservation” on page 361 Administering Platform LSF 357
PAGE 358
Understanding Advance Reservations Understanding Advance Reservations Advance reservations ensure access to specific hosts during specified times. An advance reservation is essentially a lock on a number of processors. Each reservation consists of the number of processors to reserve, a list of hosts for the reservation, a start time, an end time, and an owner. You can also specify a resource requirement string instead of or in addition to a list of hosts.
PAGE 359
Chapter 20 Advance Reservation Configuring Advance Reservation Advance reservation plugin To enable advance reservation in your cluster, configure the advance reservation scheduling plugin schmod_advrsv in lsb.modules. Configuring lsb.
PAGE 360
Configuring Advance Reservation ◆ All users in user group ugroup1 except user1 can make advance reservations on any host in hgroup1, except hostB, between 10:00 p.m. and 6:00 a.m. every day: Begin ResourceReservation NAME = nightPolicy USERS = ugroup1 ~user1 HOSTS = hgroup1 ~hostB TIME_WINDOW = 20:00-8:00 End ResourceReservation The not operator (~) does not exclude LSF administrators from the policy. USER_ADVANCE_RESERVATION is obsolete (lsb.params) USER_ADVANCE_RESERVATION in lsb.
PAGE 361
Chapter 20 Advance Reservation Using Advance Reservation Advance reservation commands Use the following commands to work with advance reservations: brsvadd Add a reservation brsvdel Delete a reservation brsvs View reservations Adding reservations By default, only LSF administrators or root can add or delete advance reservations. brsvadd command Use brsvadd to create new advance reservations.
PAGE 362
Using Advance Reservation Adding a one-time reservation brsvadd -b and -e Use the -b and -e options of brsvadd to specify the begin time and end time of a one- time advance reservation. One-time reservations are useful for dedicating hosts to a specific user or group for critical projects.
PAGE 363
Chapter 20 Advance Reservation [day:]hour[:minute] with the following ranges: day of the week: 0-6 ◆ hour : 0-23 ◆ minute: 0-59 Specify a time window one of the following ways: ◆ hour -hour hour :minute -hour :minute ◆ day :hour :minute -day :hour :minute You must specify at least the hour. Day of the week and minute are optional. Both the start time and end time values must use the same syntax. If you do not specify a minute, LSF assumes the first minute of the hour (:00).
PAGE 364
Using Advance Reservation Examples ◆ The following command creates a one-time open advance reservation for 1024 processors on a host of any type for user user1 between 6:00 a.m. and 8:00 a.m. today: % brsvadd -o -n 1024 -R "type==any" -u user1 -b 6:0 -e 8:0 Reservation "user1#1" is created ◆ The following command creates an open advance reservation for 1024 processors on hostB for user user3 every weekday from 12:00 noon to 2:00 p.m.
PAGE 365
Chapter 20 Advance Reservation % brsvs -p all RSVID TYPE user1#0 user user2#0 user groupA#0 group USER user1 user2 groupA NCPUS 1024 1024 2048 system#0 system 1024 sys RSV_HOSTS hostA:1024 hostA:1024 hostA:1024 hostB:1024 hostA:1024 TIME_WINDOW 11/12/6/0-11/12/8/0 12:0-14:0 * 3:0:0-3:3:0 * 5:18:0-5:20:0 * HOST: hostA (MAX = 1024) Week: 11/11/2006 - 11/17/2006 Hour:Min Sun Mon Tue Wed Thu Fri Sat ------------------------------------------------------------------0:0 0 0 0 1024 0 0 0 0:10 0 0 0 1024
PAGE 366
Using Advance Reservation 17:40 17:50 18:0 18:10 18:20 ... 19:30 19:40 19:50 20:0 20:10 20:20 ...
PAGE 367
Chapter 20 Advance Reservation Reservation Type: OPEN EXIT Jobs: 716 RSVID groupA#0 TYPE group USER groupA NCPUS 2048 RSV_HOSTS hostA:1024 hostB:1024 TIME_WINDOW 3:0:0-3:3:0 * USER system NCPUS 1024 RSV_HOSTS hostA:1024 TIME_WINDOW 5:18:0-5:20:0 * Reservation Type: OPEN EXIT Jobs: 717 PEND Jobs: 718 719 RUN Jobs: 720 RSVID system#0 TYPE sys Reservation Type: CLOSED RUN Jobs: 721 SUSP Jobs: 722 bjobs command Use bjobs -l to show the reservation ID used by a job: % bjobs -l Job <1152>, User
PAGE 368
Using Advance Reservation Jobs referencing the reservation are killed when the reservation expires. LSF administrators can prevent running jobs from being killed when the reservation expires by changing the termination time of the job using the reservation (bmod -t) before the reservation window closes. Modifying jobs Administrators can use the -U option of bmod to change a job to another reservation (bmod -U) ID. For example: % bmod -U user1#0 1234 To cancel the reservation, use the -Un option of bmod.
PAGE 369
Chapter 20 Advance Reservation At 15:50, both Job 122 in reservation user#17 and Job 245 in reservation user#18 are suspended. At this point, the existing scheduling policies determine which job runs; in this example, Job 122 in reservation user#17 runs. IMPORTANT bmod -t will not change the termination time of a pending job. Job resource A job using a reservation is subject to all job resource usage limits.
PAGE 370
Using Advance Reservation Example % bacct -U user1#2 Accounting for: - advanced reservation IDs: user1#2, - advanced reservations created by user1, ----------------------------------------------------------------------------RSVID TYPE CREATOR USER NCPUS RSV_HOSTS TIME_WINDOW user1#2 user user1 user1 1 hostA:1 9/16/17/369/16/17/38 SUMMARY: Total number of jobs: 4 Total CPU time consumed: 0.5 second Maximum memory of a job: 4.2 MB Maximum swap of a job: 5.
PAGE 371
C H A P T E R 21 Dispatch and Run Windows Contents ◆ ◆ ◆ “Dispatch and Run Windows” on page 372 “Run Windows” on page 373 “Dispatch Windows” on page 374 Administering Platform LSF 371
PAGE 372
Dispatch and Run Windows Dispatch and Run Windows Both dispatch and run windows are time windows that control when LSF jobs start and run. ◆ ◆ ◆ ◆ ◆ 372 Administering Platform LSF Dispatch windows can be defined in lsb.hosts. Dispatch and run windows can be defined in lsb.queues. Hosts can only have dispatch windows. Queues can have dispatch windows and run windows. Both windows affect job starting; only run windows affect the stopping of jobs.
PAGE 373
Chapter 21 Dispatch and Run Windows Run Windows Queues can be configured with a run window, which specifies one or more time periods during which jobs in the queue are allowed to run. Once a run window is configured, jobs in the queue cannot run outside of the run window. Jobs can be submitted to a queue at any time; if the run window is closed, the jobs remain pending until it opens again. If the run window is open, jobs are placed and dispatched as usual.
PAGE 374
Dispatch Windows Dispatch Windows Queues can be configured with a dispatch window, which specifies one or more time periods during which jobs are accepted. Hosts can be configured with a dispatch window, which specifies one or more time periods during which jobs are allowed to start. Once a dispatch window is configured, LSF cannot dispatch jobs outside of the window. By default, no dispatch windows are configured (the windows are always open).
PAGE 375
C H A P T E 22 R Job Dependencies Contents ◆ ◆ “Job Dependency Scheduling” on page 376 “Dependency Conditions” on page 378 Administering Platform LSF 375
PAGE 376
Job Dependency Scheduling Job Dependency Scheduling About job dependency scheduling Sometimes, whether a job should start depends on the result of another job. For example, a series of jobs could process input data, run a simulation, generate images based on the simulation output, and finally, record the images on a high-resolution film output device. Each step can only be performed after the previous step finishes successfully, and all subsequent steps must be aborted if any step fails.
PAGE 377
Chapter 22 Job Dependencies ◆ In the job name, specify the wildcard character (*) at the end of a string, to indicate all jobs whose name begins with the string. For example, if you use jobA* as the job name, it specifies jobs named jobA, jobA1, jobA_test, jobA.log, etc. Multiple jobs with the same name By default, if you use the job name to specify a dependency condition, and more than one of your jobs has the same name, all of your jobs that have that name must satisfy the test.
PAGE 378
Dependency Conditions Dependency Conditions The following dependency conditions can be used with any job: ◆ done(job_ID | "job_name ") ◆ ended(job_ID | "job_name ") ◆ exit(job_ID [,[op] exit_code]) ◆ exit("job_name "[,[op] exit_code]) ◆ external(job_ID | "job_name ", "status_text ") ◆ job_ID | "job_name " post_done(job_ID | "job_name ") post_err(job_ID | "job_name ") started(job_ID | "job_name ") ◆ ◆ ◆ done Syntax done(job_ID | "job_name ") Description The job state is DONE.
PAGE 379
Chapter 22 Job Dependencies external Syntax external(job_ID | "job_name ", "status_text ") Specify the first word of the job status or message description (no spaces). Only the first word is evaluated. Description The job has the specified job status, or the text of the job’s status begins with the specified word.
PAGE 380
Dependency Conditions The submitted job will not start until the job with the job ID of 312 has completed successfully, and either the job named Job2 has started, or the job named 99Job has terminated abnormally. ◆ -w '"210"' The submitted job will not start unless the job named 210 is finished. The numeric job name should be doubly quoted, since the UNIX shell treats -w "210" the same as -w 210, which would evaluate the job with the job ID of 210.
PAGE 381
C H A P T E R 23 Job Priorities Contents ◆ ◆ “User-Assigned Job Priority” on page 382 “Automatic Job Priority Escalation” on page 384 Administering Platform LSF 381
PAGE 382
User-Assigned Job Priority User-Assigned Job Priority User-assigned job priority provides controls that allow users to order their jobs in a queue. Job order is the first consideration to determine job eligibility for dispatch. Jobs are still subject to all scheduling policies regardless of job priority. Jobs with the same priority are ordered first come first served. The job owner can change the priority of their own jobs. LSF and queue administrators can change the priority of all jobs in a queue.
PAGE 383
Chapter 23 Job Priorities LSF and queue administrators can specify priorities beyond MAX_USER_PRIORITY. ◆ -spn Sets the job priority to the default priority of MAX_USER_PRIORITY/2 (displayed by bparams -l). Viewing job priority information Use the following commands to view job history, the current status and system configurations: bhist -l job_ID Displays the history of a job including changes in job priority. bjobs -l [job_ID] Displays the current job priority and the job priority at submission time.
PAGE 384
Automatic Job Priority Escalation Automatic Job Priority Escalation Automatic job priority escalation automatically increases job priority of jobs that have been pending for a specified period of time. User-assigned job priority (see “UserAssigned Job Priority” on page 382) must also be configured. As long as a job remains pending, LSF will automatically increase the job priority beyond the maximum priority specified by MAX_USER_PRIORITY.
PAGE 385
C H A P T E R 24 Job Requeue and Job Rerun Contents ◆ ◆ ◆ ◆ ◆ ◆ “About Job Requeue” on page 386 “Automatic Job Requeue” on page 387 “Reverse Requeue” on page 388 “Exclusive Job Requeue” on page 389 “User-Specified Job Requeue” on page 390 “Automatic Job Rerun” on page 391 Administering Platform LSF 385
PAGE 386
About Job Requeue About Job Requeue A networked computing environment is vulnerable to any failure or temporary conditions in network services or processor resources. For example, you might get NFS stale handle errors, disk full errors, process table full errors, or network connectivity problems. Your application can also be subject to external conditions such as a software license problems, or an occasional failure due to a bug in your application.
PAGE 387
Chapter 24 Job Requeue and Job Rerun Automatic Job Requeue About automatic job requeue You can configure a queue to automatically requeue a job if it exits with a specified exit value. ◆ ◆ ◆ ◆ The job is requeued to the head of the queue from which it was dispatched, unless the LSB_REQUEUE_TO_BOTTOM parameter in lsf.conf is set. When a job is requeued, LSF does not save the output from the failed run. When a job is requeued, LSF does not notify the user by sending mail.
PAGE 388
Reverse Requeue Reverse Requeue About reverse requeue By default, if you use automatic job requeue, jobs are requeued to the head of a queue. You can have jobs requeued to the bottom of a queue instead. The job priority does not change. Configuring reverse requeue You must already use automatic job requeue (REQUEUE_EXIT_VALUES in lsb.queues). To configure reverse requeue: 1 2 Set LSB_REQUEUE_TO_BOTTOM in lsf.conf to 1.
PAGE 389
Chapter 24 Job Requeue and Job Rerun Exclusive Job Requeue About exclusive job requeue You can configure automatic job requeue so that a failed job is not rerun on the same host. Limitations ◆ ◆ ◆ If mbatchd is restarted, this feature might not work properly, since LSF forgets which hosts have been excluded. If a job ran on a host and exited with an exclusive exit code before mbatchd was restarted, the job could be dispatched to the same host again after mbatchd is restarted.
PAGE 390
User-Specified Job Requeue User-Specified Job Requeue About user-specified job requeue You can use brequeue to kill a job and requeue it. When the job is requeued, it is assigned the PEND status and the job’s new position in the queue is after other jobs of the same priority. Requeuing a job To requeue one job, use brequeue. ◆ ◆ ◆ You can only use brequeue on running (RUN), user-suspended (USUSP), or system-suspended (SSUSP) jobs. Users can only requeue their own jobs.
PAGE 391
Chapter 24 Job Requeue and Job Rerun Automatic Job Rerun Job requeue vs. job rerun Automatic job requeue occurs when a job finishes and has a specified exit code (usually indicating some type of failure). Automatic job rerun occurs when the execution host becomes unavailable while a job is running. It does not occur if the job itself fails. About job rerun When a job is rerun or restarted, it is first returned to the queue from which it was dispatched with the same options as the original job.
PAGE 392
Automatic Job Rerun 392 Administering Platform LSF
PAGE 393
C H A P T E R 25 Job Checkpoint, Restart, and Migration Contents ◆ ◆ ◆ ◆ ◆ “Checkpointing Jobs” on page 394 “Approaches to Checkpointing” on page 395 “Creating Custom echkpnt and erestart for Application-level Checkpointing” on page 396 “Restarting Checkpointed Jobs” on page 405 “Migrating Jobs” on page 406 Administering Platform LSF 393
PAGE 394
Checkpointing Jobs Checkpointing Jobs Checkpointing a job involves capturing the state of an executing job, the data necessary to restart the job, and not wasting the work done to get to the current stage. The job state information is saved in a checkpoint file. There are many reasons why you would want to checkpoint a job. Fault tolerance To provide job fault tolerance, checkpoints are taken at regular intervals (periodically) during the job’s execution.
PAGE 395
Chapter 25 Job Checkpoint, Restart, and Migration Approaches to Checkpointing LSF provides support for most checkpoint and restart implementations through uniform interfaces, echkpnt and erestart. All interaction between LSF and the checkpoint implementations are handled by these commands. See the echkpnt(8) and erestart(8) man pages for more information.
PAGE 396
Creating Custom echkpnt and erestart for Application-level Checkpointing Creating Custom echkpnt and erestart for Application-level Checkpointing Different applications may have different checkpointing implementations and custom echkpnt and erestart programs. You can write your own echkpnt and erestart programs to checkpoint your specific applications and tell LSF which program to use for which application.
PAGE 397
Chapter 25 Job Checkpoint, Restart, and Migration stderr and stdout are ignored by LSF. You can save these to a file by setting LSB_ECHKPNT_KEEP_OUTPUT=y in lsf.conf or as an environment variable. Return values for erestart.method_name erestart.method_name creates the file checkpoint_dir/$LSB_JOBID/.
PAGE 398
Creating Custom echkpnt and erestart for Application-level Checkpointing % bsub -k "mydir method=myapp" job1 2 Copy your echkpnt.method_name and erestart.method_name to LSF_SERVERDIR. OR If you want to specify a different directory than LSF_SERVERDIR, in lsf.conf or as an environment variable set LSB_ECHKPNT_METHOD_DIR= absolute path to the directory in which your echkpnt.method_name and erestart.method_name are located.
PAGE 399
Chapter 25 Job Checkpoint, Restart, and Migration Checkpointing a Job Before LSF can checkpoint a job, it must be made checkpointable. LSF provides automatic and manual controls to make jobs checkpointable and to checkpoint jobs. When working with checkpointable jobs, a checkpoint directory must always be specified. Optionally, a checkpoint period can be specified to enable periodic checkpointing.
PAGE 400
The Checkpoint Directory The Checkpoint Directory A checkpoint directory must be specified for every checkpointable job and is used to store the files to restart a job. The directory must be writable by the job owner. To restart the job on another host (job migration), the directory must be accessible by both hosts. LSF does not delete the checkpoint files; checkpoint file maintenance is the user’s responsibility.
PAGE 401
Chapter 25 Job Checkpoint, Restart, and Migration Making Jobs Checkpointable Making a job checkpointable involves specifying the location of a checkpoint directory to LSF. This can be done manually on the command line or automatically through configuration. Manually Manually making a job checkpointable involves specifying the checkpoint directory on the command line. LSF will create the directory if it does not exist. A job can be made checkpointable at job submission or after submission.
PAGE 402
Manually Checkpointing Jobs Manually Checkpointing Jobs LSF provides the bchkpnt command to manually checkpoint jobs. LSF can only perform a checkpoint for checkpointable jobs as described in “Making Jobs Checkpointable” on page 401. For example, to checkpoint a job with job ID 123: % bchkpnt 123 Job <123> is being checkpointed Interactive jobs (bsub -I) cannot be checkpointed. Checkpointing and killing a job By default, after a job has been successfully checkpointed, it continues to run.
PAGE 403
Chapter 25 Job Checkpoint, Restart, and Migration Enabling Periodic Checkpointing Periodic checkpointing involves creating a checkpoint file at regular time intervals during the execution of your job. LSF provides the ability to enable periodic checkpointing manually on the command line and automatically through configuration. Automatic periodic checkpointing is discussed in “Automatically Checkpointing Jobs” on page 404.
PAGE 404
Automatically Checkpointing Jobs Automatically Checkpointing Jobs Automatically checkpointing jobs involves submitting a job to a queue that is configured for periodic checkpointing. To configure a queue, edit lsb.queues and specify a checkpoint directory and a checkpoint period for the CHKPNT parameter for a queue. The checkpoint directory must already exist, LSF will not create the directory. The checkpoint period is specified in minutes.
PAGE 405
Chapter 25 Job Checkpoint, Restart, and Migration Restarting Checkpointed Jobs LSF can restart a checkpointed job on a host other than the original execution host using the information saved in the checkpoint file to recreate the execution environment. Only jobs that have been checkpointed successfully can be restarted from a checkpoint file.
PAGE 406
Migrating Jobs Migrating Jobs Migration is the process of moving a checkpointable or rerunnable job from one host to another host. Checkpointing enables a migrating job to make progress by restarting it from its last checkpoint. Rerunnable non-checkpointable jobs are restarted from the beginning. LSF provides the ability to manually migrate jobs from the command line and automatically through configuration.
PAGE 407
Chapter 25 Job Checkpoint, Restart, and Migration Automatically Migrating Jobs Automatic job migration works on the premise that if a job is suspended (SSUSP) for an extended period of time, due to load conditions or any other reason, the execution host is heavily loaded. To allow the job to make progress and to reduce the load on the host, a migration threshold is configured. LSF allows migration thresholds to be configured for queues and hosts. The threshold is specified in minutes.
PAGE 408
Migrating Jobs 408 Administering Platform LSF
PAGE 409
C H A P T E 26 R Chunk Job Dispatch Contents ◆ ◆ ◆ “About Job Chunking” on page 410 “Configuring a Chunk Job Dispatch” on page 411 “Submitting and Controlling Chunk Jobs” on page 413 Administering Platform LSF 409
PAGE 410
About Job Chunking About Job Chunking LSF supports job chunking, where jobs with similar resource requirements submitted by the same user are grouped together for dispatch. The CHUNK_JOB_SIZE parameter in lsb.queues specifies the maximum number of jobs allowed to be dispatched together in a chunk job.
PAGE 411
Chapter 26 Chunk Job Dispatch Configuring a Chunk Job Dispatch CHUNK_JOB_SIZE (lsb.queues) To configure a queue to dispatch chunk jobs, specify the CHUNK_JOB_SIZE parameter in the queue definition in lsb.queues. For example, the following configures a queue named chunk, which dispatches up to 4 jobs in a chunk: Begin Queue QUEUE_NAME = chunk PRIORITY = 50 CHUNK_JOB_SIZE = 4 End Queue After adding CHUNK_JOB_SIZE to lsb.queues, use badmin reconfig to reconfigure your cluster.
PAGE 412
Configuring a Chunk Job Dispatch If CHUNK_JOB_DURATION is set in lsb.
PAGE 413
Chapter 26 Chunk Job Dispatch Submitting and Controlling Chunk Jobs When a job is submitted to a queue configured with the CHUNK_JOB_SIZE parameter, LSF attempts to place the job in an existing chunk. A job is added to an existing chunk if it has the same characteristics as the first job in the chunk: Submitting user Resource requirements ◆ Host requirements ◆ Queue If a suitable host is found to run the job, but there is no chunk available with the same characteristics, LSF creates a new chunk.
PAGE 414
Submitting and Controlling Chunk Jobs Action (Command) Job State Effect on Job (State) Migrate (bmig) Switch queue (bswitch) WAIT RUN RUN Removed from chunk Job is removed from the chunk and switched; all other WAIT jobs are requeued to PEND Only the WAIT job is removed from the chunk and switched, and requeued to PEND Job is checkpointed normally PEND Removed from the chunk to be scheduled later WAIT Checkpoint (bchkpnt) Modify (bmod) Migrating jobs with bmig will change the dispatch sequence of
PAGE 415
C H A P T E R Ë 27 Job Arrays LSF provides a structure called a job array that allows a sequence of jobs that share the same executable and resource requirements, but have different input files, to be submitted, controlled, and monitored as a single unit. Using the standard LSF commands, you can also control and monitor individual jobs and groups of jobs submitted from a job array. After the job array is submitted, LSF independently schedules and dispatches the individual jobs.
PAGE 416
Creating a Job Array Creating a Job Array A job array is created at job submission time using the -J option of bsub. For example, the following command creates a job array named myArray made up of 1000 jobs. % bsub -J "myArray[1-1000]" myJob Job <123> is submitted to default queue . Syntax The bsub syntax used to create a job array follows: % bsub -J "arrayName[indexList, ...]" myJob Where: -J "arrayName[indexList, ...]" Names and creates the job array.
PAGE 417
Chapter 27 Job Arrays Maximum size of a job array A large job array allows a user to submit a large number of jobs to the system with a single job submission. By default, the maximum number of jobs in a job array is 1000, which means the maximum size of a job array can never exceed 1000 jobs. To make a change to the maximum job array value, set MAX_JOB_ARRAY_SIZE in lsb.params to any positive integer between 1 and 2147483646.
PAGE 418
Handling Input and Output Files Handling Input and Output Files LSF provides methods for coordinating individual input and output files for the multiple jobs created when submitting a job array. These methods require your input files to be prepared uniformly. To accommodate an executable that uses standard input and standard output, LSF provides runtime variables (%I and %J) that are expanded at runtime.
PAGE 419
Chapter 27 Job Arrays Redirecting Standard Input and Output The variables %I and %J are used as substitution strings to support file redirection for jobs submitted from a job array. At execution time, %I is expanded to provide the job array index value of the current job, and %J is expanded at to provide the job ID of the job array. Standard input Use the -i option of bsub and the %I variable when your executable reads from standard input.
PAGE 420
Passing Arguments on the Command Line Passing Arguments on the Command Line The environment variable LSB_JOBINDEX is used as a substitution string to support passing job array indices on the command line. When the job is dispatched, LSF sets LSB_JOBINDEX in the execution environment to the job array index of the current job. LSB_JOBINDEX is set for all jobs. For non-array jobs, LSB_JOBINDEX is set to zero (0).
PAGE 421
Chapter 27 Job Arrays Job Array Dependencies Like all jobs in LSF, a job array can be dependent on the completion or partial completion of a job or another job array. A number of job-array-specific dependency conditions are provided by LSF. Whole array dependency To make a job array dependent on the completion of a job or another job array use the -w "dependency_condition" option of bsub.
PAGE 422
Monitoring Job Arrays Monitoring Job Arrays Use bjobs and bhist to monitor the current and past status of job arrays. Job array status To display summary information about the currently running jobs submitted from a job array, use the -A option of bjobs.
PAGE 423
Chapter 27 Job Arrays Specific job status Current To display the current status of a specific job submitted from a job array, specify in quotes, the job array job ID and an index value with bjobs.
PAGE 424
Controlling Job Arrays Controlling Job Arrays You can control the whole array, all the jobs submitted from the job array, with a single command. LSF also provides the ability to control individual jobs and groups of jobs submitted from a job array. When issuing commands against a job array, use the job array job ID instead of the job array name. Job names are not unique in LSF, and issuing a command using a job array name may result in unpredictable behavior.
PAGE 425
Chapter 27 Job Arrays Requeuing a Job Array Use brequeue to requeue a job array. When the job is requeued, it is assigned the PEND status and the job’s new position in the queue is after other jobs of the same priority. You can requeue: Jobs in DONE job state Jobs EXIT job state ◆ All jobs regardless of job state in a job array. ◆ EXIT, RUN, DONE jobs to PSUSP state ◆ Jobs in RUN job state brequeue is not supported across clusters.
PAGE 426
Job Array Job Slot Limit Job Array Job Slot Limit The job array job slot limit is used to specify the maximum number of jobs submitted from a job array that are allowed to run at any one time. A job array allows a large number of jobs to be submitted with one command, potentially flooding a system, and job slot limits provide a way to limit the impact a job array may have on a system.
PAGE 427
C H A P T E R 28 Running Parallel Jobs Contents ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ “How LSF Runs Parallel Jobs” on page 428 “Preparing Your Environment to Submit Parallel Jobs to LSF” on page 429 “Submitting Parallel Jobs” on page 430 “Starting Parallel Tasks with LSF Utilities” on page 431 “Job Slot Limits For Parallel Jobs” on page 432 “Specifying a Minimum and Maximum Number of Processors” on page 433 “Specifying a Mandatory First Execution Host” on page 434 “Controlling Processor Allocation Acro
PAGE 428
How LSF Runs Parallel Jobs How LSF Runs Parallel Jobs When LSF runs a job, the LSB_HOSTS variable is set to the names of the hosts running the batch job. For a parallel batch job, LSB_HOSTS contains the complete list of hosts that LSF has allocated to that job. LSF starts one controlling process for the parallel batch job on the first host in the host list.
PAGE 429
Chapter 28 Running Parallel Jobs Preparing Your Environment to Submit Parallel Jobs to LSF Getting the host list Some applications can take this list of hosts directly as a command line parameter. For other applications, you may need to process the host list. Example The following example shows a /bin/sh script that processes all the hosts in the host list, including identifying the host where the job script is executing.
PAGE 430
Submitting Parallel Jobs Submitting Parallel Jobs LSF can allocate more than one host or processor to run a job and automatically keeps track of the job status, while a parallel job is running. Specifying the number of processors When submitting a parallel job that requires multiple processors, you can specify the exact number of processors to use. To submit a parallel job, use bsub -n and specify the number of processors the job requires.
PAGE 431
Chapter 28 Running Parallel Jobs Starting Parallel Tasks with LSF Utilities For simple parallel jobs you can use LSF utilities to start parts of the job on other hosts. Because LSF utilities handle signals transparently, LSF can suspend and resume all components of your job without additional programming. The simplest parallel job runs an identical copy of the executable on every host. The lsgrun command takes a list of host names and runs the specified task on each host.
PAGE 432
Job Slot Limits For Parallel Jobs Job Slot Limits For Parallel Jobs A job slot is the basic unit of processor allocation in LSF. A sequential job uses one job slot. A parallel job that has N components (tasks) uses N job slots, which can span multiple hosts. By default, running and suspended jobs count against the job slot limits for queues, users, hosts, and processors that they are associated with. With processor reservation, job slots reserved by pending jobs also count against all job slot limits.
PAGE 433
Chapter 28 Running Parallel Jobs Specifying a Minimum and Maximum Number of Processors When submitting a parallel job, you can also specify a minimum number and a maximum number of processors. If you specify a maximum and minimum number of processors, the job will start as soon as the minimum number of processors is available, but it will use up to the maximum number of processors, depending on how many processors are available at the time.
PAGE 434
Specifying a Mandatory First Execution Host Specifying a Mandatory First Execution Host In general, the first execution host satisfies certain resource requirements that might not be present on other available hosts. LSF normally selects the first execution host dynamically according to the resource availability and host load for a parallel job. You can also specify a mandator y first execution host.
PAGE 435
Chapter 28 Running Parallel Jobs Job chunking Specifying a mandatory first execution host affects job chunking.
PAGE 436
Controlling Processor Allocation Across Hosts Controlling Processor Allocation Across Hosts Sometimes you need to control how the selected processors for a parallel job are distributed across the hosts in the cluster. You can control this at the job level or at the queue level. The queue specification is ignored if your job specifies its own locality. Specifying parallel job locality at the job level By default, LSF will allocate the required processors for the job from the available set of processors.
PAGE 437
Chapter 28 Running Parallel Jobs Specifying multiple ptile values In a span string with multiple ptile values, you must specify a predefined default value (ptile='!') and either host model or host type: ◆ For host type, you must specify same[type] in the resource requirement. For example: span[ptile='!',HP:8,SGI:8,LINUX:2] same[type] The job requests 8 processors on a host of type HP or SGI, and 2 processors on a host of type LINUX, and the predefined maximum job slot limit in lsb.
PAGE 438
Controlling Processor Allocation Across Hosts Examples % bsub -n 4 -R "span[hosts=1]" myjob Runs the job on a host that has at least 4 processors currently eligible to run the 4 components of this job. % bsub -n 4 -R "span[ptile=2]" myjob Runs the job on 2 hosts, using 2 processors on each host. Each host may have more than 2 processors available. % bsub -n 4 -R "span[ptile=3]" myjob Runs the job on 2 hosts, using 3 processors on the first host and 1 processor on the second host.
PAGE 439
Chapter 28 Running Parallel Jobs Running Parallel Processes on Homogeneous Hosts Parallel jobs run on multiple hosts. If your cluster has heterogeneous hosts some processes from a parallel job may for example, run on Solaris and some on SGI IRIX. However, for performance reasons you may want all processes of a job to run on the same type of host instead of having some processes run on one type of host and others on another type of host.
PAGE 440
Running Parallel Processes on Homogeneous Hosts If you want to specify the same resource requirement at the queue-level, define a custom resource in lsf.shared as in the previous example, map hosts to high-speed connection groups in lsf.cluster.cluster_name, and define the following queue in lsb.
PAGE 441
Chapter 28 Running Parallel Jobs Using LSF Make to Run Parallel Jobs For parallel jobs that have a variety of different components to run, you can use Platform Make. Create a makefile that lists all the components of your batch job and then submit the Platform Make command to LSF. Example The following example shows a bsub command and makefile for a simple parallel job. % bsub -n 4 lsmake -f Parjob.makefile Job <3858> is submitted to default queue . Parjob.
PAGE 442
Limiting the Number of Processors Allocated Limiting the Number of Processors Allocated Use the PROCLIMIT parameter in lsb.queues to limit the number of processors that can be allocated to a parallel job in the queue.
PAGE 443
Chapter 28 Running Parallel Jobs Changing PROCLIMIT If you change the PROCLIMIT parameter, the new processor limit does not affect running jobs. Pending jobs with no processor requirements use the new default PROCLIMIT value.
PAGE 444
Limiting the Number of Processors Allocated Examples Maximum PROCLIMIT is specified in the default queue in lsb.queues as: processor limit PROCLIMIT = 3 The maximum number of processors that can be allocated for this queue is 3. Example Description % bsub -n 2 myjob % bsub -n 4 myjob The job myjob runs on 2 processors. The job myjob is rejected from the queue because it requires more than the maximum number of processors configured for the queue (3). The job myjob runs on 2 or 3 processors.
PAGE 445
Chapter 28 Running Parallel Jobs Reserving Processors About processor reservation When parallel jobs have to compete with sequential jobs for job slots, the slots that become available are likely to be taken immediately by a sequential job. Parallel jobs need multiple job slots to be available before they can be dispatched. If the cluster is always busy, a large parallel job could be pending indefinitely. The more processors a parallel job requires, the worse the problem is.
PAGE 446
Reserving Processors Viewing information about reserved job slots Reserved slots can be displayed with the bjobs command. The number of reserved slots can be displayed with the bqueues, bhosts, bhpart, and busers commands. Look in the RSV column.
PAGE 447
Chapter 28 Running Parallel Jobs Reserving Memory for Pending Parallel Jobs By default, the rusage string reserves resources for running jobs. Because resources are not reserved for pending jobs, some memory-intensive jobs could be pending indefinitely because smaller jobs take the resources immediately before the larger jobs can start running. The more memory a job requires, the worse the problem is.
PAGE 448
Allowing Jobs to Use Reserved Job Slots Allowing Jobs to Use Reserved Job Slots By default, a reserved job slot cannot be used by another job. To make better use of resources and improve performance of LSF, you can configure backfill scheduling. About backfill scheduling Backfill scheduling allows other jobs to use the reserved job slots, as long as the other jobs will not delay the start of another job.
PAGE 449
Chapter 28 Running Parallel Jobs 2 Shortly afterwards, a parallel job (job2) requiring all 4 CPUs is submitted. It cannot start right away because job1 is using one CPU, so it reserves the remaining 3 processors (figure b). 3 At 8:30 am, another parallel job (job3) is submitted requiring only two processors and with a run limit of 1 hour. Since job2 cannot start until 10:00am (when job1 finishes), its reserved processors can be backfilled by job3 (figure c).
PAGE 450
Allowing Jobs to Use Reserved Job Slots Using backfill on memory If BACKFILL is configured in a queue, and a run limit is specified with -W on bsub or with RUNLIMIT in the queue, backfill jobs can use the accumulated memory reserved by the other jobs, as long as the backfill job can finish before the predicted start time of the jobs with the reservation. Unlike slot reservation, which only applies to parallel jobs, backfill on memory applies to sequential and parallel jobs.
PAGE 451
Chapter 28 Running Parallel Jobs Submitting a third job with same requirements will reserve one job slot, and reserve all free memory, if the amount of free memory is between 20 MB and 200 MB (some free memory may be used by the operating system or other software.) ◆ Job 4: % bsub -W 400 -q backfill -R "rusage[mem=50]" myjob4 The job will keep pending, since memory is reserved by job 3 and it will run longer than job 1 and job 2.
PAGE 452
Allowing Jobs to Use Reserved Job Slots Using interruptible backfill Inter ruptible backfill scheduling can improve cluster utilization by allowing reserved job slots to be used by low priority small jobs that will be terminated when the higher priority large jobs are about to start.
PAGE 453
Chapter 28 Running Parallel Jobs ◆ ◆ ◆ ◆ Killing other running jobs prematurely does not affect the calculated run limit of an interruptible backfill job. Slot-reserving jobs will not start sooner. While the queue is checked for the consistency of interruptible backfill, backfill and runtime specifications, the requeue exit value clause is not verified, nor executed automatically. Configure requeue exit values according to your site policies.
PAGE 454
Allowing Jobs to Use Reserved Job Slots Viewing the run limits for interruptible backfill jobs (bjobs and bhist) Use bjobs to display the run limit calculated based on the configured queue-level run limit. For example, the interruptible backfill queue lazy configures RUNLIMIT=60: % bjobs -l 135 Job <135>, User , Project , Status , Queue , Command Mon Nov 21 11:49:22: Submitted from host , CWD <$HOME/H PC/jobs>; RUNLIMIT 59.
PAGE 455
Chapter 28 Running Parallel Jobs Parallel Fairshare LSF can consider the number of CPUs when using fairshare scheduling with parallel jobs. If the job is submitted with bsub -n, the following formula is used to calculate dynamic priority: dynamic priority = number_shar es / (cpu_time * CPU_TIME_FACTOR + run_time * number_CPUs * RUN_TIME_FACTOR + (1 + job_slots )* RUN_JOB_FACTOR) where number_CPUs is the number of CPUs used by the job.
PAGE 456
How Deadline Constraint Scheduling Works For Parallel Jobs How Deadline Constraint Scheduling Works For Parallel Jobs For information about deadline constraint scheduling, see “Using Deadline Constraint Scheduling” on page 248. Deadline constraint scheduling is enabled by default. If deadline constraint scheduling is enabled and a parallel job has a CPU limit but no run limit, LSF considers the number of processors when calculating how long the job will take.
PAGE 457
Chapter 28 Running Parallel Jobs Optimized Preemption of Parallel Jobs You can configure preemption for parallel jobs to reduce the number of jobs suspended in order to run a large parallel job. When a high-priority parallel job preempts multiple low-priority parallel jobs, sometimes LSF preempts more low-priority jobs than are necessary to release sufficient job slots to start the high-priority job. The PREEMPT_FOR parameter in lsb.
PAGE 458
Optimized Preemption of Parallel Jobs 458 Administering Platform LSF
PAGE 459
P A R V T Controlling Job Execution Contents ◆ ◆ ◆ ◆ ◆ ◆ Chapter 29, “Runtime Resource Usage Limits” Chapter 30, “Load Thresholds” Chapter 31, “Pre-Execution and Post-Execution Commands” Chapter 32, “Job Starters” Chapter 33, “External Job Submission and Execution Controls” Chapter 34, “Configuring Job Controls”
PAGE 460
PAGE 461
C H A P T E 29 R Runtime Resource Usage Limits Contents ◆ ◆ ◆ ◆ “About Resource Usage Limits” on page 462 “Specifying Resource Usage Limits” on page 464 “Supported Resource Usage Limits and Syntax” on page 467 “CPU Time and Run Time Normalization” on page 472 Administering Platform LSF 461
PAGE 462
About Resource Usage Limits About Resource Usage Limits Resource usage limits control how much resource can be consumed by running jobs. Jobs that use more than the specified amount of a resource are signalled or have their priority lowered. Limits can be specified either at the queue level by your LSF administrator (lsb.queues) or at the job level when you submit a job. For example, by defining a high-priority short queue, you can allow short jobs to be scheduled earlier than long jobs.
PAGE 463
Chapter 29 Runtime Resource Usage Limits Priority of resource usage limits If no limit is specified at job submission, then the following apply to all jobs submitted to the queue: If ... Then ...
PAGE 464
Specifying Resource Usage Limits Specifying Resource Usage Limits Queues can enforce resource usage limits on running jobs. LSF supports most of the limits that the underlying operating system supports. In addition, LSF also supports a few limits that the underlying operating system does not support. Specify queue-level resource usage limits using parameters in lsb.queues. Specifying queue-level resource usage limits Limits configured in lsb.queues apply to all jobs submitted to the queue.
PAGE 465
Chapter 29 Runtime Resource Usage Limits ◆ ◆ ◆ PROCESSLIMIT RUNLIMIT THREADLIMIT Host specification If default and maximum limits are specified for CPU time limits or run time limits, only with two limits one host specification is permitted.
PAGE 466
Specifying Resource Usage Limits Specifying job-level resource usage limits To specify resource usage limits at the job level, use one of the following bsub options: ◆ -C cor e_limit ◆ -c cpu_limit ◆ -D data_limit ◆ -F file_limit ◆ -M mem_limit ◆ -p pr ocess_limit ◆ -W run_limit ◆ -S stack_limit ◆ -T thr ead_limit -v swap_limit Job-level resource usage limits specified at job submission override the queue definitions.
PAGE 467
Chapter 29 Runtime Resource Usage Limits Supported Resource Usage Limits and Syntax Core file size limit Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -C cor e_limit CORELIMIT=limit integer KB Sets a per-process (soft) core file size limit in KB for each process that belongs to this batch job. On some systems, no core file is produced if the image for the process is larger than the core limit. On other systems only the first core_limit KB of the image are dumped.
PAGE 468
Supported Resource Usage Limits and Syntax % bsub -c 10/DEC3000 myjob See “CPU Time and Run Time Normalization” on page 472 for more information. Data segment size limit Job syntax (bsub) Queue syntax (lsb.queues) Fomat/Units -D data_limit DATALIMIT=[default] maximum integer KB Sets a per-process (soft) data segment size limit in KB for each process that belongs to this batch job. An sbrk() or malloc() call to extend the data segment beyond the data limit returns an error.
PAGE 469
Chapter 29 Runtime Resource Usage Limits OS memory limit OS enforcement usually allows the process to eventually run to completion. LSF passes enforcement mem_limit to the OS, which uses it as a guide for the system scheduler and memory allocator. The system may allocate more memory to a process if there is a surplus. When memory is low, the system takes memory from and lowers the scheduling priority (renice) of a process that has exceeded its declared mem_limit.
PAGE 470
Supported Resource Usage Limits and Syntax For example, if a job is submitted from a host with a CPU factor of 2 and executed on a host with a CPU factor of 3, the run limit is multiplied by 2/3 because the execution host can do the same amount of work as the submission host in 2/3 of the time. If the optional host name or host model is not given, the run limit is scaled based on the DEFAULT_HOST_SPEC specified in the lsb.params file.
PAGE 471
Chapter 29 Runtime Resource Usage Limits Sets the total process virtual memory limit to swap_limit in KB for the whole job. The default is no limit. Exceeding the limit causes the job to terminate. This limit applies to the whole job, no matter how many processes the job may contain. Examples Queue-level limits ◆ CPULIMIT = 20/hostA 15 The first number is the default CPU limit. The second number is the maximum CPU limit.
PAGE 472
CPU Time and Run Time Normalization CPU Time and Run Time Normalization To set the CPU time limit and run time limit for jobs in a platform-independent way, LSF scales the limits by the CPU factor of the hosts involved. When a job is dispatched to a host for execution, the limits are then normalized according to the CPU factor of the execution host.
PAGE 473
C H A P T E 30 R Load Thresholds Contents ◆ ◆ “Automatic Job Suspension” on page 474 “Suspending Conditions” on page 476 Administering Platform LSF 473
PAGE 474
Automatic Job Suspension Automatic Job Suspension Jobs running under LSF can be suspended based on the load conditions on the execution hosts. Each host and each queue can be configured with a set of suspending conditions. If the load conditions on an execution host exceed either the corresponding host or queue suspending conditions, one or more jobs running on that host will be suspended to reduce the load. When LSF suspends a job, it invokes the SUSPEND action.
PAGE 475
Chapter 30 Load Thresholds Exceptions In some special cases, LSF does not automatically suspend jobs because of load levels. ◆ ◆ ◆ LSF does not suspend a job forced to run with brun -f. LSF does not suspend the only job running on a host, unless the host is being used interactively. When only one job is running on a host, it is not suspended for any reason except that the host is not interactively idle (the it interactive idle time load index is less than one minute).
PAGE 476
Suspending Conditions Suspending Conditions LSF provides different alternatives for configuring suspending conditions. Suspending conditions are configured at the host level as load thresholds, whereas suspending conditions are configured at the queue level as either load thresholds, or by using the STOP_COND parameter in the lsb.queues file, or both. The load indices most commonly used for suspending conditions are the CPU run queue lengths (r15s, r1m, and r15m), paging rate (pg), and idle time (it).
PAGE 477
Chapter 30 Load Thresholds Theory ◆ ◆ The r15s, r1m, and r15m CPU run queue length conditions are compared to the effective queue length as reported by lsload -E, which is normalised for multiprocessor hosts. Thresholds for these parameters should be set at appropriate levels for single processor hosts. Configure load thresholds consistently across queues.
PAGE 478
Suspending Conditions Resuming suspended jobs Jobs are suspended to prevent overloading hosts, to prevent batch jobs from interfering with interactive use, or to allow a more urgent job to run. When the host is no longer overloaded, suspended jobs should continue running. When LSF automatically resumes a job, it invokes the RESUME action. The default action for RESUME is to send the signal SIGCONT. If there are any suspended jobs on a host, LSF checks the load levels in each dispatch turn.
PAGE 479
C H A P T E R 31 Pre-Execution and Post-Execution Commands Jobs can be submitted with optional pre- and post-execution commands. A pre- or postexecution command is an arbitrary command to run before the job starts or after the job finishes. Pre- and post-execution commands are executed in a separate environment from the job.
PAGE 480
About Pre-Execution and Post-Execution Commands About Pre-Execution and Post-Execution Commands Each batch job can be submitted with optional pre- and post-execution commands. Preand post-execution commands can be any excutable command lines to be run before a job is started or after a job finishes. Some batch jobs require resources that LSF does not directly support.
PAGE 481
Chapter 31 Pre-Execution and Post-Execution Commands Job-level commands The bsub -E option specifies an arbitrary command to run before starting the batch job. When LSF finds a suitable host on which to run a job, the pre-execution command is executed on that host. If the pre-execution command runs successfully, the batch job is started. Job-level post-execution commands are not supported.
PAGE 482
Configuring Pre- and Post-Execution Commands Configuring Pre- and Post-Execution Commands Pre- and post-execution commands can be configured at the job level or on a per-queue basis. Job-level commands Job-level pre-execution commands require no configuration. Use the bsub -E option to specify an arbitrary command to run before the job starts. Example The following example shows a batch job that requires a tape drive.
PAGE 483
Chapter 31 Pre-Execution and Post-Execution Commands ◆ If both queue and job-level pre-execution commands are specified, the job-level pre-execution is run after the queue-level pre-execution command. UNIX The entire contents of the configuration line of the pre- and post-execution commands are run under /bin/sh -c, so shell features can be used in the command. For example, the following is valid: PRE_EXEC = /usr/share/lsf/misc/testq_pre >> /tmp/pre.
PAGE 484
Configuring Pre- and Post-Execution Commands 484 Administering Platform LSF
PAGE 485
C H A P T E R 32 Job Starters A job starter is a specified shell script or executable program that sets up the environment for a job and then runs the job. The job starter and the job share the same environment. This chapter discusses two ways of running job starters in LSF and how to set up and use them.
PAGE 486
About Job Starters About Job Starters Some jobs have to run in a particular environment, or require some type of setup to be performed before they run. In a shell environment, job setup is often written into a wrapper shell script file that itself contains a call to start the desired job. A job starter is a specified wrapper script or executable program that typically performs environment setup for the job, then calls the job itself, which inherits the execution environment created by the job starter.
PAGE 487
Chapter 32 Job Starters ◆ Setting a job starter to make clean causes the command make clean to be run before the user job.
PAGE 488
Command-Level Job Starters Command-Level Job Starters A command-level job starter allows you to specify an executable file that does any necessary setup for the job and runs the job when the setup is complete. You can select an existing command to be a job starter, or you can create a script containing a desired set of commands to serve as a job starter. This section describes how to set up and use a command-level job starter to run interactive jobs.
PAGE 489
Chapter 32 Job Starters #!/bin/sh set term = xterm eval "$*" Windows If you define the LSF_JOB_STARTER environment variable as follows: % set LSF_JOB_STARTER=C:\cmd.exe /C Then you run a simple DOS shell job: C:\> lsrun dir /p The command that actually runs is: C:\cmd.
PAGE 490
Queue-Level Job Starters Queue-Level Job Starters LSF administrators can define a job starter for an individual queue to create a specific environment for jobs to run in. A queue-level job starter specifies an executable that performs any necessary setup, and then runs the job when the setup is complete. The JOB_STARTER parameter in lsb.queues specifies the command or script that is the job starter for the queue. This section describes how to set up and use a queue-level job starter.
PAGE 491
Chapter 32 Job Starters JOB_STARTER = /bin/csh -c %USRCMD You can also enclose the %USRCMD string in quotes or follow it with additional commands. For example: JOB_STARTER = /bin/csh -c "%USRCMD;sleep 10" If a user submits the following job to the queue with this job starter: % bsub myjob arguments the command that actually runs is: % /bin/csh -c "myjob arguments; sleep 10" For more See the Platfor m LSF Refer ence for information about the JOB_STARTER parameter information in the lsb.queues file.
PAGE 492
Controlling Execution Environment Using Job Starters Controlling Execution Environment Using Job Starters In some cases, using bsub -L does not result in correct environment settings on the execution host. LSF provides the following two job starters: ◆ preservestarter —preserves the default environment of the execution host. It ◆ augmentstarter —augments the default user environment of the execution host does not include any submission host settings.
PAGE 493
C H A P T E R 33 External Job Submission and Execution Controls This document describes the use of external job submission and execution controls called esub and eexec. These site-specific user-written executables are used to validate, modify, and reject job submissions, pass data to and modify job execution environments.
PAGE 494
Understanding External Executables Understanding External Executables About esub and eexec LSF provides the ability to validate, modify, or reject job submissions, modify execution environments, and pass data from the submission host directly to the execution host through the use of the esub and eexec executables. Both are site-specific and user written and must be located in LSF_SERVERDIR. Validate, modify, To validate, modify, or reject a job, an esub needs to be written.
PAGE 495
Chapter 33 External Job Submission and Execution Controls Using esub About esub An esub, short for exter nal submission, is a user-written executable (binary or script) that can be used to validate, modify, or reject jobs. The esub is put into LSF_SERVERDIR (defined in lsf.conf) where LSF checks for its existence when a job is submitted, restarted, and modified. If LSF finds an esub, it is run by LSF. Whether the job is submitted, modified, or rejected depends on the logic built into the esub.
PAGE 496
Using esub Option Description LSB_SUB_DEPEND_COND LSB_SUB_ERR_FILE LSB_SUB_EXCEPTION LSB_SUB_EXCLUSIVE LSB_SUB_EXTSCHED_PARAM LSB_SUB_HOST_SPEC LSB_SUB_HOSTS LSB_SUB_IN_FILE LSB_SUB_INTERACTIVE LSB_SUB_LOGIN_SHELL LSB_SUB_JOB_NAME LSB_SUB_JOB_WARNING_ACTION LSB_SUB_JOB_ACTION_WARNING_TIME LSB_SUB_MAIL_USER LSB_SUB_MAX_NUM_PROCESSORS LSB_SUB_MODIFY LSB_SUB_MODIFY_ONCE LSB_SUB_NOTIFY_BEGIN LSB_SUB_NOTIFY_END LSB_SUB_NUM_PROCESSORS LSB_SUB_OTHER_FILES Dependency condition Standard error file name Exception
PAGE 497
Chapter 33 External Job Submission and Execution Controls Option Description LSB_SUB_RLIMIT_STACK LSB_SUB_RLIMIT_THREAD LSB_SUB_TERM_TIME LSB_SUB_TIME_EVENT LSB_SUB_USER_GROUP LSB_SUB_WINDOW_SIG LSB_SUB2_JOB_GROUP LSB_SUB2_LICENSE_PROJECT LSB_SUB2_SLA LSB_SUB2_USE_RSV Stack size limit Thread limit Termination time, in seconds, since 00:00:00 GMT, Jan.
PAGE 498
Using esub External executables get called by several LSF commands (bsub, bmod, lsrun). This variable contians the name of the last LSF command to call the executable. General esub logic After esub runs, LSF checks: 1 2 3 4 5 Is the esub exit value LSB_SUB_ABORT_VALUE? a Yes, step 2 b No, step 4 Reject the job Go to step 5 Does LSB_SUB_MODIFY_FILE or LSB_SUB_MODIFY_ENVFILE exist? ❖ Apply changes Done Rejecting jobs Depending on your policies you may choose to reject a job.
PAGE 499
Chapter 33 External Job Submission and Execution Controls if [ $LSB_SUB_PROJECT_NAME != "proj1" -o $LSB_SUB_PROJECT_NAME != "proj2" ]; then echo "Incorrect project name specified" exit $LSB_SUB_ABORT_VALUE fi USER=`whoami` if [ $LSB_SUB_PROJECT_NAME = "proj1" ]; then # Only user1 and user2 can charge to proj1 if [$USER != "user1" -a $USER != "user2" ]; then echo "You are not allowed to charge to this project" exit $LSB_SUB_ABORT_VALUE fi fi Modifying job submission parameters esub can be used to modify su
PAGE 500
Using esub # Deny userC the ability to submit a job if [ $USER="userC" ]; then echo "You are not permitted to submit a job." exit $LSB_SUB_ABORT_VALUE fi Using bmod and brestart commands with mesub You can use the bmod command to modify job submission parameters, and brestart to restart checkpointed jobs. Like bsub, bmod and brestart also call mesub, which in turn invoke any existing esub executables in LSF_SERVERDIR. bmod and brestart cannot make changes to the job environment through mesub and esub.
PAGE 501
Chapter 33 External Job Submission and Execution Controls Example LSB_ESUB_METHOD=dce (lsf.conf) LSF_SERVERDIR bsub -a fluent license esub.dce mesub esub esub.fluent esub.license In this example: ◆ esub.dce is defined as the only mandatory esub ◆ An executable named esub already exists in LSF_SERVERDIR Executables named esub.fluent and esub.
PAGE 502
Using esub esub.user name is The file name esub.user is reserved for backward compatibility. Do not use reserved the name esub.user for your application-specific esub.
PAGE 503
Chapter 33 External Job Submission and Execution Controls Working with eexec About eexec The eexec program runs on the execution host at job start-up and completion time and when checkpointing is initiated. It is run as the user after the job environment variables have been set. The environment variable LS_EXEC_T is set to START, END, and CHKPNT, respectively, to indicate when eexec is invoked.
PAGE 504
Working with eexec 504 Administering Platform LSF
PAGE 505
C H A P T E 34 R Configuring Job Controls After a job is started, it can be killed, suspended, or resumed by the system, an LSF user, or LSF administrator. LSF job control actions cause the status of a job to change. This chapter describes how to configure job control actions to override or augment the default job control actions.
PAGE 506
Default Job Control Actions Default Job Control Actions After a job is started, it can be killed, suspended, or resumed by the system, an LSF user, or LSF administrator. LSF job control actions cause the status of a job to change. LSF supports the following default actions for job controls: SUSPEND RESUME ◆ TERMINATE On successful completion of the job control action, the LSF job control commands cause the status of a job to change.
PAGE 507
Chapter 34 Configuring Job Controls The scheduling thresholds of the queue and the execution host A closed run window of the queue opens again A preempted job finishes ❖ ◆ ◆ TERMINATE action Terminate a job. This usually causes the job change to EXIT status. The default action is to send SIGINT first, then send SIGTERM 10 seconds after SIGINT, then send SIGKILL 10 seconds after SIGTERM. The delay between signals allows user programs to catch the signals and clean up before the job terminates.
PAGE 508
Configuring Job Control Actions Configuring Job Control Actions Several situations may require overriding or augmenting the default actions for job control. For example: Notifying users when their jobs are suspended, resumed, or terminated ◆ An application holds resources (for example, licenses) that are not freed by suspending the job. The administrator can set up an action to be performed that causes the license to be released before the job is suspended and re-acquired when the job is resumed.
PAGE 509
Chapter 34 Configuring Job Controls JOB_CONTROLS=TERMINATE[brequeue]). This will cause a deadlock between the signal and the action. Using a command as a job control action ◆ ◆ ◆ The command line for the action is run with /bin/sh -c so you can use shell features in the command. The command is run as the user of the job. All environment variables set for the job are also set for the command action.
PAGE 510
Configuring Job Control Actions Syntax TERMINATE_WHEN = [LOAD] [PREEMPT] [WINDOW] Example The following defines a night queue that will kill jobs if the run window closes. Begin Queue NAME = night RUN_WINDOW = 20:00-08:00 TERMINATE_WHEN = WINDOW JOB_CONTROLS = TERMINATE[ kill -KILL $LSB_JOBPIDS; echo "job $LSB_JOBID killed by queue run window" | mail $USER ] End Queue LSB_SIGSTOP parameter (lsf.conf) Use LSB_SIGSTOP to configure the SIGSTOP signal sent by the default SUSPEND action.
PAGE 511
Chapter 34 Configuring Job Controls Customizing Cross-Platform Signal Conversion LSF supports signal conversion between UNIX and Windows for remote interactive execution through RES. On Windows, the CTRL+C and CTRL+BREAK key combinations are treated as signals for console applications (these signals are also called console control actions). LSF supports these two Windows console signals for remote interactive execution. LSF regenerates these signals for user tasks on the execution host.
PAGE 512
Customizing Cross-Platform Signal Conversion 512 Administering Platform LSF
PAGE 513
P A R VI T Interactive Jobs Contents ◆ ◆ Chapter 35, “Interactive Jobs with bsub” Chapter 36, “Running Interactive and Remote Tasks”
PAGE 514
PAGE 515
C H A P T E R 35 Interactive Jobs with bsub Contents ◆ ◆ ◆ ◆ ◆ ◆ ◆ “About Interactive Jobs” on page 516 “Submitting Interactive Jobs” on page 517 “Performance Tuning for Interactive Batch Jobs” on page 519 “Interactive Batch Job Messaging” on page 522 “Running X Applications with bsub” on page 524 “Writing Job Scripts” on page 525 “Registering utmp File Entries for Interactive Batch Jobs” on page 528 Administering Platform LSF 515
PAGE 516
About Interactive Jobs About Interactive Jobs It is sometimes desirable from a system management point of view to control all workload through a single centralized scheduler. Running an interactive job through the LSF batch system allows you to take advantage of batch scheduling policies and host selection features for resource-intensive jobs. You can submit a job and the least loaded host is selected to run the job.
PAGE 517
Chapter 35 Interactive Jobs with bsub Submitting Interactive Jobs Use the bsub -I option to submit batch interactive jobs, and the bsub -Is and -Ip options to submit batch interactive jobs in pseudo-terminals. Pseudo-terminals are not supported for Windows. For more details, see the bsub(1) man page. Finding out which queues accept interactive jobs Before you submit an interactive job, you need to find out which queues accept interactive jobs with the bqueues -l command.
PAGE 518
Submitting Interactive Jobs When you specify the -Ip option, bsub submits a batch interactive job and creates a pseudo-terminal when the job starts. Some applications such as vi for example, require a pseudo-terminal in order to run correctly. For example: % bsub -Ip vi myfile Submits a batch interactive job to edit myfile. bsub -Is To submit a batch interactive job and create a pseudo-terminal with shell mode support, use the bsub -Is option.
PAGE 519
Chapter 35 Interactive Jobs with bsub Performance Tuning for Interactive Batch Jobs LSF is often used on systems that support both interactive and batch users. On one hand, users are often concerned that load sharing will overload their workstations and slow down their interactive tasks. On the other hand, some users want to dedicate some machines for critical batch jobs so that they have guaranteed resources.
PAGE 520
Performance Tuning for Interactive Batch Jobs The paging rate load index can be used as a threshold to either stop sending more jobs to the host, or to suspend an already running batch job to give priority to interactive users. This parameter can be used in different configuration files to achieve different purposes. By defining paging rate threshold in lsf.cluster.cluster_name, the host will become busy from LIM’s point of view; therefore, no more jobs will be advised by LIM to run on this host.
PAGE 521
Chapter 35 Interactive Jobs with bsub For short to medium-length jobs, the r1m index should be used. For longer jobs, you might want to add an r15m threshold. An exception to this are high priority queues, where turnaround time is more important than total throughput. For high priority queues, an r1m scheduling threshold of 2.0 is appropriate. CPU utilization The ut parameter measures the amount of CPU time being used.
PAGE 522
Interactive Batch Job Messaging Interactive Batch Job Messaging LSF can display messages to stderr or the Windows console when the following changes occur with interactive batch jobs: Job state ◆ Pending reason ◆ Suspend reason Other job status changes, like switching the job’s queue, are not displayed. ◆ Limitations Interactive batch job messaging is not supported in a MultiCluster environment. Windows Interactive batch job messaging is not fully supported on Windows.
PAGE 523
Chapter 35 Interactive Jobs with bsub Job terminated by The following example shows messages displayed when a job in pending state is user terminated by the user: % bsub -m hostA -b 13:00 -Is sh Job <2015> is submitted to default queue . Job will be scheduled after Fri Nov 19 13:00:00 1999 <
PAGE 524
Running X Applications with bsub Running X Applications with bsub You can start an X session on the least loaded host by submitting it as a batch job: % bsub xterm An xterm is started on the least loaded host in the cluster. When you run X applications using lsrun or bsub, the environment variable DISPLAY is handled properly for you. It behaves as if you were running the X application on the local machine.
PAGE 525
Chapter 35 Interactive Jobs with bsub Writing Job Scripts You can build a job file one line at a time, or create it from another file, by running bsub without specifying a job to submit. When you do this, you start an interactive session in which bsub reads command lines from the standard input and submits them as a single batch job. You are prompted with bsub> for each line. You can use the bsub -Zs command to spool a file. For more details on bsub options, see the bsub(1) man page.
PAGE 526
Writing Job Scripts % bsub < myscript Job <1234> submitted to queue . In this example, the myscript file contains job submission options as well as command lines to execute. When the bsub command reads a script from its standard input, it can be modified right after bsub returns for the next job submission. When the script is specified on the bsub command line, the script is not spooled: % bsub myscript Job <1234> submitted to default queue .
PAGE 527
Chapter 35 Interactive Jobs with bsub For example: % bsub bsub> # This is a comment line. This tells the system to use /bin/csh to bsub> # interpret the script. bsub> bsub> setenv DAY ‘date | cut -d" " -f1‘ bsub> myjob bsub> ^D Job <1234> is submitted to default queue . If running jobs under a particular shell is required frequently, you can specify an alternate shell using a command-level job starter and run your jobs interactively.
PAGE 528
Registering utmp File Entries for Interactive Batch Jobs Registering utmp File Entries for Interactive Batch Jobs LSF administrators can configure the cluster to track user and account information for interactive batch jobs submitted with bsub -Ip or bsub -Is. User and account information is registered as entries in the UNIX utmp file, which holds information for commands such as who. Registering user information for interactive batch jobs in utmp allows more accurate job accounting.
PAGE 529
C H A P T E R 36 Running Interactive and Remote Tasks This chapter provides instructions for running tasks interactively and remotely with non-batch utilities such as lsrun, lsgrun, and lslogin.
PAGE 530
Running Remote Tasks Running Remote Tasks lsrun is a non-batch utility to run tasks on a remote host. lsgrun is a non-batch utility to run the same task on many hosts, in sequence one after the other, or in parallel. The default for lsrun is to run the job on the host with the least CPU load (represented by the lowest normalized CPU run queue length) and the most available memory. Command-line arguments can be used to select other resource requirements or to specify the execution host.
PAGE 531
Chapter 36 Running Interactive and Remote Tasks Resource usage Resource reservation is only available for batch jobs. If you run jobs using only LSF Base, LIM uses resource usage to determine the placement of jobs. Resource usage requests are used to temporarily increase the load so that a host is not overloaded. When LIM makes a placement advice, external load indices are not considered in the resource usage string. In this case, the syntax of the resource usage string is res[=value]:res[=value]: ...
PAGE 532
Running Remote Tasks Running tasks on hosts specified by a file lsgrun -f The lsgrun -f host_file option reads the host_file file to get a list of hosts on which host_file to run the task.
PAGE 533
Chapter 36 Running Interactive and Remote Tasks Interactive Tasks LSF supports transparent execution of tasks on all server hosts in the cluster. You can run your program on the best available host and interact with it just as if it were running directly on your workstation. Keyboard signals such as CTRL-Z and CTRL-C work as expected. Interactive tasks communicate with the user in real time. Programs like vi use a textbased terminal interface.
PAGE 534
Interactive Tasks Tasks that read and write files access the files on the remote host. For load sharing to be transparent, your files should be available on all hosts in the cluster using a file sharing mechanism such as NFS or AFS. When your files are available on all hosts in the cluster, you can run your tasks on any host without worrying about how your task will access files. LSF can operate correctly in cases where these conditions are not met, but the results may not be what you expect.
PAGE 535
Chapter 36 Running Interactive and Remote Tasks Load Sharing Interactive Sessions There are different ways to use LSF to start an interactive session on the best available host. Logging on to the least loaded host To log on to the least loaded host, use the lslogin command. When you use lslogin, LSF automatically chooses the best host and does an rlogin to that host.
PAGE 536
Load Sharing X Applications Load Sharing X Applications Starting an xterm If you are using the X Window System, you can start an xterm that opens a shell session on the least loaded host by entering: % lsrun sh -c xterm & The & in this command line is important as it frees resources on the host once xterm is running, by running the X terminal in the background. In this example, no processes are left running on the local host.
PAGE 537
Chapter 36 Running Interactive and Remote Tasks 4 5 Set description to be Best. Click the Install button in the Xstart window. This installs Best as an icon in the program group you chose (for example, xterm). The user can now log on to the best host by clicking Best in the Xterm program group. Starting an xterm in Exceed To start an xterm: ◆ Double-click the Best icon. You will get an xterm started on the least loaded host in the cluster and displayed on your screen.
PAGE 538
Load Sharing X Applications lsxterm -display your_PC:0.
PAGE 539
P A R VII T Monitoring Your Cluster Contents ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ Chapter 37, “Achieving Performance and Scalability” Chapter 38, “Event Generation” Chapter 39, “Tuning the Cluster” Chapter 40, “Authentication” Chapter 41, “Job Email, and Job File Spooling” Chapter 42, “Non-Shared File Systems” Chapter 43, “Error and Event Logging” Chapter 44, “Troubleshooting and Error Messages”
PAGE 540
PAGE 541
C H A P T E R 37 Achieving Performance and Scalability Contents ◆ ◆ ◆ “Optimizing Performance in Large Sites” on page 542 “Tuning UNIX for Large Clusters” on page 543 “Tuning LSF for Large Clusters” on page 544 Administering Platform LSF 541
PAGE 542
Optimizing Performance in Large Sites Optimizing Performance in Large Sites As your site grows, you must tune your LSF cluster to support a large number of hosts and an increased workload. This chapter discusses how to efficiently tune querying, scheduling, and event logging in a large cluster that scales to 5000 hosts and 100,000 jobs at any one time. To target performance optimization to a cluster with 5000 hosts and 100,000 jobs, you must: ◆ ◆ Configure your operating system.
PAGE 543
Chapter 37 Achieving Performance and Scalability Tuning UNIX for Large Clusters The following hardware and software specifications are requirements for a large cluster that supports 5,000 hosts and 100,000 jobs at any one time. Hardware recommendation LSF master host: ◆ ◆ 3 GHz CPU speed 4 CPUs, one each for: ❖ ❖ ❖ mbatchd mbschd lim operating system 10 GB Ram ❖ ◆ Software requirement To meet the performance requirements of a large cluster, increase the file descriptor limit of the operating system.
PAGE 544
Tuning LSF for Large Clusters Tuning LSF for Large Clusters To enable and sustain large clusters, you need to tune LSF for efficient querying, dispatching, and event log management. Managing scheduling performance For fast job dispatching in a large cluster, configure the following parameters: ◆ LSB_MAX_JOB_DISPATCH_PER_SESSION in lsf.
PAGE 545
Chapter 37 Achieving Performance and Scalability Limiting the number of batch queries In large clusters, job querying can grow very quickly. If your site sees a lot of high traffic job querying, you can tune LSF to limit the number of job queries that mbatchd can handle. This helps decrease the load on the master host. If a job information query is sent after the limit has been reached, an error message is displayed and mbatchd keeps retrying, in one second intervals.
PAGE 546
Tuning LSF for Large Clusters Managing the number of pending reasons For efficient, scalable management of pending reasons, use CONDENSE_PENDING_REASONS in lsb.params to condense all the hostbased pending reasons into one generic pending reason.
PAGE 547
Chapter 37 Achieving Performance and Scalability Important For automatic tuning of the loading interval, make sure the parameter EXINTERVAL in lsf.cluster.cluster_name file is not defined. Do not configure your cluster to load the information at specific intervals. Managing the I/O performance of the info directory In large clusters, there are large numbers of jobs submitted by its users.
PAGE 548
Tuning LSF for Large Clusters 548 Administering Platform LSF
PAGE 549
C H A P T E 38 R Event Generation Contents ◆ ◆ ◆ ◆ “Event Generation” on page 550 “Enabling event generation” on page 550 “Events list” on page 550 “Arguments passed to the LSF event program” on page 551 Administering Platform LSF 549
PAGE 550
Event Generation Event Generation LSF detects events occurring during the operation of LSF daemons. LSF provides a program which translates LSF events into SNMP traps. You can also write your own program that runs on the master host to interpret and respond to LSF events in other ways.
PAGE 551
Chapter 38 Event Generation 7 mbatchd comes up and is ready to schedule jobs (detected by mbatchd). 8 mbatchd goes down (detected by mbatchd). 9 mbatchd receives a reconfiguration request and is being reconfigured (detected by mbatchd). 10 LSB_SHAREDIR becomes full (detected by mbatchd). Arguments passed to the LSF event program If LSF_EVENT_RECEIVER is defined, a function called ls_postevent() allows specific daemon operations to generate LSF events.
PAGE 552
Event Generation 552 Administering Platform LSF
PAGE 553
C H A P T E 39 R Tuning the Cluster Contents ◆ ◆ “Tuning LIM” on page 554 “Tuning mbatchd on UNIX” on page 563 Administering Platform LSF 553
PAGE 554
Tuning LIM Tuning LIM LIM provides critical services to all LSF components. In addition to the timely collection of resource information, LIM provides host selection and job placement policies. If you are using Platform MultiCluster, LIM determines how different clusters should exchange load and resource information. You can tune LIM policies and parameters to improve performance. LIM uses load thresholds to determine whether to place remote jobs on a host.
PAGE 555
Chapter 39 Tuning the Cluster Adjusting LIM Parameters There are two main goals in adjusting LIM configuration parameters: improving response time, and reducing interference with interactive use. To improve response time, tune LSF to correctly select the best available host for each job. To reduce interference, tune LSF to avoid overloading any host. LIM policies are advisory information for applications.
PAGE 556
Load Thresholds Load Thresholds Load threshold parameters define the conditions beyond which a host is considered busy by LIM and are a major factor in influencing performance. No jobs will be dispatched to a busy host by LIM’s policy. Each of these parameters is a load index value, so that if the host load goes beyond that value, the host becomes busy. LIM uses load thresholds to determine whether to place remote jobs on a host.
PAGE 557
Chapter 39 Tuning the Cluster % lshosts -l HOST_NAME: hostD ... LOAD_THRESHOLDS: r15s r1m r15m 3.5 - ut - pg 15 io - ls - it - tmp - swp 2M mem 1M HOST_NAME: hostA ... LOAD_THRESHOLDS: r15s r1m r15m 3.5 - ut - pg 15 io - ls - it - tmp - swp 2M mem 1M % lsload HOST_NAME status r15s hostD ok 0.0 hostA busy 1.9 r1m 0.0 2.1 r15m 0.0 1.9 ut 0% 47% pg ls 0.0 6 *69.
PAGE 558
Load Thresholds Note that the normalized run queue length displayed by lsload -N is scaled by the number of processors. See “Load Indices” on page 209 and lsfintro(1) for the concept of effective and normalized run queue lengths.
PAGE 559
Chapter 39 Tuning the Cluster Changing Default LIM Behavior to Improve Performance You may want to change the default LIM behavior in the following cases: ◆ ◆ In this section ◆ ◆ ◆ ◆ ◆ In very large sites. As the size of the cluster becomes large (500 hosts or more), reconfiguration of the cluster causes each LIM to re-read the configuration files. This can take quite some time. In sites where each host in the cluster cannot share a common configuration directory or exact replica.
PAGE 560
Changing Default LIM Behavior to Improve Performance Reconfiguration and LSF_MASTER_LIST If you change LSF_MASTER_LIST Whenever you change the parameter LSF_MASTER_LIST, reconfigure the cluster with lsadmin reconfig and badmin mbdrestart. If you change lsf.cluster.cluster_name or lsf.
PAGE 561
Chapter 39 Tuning the Cluster The order in which you specify hosts in LSF_MASTER_LIST is the preferred order for selecting hosts to become the master LIM. lsf.shared lsf.cluster.cluster_name hostB Master LIM hostC lsf.conf master LIM candidate hostD lsf.conf master LIM candidate hostE lsf.conf slave-only LIM hostF lsf.conf slave-only LIM hostG slave-only LIM hostH lsf.conf lsf.conf LSF_MASTER_LIST=hostC hostD hostE Considerations Generally, the files lsf.cluster.cluster_name and lsf.
PAGE 562
Changing Default LIM Behavior to Improve Performance If you want the hosts that were rejected to be part of the cluster, ensure the number of load indices in lsf.cluster.cluster_name and lsf.shared are identical for all master candidates and restart LIMs on the master and all master candidates: % lsadmin limrestart hostA hostB hostC LSF_MASTER_LIST defined, and master host goes down If LSF_MASTER_LIST is defined and the elected master host goes down, and if the number of load indices in lsf.cluster.
PAGE 563
Chapter 39 Tuning the Cluster Tuning mbatchd on UNIX On UNIX platforms that support thread programming, you can change default mbatchd behavior to use multithreading and increase performance of query requests when you use the bjobs command. Multithreading is beneficial for busy clusters with many jobs and frequent query requests. This may indirectly increase overall mbatchd performance. Operating system See the Online Support area of the Platform Computing Web site at support www.platform.
PAGE 564
Tuning mbatchd on UNIX The child mbatchd continues to listen to the port number specified by LSB_QUERY_PORT and creates threads to service requests until the job status changes, a new job is submitted, or until the time specified in MBD_REFRESH_TIME in lsb.params has passed.
PAGE 565
C H A P T E 40 R Authentication Controlling access to remote execution has two requirements: ◆ Authenticate the user. When a user executes a remote command, the command must be run with that user’s permission. The LSF daemons need to know which user is requesting the remote execution. ◆ Check access controls on the remote host. The user must be authorized to execute commands remotely on the host. This chapter describes user, host, and daemon authentication in LSF.
PAGE 566
About User Authentication About User Authentication LSF recognizes UNIX and Windows authentication environments, including different Windows domains and individual Windows workgroup hosts. ◆ ◆ ◆ In a UNIX environment, user accounts are validated at the system level, so your user account is valid on all hosts.
PAGE 567
Chapter 40 Authentication eauth -s When the LSF daemon receives the request, it executes eauth -s under the primary LSF administrator user ID to process the user authentication data. If your site cannot run authentication under the primary LSF administrator user ID, configure the parameter LSF_EAUTH_USER in the /etc/lsf.sudoers file.
PAGE 568
About User Authentication LSF_AUTH in If you do not define LSF_AUTH in lsf.conf, privileged ports (setuid) authentication is lsf.conf the default user authentication used by LSF. Installation with lsfinstall sets LSF_AUTH=eauth automatically. To use setuid authentication, you must remove LSF_AUTH from lsf.conf. LSF_AUTH=setuid is an incorrect configuration Identification daemon (identd) LSF also supports authentication using the RFC 931 or RFC 1413 identification protocols.
PAGE 569
Chapter 40 Authentication LSF allows both the setuid and identification daemon methods to be in effect simultaneously. If the effective user ID of a load-sharing application is root, then a privileged port number is used in contacting RES. RES always accepts requests from a privileged port on a known host even if LSF_AUTH is defined to be ident.
PAGE 570
About User Authentication format of LSF protocol messages and write a program that tries to communicate with an LSF server. The LSF default external authentication should be used where this security risk is a concern. Only the parameters LSF_STARTUP_USERS and LSF_STARTUP_PATH are used on Windows. You should ensure that only authorized users modify the files under the %SYSTEMROOT% directory. Once the LSF services on Windows are started, they will only accept requests from LSF cluster administrators.
PAGE 571
Chapter 40 Authentication About Host Authentication When a batch job or a remote execution request is received, LSF first determines the user’s identity. Once the user’s identity is known, LSF decides whether it can trust the host from which the request comes from. Trust LSF host LSF normally allows remote execution by all users except root, from all hosts in the LSF cluster; LSF trusts all hosts that are configured into your cluster.
PAGE 572
About Daemon Authentication About Daemon Authentication Daemon authentication By default, LSF calls the eauth program only for user authentication (authenticate LSF user requests to either RES or mbatchd).
PAGE 573
Chapter 40 Authentication LSF in Multiple Authentication Environments In some environments, such as a UNIX system or a Windows domain, you can have one user account that works on all hosts. However, when you build an LSF cluster in a heterogeneous environment, you can have a different user account on each system, and each system does its own password authentication.
PAGE 574
User Account Mapping User Account Mapping LSF allows user account mapping across a non-uniform user name space. By default, LSF assumes uniform user accounts throughout the cluster. This means that jobs will be executed on any host with exactly the same user ID and user login name. The LSF administrator can disable user account mapping. For information about account mapping between clusters in a MultiCluster environment, see the Using Platfor m LSF MultiCluster. Configuring user-level account mapping (.
PAGE 575
Chapter 40 Authentication d 4 5 In the user_name Properties dialog box, click the Profile tab. e In the Home folder box, choose Local path. f In the Local path text box, type C:/isippc/users/user_name . g Click Apply, then click Close. h Close the Computer Management window. Log in as the user_name user. Create the following file: C:/isppc/users/user_name/.lsfhosts The .lsfhosts file must not use a file extension such as .txt 6 Add the following line to the .
PAGE 576
User Account Mapping % cat ~lsfguest/.
PAGE 577
C H A P T E R 41 Job Email, and Job File Spooling Contents ◆ ◆ “Mail Notification When a Job Starts” on page 578 “File Spooling for Job Input, Output, and Command Files” on page 581 Administering Platform LSF 577
PAGE 578
Mail Notification When a Job Starts Mail Notification When a Job Starts When a batch job completes or exits, LSF by default sends a job report by electronic mail to the submitting user account. The report includes the following information: Standard output (stdout) of the job ◆ Standard error (stderr) of the job ◆ LSF job information such as CPU, process and memory usage The output from stdout and stderr are merged together in the order printed, as if the job was run interactively.
PAGE 579
Chapter 41 Job Email, and Job File Spooling The job reads its input from file job_in. Standard output is stored in file job_out, and standard error is stored in file job_err. Size of job email Some batch jobs can create large amounts of output. To prevent large job output files from interfering with your mail system, you can use the LSB_MAILSIZE_LIMIT parameter in lsf.conf to limit the size of the email containing the job output information.
PAGE 580
Mail Notification When a Job Starts Specifying a directory for job output Make the final character in the path a slash (/) on UNIX, or a double backslash (\\) on Windows. If you omit the trailing slash or backslash characters, LSF treats the specification as a file name. If the specified directory does not exist, LSF creates it on the execution host when it creates the standard error and standard output files. By default, the output files have the following format: Standard output output_directory/job_ID.
PAGE 581
Chapter 41 Job Email, and Job File Spooling File Spooling for Job Input, Output, and Command Files About job file spooling LSF enables s pooling of job input, output, and command files by creating directories and files for buffering input and output for a job. LSF removes these files when the job completes. You can make use of file spooling when submitting jobs with the -is and -Zs options to bsub. Use similar options in bmod to modify or cancel the spool file specification for the job.
PAGE 582
File Spooling for Job Input, Output, and Command Files Use the bmod -Zs command if you need to change the command file after the job has been submitted. Changing the original input file does not affect the submitted job. Use bmod -Zsn to cancel the last spooled command file and use the original spooled file. The bsub -Zs option is not supported for embedded job commands because LSF is unable to determine the first command to be spooled in an embedded job command.
PAGE 583
Chapter 41 Job Email, and Job File Spooling Modifying the job command file Use the -Z and -Zs options of bmod to modify the job command file specification. -Z modifies a command submitted without spooling, and Zs modifies a spooled command file. The -Zsn option of bmod cancels the last job command file modification made with -Zs and uses the original spooled command.
PAGE 584
File Spooling for Job Input, Output, and Command Files 584 Administering Platform LSF
PAGE 585
C H A P T E 42 R Non-Shared File Systems Contents ◆ ◆ ◆ ◆ “About Directories and Files” on page 586 “Using LSF with Non-Shared File Systems” on page 587 “Remote File Access” on page 588 “File Transfer Mechanism (lsrcp)” on page 590 Administering Platform LSF 585
PAGE 586
About Directories and Files About Directories and Files LSF is designed for networks where all hosts have shared file systems, and files have the same names on all hosts. LSF includes support for copying user data to the execution host before running a batch job, and for copying results back after the job executes. In networks where the file systems are not shared, this can be used to give remote jobs access to local data.
PAGE 587
Chapter 42 Non-Shared File Systems Using LSF with Non-Shared File Systems LSF installation To install LSF on a cluster without shared file systems, follow the complete installation procedure on every host to install all the binaries, man pages, and configuration files. Configuration files After you have installed LSF on every host, you must update the configuration files on all hosts so that they contain the complete cluster configuration. Configuration files must be the same on all hosts.
PAGE 588
Remote File Access Remote File Access Using LSF with non-shared file space LSF is usually used in networks with shared file space. When shared file space is not available, use the bsub -f command to have LSF copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes. LSF attempts to run a job in the directory where the bsub command was invoked.
PAGE 589
Chapter 42 Non-Shared File Systems ><, <> Equivalent to performing the > and then the < operation. The file local_file is copied to r emote_file before the job executes, and r emote_file is copied back, overwriting local_file, after the job completes. <> is the same as >< If the submission and execution hosts have different directory structures, you must ensure that the directory where r emote_file and local_file will be placed exists.
PAGE 590
File Transfer Mechanism (lsrcp) File Transfer Mechanism (lsrcp) The LSF remote file access mechanism (bsub -f) uses lsrcp to process the file transfer. The lsrcp command tries to connect to RES on the submission host to handle the file transfer. See “Remote File Access” on page 588 for more information about using bsub -f. Limitations to lsrcp Because LSF client hosts do not run RES, jobs that are submitted from client hosts should only specify bsub -f if rcp is allowed.
PAGE 591
C H A P T E 43 R Error and Event Logging Contents ◆ ◆ ◆ ◆ ◆ “System Directories and Log Files” on page 592 “Managing Error Logs” on page 593 “System Event Log” on page 594 “Duplicate Logging of Event Logs” on page 595 “LSF Job Termination Reason Logging” on page 597 Administering Platform LSF 591
PAGE 592
System Directories and Log Files System Directories and Log Files LSF uses directories for temporary work files, log files and transaction files and spooling. LSF keeps track of all jobs in the system by maintaining a transaction log in the work subtree. The LSF log files are found in the directory LSB_SHAREDIR/cluster_name /logdir. The following files maintain the state of the LSF system: lsb.events LSF uses the lsb.events file to keep track of the state of all jobs.
PAGE 593
Chapter 43 Error and Event Logging Managing Error Logs Error logs maintain important information about LSF operations. When you see any abnormal behavior in LSF, you should first check the appropriate error logs to find out the cause of the problem. LSF log files grow over time. These files should occasionally be cleared, either by hand or using automatic scripts.
PAGE 594
System Event Log System Event Log The LSF daemons keep an event log in the lsb.events file. The mbatchd daemon uses this information to recover from server failures, host reboots, and mbatchd restarts. The lsb.events file is also used by the bhist command to display detailed information about the execution history of batch jobs, and by the badmin command to display the operational history of hosts, queues, and daemons. By default, mbatchd automatically backs up and rewrites the lsb.
PAGE 595
Chapter 43 Error and Event Logging Duplicate Logging of Event Logs To recover from server failures, host reboots, or mbatchd restarts, LSF uses information stored in lsb.events. To improve the reliability of LSF, you can configure LSF to maintain copies of these logs, to use as a backup. If the host that contains the primary copy of the logs fails, LSF will continue to operate using the duplicate logs. When the host recovers, LSF uses the duplicate logs to update the primary copies.
PAGE 596
Duplicate Logging of Event Logs events to LSB_LOCALDIR and M2 logging to LSB_SHAREDIR. When connectivity is restored, the changes made by M2 to LSB_SHAREDIR will be lost when M1 updates LSB_SHAREDIR from its copy in LSB_LOCALDIR. The archived event files are only available on LSB_LOCALDIR, so in the case of network partitioning, commands such as bhist cannot access these files. As a precaution, you should periodically copy the archived files from LSB_LOCALDIR to LSB_SHAREDIR.
PAGE 597
Chapter 43 Error and Event Logging LSF Job Termination Reason Logging When a job finishes, LSF reports the last job termination action it took against the job and logs it into lsb.acct. If a running job exits because of node failure, LSF sets the correct exit information in lsb.acct, lsb.events, and the job output file. Viewing logged job exit information (bacct -l) Use bacct -l to view job exit information logged to lsb.
PAGE 598
LSF Job Termination Reason Logging Termination reasons displayed by bacct When LSF detects that a job is terminated, bacct -l displays one of the following termination reasons: Keyword Reason TERM_ADMIN TERM_CHKPNT: TERM_CPULIMIT TERM_DEADLINE TERM_EXTERNAL_SIGNAL TERM_FORCE_ADMIN Job killed by root or LSF administrator Job killed after checkpointing Job killed after reaching LSF CPU usage limit Job killed after deadline expires Job killed by a signal external to LSF Job killed by root or LSF administra
PAGE 599
Chapter 43 Error and Event Logging Termination cause Termination reason in bacct –l Example bhist output TERMINATE_WHEN Completed ; TERM_LOAD/ TERM_WINDOWS/ TERM_PREEMPT Memory limit reached Completed ; TERM_MEMLIMIT Run limit reached Completed ; TERM_RUNLIMIT CPU limit Completed ; TERM_CPULIMIT Swap limit Completed ; TERM_SWAPLIMIT Regular job exits when host crashes Rusage 0, Completed ; TERM_ZOMBIE brequeue –r For each requeue, Completed ; TERM_R
PAGE 600
LSF Job Termination Reason Logging For instance, if your application had an explicit exit 129, you would see Exit code 129 in your output. When you send a signal that terminates the job, bhist reports either the signal or the value of signal+128. If the return status is greater than 128 and the job was terminated with a signal, then return_status-128=signal. Example For return status 133, the job was terminated with signal 5 (SIGTRAP on most systems, 133-128=5).
PAGE 601
Chapter 43 Error and Event Logging Administering Platform LSF 601
PAGE 602
LSF Job Termination Reason Logging 602 Administering Platform LSF
PAGE 603
C H A P T E 44 R Troubleshooting and Error Messages Contents ◆ ◆ ◆ ◆ ◆ “Shared File Access” on page 604 “Common LSF Problems” on page 605 “Error Messages” on page 610 “Setting Daemon Message Log to Debug Level” on page 616 “Setting Daemon Timing Levels” on page 619 Administering Platform LSF 603
PAGE 604
Shared File Access Shared File Access A frequent problem with LSF is non-accessible files due to a non-uniform file space. If a task is run on a remote host where a file it requires cannot be accessed using the same name, an error results. Almost all interactive LSF commands fail if the user’s current working directory cannot be found on the remote host. Shared files on UNIX If you are running NFS, rearranging the NFS mount table may solve the problem.
PAGE 605
Chapter 44 Troubleshooting and Error Messages Common LSF Problems This section lists some other common problems with the LIM, RES, mbatchd, sbatchd, and interactive applications. Most problems are due to incorrect installation or configuration. Check the error log files; often the log message points directly to the problem. LIM dies quietly Run the following command to check for errors in the LIM configuration files. % lsadmin ckconfig -v This displays most configuration errors.
PAGE 606
Common LSF Problems RES does not start Check the RES error log. UNIX If the RES is unable to read the lsf.conf file and does not know where to write error messages, it logs errors into syslog(3). Windows If the RES is unable to read the lsf.conf file and does not know where to write error messages, it logs errors into C:\temp. User permission denied If remote execution fails with the following error message, the remote host could not securely determine the user ID of the user requesting remote execution.
PAGE 607
Chapter 44 Troubleshooting and Error Messages LSF can resolve most, but not all, problems using automount. The automount maps must be managed through NIS. Follow the instructions in your Release Notes for obtaining technical support if you are running automount and LSF is not able to locate directories on remote hosts. Batch daemons die quietly First, check the sbatchd and mbatchd error logs. Try running the following command to check the configuration. % badmin ckconfig This reports most errors.
PAGE 608
Common LSF Problems UNKNOWN host type or model Viewing UNKNOWN host type or model Run lshosts. A model or type UNKNOWN indicates the host is down or the LIM on the host is down. You need to take immediate action. For example: % lshosts HOST_NAME type hostA UNKNOWN model Ultra2 Fixing UNKNOWN 1 host type or 2 model cpuf 20.2 ncpus 2 maxmem 256M maxswp 710M server Yes RESOURCES () Start the host. Run lsadmin limstartup to start up the LIMs on the host.
PAGE 609
Chapter 44 Troubleshooting and Error Messages Host Type Host Architecture Matched Type Matched Architecture Matched Model CPU Factor : : : : : : sun4 SUNWUltra2_200_sparcv9 DEFAULT SUNWUltra2_300_sparc Ultra2 20.2 Note the value of Host Type and Host Architecture. 2 Edit lsf.shared. In the HostType section, enter a new host type. Use the host type name detected with lim -t. For example: Begin HostType TYPENAME DEFAULT CRAYJ sun4 ... 3 4 Fixing DEFAULT 1 host model Save changes to lsf.shared.
PAGE 610
Error Messages Error Messages The following error messages are logged by the LSF daemons, or displayed by the following commands. lsadmin ckconfig badmin ckconfig General errors The messages listed in this section may be generated by any LSF daemon. can’t open file: error The daemon could not open the named file for the reason given by er r or. This error is usually caused by incorrect file permissions or missing files.
PAGE 611
Chapter 44 Troubleshooting and Error Messages Set the LSF binaries to be owned by root with the setuid bit set, or define LSF_AUTH=ident and set up an ident server on all hosts in the cluster. If the binaries are on an NFS-mounted file system, make sure that the file system is not mounted with the nosuid flag.
PAGE 612
Error Messages The number of fields on a line in a configuration section does not match the number of keywords. This may be caused by not putting () in a column to represent the default value. file: HostModel section missing or invalid file: Resource section missing or invalid file: HostType section missing or invalid The HostModel, Resource, or HostType section in the lsf.shared file is either missing or contains an unrecoverable error. file(line): Name name reserved or previously defined.
PAGE 613
Chapter 44 Troubleshooting and Error Messages getLicense: Can’t get software license for LIM from license file : feature not yet available. Your LSF license is not yet valid. Check whether the system clock is correct.
PAGE 614
Error Messages authRequest: Submitter’s name @ is different from name on this host RES assumes that a user has the same userID and username on all the LSF hosts. These messages occur if this assumption is violated. If the user is allowed to use LSF for interactive remote execution, make sure the user’s account has the same userID and username on all LSF hosts.
PAGE 615
Chapter 44 Troubleshooting and Error Messages touchElogLock: Failed to open lock file : error touchElogLock: close failed: error mbatchd failed to create, remove, read, or write the log directory or a file in the log directory, for the reason given in er r or. Check that LSF administrator has read, write, and execute permissions on the logdir directory.
PAGE 616
Setting Daemon Message Log to Debug Level Setting Daemon Message Log to Debug Level The message log level for LSF daemons is set in lsf.conf with the parameter LSF_LOG_MASK. To include debugging messages, set LSF_LOG_MASK to one of: LOG_DEBUG ◆ LOG_DEBUG1 ◆ LOG_DEBUG2 ◆ LOG_DEBUG3 By default, LSF_LOG_MASK=LOG_WARNING and these debugging messages are not displayed. ◆ The debugging log classes for LSF daemons is set in lsf.
PAGE 617
Chapter 44 Troubleshooting and Error Messages Examples ◆ % lsadmin limdebug -c "LC_MULTI LC_PIM" -f myfile hostA hostB Log additional messages for the LIM daemon running on hostA and hostB, related to MultiCluster and PIM. Create log files in the LSF_LOGDIR directory with the name myfile.lim.log.hostA, and myfile.lim.log.hostB. The debug level is the default value, LOG_DEBUG level in parameter LSF_LOG_MASK.
PAGE 618
Setting Daemon Message Log to Debug Level LSF_DEBUG_LIM, LSB_DEBUG_MBD, LSB_DEBUG_SBD, and LSB_DEBUG_SCH. The log file is reset to the LSF system log file in the directory specified by LSF_LOGDIR in the format daemon_name.log.host_name. For timing level examples, see “Setting Daemon Timing Levels” on page 619.
PAGE 619
Chapter 44 Troubleshooting and Error Messages Setting Daemon Timing Levels The timing log level for LSF daemons is set in lsf.conf with the parameters LSB_TIME_CMD, LSB_TIME_MBD, LSB_TIME_SBD, LSB_TIME_SCH, LSF_TIME_LIM, LSF_TIME_RES. The location of log files is specified with the parameter LSF_LOGDIR in lsf.conf. Timing is included in the same log files as messages. To change the timing log level, you need to stop any running daemons, change lsf.conf, and then restart the daemons.
PAGE 620
Setting Daemon Timing Levels 620 Administering Platform LSF
PAGE 621
P A R VIII T LSF Utilities Contents ◆ Chapter 45, “Using lstcsh”
PAGE 622
PAGE 623
C H A P T E R 45 Using lstcsh This chapter describes lstcsh, an extended version of the tcsh command interpreter. The lstcsh interpreter provides transparent load sharing of user jobs. This chapter is not a general description of the tcsh shell. Only load sharing features are described in detail. Interactive tasks, including lstcsh, are not supported on Windows.
PAGE 624
About lstcsh About lstcsh The lstcsh shell is a load-sharing version of the tcsh command interpreter. It is compatible with csh and supports many useful extensions. csh and tcsh users can use lstcsh to send jobs to other hosts in the cluster without needing to learn any new commands. You can run lstcsh from the command-line, or use the chsh command to set it as your login shell.
PAGE 625
Chapter 45 Using lstcsh Task Lists LSF maintains two task lists for each user, a local list (.lsftask) and a remote list (lsf.task). Commands in the local list must be executed locally. Commands in the remote list can be executed remotely. See the Platfor m LSF Refer ence for information about the .lsftask and lsf.task files. Changing task list membership You can use the LSF commands lsltasks and lsrtasks to inspect and change the memberships of the local and remote task lists.
PAGE 626
Local and Remote Modes Local and Remote Modes lstcsh has two modes of operation: ◆ ◆ Local Remote Local mode The local mode is the default mode. In local mode, a command line is eligible for remote execution only if all of the commands on the line are present in the remote task list, or if the @ character is specified on the command-line to force it to be eligible. See “@ character” on page 632 for more details.
PAGE 627
Chapter 45 Using lstcsh Automatic Remote Execution Every time you enter a command, lstcsh looks in your task lists to determine whether the command can be executed on a remote host and to find the configured resource requirements for the command. See the Platfor m LSF Refer ence for information about task lists and lsf.task file. If the command can be executed on a remote host, lstcsh contacts LIM to find the best available host.
PAGE 628
Differences from Other Shells Differences from Other Shells When a command is running in the foreground on a remote host, all keyboard input (type-ahead) is sent to the remote host. If the remote command does not read the input, it is lost. lstcsh has no way of knowing whether the remote command reads its standard input. The only way to provide any input to the command is to send everything available on the standard input to the remote command in case the remote command needs it.
PAGE 629
Chapter 45 Using lstcsh Limitations A shell is a very complicated application by itself. lstcsh has certain limitations: Native language system Native Language System is not supported. To use this feature of the tcsh, you must compile tcsh with SHORT_STRINGS defined. This causes complications for characters flowing across machines. Shell variables Shell variables are not propagated across machines.
PAGE 630
Starting lstcsh Starting lstcsh Starting lstcsh If you normally use some other shell, you can start lstcsh from the command-line. Make sure that the LSF commands are in your PATH environment variable, then enter: % lstcsh If you have a .cshrc file in your home directory, lstcsh reads it to set variables and aliases. Exiting lstcsh Use the exit command to get out of lstcsh.
PAGE 631
Chapter 45 Using lstcsh Using lstcsh as Your Login Shell If your system administrator allows, you can use LSF as your login shell. The /etc/shells file contains a list of all the shells you are allowed to use as your login shell. Setting your login shell Using csh The chsh command can set your login shell to any of those shells. If the /etc/shells file does not exist, you cannot set your login shell to lstcsh.
PAGE 632
Host Redirection Host Redirection Host redirection overrides the task lists, so you can force commands from your local task list to execute on a remote host or override the resource requirements for a command. You can explicitly specify the eligibility of a command-line for remote execution using the @ character. It may be anywhere in the command line except in the first position (@ as the first character on the line is used to set the value of shell variables).
PAGE 633
Chapter 45 Using lstcsh Task Control Task control in lstcsh is the same as in tcsh except for remote background tasks. lstcsh numbers shell tasks separately for each execution host. jobs command The output of the built-in command jobs lists background tasks together with their execution hosts. This break of transparency is intentional to give you more control over your background tasks.
PAGE 634
Built-in Commands Built-in Commands lstcsh supports two built-in commands to control load sharing, lsmode and connect. In this section ◆ ◆ “lsmode” on page 634 “connect” on page 635 lsmode Syntax lsmode [on|off] [local|remote] [e|-e] [v|-v] [t|-t] Description The lsmode command reports that LSF is enabled if lstcsh was able to contact LIM when it started up. If LSF is disabled, no load-sharing features are available. The lsmode command takes a number of arguments that control how lstcsh behaves.
PAGE 635
Chapter 45 Using lstcsh This time includes all remote execution overhead. The csh time builtin does not include the remote execution overhead. This is an impartial way of comparing the response time of jobs submitted locally or remotely, because all the load sharing overhead is included in the displayed elapsed time. The default is off. connect Syntax connect [host_name] Description lstcsh opens a connection to a remote host when the first command is executed remotely on that host.
PAGE 636
Writing Shell Scripts in lstcsh Writing Shell Scripts in lstcsh You should write shell scripts in /bin/sh and use the lstools commands for load sharing. However, lstcsh can be used to write load-sharing shell scripts. By default, an lstcsh script is executed as a normal tcsh script with load-sharing disabled.
PAGE 637
Index Symbols ! (NOT) operator, job dependencies 376 %I substitution string in job arrays 419 %J substitution string in job arrays 419 %USRCMD string in job starters 490 && (AND) operator, job dependencies 376 .cshrc file and lstcsh 630 .lsbatch directory 73 .rhosts file disadvantages 571 file transfer with lsrcp 590 host authentication 571 troubleshooting 606 /etc/hosts file example host entries 116 host naming 113 name lookup 114 troubleshooting 606 /etc/hosts.
PAGE 638
Index privileged ports (setuid) 567 RFC 1413 and RFC 931 568 security 569 authentication environments 566 automatic checkpointing 404 duplicate event logging 596 event log archiving 596 job migration 407 job requeue 387 job rerun 391 queue selection 69 remote execution in lstcsh 627 automatic time-based configuration, description 243 automount command, NFS (Network File System) 586, 604 automount option, /net 588 B bacct command CPU time display 472 SLA scheduling 319 bacct -U, advance reservations 369 bac
PAGE 639
Index bsub -Zs 581 built-in load indices, overriding 227 built-in resources 206 busy host status lsload command 93 status load index 209 busy thresholds, tuning 556 C calculating license key check sums (lmcksum) 184 calculation of required licenses 171 candidate master hosts, specifying 559 ceiling resource usage limit 464 ceiling run limit 464 chargeback fairshare 308 check_license script, for counted software licenses 232 checking license server status (lmstat) 183 LSF floating client 195 checkout of lic
PAGE 640
Index busy thresholds 556 LIM policies 555 load indices 556 load thresholds 556 mbatchd on UNIX 563 run windows 555 viewing, errors 90 configuration files location 160 non-shared file systems 587 reconfiguration quick reference 89 configuration parameters. See individual parameter names CONTROL_ACTION parameter in lsb.serviceclasses 325 core file size limit 467 CORELIMIT parameter in lsb.
PAGE 641
Index breboot. See badmin reconfig breconfig. See badmin reconfig lslockhost. See lsadmin limlock lsreconfig. See lsadmin reconfig lsunlockhost. See lsadmin limunlock DFS (Distributed File System). See DCE/DFS directories checkpoint 400 default UNIX directory structure 82 default Windows directory structure 84 log, permissions and ownership 592 .
PAGE 642
Index /etc/hosts.equiv file host authentication 571 troubleshooting 606 using rcp 590 /etc/services file, adding LSF entries to 110 /etc/syslog.conf file 593 evaluation license 166 event generation 550 event log archiving, automatic 596 event log replication.
PAGE 643
Index spooling file systems AFS (Andrew File System) 586 DCE/DFS (Distributed File System) 586 NFS (Network File System) 586 supported by LSF 586 file transfer, lsrcp command 590 file, updating 180 FILELIMIT parameter in lsb.queues 468 files /etc/hosts example host entries 116 host naming 113 name lookup 114 /etc/services, adding LSF entries to 110 automatic time-based configuration 243 copying across hosts 534 enabling utmp registration 528 hosts, configuring 115 if-else constructs 243 license.
PAGE 644
Index hopen badmin command 99 host affinity, same string 274 host authentication LSF_USE_HOSTEQUIV parameter in lsf.
PAGE 645
Index example host entries 116 host naming 113 name lookup 114 troubleshooting 606 hosts file (LSF), configuring 115 HOSTS parameter in lsb.hosts 118 in lsb.queues file 118 hosts.
PAGE 646
Index job job job job job job job job job job job job job job job SUSPEND 506 TERMINATE 507 terminating 509 using commands in 509 with lstcsh 633 dependencies, logical operators 376 dependency conditions advanced 379 description 378 done 378 ended 378 examples 379 exit 378 external 379 job arrays 421 job name 379 post_done 379, 481 post_err 379, 481 post-processing 481 scheduling 376 specifying 376 specifying job ID 379 started 379 dispatch fast 544 maximum per session 544 dispatch order, fairshare
PAGE 647
Index JOB_TERMINATE_INTERVAL parameter in lsb.params 468, 507 JOB_UNDERRUN parameter in lsb.
PAGE 648
Index load thresholds 556 policies 555 run windows 555 LIM, master 171 lim.log.
PAGE 649
Index CHUNK_JOB_DURATION parameter 411 CONDENSE_PENDING_REASONS parameter 546 controlling lsb.
PAGE 650
Index displaying enabled license 187 LSF vendor license daemon (lsf_ld) 171 LSF version, viewing 80 lsf.cluster.cluster_name license checkout 171 LSF_HOST_ADDR_RANGE parameter 104 lsf.cluster.cluster_name file ADMINISTRATORS parameter 85 configuring cluster administrators 85 exclusive resources 267 floating LSF client licenses 192 lsf.
PAGE 651
Index lslockhost, obsolete command. See lsadmin limlock lspasswd command 570 lsrcp command description 588 executable file location 590 file transfer 590 restrictions 590 lsreconfig, obsolete command. See lsadmin reconfig lstcsh about 624 difference from other shells 628 exiting 630 limitations 629 local mode 626 remote mode 626 resource requirements 625 starting 630 task lists 625 using as login shell 631 writing shell scripts in 636 lsunlockhost, obsolete command.
PAGE 652
Index NO_PREEMPT_FINISH_TIME parameter in lsb.params 257 NO_PREEMPT_RUN_TIME parameter in lsb.
PAGE 653
Index /etc/services file, adding LSF entries to 110 /net 588 /usr/bin/ 73 PEND, job state 140 pending reasons, queue-level resource reservation 343 pending reasons, viewing 141 per-CPU licensing 171 performance tuning busy thresholds 556 LIM policies 555 load indices 556 load thresholds 556 mbatchd on UNIX 563 preselecting master hosts 559 run windows for LIM 555 periodic checkpointing description 403 disabling 403 job-level 403 queue-level 404 periodic tasks 593 per-job CPU limit 467 permanent license 166
PAGE 654
Index Q qact badmin command 132 qclose badmin command 132 qinact badmin command 132 QJOB_LIMIT parameter in lsb.queues 334 qopen badmin command 132 queue administrators, displaying 130 queue dispatch windows 374 queue priority 68 queue thresholds, viewing 72 QUEUE_NAME parameter in lsb.
PAGE 655
Index resource names aliases 265 description 218 resource requirements and task lists in lstcsh 625 description 260 exclusive resources 267 for advance reservations 361 host type 260 operators 266 ordering hosts 263, 268 parallel job locality 263, 273 parallel job processes 263, 274 parallel jobs, selecting hosts 274 resource reservation 269 resource usage 263, 269 select string 265 selecting hosts 263, 265, 274 resource reservation description 342 static shared resources 221 resource usage fairshare schedu
PAGE 656
Index restarting 87 shutting down 87 sbatchd.log.
PAGE 657
Index bstop 148 default SUSPEND action 506 site-defined resources, resource types 206 SLA scheduling bacct command 319 bsla command 318 configuring 315 deadline goals 312 delayed goals 325 description 312 job preemption 325 missed goals 325 optimum number of running jobs 313 service classes description 312 examples 315 service level goals 313 submitting jobs 313 velocity goals 312 violation period 325 SLOT_POOL parameter, in lsb.queues 293 SLOT_SHARE parameter in lsb.
PAGE 658
Index temporary license 166 TERMINATE job control action 507 TERMINATE_WHEN parameter in lsb.queues changing default SUSPEND action 509 TERMINATE job control action 507 TerminateProcess() system call (Windows), job control actions 507 THREADLIMIT parameter in lsb.
PAGE 659
Index /usr/include/sys/syslog.h file 593 %USRCMD string in job starters 490 USUSP job state description 148 overview 140 suspending and resuming jobs 148 ut load index built-in resource 210 description 521 utmp file registration on IRIX, enabling 528 V variables.
PAGE 660
Index 660 Administering Platform LSF