LSF Version 7.3 - Administering Platform LSF

ManualsBrandsHP ManualsSoftwareHP XC System 4.x Software

Administering Platform™ LSF™

Version 7 Update 3

Release date: May 2008

Last modified: May 16, 2008

Comments to: doc@platform.com

Support: support@platform.com

Summary of content (774 pages)

PAGE 1
Administering Platform™ LSF™ Version 7 Update 3 Release date: May 2008 Last modified: May 16, 2008 Comments to: doc@platform.com Support: support@platform.
PAGE 2
Copyright © 1994-2008, Platform Computing Inc. Although the information in this document has been carefully reviewed, Platform Computing Inc. (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.
PAGE 3
Contents 1 About Platform LSF . . . . . . . . . . . . . . . Contents . . . . . . . . . . Learn about Platform LSF Cluster Concepts . . . . . Job Life Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 4
Contents Hosts with Multiple Addresses . . . Using IPv6 Addresses . . . . . . . . Host Groups . . . . . . . . . . . . . . Tuning CPU Factors . . . . . . . . . . Handling Host-level Job Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 5
Contents 9 Managing LSF on Platform EGO . . . . . . . . . . . . . . . Contents . . . . . . . . . . . . . . . . . . About LSF on Platform EGO . . . . . . LSF and EGO directory structure . . . Configuring LSF and EGO . . . . . . . . Managing LSF daemons through EGO EGO control of PMC and PERF services Administrative Basics for PMC and CLI Logging and troubleshooting . . . . . Frequently asked questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
Contents External Load Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifying a Built-In Load Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Managing Software Licenses with LSF . . . . . . . . Contents . . . . . . . . . . . . . . . Using Licensed Software with LSF Host-locked Licenses . . . . . . . . Counted Host-Locked Licenses . Network Floating Licenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 7
Contents Queue-based Fairshare . . . . . . . . . . . . . Configuring Slot Allocation per Queue . . . View Queue-based Fairshare Allocations . . Typical Slot Allocation Scenarios . . . . . . . Using Historical and Committed Run Time . Users Affected by Multiple Fairshare Policies Ways to Configure Fairshare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 8
Contents 24 Advance Reservation . . . . . . . . . . . . . . . . . . . . . . Contents . . . . . . . . . . . . . . . . . . Understanding Advance Reservations Configure Advance Reservation . . . . Using Advance Reservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 9
Contents Passing Arguments on the Command Line Job Array Dependencies . . . . . . . . . . . Monitoring Job Arrays . . . . . . . . . . . . . Individual job status . . . . . . . . . . . . . . Specific job status . . . . . . . . . . . . . . . . Controlling Job Arrays . . . . . . . . . . . . . Requeuing a Job Array . . . . . . . . . . . . . Job Array Job Slot Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Contents 35 Load Thresholds . . . . . . . . . . . . . . . . . . Contents . . . . . . . . . . . Automatic Job Suspension Suspending Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 11
Contents Part VII: Monitoring Your Cluster 42 Achieving Performance and Scalability . . . . . . . . . . . . . . . . Contents . . . . . . . . . . . . . . . . . . . . . . . Optimizing Performance in Large Sites . . . . Tuning UNIX for Large Clusters . . . . . . . . . Tuning LSF for Large Clusters . . . . . . . . . . Monitoring Performance Metrics in Real Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 12
Contents 48 Non-Shared File Systems . . . . . . . . . . . . . . . . . . . . . Contents . . . . . . . . . . . . . . . . . . . About Directories and Files . . . . . . . . Using LSF with Non-Shared File Systems Remote File Access . . . . . . . . . . . . . File Transfer Mechanism (lsrcp) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 13
C H A P T E R 1 About Platform LSF Contents ◆ Learn about Platform LSF on page 14 ◆ Cluster Concepts on page 14 ◆ Job Life Cycle on page 26 Administering Platform LSF 13
PAGE 14
Learn about Platform LSF Learn about Platform LSF Before using Platform LSF for the first time, you should download and read LSF Version 7 Release Notes for the latest information about what’s new in the current release and other important information. Cluster Concepts Clusters, jobs, and queues Cluster A group of computers (hosts) running LSF that work together as a single unit, combining computing power and sharing workload and resources.
PAGE 15
About Platform LSF Job A unit of work run in the LSF system. A job is a command submitted to LSF for execution. LSF schedules, controls, and tracks the job according to configured policies. Jobs can be complex problems, simulation scenarios, extensive calculations, anything that needs compute power. Commands: Job slot ◆ bjobs —View jobs in the system ◆ bsub —Submit jobs A job slot is a bucket into which a single unit of work is assigned in the LSF system.
PAGE 16
Cluster Concepts When you submit a job to a queue, you do not need to specify an execution host. LSF dispatches the job to the best available execution host in the cluster to run that job. Queues implement different job scheduling and control policies. Commands: ◆ bqueues —View available queues ◆ bsub -q —Submit a job to a specific queue ◆ bparams —View default queues Configuration: ◆ Define queues in lsb.queues TIP: The names of your queues should be unique.
PAGE 17
About Platform LSF Commands: ◆ Server host bjobs —View where a job runs Hosts that are capable of submitting and executing jobs. A server host runs sbatchd to execute server requests and apply local policies. An LSF cluster may consist of static and dynamic hosts. Dynamic host configuration allows you to add and remove hosts without manual reconfiguration. By default, all configuration changes made to LSF are static.
PAGE 18
Cluster Concepts Master host Where the master LIM and mbatchd run. An LSF server host that acts as the overall coordinator for that cluster. Each cluster has one master host to do all job scheduling and dispatch. If the master host goes down, another LSF server in the cluster becomes the master host. All LSF daemons run on the master host. The LIM on the master host is the master LIM.
PAGE 19
About Platform LSF Configuration: ◆ res Port number defined in lsf.conf Remote Execution Server (RES) running on each server host. Accepts remote execution requests to provide transparent and secure remote execution of jobs and tasks. Commands: ◆ lsadmin resstartup —Starts res ◆ lsadmin resshutdown —Shuts down res ◆ lsadmin resrestart —Restarts res Configuration: ◆ lim Port number defined in lsf.conf Load Information Manager (LIM) running on each server host.
PAGE 20
Cluster Concepts ◆ lsadmin limrestart —Restarts LIM ◆ lsload —View dynamic load values ◆ lshosts —View static host load values Configuration: ◆ Master LIM Port number defined in lsf.conf. The LIM running on the master host. Receives load information from the LIMs running on hosts in the cluster. Forwards load information to mbatchd, which forwards this information to mbschd to support scheduling decisions. If the master LIM becomes unavailable, a LIM on another host automatically takes over.
PAGE 21
About Platform LSF ◆ Interactive batch job bsub —Submit jobs A batch job that allows you to interact with the application and still take advantage of LSF scheduling policies and fault tolerance. All input and output are through the terminal that you used to type the job submission command. When you submit an interactive job, a message is displayed while the job is awaiting scheduling. A new job cannot be submitted until the interactive job is completed or terminated.
PAGE 22
Cluster Concepts Host types and host models Hosts in LSF are characterized by host type and host model. The following example host type X86_64, with host models Opteron240, Opteron840, Intel_EM64T, Intel_IA64, etc. Host type The combination of operating system version and host CPU architecture. All computers that run the same operating system on the same computer architecture are of the same type—in other words, binary-compatible with each other.
PAGE 23
About Platform LSF Primary LSF administrator The first cluster administrator specified during installation and first administrator listed in lsf.cluster.cluster_name. The primary LSF administrator account owns the configuration and log files. The primary LSF administrator has permission to perform clusterwide operations, change configuration files, reconfigure the cluster, and control jobs submitted by all users.
PAGE 24
Cluster Concepts ◆ External load indices bhosts -l —View load levels on a host Defined and configured by the LSF administrator and collected by an External Load Information Manager (ELIM) program. The ELIM also updates LIM when new values are received. Commands: ◆ Static resources lsinfo —View external load indices Built-in resources that represent host information that does not change over time, such as the maximum RAM available to user processes or the number of processors in a machine.
PAGE 25
About Platform LSF Resource allocation Restrict the amount of a given resource that must be available during job scheduling for different classes of jobs to start, and which resource consumers the limits apply limits to. If all of the resource has been consumed, no more jobs can be started until some of the resource is released. Configuration: ◆ lsb.resources —Configure queue-level resource allocation limits for hosts, users, queues, and projects Resource Restrict which hosts the job can run on.
PAGE 26
Job Life Cycle Job Life Cycle 1 Submit a job You submit a job from an LSF client or server with the bsub command. If you do not specify a queue when submitting the job, the job is submitted to the default queue. Jobs are held in a queue waiting to be scheduled and have the PEND state. The job is held in a job file in the LSF_SHAREDIR/cluster_name/logdir/info/ directory, or in one of its subdirectories if MAX_INFO_DIRS is defined in lsb.params.
PAGE 27
About Platform LSF 4 Run job sbatchd handles job execution.
PAGE 28
Job Life Cycle 28 Administering Platform LSF
PAGE 29
P A R T I Managing Your Cluster ◆ Working with Your Cluster on page 41 ◆ Working with Hosts on page 57 ◆ Working with Queues on page 101 ◆ Managing Jobs on page 113 ◆ Managing Users and User Groups on page 143 ◆ Platform LSF Licensing on page 149 ◆ Managing LSF on Platform EGO on page 185 ◆ Cluster Version Management and Patching on UNIX and Linux on page 213 Administering Platform LSF 29
PAGE 30
Administering Platform LSF
PAGE 31
C H A P T E R 2 How the System Works LSF can be configured in different ways that affect the scheduling of jobs. By default, this is how LSF handles a new job: 1 Receive the job. Create a job file. Return the job ID to the user. 2 Schedule the job and select the best available host. 3 Dispatch the job to a selected host. 4 Set the environment on the host. 5 Start the job.
PAGE 32
Job Submission Job Submission The life cycle of a job starts when you submit the job to LSF. On the command line, bsub is used to submit jobs, and you can specify many options to bsub to modify the default behavior, including the use of a JSDL file. Jobs must be submitted to a queue. Queues Queues represent a set of pending jobs, lined up in a defined order and waiting for their opportunity to use resources. Queues implement different job scheduling and control policies.
PAGE 33
How the System Works Automatic queue selection Typically, a cluster has multiple queues. When you submit a job to LSF you might define which queue the job will enter. If you submit a job without specifying a queue name, LSF considers the requirements of the job and automatically chooses a suitable queue from a list of candidate default queues. If you did not define any candidate default queues, LSF will create a new queue using all the default settings, and submit the job to that queue.
PAGE 34
Job Scheduling and Dispatch Job Scheduling and Dispatch Submitted jobs sit in queues until they are scheduled and dispatched to a host for execution.
PAGE 35
How the System Works 300 and less than or equal to one-half the value of MAX_SBD_CONNS defined in lsb.params. LSB_MAX_JOB_DISPATCH_PER_SESSION defines the maximum number of jobs that mbatchd can dispatch during one job scheduling session. You must restart mbatchd and sbatchd when you change the value of this parameter for the change to take effect. Dispatch order Jobs are not necessarily dispatched in order of submission.
PAGE 36
Host Selection Host Selection Each time LSF attempts to dispatch a job, it checks to see which hosts are eligible to run the job. A number of conditions determine whether a host is eligible: ◆ Host dispatch windows ◆ Resource requirements of the job ◆ Resource requirements of the queue ◆ Host list of the queue ◆ Host load levels ◆ Job slot limits of the host. A host is only eligible to run a job if all the conditions are met.
PAGE 37
How the System Works Job Execution Environment When LSF runs your jobs, it tries to make it as transparent to the user as possible. By default, the execution environment is maintained to be as close to the submission environment as possible. LSF will copy the environment from the submission host to the execution host.
PAGE 38
Fault Tolerance Fault Tolerance LSF is designed to continue operating even if some of the hosts in the cluster are unavailable. One host in the cluster acts as the master, but if the master host becomes unavailable another host takes over. LSF is available as long as there is one available host in the cluster. LSF can tolerate the failure of any host or group of hosts in the cluster. When a host crashes, all jobs running on that host are lost. No other pending or running jobs are affected.
PAGE 39
How the System Works Host failure If an LSF server host fails, jobs running on that host are lost. No other jobs are affected. Jobs can be submitted as rerunnable, so that they automatically run again from the beginning or as checkpointable, so that they start again from a checkpoint on another host if they are lost because of a host failure. If all of the hosts in a cluster go down, all running jobs are lost. When a host comes back up and takes over as master, it reads the lsb.
PAGE 40
Fault Tolerance 40 Administering Platform LSF
PAGE 41
C H A P T E R 3 Working with Your Cluster Contents ◆ Viewing cluster information on page 42 ◆ Example directory structures on page 47 ◆ Cluster administrators on page 49 ◆ Controlling daemons on page 50 ◆ Controlling mbatchd on page 52 ◆ Reconfiguring your cluster on page 55 Administering Platform LSF 41
PAGE 42
Viewing cluster information Viewing cluster information LSF provides commands for users to access information about the cluster. Cluster information includes the cluster master host, cluster name, cluster resource definitions, cluster administrator, and so on. To view the ... Run ...
PAGE 43
Working with Your Cluster 2 Run bparams -l to display the information in long format, which gives a brief description of each parameter and the name of the parameter as it appears in lsb.params.
PAGE 44
Viewing cluster information Viewing daemon parameter configuration 1 Display all configuration settings for running LSF daemons. ❖ Use lsadmin showconf to display all configured parameters and their values in lsf.conf or ego.conf for LIM. ❖ Use badmin showconf to display all configured parameters and their values in lsf.conf or ego.conf for mbatchd and sbatchd. In a MultiCluster environment, lsadmin showconf and badmin showconf only display the parameters of daemons on the local cluster.
PAGE 45
Working with Your Cluster LSF_CONFDIR=/scratch/dev/lsf/user1/0604/conf LSF_LOG_MASK=LOG_WARNING LSF_ENVDIR=/scratch/dev/lsf/user1/0604/conf LSF_EGO_DAEMON_CONTROL=N … ◆ Show sbatchd configuration on a specific host: badmin showconf sbd hosta SBD configuration for host at Fri Jun 8 10:27:52 CST 2007 LSB_SHAREDIR=/scratch/dev/lsf/user1/0604/work LSF_CONFDIR=/scratch/dev/lsf/user1/0604/conf LSF_LOG_MASK=LOG_WARNING LSF_ENVDIR=/scratch/dev/lsf/user1/0604/conf LSF_EGO_DAEMON_CONTROL=N … ◆ Show sbatchd
PAGE 46
Viewing cluster information LSF_CONFDIR=/scratch/dev/lsf/user1/0604/conf LSF_LOG_MASK=LOG_WARNING LSF_ENVDIR=/scratch/dev/lsf/user1/0604/conf LSF_EGO_DAEMON_CONTROL=N … ◆ Show lim configuration for all hosts: lsadmin showconf lim all LIM configuration for host at Fri Jun 8 10:27:52 CST 2007 LSB_SHAREDIR=/scratch/dev/lsf/user1/0604/work LSF_CONFDIR=/scratch/dev/lsf/user1/0604/conf LSF_LOG_MASK=LOG_WARNING LSF_ENVDIR=/scratch/dev/lsf/user1/0604/conf LSF_EGO_DAEMON_CONTROL=N … LIM configuration for ho
PAGE 47
Working with Your Cluster Example directory structures UNIX and Linux The following figures show typical directory structures for a new UNIX or Linux installation with lsfinstall. Depending on which products you have installed and platforms you have selected, your directory structure may vary.
PAGE 48
Example directory structures Microsoft Windows The following diagram shows an example directory structure for a Windows installation.
PAGE 49
Working with Your Cluster Cluster administrators Primary cluster administrator Required. The first cluster administrator, specified during installation. The primary LSF administrator account owns the configuration and log files. The primary LSF administrator has permission to perform clusterwide operations, change configuration files, reconfigure the cluster, and control jobs submitted by all users. Other cluster administrators Optional. May be configured during or after installation.
PAGE 50
Controlling daemons Controlling daemons Permissions required To control all daemons in the cluster, you must ◆ Be logged on as root or as a user listed in the /etc/lsf.sudoers file. See the Platform LSF Configuration Reference for configuration details of lsf.sudoers. ◆ Be able to run the rsh or ssh commands across all LSF hosts without having to enter a password. See your operating system documentation for information about configuring the rsh and ssh commands.
PAGE 51
Working with Your Cluster sbatchd Restarting sbatchd on a host does not affect jobs that are running on that host. If sbatchd is shut down, the host is not available to run new jobs. Existing jobs running on that host continue, but the results are not sent to the user until sbatchd is restarted. LIM and RES Jobs running on the host are not affected by restarting the daemons. If a daemon is not responding to network connections, lsadmin displays an error message with the host name.
PAGE 52
Controlling mbatchd Controlling mbatchd You use the badmin command to control mbatchd. Reconfigure mbatchd If you add a host to a host group, a host to a queue, or change resource configuration in the Hosts section of lsf.cluster.cluster_name, the change is not recognized by jobs that were submitted before you reconfigured. If you want the new host to be recognized, you must restart mbatchd. 1 Run badmin reconfig. When you reconfigure the cluster, mbatchd is not restarted.
PAGE 53
Working with Your Cluster Shut down mbatchd 1 Run badmin hshutdown to shut down sbatchd on the master host. For example: badmin hshutdown hostD Shut down slave batch daemon on .... done 2 Run badmin mbdrestart: badmin mbdrestart Checking configuration files ... No errors found. This causes mbatchd and mbschd to exit. mbatchd cannot be restarted, because sbatchd is shut down. All LSF services are temporarily unavailable, but existing jobs are not affected.
PAGE 54
Customize batch command messages Customize batch command messages LSF displays error messages when a batch command cannot communicate with mbatchd. Users see these messages when the batch command retries the connection to mbatchd. You can customize three of these messages to provide LSF users with more detailed information and instructions. 1 In the file lsf.conf, identify the parameter for the message that you want to customize.
PAGE 55
Working with Your Cluster Reconfiguring your cluster After changing LSF configuration files, you must tell LSF to reread the files to update the configuration. Use the following commands to reconfigure a cluster: ◆ lsadmin reconfig ◆ badmin reconfig ◆ badmin mbdrestart The reconfiguration commands you use depend on which files you change in LSF. The following table is a quick reference. After making changes to ... Use ... Which ... hosts badmin reconfig license.
PAGE 56
Reconfiguring your cluster If no errors are found, you are prompted to either restart lim on master host candidates only, or to confirm that you want to restart lim on all hosts. If fatal errors are found, reconfiguration is aborted. 3 Run badmin reconfig to reconfigure mbatchd: badmin reconfig The badmin reconfig command checks for configuration errors. If fatal errors are found, reconfiguration is aborted.
PAGE 57
C H A P T E R 4 Working with Hosts Contents ◆ Host status on page 58 ◆ How LIM Determines Host Models and Types on page 60 ◆ Viewing Host Information on page 62 ◆ Controlling Hosts on page 68 ◆ Adding a Host on page 71 ◆ Remove a Host on page 73 ◆ Adding Hosts Dynamically on page 75 ◆ Add Host Types and Host Models to lsf.
PAGE 58
Host status Host status Host status describes the ability of a host to accept and run batch jobs in terms of daemon states, load levels, and administrative controls. The bhosts and lsload commands display host status. bhosts Displays the current status of the host: STATUS Description ok Host is available to accept and run new batch jobs. Host is down, or LIM and sbatchd are unreachable. LIM is running but sbatchd is unreachable. Host will not accept new jobs. Use bhosts -l to display the reasons.
PAGE 59
Working with Hosts loadSched loadStop r15s - r1m - r15m - ut - pg - cpuspeed bandwidth loadSched - - loadStop - - io - ls - it - tmp - swp - mem - lsload Displays the current state of the host: Status Description ok Host is available to accept and run batch jobs and remote tasks. LIM is running but RES is unreachable. Does not affect batch jobs, only used for remote task placement (i.e., lsrun). The value of a load index exceeded a threshold (configured in lsf.cluster.
PAGE 60
How LIM Determines Host Models and Types How LIM Determines Host Models and Types The LIM (load information manager) daemon/service automatically collects information about hosts in an LSF cluster, and accurately determines running host models and types. At most, 1024 model types can be manually defined in lsf.shared. If lsf.
PAGE 61
Working with Hosts Architecture name of running host What the lim reports Additional information about the lim process Similar to what is defined in Reports fuzzy match based on detection of 1or 2 fields in the input architecture string ◆ For input architecture strings with only one field, if LIM cannot detect an exact match for the input string, then it reports the best match. A best match is a model field with the most characters shared by the input string.
PAGE 62
Viewing Host Information Viewing Host Information LSF uses some or all of the hosts in a cluster as execution hosts. The host list is configured by the LSF administrator. Use the bhosts command to view host information. Use the lsload command to view host load information. To view... Run...
PAGE 63
Working with Hosts hostB ok 1 2 2 1 0 1 0 hostC ok - 3 0 0 0 0 0 hostE ok 2 4 2 1 0 0 1 hostF ok - 2 2 1 0 1 0 View detailed server host information Run bhosts -l host_name and lshosts -l host_name to display all information about each server host such as the CPU factor and the load thresholds to start, suspend, and resume jobs: 1 bhosts -l hostB HOST hostB STATUS CPUF ok 20.
PAGE 64
Viewing Host Information View host load by host The lsload command reports the current status and load levels of hosts in a cluster. The lshosts -l command shows the load thresholds. The lsmon command provides a dynamic display of the load information. The LSF administrator can find unavailable or overloaded hosts with these tools. 1 Run lsload to see load levels for each host: lsload HOST_NAME status r15s r1m r15m ut pg ls it tmp swp hostD ok 1.3 1.2 0.9 92% 0.0 2 20 5M hostB -ok 0.1 0.
PAGE 65
Working with Hosts a DEFAULT host type can be migrated to another DEFAULT host type. automatic detection of host type or model has failed, and the host type configured in lsf.shared cannot be found. View host history 1 Run badmin hhist to view the history of a host such as when it is opened or closed: badmin hhist hostB Wed Nov 20 14:41:58: Host closed by administrator . Wed Nov 20 15:23:39: Host opened by administrator .
PAGE 66
Viewing Host Information Threads per Core : 2 License Needed : Class(B),Multi-cores Matched Type : NTX64 Matched Architecture : EM64T_3000 Matched Model : Intel_EM64T CPU Factor : 60.0 View job exit rate and load for hosts Run bhosts to display the exception threshold for job exit rate and the current load value for hosts.: 1 In the following example, EXIT_RATE for hostA is configured as 4 jobs per minute.
PAGE 67
Working with Hosts 2 Use bhosts -x to see hosts whose job exit rate has exceeded the threshold for longer than JOB_EXIT_RATE_DURATION, and are still high. By default, these hosts are closed the next time LSF checks host exceptions and invokes eadmin. If no hosts exceed the job exit rate, bhosts -x displays: There is no exceptional host found View dynamic host information 1 Use lshosts to display information on dynamically added hosts. An LSF cluster may consist of static and dynamic hosts.
PAGE 68
Controlling Hosts Controlling Hosts Hosts are opened and closed by an LSF Administrator or root issuing a command or through configured dispatch windows. Close a host 1 Run badmin hclose: badmin hclose hostB Close ...... done If the command fails, it may be because the host is unreachable through network problems, or because the daemons on the host are not running. Open a host 1 Run badmin hopen: badmin hopen hostB Open ......
PAGE 69
Working with Hosts Log a comment when closing or opening a host Use the -C option of badmin hclose and badmin hopen to log an administrator comment in lsb.events: 1 badmin hclose -C "Weekly backup" hostB The comment text Weekly backup is recorded in lsb.events. If you close or open a host group, each host group member displays with the same comment string. A new event record is recorded for each host open or host close event.
PAGE 70
Controlling Hosts THRESHOLD AND LOAD USED FOR EXCEPTIONS: JOB_EXIT_RATE Threshold 2.00 Load 0.
PAGE 71
Working with Hosts How events are displayed and recorded in MultiCluster lease model In the MultiCluster resource lease model, host control administrator comments are recorded only in the lsb.events file on the local cluster. badmin hist and badmin hhist display only events that are recorded locally. Host control messages are not passed between clusters in the MultiCluster lease model. For example.
PAGE 72
Adding a Host 4 Run badmin mbdrestart to restart mbatchd. 5 Run hostsetup to set up the new host and configure the daemons to start automatically at boot from /usr/share/lsf/7.0/install: ./hostsetup --top="/usr/share/lsf" --boot="y" 6 Start LSF on the new host: lsadmin limstartup lsadmin resstartup badmin hstartup 7 Run bhosts and lshosts to verify your changes. ◆ If any host type or host model is UNKNOWN, follow the steps in UNKNOWN host type or model on page 709 to fix the problem.
PAGE 73
Working with Hosts 8 Run hostsetup to set up the new host and configure the daemons to start automatically at boot from /usr/share/lsf/7.0/install: ./hostsetup --top="/usr/share/lsf" --boot="y" 9 Start LSF on the new host: lsadmin limstartup lsadmin resstartup badmin hstartup 10 Run bhosts and lshosts to verify your changes. ◆ If any host type or host model is UNKNOWN, follow the steps in UNKNOWN host type or model on page 709 to fix the problem.
PAGE 74
Remove a Host 10 If any users of the host use lstcsh as their login shell, change their login shell to tcsh or csh. Remove lstcsh from the /etc/shells file.
PAGE 75
Working with Hosts Adding Hosts Dynamically By default, all configuration changes made to LSF are static. To add or remove hosts within the cluster, you must manually change the configuration and restart all master candidates. Dynamic host configuration allows you to add and remove hosts without manual reconfiguration. To enable dynamic host configuration, all of the parameters described in the following table must be defined. Parameter Defined in … Description LSF_MASTER_LIST lsf.
PAGE 76
Adding Hosts Dynamically Dynamic hosts cannot be master host candidates. By defining the parameter LSF_MASTER_LIST, you ensure that LSF limits the list of master host candidates to specific, static hosts. mbatchd mbatchd gets host information from the master LIM; when it detects the addition or removal of a dynamic host within the cluster, mbatchd automatically reconfigures itself. TIP: After adding a host dynamically, you might have to wait for mbatchd to detect the host and reconfigure.
PAGE 77
Working with Hosts Adding dynamic hosts Add a dynamic host in a shared file system environment In a shared file system environment, you do not need to install LSF on each dynamic host. The master host will recognize a dynamic host as an LSF host when you start the daemons on the dynamic host. 1 In lsf.conf on the master host, define the parameter LSF_DYNAMIC_HOST_WAIT_TIME, in seconds, and assign a value greater than zero.
PAGE 78
Adding Hosts Dynamically ❖ For csh or tcsh: source LSF_TOP/conf/cshrc.lsf ❖ For sh, ksh, or bash: . LSF_TOP/conf/profile.lsf 6 Do you want LSF to start automatically when the host reboots? ❖ If no, go to step 7. ❖ If yes, run the hostsetup command. For example: cd /usr/share/lsf/7.0/install ./hostsetup --top="/usr/share/lsf" --boot="y" For complete hostsetup usage, enter hostsetup -h.
PAGE 79
Working with Hosts Add local resources on a dynamic host to the cluster Prerequisites: Ensure that the resource name and type are defined in lsf.shared, and that the ResourceMap section of lsf.cluster.cluster_name contains at least one resource mapped to at least one static host. LSF can add local resources as long as the ResourceMap section is defined; you do not need to map the local resources. 1 In the slave.config file, define the parameter LSF_LOCAL_RESOURCES.
PAGE 80
Adding Hosts Dynamically Configure dynamic host parameters 1 In lsf.conf on the master host, define the parameter LSF_DYNAMIC_HOST_WAIT_TIME, in seconds, and assign a value greater than zero. LSF_DYNAMIC_HOST_WAIT_TIME specifies the length of time a dynamic host waits before sending a request to the master LIM to add the host to the cluster. For example: LSF_DYNAMIC_HOST_WAIT_TIME=60 2 In lsf.conf on the master host, define the parameter LSF_DYNAMIC_HOST_TIMEOUT.
PAGE 81
Working with Hosts 3 Do you want LSF to start automatically when the host reboots? ❖ If no, go to step 4. ❖ If yes, run the hostsetup command. For example: cd /usr/share/lsf/7.0/install ./hostsetup --top="/usr/share/lsf" --boot="y" For complete hostsetup usage, enter hostsetup -h. 4 Is this the first time the host is joining the cluster? ❖ If no, use the following commands to start LSF: lsadmin limstartup lsadmin resstartup badmin hstartup ❖ If yes, you must start the daemons from the local host.
PAGE 82
Adding Hosts Dynamically Remove a host by editing the hostcache file Dynamic hosts remain in the cluster unless you intentionally remove them. Only the cluster administrator can modify the hostcache file. 1 Shut down the cluster. lsfshutdown This shuts down LSF on all hosts in the cluster and prevents LIMs from trying to write to the hostcache file while you edit it. 2 In the file $EGO_WORKDIR/lim/hostcache, delete the line for the dynamic host that you want to remove.
PAGE 83
Working with Hosts Add Host Types and Host Models to lsf.shared The lsf.shared file contains a list of host type and host model names for most operating systems. You can add to this list or customize the host type and host model names. A host type and host model name can be any alphanumeric string up to 39 characters long. Add a custom host type or model 1 Log on as the LSF administrator on any host in the cluster. 2 Edit lsf.
PAGE 84
Registering Service Ports 4 Run lsadmin reconfig to reconfigure LIM. 5 Run badmin reconfig to reconfigure mbatchd. Registering Service Ports LSF uses dedicated UDP and TCP ports for communication. All hosts in the cluster must use the same port numbers to communicate with each other. The service port numbers can be any numbers ranging from 1024 to 65535 that are not already used by other services.
PAGE 85
Working with Hosts 3 Edit the /etc/services file by adding the contents of the LSF_TOP/version/install/instlib/example.services file: # /etc/services entries for LSF daemons # res 3878/tcp # remote execution server lim 3879/udp # load information manager mbatchd 3881/tcp # master lsbatch daemon sbatchd 3882/tcp # slave lsbatch daemon # # Add this if ident is not already defined # in your /etc/services file ident 113/tcp auth tap # identd 4 Run lsadmin reconfig to reconfigure LIM.
PAGE 86
Host Naming 7 Use the following command: ypmake services On some hosts the master copy of the services database is stored in a different location. On systems running NIS+ the procedure is similar. Refer to your system documentation for more information. 8 Run lsadmin reconfig to reconfigure LIM. 9 Run badmin reconfig to reconfigure mbatchd. 10 Run lsfstartup to restart all daemons in the cluster. Host Naming LSF needs to match host names with the corresponding Internet host addresses.
PAGE 87
Working with Hosts For example: atlasD0[0-3,4,5-6, ...] is equivalent to: atlasD0[0-6, ...] The node list does not need to be a continuous range (some nodes can be configured out). Node indices can be numbers or letters (both upper case and lower case). Example Some systems map internal compute nodes to single LSF host names. A host file might contains 64 lines, each specifying an LSF host name and 32 node names that correspond to each LSF host: ... 177.16.1.
PAGE 88
Hosts with Multiple Addresses Hosts with Multiple Addresses Multi-homed hosts Hosts that have more than one network interface usually have one Internet address for each interface. Such hosts are called multi-homed hosts. For example, dual-stack hosts are multi-homed because they have both an IPv4 and an IPv6 network address. LSF identifies hosts by name, so it needs to match each of these addresses with a single host name.
PAGE 89
Working with Hosts BB.BB.BB.BB host host-BB # second interface Example /etc/hosts entries No unique official name The following example is for a host with two interfaces, where the host does not have a unique official name. # Address Official name Aliases # Interface on network A AA.AA.AA.AA host-AA.domain host.domain host-AA host # Interface on network B BB.BB.BB.BB host-BB.domain host-BB host Looking up the address AA.AA.AA.AA finds the official name host-AA.domain. Looking up address BB.BB.BB.
PAGE 90
Using IPv6 Addresses Sun Solaris example For example, Sun NIS uses the /etc/hosts file on the NIS master host as input, so the format for NIS entries is the same as for the /etc/hosts file. Since LSF can resolve this case, you do not need to create an LSF hosts file. DNS configuration The configuration format is different for DNS. The same result can be produced by configuring two address (A) records for each Internet address. Following the previous example: # name host.domain host.domain host-AA.
PAGE 91
Working with Hosts ❖ 2003 ❖ 2000 with Service Pack 1 or higher ◆ AIX 5 ◆ HP-UX ❖ 11i ❖ 11iv1 ❖ 11iv2 ❖ 11.11 ◆ SGI Altix ProPack 3, 4, and 5 ◆ IRIX 6.5.19 and higher, Trusted IRIX 6.5.19 and higher ◆ Mac OS 10.2 and higher ◆ Cray XT3 ◆ IBM Power 5 Series Enable both IPv4 and IPv6 support 1 Configure the parameter LSF_ENABLE_SUPPORT_IPV6=Y in lsf.conf.
PAGE 92
Using IPv6 Addresses 6 92 Administering Platform LSF Test IPv6 communication between hosts using the command ping6.
PAGE 93
Working with Hosts Host Groups You can define a host group within LSF or use an external executable to retrieve host group members. Use bhosts to view a list of existing hosts. Use bmgroup to view host group membership. Where to use host groups LSF host groups can be used in defining the following parameters in LSF configuration files: ◆ HOSTS in lsb.queues for authorized hosts for the queue ◆ HOSTS in lsb.
PAGE 94
Host Groups b Run badmin mbdrestart if you want the new host to be recognized by jobs that were submitted before you reconfigured. Using wildcards and special characters to define host names You can use special characters when defining host group members under the GROUP_MEMBER column to specify hosts. These are useful to define several hosts in a single entry, such as for a range of hosts, or for all host names with a certain text string.
PAGE 95
Working with Hosts You cannot define subgroups that contain wildcards and special characters. The following definition for groupB is not correct because groupA defines hosts with a wildcard: Begin HostGroup GROUP_NAME GROUP_MEMBER groupA (hostA*) groupB (groupA) End HostGroup Defining condensed host groups You can define condensed host groups to display information for its hosts as a summary for the entire group.
PAGE 96
Tuning CPU Factors bjobs displays hg1 as the execution host instead of hg2: bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 520 user1 RUN normal host5 hg1 sleep 1001 Apr 15 13:50 521 user1 RUN normal host5 hg1 sleep 1001 Apr 15 13:50 522 user1 PEND normal host5 sleep 1001 Apr 15 13:51 Importing external host groups (egroup) When the membership of a host group changes frequently, or when the group contains a large number of members, you can use an external exe
PAGE 97
Working with Hosts View normalized ratings 1 Run lsload -N to display normalized ratings. LSF uses a normalized CPU performance rating to decide which host has the most available CPU power. Hosts in your cluster are displayed in order from best to worst. Normalized CPU run queue length values are based on an estimate of the time it would take each host to run one additional unit of work, given that an unloaded host with CPU factor 1 runs one unit of work in one unit of time.
PAGE 98
Handling Host-level Job Exceptions Handling Host-level Job Exceptions You can configure hosts so that LSF detects exceptional conditions while jobs are running, and take appropriate action automatically. You can customize what exceptions are detected, and the corresponding actions. By default, LSF does not detect any exceptions. Host exceptions LSF can detect If you configure host exception handling, LSF can detect jobs that exit repeatedly on a host.
PAGE 99
Working with Hosts Example In the following diagram, the job exit rate of hostA exceeds the configured threshold (EXIT_RATE for hostA in lsb.hosts) LSF monitors hostA from time t1 to time t2 (t2=t1 + JOB_EXIT_RATE_DURATION in lsb.params). At t2, the exit rate is still high, and a host exception is detected. At t3 (EADMIN_TRIGGER_DURATION in lsb.params), LSF invokes eadmin and the host exception is handled. By default, LSF closes hostA and sends email to the LSF administrator.
PAGE 100
Handling Host-level Job Exceptions 100 Administering Platform LSF
PAGE 101
C H A P T E R 5 Working with Queues Contents ◆ Queue States on page 101 ◆ Viewing Queue Information on page 102 ◆ Control Queues on page 104 ◆ Add and Remove Queues on page 107 ◆ Manage Queues on page 109 ◆ Handling Job Exceptions in Queues on page 110 Queue States Queue states, displayed by bqueues, describe the ability of a queue to accept and start batch jobs using a combination of the following states: ◆ Open: queues accept new jobs ◆ Closed: queues do not accept new jobs ◆ Acti
PAGE 102
Viewing Queue Information Viewing Queue Information The bqueues command displays information about queues. The bqueues -l option also gives current statistics about the jobs in a particular queue, such as the total number of jobs in the queue, the number of jobs running, suspended, and so on. To view the... Run...
PAGE 103
Working with Queues 20 min of IBM350 FILELIMIT 20000 K 342800 min of IBM350 DATALIMIT 20000 K STACKLIMIT 2048 K SCHEDULING PARAMETERS r15s r1m r15m loadSched 0.7 1.0 loadStop 1.5 2.5 ut 0.2 - CORELIMIT 20000 K pg 4.0 8.
PAGE 104
Control Queues View queue administrators Run bqueues -l for the queue. 1 View exception status for queues (bqueues) Use bqueues to display the configured threshold for job exceptions and the current number of jobs in the queue in each exception state. 1 For example, queue normal configures JOB_IDLE threshold of 0.10, JOB_OVERRUN threshold of 5 minutes, and JOB_UNDERRUN threshold of 2 minutes.
PAGE 105
Working with Queues Close a queue 1 Run badmin qclose: badmin qclose normal Queue is closed When a user tries to submit a job to a closed queue the following message is displayed: bsub -q normal ...
PAGE 106
Control Queues 2 Use badmin hist or badmin qhist to display administrator comments for closing and opening hosts. badmin qhist Fri Apr 4 10:50:36: Queue closed by administrator change configuration. bqueues -l also displays the comment text: bqueues -l normal QUEUE: normal -- For normal low priority jobs, running only if hosts are lightly loaded. is is the default queue.
PAGE 107
Working with Queues To configure a dispatch window: 1 Edit lsb.queues 2 Create a DISPATCH_WINDOW keyword for the queue and specify one or more time windows. Begin Queue QUEUE_NAME = queue1 PRIORITY = 45 DISPATCH_WINDOW = 4:30-12:00 End Queue 3 4 Reconfigure the cluster: a Run lsadmin reconfig. b Run badmin reconfig. Run bqueues -l to display the dispatch windows. Configure Run Windows A run window specifies one or more time periods during which jobs dispatched from a queue are allowed to run.
PAGE 108
Add and Remove Queues You can copy another queue definition from this file as a starting point; remember to change the QUEUE_NAME of the copied queue. 3 Save the changes to lsb.queues. 4 Run badmin reconfig to reconfigure mbatchd. Adding a queue does not affect pending or running jobs. Remove a queue IMPORTANT: Before removing a queue, make sure there are no jobs in that queue. If there are jobs in the queue, move pending and running jobs to another queue, then remove the queue.
PAGE 109
Working with Queues Manage Queues Restrict host use by queues You may want a host to be used only to run jobs submitted to specific queues. For example, if you just added a host for a specific department such as engineering, you may only want jobs submitted to the queues engineering1 and engineering2 to be able to run on the host. 1 Log on as root or the LSF administrator on any host in the cluster. 2 Edit lsb.queues, and add the host to the HOSTS parameter of specific queues.
PAGE 110
Handling Job Exceptions in Queues Handling Job Exceptions in Queues You can configure queues so that LSF detects exceptional conditions while jobs are running, and take appropriate action automatically. You can customize what exceptions are detected, and the corresponding actions. By default, LSF does not detect any exceptions.
PAGE 111
Working with Queues Configuring thresholds for job exception handling By default, LSF checks for job exceptions every 1 minute. Use EADMIN_TRIGGER_DURATION in lsb.params to change how frequently LSF checks for overrun, underrun, and idle jobs. Tuning TIP: Tune EADMIN_TRIGGER_DURATION carefully. Shorter values may raise false alarms, longer values may not trigger exceptions frequently enough.
PAGE 112
Handling Job Exceptions in Queues 112 Administering Platform LSF
PAGE 113
C H A P T E R 6 Managing Jobs Contents ◆ Understanding Job States on page 114 ◆ View Job Information on page 117 ◆ Changing Job Order Within Queues on page 120 ◆ Switch Jobs from One Queue to Another on page 121 ◆ Forcing Job Execution on page 122 ◆ Suspending and Resuming Jobs on page 123 ◆ Killing Jobs on page 124 ◆ Sending a Signal to a Job on page 126 ◆ Using Job Groups on page 127 ◆ Handling Job Exceptions on page 138 Administering Platform LSF 113
PAGE 114
Understanding Job States Understanding Job States The bjobs command displays the current state of the job.
PAGE 115
Managing Jobs Maximum pending job threshold ◆ Run windows during which jobs from the queue can run ◆ Limits on the number of job slots configured for a queue, a host, or a user ◆ Relative priority to other users and jobs ◆ Availability of the specified resources ◆ Job dependency and pre-execution conditions If the user or user group submitting the job has reached the pending job threshold as specified by MAX_PEND_JOBS (either in the User section of lsb.users, or cluster-wide in lsb.
PAGE 116
Understanding Job States Exited jobs An exited job ended with a non-zero exit status. A job might terminate abnormally for various reasons. Job termination can happen from any state. An abnormally terminated job goes into EXIT state. The situations where a job terminates abnormally include: ◆ The job is cancelled by its owner or the LSF administrator while pending, or after being dispatched to a host.
PAGE 117
Managing Jobs View Job Information The bjobs command is used to display job information. By default, bjobs displays information for the user who invoked the command. For more information about bjobs, see the LSF Reference and the bjobs(1) man page. View all jobs for all users 1 Run bjobs -u all to display all jobs for all users.
PAGE 118
View Job Information View running jobs 1 Run bjobs -r to display running jobs. 1 Run bjobs -d to display recently completed jobs. View done jobs View pending job information 1 Run bjobs -p to display the reason why a job is pending. 2 Run busers -w all to see the maximum pending job threshold for all users. View suspension reasons 1 Run bjobs -s to display the reason why a job was suspended. View chunk job wait status and wait reason 1 Run bhist -l to display jobs in WAIT status.
PAGE 119
Managing Jobs Wed Aug 13 14:23:35: Submitted from host , CWD <$HOME>, Output File , Specified Hosts ; Wed Aug 13 14:23:43: Started on , Execution Home , Execution CWD ; Resource usage collected. IDLE_FACTOR(cputime/runtime): MEM: 3 Mbytes; PGID: 5027; SWAP: 4 Mbytes; 0.
PAGE 120
Changing Job Order Within Queues Changing Job Order Within Queues By default, LSF dispatches jobs in a queue in the order of arrival (that is, first-come-first-served), subject to availability of suitable server hosts. Use the btop and bbot commands to change the position of pending jobs, or of pending job array elements, to affect the order in which jobs are considered for dispatch.
PAGE 121
Managing Jobs Switch Jobs from One Queue to Another You can use the command bswitch to change jobs from one queue to another. This is useful if you submit a job to the wrong queue, or if the job is suspended because of queue thresholds or run windows and you would like to resume the job. Switch a single job to a different queue 1 Run bswitch to move pending and running jobs from queue to queue.
PAGE 122
Forcing Job Execution Forcing Job Execution A pending job can be forced to run with the brun command. This operation can only be performed by an LSF administrator. You can force a job to run on a particular host, to run until completion, and other restrictions. For more information, see the brun command. When a job is forced to run, any other constraints associated with the job such as resource requirements or dependency conditions are ignored.
PAGE 123
Managing Jobs Suspending and Resuming Jobs A job can be suspended by its owner or the LSF administrator. These jobs are considered user-suspended and are displayed by bjobs as USUSP. If a user suspends a high priority job from a non-preemptive queue, the load may become low enough for LSF to start a lower priority job in its place. The load created by the low priority job can prevent the high priority job from resuming. This can be avoided by configuring preemptive queues.
PAGE 124
Killing Jobs Killing Jobs The bkill command cancels pending batch jobs and sends signals to running jobs. By default, on UNIX, bkill sends the SIGKILL signal to running jobs. Before SIGKILL is sent, SIGINT and SIGTERM are sent to give the job a chance to catch the signals and clean up. The signals are forwarded from mbatchd to sbatchd. sbatchd waits for the job to exit before reporting the status.
PAGE 125
Managing Jobs If the -b option is used with bkill 0, it kills all applicable jobs and silently skips the jobs that cannot be killed. The -b option is ignored if used with -r or -s. Force removal of a job from LSF 1 Run bkill -r to force the removal of the job from LSF. Use this option when a job cannot be killed in the operating system. The bkill -r command removes a job from the LSF system without waiting for the job to terminate in the operating system.
PAGE 126
Sending a Signal to a Job Sending a Signal to a Job LSF uses signals to control jobs, to enforce scheduling policies, or in response to user requests. The principal signals LSF uses are SIGSTOP to suspend a job, SIGCONT to resume a job, and SIGKILL to terminate a job. Occasionally, you may want to override the default actions. For example, instead of suspending a job, you might want to kill or checkpoint it.
PAGE 127
Managing Jobs Using Job Groups A collection of jobs can be organized into job groups for easy management. A job group is a container for jobs in much the same way that a directory in a file system is a container for files. For example, a payroll application may have one group of jobs that calculates weekly payments, another job group for calculating monthly salaries, and a third job group that handles the salaries of part-time or contract employees.
PAGE 128
Using Job Groups Root job group LSF maintains a single tree under which all jobs in the system are organized. The top-most level of the tree is represented by a top-level “root” job group, named “/”. The root group is owned by the primary LSF Administrator and cannot be removed. Users and administrators create new groups under the root group. By default, if you do not specify a job group path name when submitting a job, the job is created under the top-level “root” job group, named “/”.
PAGE 129
Managing Jobs For example, a default job group name specified by DEFAULT_JOBGROUP=/canada/%p/%u is expanded to the value for the LSF project name and the user name of the job submission user (for example, /canada/projects/user1). Job group names must follow this format: ◆ Job group names must start with a slash character (/). For example, DEFAULT_JOBGROUP=/A/B/C is correct, but DEFAULT_JOBGROUP=A/B/C is not correct. ◆ Job group names cannot end with a slash character (/).
PAGE 130
Using Job Groups Job group limits are not supported at job submission for job groups created automatically with bsub -g. Use bgadd -L before job submission. Jobs forwarded to the execution cluster in a MultiCluster environment are not counted towards the job group limit. Examples bgadd -L 6 /canada/projects/test If /canada is existing job group, and /canada/projects and /canada/projects/test are new groups, only the job group /canada/projects/test is limited to 6 running and suspended jobs.
PAGE 131
Managing Jobs ◆ When a job is submitted to a job group, LSF checks the limits for the entire job group. For example, for a job is submitted to job group /canada/qa/auto, LSF checks the limits on groups /canada/qa/auto, /canada/qa and /canada. If any one limit in the branch of the hierarchy is exceeded, the job remains pending ◆ The zero (0) job limit for job group /canada/qa/manual means no job in the job group can enter running status 1 Use the bgadd command to create a new job group.
PAGE 132
Using Job Groups Submit jobs under a job group Use the -g option of bsub to submit a job into a job group. 1 The job group does not have to exist before submitting the job. bsub -g /risk_group/portfolio1/current myjob Job <105> is submitted to default queue. Submits myjob to the job group /risk_group/portfolio1/current. If group /risk_group/portfolio1/current exists, job 105 is attached to the job group.
PAGE 133
Managing Jobs /X/Y 0 0 0 0 0 0 () 0/5 user2 Specify a job group name to show the hierarchy of a single job group: 3 bjgroup -s /X GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER /X 25 0 25 0 0 0 puccini 25/100 user1 /X/Y 20 0 20 0 0 0 puccini 20/30 user1 /X/Z 5 0 5 0 0 0 puccini 5/10 user2 Specify a job group name with a trailing slash character (/) to show only the root job group: 4 bjgroup -s /X/ GROUP_NAME NJOBS /X PEND RUN SSUSP USU
PAGE 134
Using Job Groups ; ... Control jobs in job groups Suspend and resume jobs in job groups, move jobs to different job groups, terminate jobs in job groups, and delete job groups.
PAGE 135
Managing Jobs You cannot move job array elements from one job group to another, only entire job arrays. If any job array elements in a job array are running, you cannot move the job array to another group. A job array can only belong to one job group at a time. You cannot modify the job group of a job attached to a service class.
PAGE 136
Using Job Groups Delete a job groups manually (bgdel) 1 Use the bgdel command to manually remove a job group. The job group cannot contain any jobs. bgdel /risk_group Job group /risk_group is deleted. deletes the job group /risk_group and all its subgroups. Normal users can only delete the empty groups they own that are specified by the requested job_group_name. These groups can be explicit or implicit. 2 Run bgdel 0 to delete all empty job groups you own. Theses groups can be explicit or implicit.
PAGE 137
Managing Jobs Automatic job group cleanup When an implicitly created job group becomes empty, it can be automatically deleted by LSF. Job groups that can be automatically deleted cannot: ◆ Have limits specified including their child groups ◆ Have explicitly created child job groups ◆ Be attached to any SLA Configure JOB_GROUP_CLEAN=Y in lsb.params to enable automatic job group deletion.
PAGE 138
Handling Job Exceptions Handling Job Exceptions You can configure hosts and queues so that LSF detects exceptional conditions while jobs are running, and take appropriate action automatically. You can customize what exceptions are detected, and the corresponding actions. By default, LSF does not detect any exceptions. Run bjobs -d -m host_name to see exited jobs for a particular host.
PAGE 139
Managing Jobs In some environments, a job running 1 hour would be an overrun job, while this may be a normal job in other environments. If your configuration considers jobs running longer than 1 hour to be overrun jobs, you may want to close the queue when LSF detects a job that has run longer than 1 hour and invokes eadmin. Default eadmin actions For host-level exceptions, LSF closes the host and sends email to the LSF administrator.
PAGE 140
Handling Job Exceptions Job exits excluded from exit rate calculation Exit rate type ... Includes ... JOBINIT Local job initialization failures Parallel job initialization failures on the first execution host HPCINIT Job initialization failures for Platform LSF HPC jobs By default, jobs that are exited for non-host related reasons (user actions and LSF policies) are not counted in the exit rate calculation.
PAGE 141
Managing Jobs Parallel jobs By default, or when EXIT_RATE_TYPE=JOBEXIT_NONLSF, job initialization failure on the first execution host does not count in the job exit rate calculation. Job initialization failure for hosts other than the first execution host are counted in the exit rate calculation. When EXIT_RATE_TYPE=JOBINIT, job initialization failure happens on the first execution host are counted in the job exit rate calculation.
PAGE 142
Handling Job Exceptions 142 Administering Platform LSF
PAGE 143
C H A P T E R 7 Managing Users and User Groups Contents ◆ Viewing User and User Group Information on page 143 ◆ About User Groups on page 145 ◆ Existing User Groups as LSF User Groups on page 145 ◆ LSF User Groups on page 146 Viewing User and User Group Information You can display information about LSF users and user groups using the busers and bugroup commands. The busers command displays information about users and user groups.
PAGE 144
Viewing User and User Group Information View user information 1 Run busers all. busers all USER/GROUP default user9 groupA JL/P 12 1 - MAX 12 100 NJOBS 34 20 PEND 22 7 RUN 10 11 SSUSP 2 1 USUSP 0 1 RSV 0 0 View user pending job threshold information 1 busers -w USER/GROUP default user9 groupA JL/P 12 1 - Run busers -w, which displays the pending job threshold column at the end of the busers all output.
PAGE 145
Managing Users and User Groups USERS: SHARES: 16] user4 user10 user11 engineers/ [engineers, 40] [user4, 15] [user10, 34] [user11, About User Groups User groups act as aliases for lists of users. The administrator can also limit the total number of running jobs belonging to a user or a group of users.
PAGE 146
LSF User Groups Where to use existing user groups Existing user groups can be used in defining the following parameters in LSF configuration files: ◆ USERS in lsb.queues for authorized queue users ◆ USER_NAME in lsb.users ◆ USER_SHARES (optional) in lsb.hosts for host partitions or in lsb.queues or lsb.users for queue fairshare policies for user job slot limits LSF User Groups You can define an LSF user group within LSF or use an external executable to retrieve user group members.
PAGE 147
Managing Users and User Groups 7 Save your changes. 8 Run badmin ckconfig to check the new user group definition. If any errors are reported, fix the problem and check the configuration again. 9 Run badmin reconfig to reconfigure the cluster.
PAGE 148
LSF User Groups 148 Administering Platform LSF
PAGE 149
C H A P T E R 8 Platform LSF Licensing Contents ◆ The LSF License File on page 150 ◆ How LSF Permanent Licensing Works on page 154 ◆ Installing a Demo License on page 156 ◆ Installing a Permanent License on page 158 ◆ Updating a License on page 164 ◆ FLEXlm Basics on page 166 ◆ Multiple FLEXlm License Server Hosts on page 169 ◆ Partial Licensing on page 171 ◆ Floating Client Licenses on page 175 ◆ Troubleshooting License Issues on page 181 Administering Platform LSF 149
PAGE 150
The LSF License File The LSF License File You must have a valid license to run LSF. This section helps you to understand the types of LSF licenses and the contents of the LSF license file. It does not contain information required to install your license. TIP: To learn about licensing license a cluster that includes Windows hosts, see Using Platform LSF on Windows.
PAGE 151
Platform LSF Licensing LICENSE CLASS NEEDED: Class(B), Multi-cores ... Enforcement of multicore processor licenses on Linux and Windows Multicore hosts running Linux or Windows must be licensed by the lsf_dualcore_x86 license feature. Each physical processor requires one standard LSF license and num_cores-1 lsf_dualcore_x86 licenses. For example, a processor with 4 cores requires 3 lsf_dualcore_x86 licenses. Use lshosts -l to see the number of multicore licenses enabled and needed.
PAGE 152
The LSF License File In the LSF license file: FEATURE lsf_manager lsf_ld 6.200 8-may-2008 2 ADE2C12C1A81E5E8F29C \ VENDOR_STRING=Platform NOTICE=Class(S) FEATURE lsf_manager lsf_ld 6.200 8-may-2008 10 1DC2C1CCEF193E42B6DC \ VENDOR_STRING=Platform NOTICE=Class(E) Determining what licenses a host needs Use lim -t and lshosts -l to see the license requirements for a host.
PAGE 153
Platform LSF Licensing Example demo license file The following is an example of a demo license file. This file licenses LSF 7, advance reservation, and Platform LSF Make. The license is valid until October 24, 2008. Format of the permanent license file A permanent license file has the same format as other products licensed with FLEXlm. If you are already familiar with FLEXlm license files, you can skip this section.
PAGE 154
How LSF Permanent Licensing Works Example permanent license file The following is an example of a permanent license file. The license server daemon is configured to run on hosta, using TCP port 1700. It allows 10 single-processor hosts to run Platform LSF 7 and Platform LSF Make, with no expiry date. How LSF Permanent Licensing Works This section is intended to give you a better understanding of how LSF licensing works in a production environment with a permanent license.
PAGE 155
Platform LSF Licensing LSF license checkout Only the master LIM can check out licenses. No other part of LSF has any contact with the FLEXlm license server daemon. Once LIM on the master host identifies itself as the master, it reads the LSF_CONFDIR/lsf.cluster.cluster_name file to get the host information to calculate the total number of licenses needed. Most LSF software is licensed per CPU, not per host or per cluster, so multi-processor hosts require multiple LSF licenses.
PAGE 156
Installing a Demo License Installing a Demo License This section includes instructions for licensing LSF with a new demo license. Most new users should follow the procedure under Install and license LSF for the first time on page 156. If you already have LSF installed, see Install a demo license manually on page 156.
PAGE 157
Platform LSF Licensing lsadmin resstartup all badmin hstartup all b On any LSF host, run the script: LSF_BINDIR/lsfstartup Get a demo license To get a demo license from Platform Computing or your Platform LSF vendor. Location of the LSF license file for a demo license For a demo license, each LSF host must be able to read the license file. The installation program lsfinstall puts the LSF license file in a shared directory where it is available to all LSF hosts.
PAGE 158
Installing a Permanent License Installing a Permanent License This section includes instructions for licensing LSF with a new permanent license. If you have not yet installed LSF, you can use a demo license to get started. See Installing a Demo License on page 156. If you already have LSF, see Install a permanent license for the first time on page 158.
PAGE 159
Platform LSF Licensing See Start the license daemons on page 166. 8 To allow the new permanent license to take effect, reconfigure the cluster: lsadmin reconfig badmin mbdrestart 9 After the cluster starts, use the following commands to make sure LSF is up and running: lsid bhosts Getting a permanent license To install Platform LSF for production use, you must get a permanent license from Platform or your LSF vendor.
PAGE 160
Installing a Permanent License For example, ◆ If you receive your license from Platform as text, you must create a new file and copy the text into the file. ◆ You might have to modify lines in the license, such as the path in the DAEMON line when you install a new permanent license. ◆ You might want to check that the license includes the correct features before you install it.
PAGE 161
Platform LSF Licensing ◆ For a permanent license, the name of the license server host and TCP port number used by the lmgrd daemon, in the format port@host_name. For example: LSF_LICENSE_FILE="1700@hostD" ◆ For a license with redundant servers, use a comma to separate the port@host_names. The port number must be the same as that specified in the SERVER line of the license file.
PAGE 162
Installing a Permanent License ◆ LSF_Client ◆ LSF_Float_Client LSF client hosts are licensed per host, not per CPU, so there is no difference between licensing a single-processor host and a multi-processor host. See Floating Client Licenses on page 175 for information about configuring LSF floating clients. FLEXlm license server host A permanent LSF license is tied to the host ID of a particular license server host and cannot be used on another host.
PAGE 163
Platform LSF Licensing If your FLEXlm license server host is a different host type, you do not need the complete LSF distribution. You can download just the FLEXlm software from Platform’s FTP site, and copy it to any convenient location.
PAGE 164
Updating a License Updating a License This section is intended for those who are updating an existing LSF license file. To switch your demo license to a permanent license, see Installing a Permanent License on page 158. To update a license: 1 Contact Platform to get the license. See Requesting a new license on page 164.
PAGE 165
Platform LSF Licensing 3 If you want LSF 4.x and LSF 5.x clusters to share a license file, make sure your license includes the FEATURE line for lsf_batch version 4.x. 4 Reconfigure LSF using either of the following LSF commands: ❖ lsadmin reconfig ❖ lsadmin restart on the master LIM The license file is re-read and the changes accepted by LSF. At this point, the LSF license has been updated.
PAGE 166
FLEXlm Basics FLEXlm Basics This section is for users installing a permanent license, as FLEXlm is not used with demo licenses. Users who already know how to use FLEXlm will not need to read this section. FLEXlm is used by many UNIX software packages because it provides a simple and flexible method for controlling access to licensed software. A single FLEXlm license server daemon can handle licenses for many software packages, even if those packages come from different vendors.
PAGE 167
Platform LSF Licensing The lmstat command is in LSF_SERVERDIR. For example: /usr/share/lsf/lsf_62/7.0/sparc-sol2/etc/lmstat Run lmstat -a -c LSF_LICENSE_FILE from the FLEXlm license server and also from the LSF master host. You must use the -c option of lmstat to specify the path to the LSF license file.
PAGE 168
FLEXlm Basics License management utilities FLEXlm provides several utility programs for managing software licenses. These utilities and their man pages are included in the Platform LSF software distribution. Because these utilities can be used to shut down the FLEXlm license server daemon, and can prevent licensed software from running, they are installed in the LSF_SERVERDIR directory. For security reasons, this directory should only be accessible to LSF administrators.
PAGE 169
Platform LSF Licensing Multiple FLEXlm License Server Hosts This section applies to permanent licenses only. Read this section if you are interested in the various ways you can distribute your licenses. This is valuable if you are interested in having some form of backup in case of failure. Compare with Selecting a license server host on page 162 to make an educated decision.
PAGE 170
Multiple FLEXlm License Server Hosts 3 See Start the license daemons on page 166. Start lmgrd on all license server hosts, not just one. 4 To allow the new permanent licenses to take effect, reconfigure the cluster with the commands: lsadmin reconfig badmin mbdrestart Redundant license server hosts Configuring multiple license server hosts is optional. It provides a way to keep LSF running if a license server host goes down. There are two ways to configure multiple license servers.
PAGE 171
Platform LSF Licensing Partial Licensing This section applies to permanent licenses. Read this if you have a cluster in which not all of the hosts will require licenses for the same LSF products. In this section, you will learn how to save money through distributing your licenses efficiently. Not all hosts in the cluster need to be licensed for the same set of LSF products. For example, some hosts might be licensed only for Platform LSF Make, while others may also be licensed to run Platform MultiCluster.
PAGE 172
Partial Licensing Example of partial licensing Here is an example that will allow you to better visualize the concept of partial licensing. Through this example, you can learn how to configure your hosts to use partial licensing. Scenario In the following configuration, the license file contains licenses for LSF, Platform LSF Make and Platform LSF MultiCluster.
PAGE 173
Platform LSF Licensing RUN_WINDOWS: (always open) LICENSES_ENABLED: (LSF_Base LSF_Manager) LOAD_THRESHOLDS: r15s r1m r15m - ut - pg - io - ls - it - tmp - swp - mem - HOST_NAME: hostC type model cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server nprocs ncores nthreads LINUX86 DEFAULT 116.
PAGE 174
Partial Licensing RESOURCES: Not defined RUN_WINDOWS: (always open) LICENSES_ENABLED: (LSF_Base LSF_Manager LSF_Make) LOAD_THRESHOLDS: r15s r1m r15m - ut - pg - io - ls - it - tmp - swp - mem - Note that hostC has now picked up the available Platform LSF Make license that was originally held by hostA.
PAGE 175
Platform LSF Licensing Floating Client Licenses LSF floating client is valuable if you have a cluster in which not all of the hosts will be active at the same time. In this section, you will learn how to save money through distributing your licenses efficiently. An LSF floating client license is a type of LSF license to be shared among several client hosts at different times. Floating client licenses are not tied to specific hosts. They are assigned dynamically to any host that submits a request to LSF.
PAGE 176
Floating Client Licenses Administration commands Since LSF floating client hosts are not listed in lsf.cluster.cluster_name, some administration commands will not work if issued from LSF floating client hosts. Always run administration commands from server hosts. Floating client hosts and host types/models This differentiates between client hosts and floating client hosts in terms of the restrictions on host types or models. For LSF client hosts, you can list the host type and model in lsf.cluster.
PAGE 177
Platform LSF Licensing FLOAT_CLIENTS= 25 End Parameters ... The FLOAT_CLIENTS parameter sets the size of your license pool in the cluster. When the master LIM starts up, the number of licenses specified in FLOAT_CLIENTS (or fewer) can be checked out for use as floating client licenses. If the parameter FLOAT_CLIENTS is not specified in lsf.cluster.cluster_name, or there is an error in either license.dat or in lsf.cluster.cluster_name, the floating LSF client license feature is disabled.
PAGE 178
Floating Client Licenses FLOAT_CLIENTS_ADDR_RANGE parameter This optional parameter specifies an IP address or range of addresses of domains from which floating client hosts can submit requests. Multiple ranges can be defined, separated by spaces. The IP address can have either a dotted quad notation (IPv4) or IP Next Generation (IPv6) format. LSF supports both formats; you do not have to map IPv4 addresses to an IPv6 format.
PAGE 179
Platform LSF Licensing All client hosts belonging to a domain with the address 100.172.1.13 will be allowed access. All client hosts belonging to domains starting with 100, then any number, then a range of 30 to 54 will be allowed access. All client hosts belonging to domains starting with 124, then from 24 onward, then 1, then from 0 to 34 will be allowed access. No IPv6 hosts are allowed. FLOAT_CLIENTS_ADDR_RANGE=12.23.45.* All client hosts belonging to domains starting with 12.23.45 are allowed.
PAGE 180
Floating Client Licenses lshosts In the following example, only hostA and hostB are defined in lsf.cluster.cluster_name. HostA is a server and master host, and hostB is a static client. If you type the command from hostA or hostB, you will get the following output: lshosts HOST_NAME hostA hostB type SUNSOL SUNSOL 3 model DEFAULT DEFAULT cpuf ncpus maxmem maxswp server RESOURCES 1.0 1 128M 602M Yes () 1.0 No () Submit a job from a host not listed in lsf.cluster.cluster_name.
PAGE 181
Platform LSF Licensing Troubleshooting License Issues ◆ "lsadmin reconfig" gives "User permission denied" message on page 181 ◆ Primary cluster administrator receives email “Your cluster has experienced license overuse” message on page 181 ◆ lsadmin command fails with "ls_gethostinfo: Host does not have a software license" on page 181 ◆ LSF commands give "Host does not have a software license" on page 182 ◆ LSF commands fail with "ls_initdebug: Unable to open file lsf.
PAGE 182
Troubleshooting License Issues 2 Kill the LIM, using one of the following commands: kill lim_PID kill -9 lim_PID 3 After the old LIM has died, start the new LIM on the master host using one of the following methods: ❖ lsadmin limstartup ❖ LSF_SERVERDIR/lim as root. LSF commands give "Host does not have a software license" You may see this message after running lsid, lshosts, or other ls* commands. Typical problems: and their solutions: If you experience this problem ...
PAGE 183
Platform LSF Licensing lmgrd fails with message "Port already in use" The port number defined in LSF_LICENSE_FILE and license.dat is being used by another application (by default, LSF uses port number 1700). Possible causes: If you experience this problem ... Do the following: lmgrd is already running for this license Use ps -ef and make sure that lmgrd and lsf_ld are not running. lmgrd has been stopped and Wait a few minutes for the OS to clear this port.
PAGE 184
Troubleshooting License Issues 184 Administering Platform LSF
PAGE 185
C H A P T E R 9 Managing LSF on Platform EGO Contents ◆ About LSF on Platform EGO on page 186 ◆ LSF and EGO directory structure on page 189 ◆ Configuring LSF and EGO on page 193 ◆ Managing LSF daemons through EGO on page 196 ◆ Administrative Basics for PMC and CLI on page 199 ◆ Logging and troubleshooting on page 203 ◆ Frequently asked questions on page 210 Administering Platform LSF 185
PAGE 186
About LSF on Platform EGO About LSF on Platform EGO LSF on Platform EGO allows EGO to serve as the central resource broker, enabling enterprise applications to benefit from sharing of resources across the enterprise grid. ◆ Scalability—EGO enhances LSF scalability. Currently, the LSF scheduler has to deal with a large number of jobs. EGO provides management functionality for multiple schedulers that co-exist in one EGO environment.
PAGE 187
Managing LSF on Platform EGO EGO is only sensitive to the resource requirements of business services; EGO has no knowledge of any run-time dynamic parameters that exist for them. This means that EGO does not interfere with how a business service chooses to use the resources it has been allocated. How does Platform EGO work? Platform products work in various ways to match business service (consumer) demands for resources with an available supply of resources.
PAGE 188
About LSF on Platform EGO Key EGO concepts Consumers A consumer represents an entity that can demand resources from the cluster. A consumer might be a business service, a business process that is a complex collection of business services, an individual user, or an entire line of business. EGO resources Resources are physical and logical entities that can be requested by a client. For example, an application (client) requests a processor (resource) in order to run. Resources also have attributes.
PAGE 189
Managing LSF on Platform EGO LSF and EGO directory structure The following tables describe the purpose of each sub-directory and whether they are writable or non-writable by LSF. LSF_TOP Directory Path Description Attribute LSF_TOP/7.0 LSF 7.0 binaries and other machine dependent files Non-writable LSF_TOP/conf LSF 7.
PAGE 190
LSF and EGO directory structure EGO, GUI, and PERF directories Directory Path Description Attribute LSF_BINDIR EGO binaries and other machine dependent files Non-writable LSF_LOGDIR/ego/cluster_name/eservice (EGO_ESRVDIR) EGO services configuration and log files.
PAGE 191
Managing LSF on Platform EGO Example directory structures UNIX and Linux The following figures show typical directory structures for a new UNIX or Linux installation with lsfinstall. Depending on which products you have installed and platforms you have selected, your directory structure may vary.
PAGE 192
LSF and EGO directory structure Microsoft Windows The following diagram shows an example directory structure for a Windows installation.
PAGE 193
Managing LSF on Platform EGO Configuring LSF and EGO EGO configuration files for LSF daemon management (res.xml and sbatchd.xml) The following files are located in EGO_ESRVDIR/esc/conf/services/: ◆ res.xml—EGO service configuration file for res. ◆ sbatchd.xml—EGO service configuration file for sbatchd.
PAGE 194
Configuring LSF and EGO LSF and EGO corresponding parameters The following table summarizes existing LSF parameters that have corresponding EGO parameter names. You must continue to set other LSF parameters in lsf.conf. lsf.conf parameter ego.
PAGE 195
Managing LSF on Platform EGO LSF and EGO require exclusive use of certain ports for communication. EGO uses the same four consecutive ports on every host in the cluster. The first of these is called the base port. The default EGO base connection port is 7869. By default, EGO uses four consecutive ports starting from the base port. By default, EGO uses ports 7869-7872. The ports can be customized by customizing the base port. For example, if the base port is 6880, EGO uses ports 6880-6883.
PAGE 196
Managing LSF daemons through EGO Managing LSF daemons through EGO EGO daemons Daemons in LSF_SERVERDIR Description vemkd Started by lim on master host pem Started by lim on every host egosc Started by vemkd on master host Daemons in LSF_SERVERDIR Description lim lim runs on every host. On UNIX, lim is either started by lsadmin through rsh/ssh or started through rc file. On Windows, lim is started as a Windows service.
PAGE 197
Managing LSF on Platform EGO If EGO Service Controller management is configured and you run badmin hshutdown and lsadmin resshutdown to manually shut down LSF, the LSF daemons are not restarted automatically by EGO. You must run lsadmin resstartup and badmin hstartup to start the LSF daemons manually. Permissions required for daemon control To control all daemons in the cluster, you must ◆ Be logged on as root or as a user listed in the /etc/lsf.sudoers file.
PAGE 198
EGO control of PMC and PERF services EGO control of PMC and PERF services When EGO is enabled in the cluster, EGO may control services for components such as the Platform Management Console (PMC) or LSF Reports (PERF). This is recommended. It allows failover among multiple management hosts, and allows EGO cluster commands to start, stop, and restart the services. PMC not controlled by EGO For PMC, if it is not controlled by EGO, you must specify the host to run PMC.
PAGE 199
Managing LSF on Platform EGO Administrative Basics for PMC and CLI See Administering and Using Platform EGO for detailed information about EGO administration. You can use the pmcadmin command to administer the Platform Management Console. For more information, see the command reference documentation. Log on to the Platform Management Console The Platform Management Console (PMC) allows you to monitor, administer, and configure your cluster. To log on, give the name and password of the LSF administrator.
PAGE 200
Administrative Basics for PMC and CLI If Platform EGO is enabled in the LSF cluster (LSF_ENABLE_EGO=Y and LSF_EGO_ENVDIR are defined in lsf.conf), cshrc.lsf and profile.lsf set the following environment variables. ◆ EGO_BINDIR ◆ EGO_CONFDIR ◆ EGO_ESRVDIR ◆ EGO_LIBDIR ◆ EGO_LOCAL_CONFDIR ◆ EGO_SERVERDIR ◆ EGO_TOP See the Platform EGO Reference for more information about these variables. See the Platform LSF Configuration Reference for more information about cshrc.lsf and profile.lsf.
PAGE 201
Managing LSF on Platform EGO Responding to service message Error Normally, Platform EGO attempts to start a service multiple times, up to the maximum threshold set in the service profile XML file (containing the service definition). If the service cannot start, you will receive a service error message. 1 Try stopping and then restarting the service. 2 Review the appropriate service instance log file to discover the cause of the error.
PAGE 202
Administrative Basics for PMC and CLI Master host failover During master host failover, the system is unavailable for a few minutes while hosts are waiting to be contacted by the new master. The master candidate list defines which hosts are master candidates. By default, the list includes just one host, the master host, and there is no failover. If you configure additional candidates to enable failover, the master host is first in the list.
PAGE 203
Managing LSF on Platform EGO Logging and troubleshooting LSF log files LSF event and account log location LSF uses directories for temporary work files, log files and transaction files and spooling. LSF keeps track of all jobs in the system by maintaining a transaction log in the work subtree. The LSF log files are found in the directory LSB_SHAREDIR/cluster_name/logdir. The following files maintain the state of the LSF system: lsb.events LSF uses the lsb.
PAGE 204
Logging and troubleshooting values for this parameter can be any log priority symbol that is defined in /usr/include/sys/syslog.h. The default value for LSF_LOG_MASK is LOG_WARNING. LSF log directory permissions and ownership Ensure that the permissions on the LSF_LOGDIR directory to be writable by root. The LSF administrator must own LSF_LOGDIR. EGO log files Log files contain important run-time information about the general health of EGO daemons, workload submissions, and other EGO system events.
PAGE 205
Managing LSF on Platform EGO where the date is expressed in YYYY-MM-DD hh-mm-ss.sss. For example, 2006-03-14 11:02:44.000 Eastern Standard Time ERROR [2488:1036] vemkdexit: vemkd is halting. EGO log classes Every log entry belongs to a log class. You can use log class as a mechanism to filter log entries by area. Log classes in combination with log levels allow you to troubleshoot using log entries that only address, for example, configuration. Log classes are adjusted at run time using egosh debug.
PAGE 206
Logging and troubleshooting Level Description LOG_INFO Log all informational messages and more serious messages. LOG_DEBUG Log all debug-level messages. LOG_TRACE Log all available messages. EGO log level and class information retrieved from configuration files When EGO is enabled, the pem and vemkd daemons read ego.conf to retrieve the following information (as corresponds to the particular daemon): ◆ EGO_LOG_MASK: The log level used to determine the amount of detail logged.
PAGE 207
Managing LSF on Platform EGO ◆ For troubleshooting purposes, set your log level to LOG_DEBUG. Because of the quantity of messages you will receive when subscribed to this log level, change the level back to LOG_WARNING as soon as you are finished troubleshooting. TIP: If your log files are too long, you can always rename them for archive purposes. New, fresh log files will then be created and will log all new events.
PAGE 208
Logging and troubleshooting Log file Default location What it contains vemkd.log Linux: LSF_LOGDIR/vemkd.log.hostname Windows: LSF_LOGDIR\vemkd.log.hostname Logs aggregated host information about the state of individual resources, status of allocation requests, consumer hierarchy, resources assignment to consumers, and started operating system-level process. wsg.log Linux: Logs service failures surrounding web services interfaces for web service clients (applications).
PAGE 209
Managing LSF on Platform EGO Matching service error messages and corresponding log files If you receive this message… This may be the problem… Review this log file failed to create vem working directory Cannot create work directory during vemkd startup failed to open lock file Cannot get lock file during startup vemkd failed to open host event file Cannot recover during startup because cannot open event file vemkd lim port is not defined EGO_LIM_PORT in ego.
PAGE 210
Frequently asked questions Frequently asked questions Question Does LSF 7 on EGO support upgrade of the master host only? Answer Yes Question Under EGO Service Controller daemon management mode on Windows, does PEM start sbatchd and res directly or does it ask Windows to start sbatchd and res as Windows Services? Answer On Windows, LSF still installs sbatchd and res as Windows services. If EGO Service Controller daemon control is selected during installation, the Windows service will be set up as Manual.
PAGE 211
Managing LSF on Platform EGO Question Can EGO consumer policies replace MultiCluster lease mode? Answer Conceptually, both define resource borrowing and lending policies. However, current EGO consumer policies can only work with slot resources within one EGO cluster. MultiCluster lease mode supports other load indices and external resources between multiple clusters.
PAGE 212
Frequently asked questions 212 Administering Platform LSF
PAGE 213
C H A P T E R 10 Cluster Version Management and Patching on UNIX and Linux IMPORTANT: For LSF 7 Update 2 only, you cannot use the steps in this chapter to update your cluster from LSF 7 Update 1 to Update 3. You must follow the steps in “Migrating to LSF Version 7 Update 3 on UNIX and Linux” to manually migrate your LSF 7 cluster to Update 3.
PAGE 214
Scope Scope Operating system ◆ Limitations pversions supports LSF Update 1 and later patchinstall supports LSF Update 1 and later For installation of a new cluster, see Installing Platform LSF on UNIX and Linux.
PAGE 215
Cluster Version Management and Patching on UNIX and Linux Patch installation interaction diagram Patches may be installed using the patch installer or LSF installer. The same mechanism is used.
PAGE 216
Patch rollback interaction diagram Patch rollback interaction diagram Use the patch installer to roll back the most recent patch in the cluster.
PAGE 217
Cluster Version Management and Patching on UNIX and Linux Version management components Patches and distributions Products and versioning Platform products and components may be separately licensed and versioned. For example, LSF and the Platform Management Console are licensed together, but delivered as separate distributions and patched separately. Product version is a number identifying the release, such as LSF version 7.0.3.
PAGE 218
Version management components Version command The version command pversions is a tool provided to query the patch history and deliver information about cluster and product version and patch levels. The version command includes functionality to query a cluster or check contents of a package. The version command is not located with other LSF commands so it may not be in your path. The command location is LSF_TOP/7.
PAGE 219
Cluster Version Management and Patching on UNIX and Linux The patch backup directory is configurable during installation. See the PATCH_BACKUP_DIR parameter in install.config. Maintenance Over time, the backups accumulate. You may choose to manually delete old backups, starting with the oldest. Remember that rollback is performed one patch at a time, so your cluster’s rollback functionality stops at the point where a backup file is unavailable.
PAGE 220
Version management concepts Installers The LSF installer installs full distributions and can modify configuration. The LSF installer incorporates the patch installer so the process of updating the files is the same as the patch installer. However, the LSF installer should be used to install an update because the update may require configuration changes that lsfinstall can do automatically. The patch installer installs all patches and never modifies configuration.
PAGE 221
Cluster Version Management and Patching on UNIX and Linux Windows-UNIX clusters and Windows clusters If your cluster has both Windows and UNIX, patch the UNIX hosts in the cluster using the patch installer. Patch the Windows hosts using Windows tools. The Windows patch files should be installed in order from oldest to newest on every Windows host if you have more than one to install. To install a Windows patch, double click the .msp file for the OS you want and follow the wizard.
PAGE 222
Cluster rollback behavior table Cluster rollback behavior table When … Actions... The result … Normal behavior. The installer replaces current files with previous backup. ◆ Success, cluster reverts to previous state. The patch history is missing (files are not found in the directory defined by the parameter PATCH_HISTORY_DIR in patch.conf) Without the history, the installer cannot determine which backups to use.
PAGE 223
Cluster Version Management and Patching on UNIX and Linux Version management commands Commands to modify cluster Command Description lsfinstall This command: ◆ Creates a new cluster (using any full distribution including update releases) ◆ Patches a cluster with an update release (a full distribution) by installing binaries and updating configuration patchinstall This command: ◆ Patches a cluster by installing binaries from a full or partial distribution (does not update configuration, so lsfinstall i
PAGE 224
Installing update releases on UNIX and Linux Installing update releases on UNIX and Linux To install an update release to the cluster. IMPORTANT: For LSF 7 Update 3, you cannot use the steps in this section to update your cluster from LSF 7 Update 1 to Update 3. You must follow the steps in “Migrating to LSF Version 7 Update 3 on UNIX and Linux” to manually migrate your LSF 7 cluster to Update 3.
PAGE 225
Cluster Version Management and Patching on UNIX and Linux Installing fixes on UNIX and Linux To install fixes or fix packs to update the cluster. 1 To patch the reporting database, download the corresponding database update scripts and update the database schema first. 2 Download the patches from Platform. If hosts in your cluster have multiple binary types, you may require multiple distribution files to patch the entire cluster. Put the distribution files on any host.
PAGE 226
Patching the Oracle database 3 Run LSF_TOP/7.0/install/pversions to determine the state of the cluster and find the build number of the last patch installed (roll back one patch at a time). 4 Run patchinstall with -r and specify the build number of the last patch installed (the patch to be removed). patchinstall -r 12345 5 If you were prompted to do so, restart the cluster. Patches that affect running daemons require you to restart manually. 6 If necessary, modify LSF cluster configuration manually.
PAGE 227
Cluster Version Management and Patching on UNIX and Linux Patching the Derby database Prerequisites: The Derby database is properly configured and running: To patch the reporting database as part of patching the cluster, get the corresponding database update scripts and update the database schema first. 1 When you download the patches for your cluster, download the corresponding database update scripts from Platform. 2 In the command console, open the database schema directory.
PAGE 228
Patching the Derby database 228 Administering Platform LSF
PAGE 229
P A R T II Working with Resources ◆ Understanding Resources on page 231 ◆ Adding Resources on page 251 ◆ Managing Software Licenses with LSF on page 261 Administering Platform LSF 229
PAGE 230
Administering Platform LSF
PAGE 231
C H A P T E R 11 Understanding Resources Contents ◆ About LSF Resources on page 232 ◆ How Resources are Classified on page 234 ◆ How LSF Uses Resources on page 237 ◆ Load Indices on page 239 ◆ Static Resources on page 242 ◆ Automatic Detection of Hardware Reconfiguration on page 249 Administering Platform LSF 231
PAGE 232
About LSF Resources About LSF Resources The LSF system uses built-in and configured resources to track job resource requirements and schedule jobs according to the resources available on individual hosts. View available resources View cluster resources (lsinfo) 1 Use lsinfo to list the resources available in your cluster. The lsinfo command lists all the resource names and their descriptions.
PAGE 233
Understanding Resources View host load by resource 1 Run lshosts -s to view host load by shared resource: lshosts -s RESOURCE tot_lic tot_scratch VALUE 5 500 LOCATION host1 host2 host1 host2 The above output indicates that 5 licenses are available, and that the shared scratch directory currently contains 500 MB of space. The VALUE field indicates the amount of that resource. The LOCATION column shows the hosts which share this resource. The lshosts -s command displays static shared resources.
PAGE 234
How Resources are Classified How Resources are Classified Resource categories By values Boolean resources Numerical resources String resources By the way values change Dynamic Resources Static Resources Resources that denote the availability of specific features Resources that take numerical values, such as all the load indices, number of processors on a host, or host CPU factor Resources that take string values, such as host type, host model, host status Resources that change their values dynamically:
PAGE 235
Understanding Resources Some examples of Boolean resources: Resource Name Describes Meaning of Example Name cs Role in cluster Role in cluster Operating system Available software Compute server File server Solaris operating system FrameMaker license fs solaris frame Shared resources Shared resources are configured resources that are not tied to a specific host, but are associated with the entire cluster, or a specific subset of hosts within the cluster.
PAGE 236
How Resources are Classified View shared resources for hosts 1 Run bhosts -s to view shared resources for hosts. For example: bhosts -s RESOURCE tot_lic tot_scratch avail_lic avail_scratch TOTAL 5 00 2 100 RESERVED 0.0 0.0 3.0 400.0 LOCATION hostA hostB hostA hostB hostA hostB hostA hostB The TOTAL column displays the value of the resource. For dynamic resources, the RESERVED column displays the amount that has been reserved by running jobs.
PAGE 237
Understanding Resources How LSF Uses Resources Jobs submitted through the LSF system will have the resources they use monitored while they are running. This information is used to enforce resource usage limits and load thresholds as well as for fairshare scheduling.
PAGE 238
How LSF Uses Resources loadSched loadStop 238 Administering Platform LSF - -
PAGE 239
Understanding Resources Load Indices Load indices are built-in resources that measure the availability of dynamic, non-shared resources on hosts in the LSF cluster. Load indices built into the LIM are updated at fixed time intervals. External load indices are defined and configured by the LSF administrator, who writes an external load information manager (elim) executable. The elim collects the values of the external load indices and sends these values to the LIM.
PAGE 240
Load Indices CPU run queue lengths (r15s, r1m, r15m) The r15s, r1m and r15m load indices are the 15-second, 1-minute and 15-minute average CPU run queue lengths. This is the average number of processes ready to use the CPU during the given interval. On UNIX, run queue length indices are not necessarily the same as the load averages printed by the uptime(1) command; uptime load averages on some platforms also include processes that are in short-term wait states (such as paging or disk I/O).
PAGE 241
Understanding Resources Temporary directories (tmp) The tmp index is the space available in MB on the file system that contains the temporary directory: ◆ /tmp on UNIX ◆ C:\temp on Windows Swap space (swp) The swp index gives the currently available virtual memory (swap space) in MB. This represents the largest process that can be started on the host. Memory (mem) The mem index is an estimate of the real memory currently available to user processes.
PAGE 242
Static Resources Static Resources Static resources are built-in resources that represent host information that does not change over time, such as the maximum RAM available to user processes or the number of processors in a machine. Most static resources are determined by the LIM at start-up time, or when LSF detects hardware configuration changes. Static resources can be used to select appropriate hosts for particular jobs based on binary architecture, relative CPU speed, and system configuration.
PAGE 243
Understanding Resources Host name (hname) Host name specifies the name with which the host identifies itself. CPU factor (cpuf ) The CPU factor (frequently shortened to cpuf) represents the speed of the host CPU relative to other hosts in the cluster. For example, if one processor is twice the speed of another, its CPU factor should be twice as large.
PAGE 244
Static Resources Maximum temporary space (maxtmp) Maximum temporary space is the total temporary space a machine has, measured in megabytes (MB). How LIM detects cores, threads and processors Traditionally, the value of ncpus has been equal to the number of physical CPUs. However, many CPUs consist of multiple cores and threads, so the traditional 1:1 mapping is no longer useful.
PAGE 245
Understanding Resources In cases where CPU architectures and operating system combinations may not support accurate processor, core, thread detection, LIM uses the defaults of 1 processor, 1 core per physical processor, and 1 thread per core. If LIM detects that it is running in a virtual environment (for example, VMware®), each detected processor is similarly reported (as a single-core, single-threaded, physical processor). LIM only detects hardware that is recognized by the operating system.
PAGE 246
Static Resources By default, ncpus is set to procs (number of processors). NOTE: In clusters with older LIMs that do not recognize cores and threads, EGO_DEFINE_NCPUS is ignored. In clusters where only the master LIM recognizes cores and threads, the master LIM assigns default values (for example, in Platform LSF 6.2: 1 core, -1 thread). 3 Save and close lsf.conf or ego.conf. TIP: As a best practice, set EGO_DEFINE_NCPUS instead of EGO_ENABLE_DUALCORE.
PAGE 247
Understanding Resources ◆ Windows: EGO_LOCAL_RESOURCES="[type NTX86] [resource define_ncpus_procs]" ◆ 3 Linux: EGO_LOCAL_RESOURCES="[resource define_ncpus_cores]" Save and close ego.conf. NOTE: In multi-cluster environments, if ncpus is defined on a per-host basis (thereby overriding the global setting) the definition is applied to all clusters that the host is a part of. In contrast, globally defined ncpus settings only take effect within the cluster for which EGO_DEFINE_NCPUS is defined.
PAGE 248
Static Resources Interaction with LSF_LOCAL_RESOURCES in lsf.conf If EGO is enabled, and EGO_LOCAL_RESOURCES is set in ego.conf and LSF_LOCAL_RESOURCES is set in lsf.conf, EGO_LOCAL_RESOURCES takes precedence.
PAGE 249
Understanding Resources Automatic Detection of Hardware Reconfiguration Some UNIX operating systems support dynamic hardware reconfiguration—that is, the attaching or detaching of system boards in a live system without having to reboot the host. Supported platforms LSF is able to recognize changes in ncpus, maxmem, maxswp, maxtmp in the following platforms: ◆ Sun Solaris 2.5+ ◆ HP-UX 10.10+ ◆ IBM AIX 4.0+ ◆ SGI IRIX 6.
PAGE 250
Automatic Detection of Hardware Reconfiguration How dynamic hardware changes affect LSF LSF uses ncpus, maxmem, maxswp, maxtmp to make scheduling and load decisions. When processors are added or removed, LSF licensing is affected because LSF licenses are based on the number of processors. If you put a processor offline: ◆ Per host or per-queue load thresholds may be exceeded sooner. This is because LSF uses the number of CPUS and relative CPU speeds to calculate effective run queue length.
PAGE 251
C H A P T E R 12 Adding Resources Contents ◆ About Configured Resources on page 252 ◆ Add New Resources to Your Cluster on page 253 ◆ Static Shared Resource Reservation on page 259 ◆ External Load Indices on page 260 ◆ Modifying a Built-In Load Index on page 260 Administering Platform LSF 251
PAGE 252
About Configured Resources About Configured Resources LSF schedules jobs based on available resources. There are many resources built into LSF, but you can also add your own resources, and then use them same way as built-in resources. For maximum flexibility, you should characterize your resources clearly enough so that users have satisfactory choices.
PAGE 253
Adding Resources Add New Resources to Your Cluster 1 Log in to any host in the cluster as the LSF administrator. 2 Define new resources in the Resource section of lsf.shared. Specify at least a name and a brief description, which will be displayed to a user by lsinfo. See Configuring lsf.shared Resource Section on page 254.
PAGE 254
Configuring lsf.shared Resource Section Configuring lsf.shared Resource Section Configured resources are defined in the Resource section of lsf.shared. There is no distinction between shared and non-shared resources. You must specify at least a name and description for the resource, using the keywords RESOURCENAME and DESCRIPTION. ◆ A resource name cannot begin with a number. ◆ A resource name cannot contain any of the following characters : ◆ .
PAGE 255
Adding Resources ❖ All numeric resources are consumable. ❖ String and boolean resources are not consumable. You should only specify consumable resources in the rusage section of a resource requirement string. Non-consumable resources are ignored in rusage sections. A non-consumable resource should not be releasable. Non-consumable numeric resource should be able to be used in order, select and same sections of a resource requirement string. When LSF_STRICT_RESREQ=Y in lsf.
PAGE 256
Configuring lsf.cluster.cluster_name Host Section bandwidth Numeric 60 Y (IndividualNetworkBandwidth) End Resource Configuring lsf.cluster.cluster_name Host Section The Host section is the only required section in lsf.cluster.cluster_name. It lists all the hosts in the cluster and gives configuration information for each host. Define the resource names as strings in the Resource section of lsf.shared. You may list any number of resources, enclosed in parentheses and separated by blanks or tabs.
PAGE 257
Adding Resources Configuring lsf.cluster.cluster_name ResourceMap Section Resources are associated with the hosts for which they are defined in the ResourceMap section of lsf.cluster.cluster_name. For each resource, you must specify the name and the hosts that have it. If the ResourceMap section is not defined, then any dynamic resources specified in lsf.shared are not tied to specific hosts, but are shared across all hosts in the cluster. Example A cluster consists of hosts host1, host2, and host3.
PAGE 258
Configuring lsf.cluster.cluster_name ResourceMap Section ◆ Each host in the cluster has the resource ◆ The resource is shared by all hosts in the cluster ◆ There are multiple instances of a resource within the cluster, and each instance is shared by a unique subset of hosts. Syntax ([resource_value@][host_name... | all [~host_name]... | others | default] ...) ◆ For static resources, you must include the resource value, which indicates the quantity of the resource.
PAGE 259
Adding Resources Static Shared Resource Reservation You must use resource reservation to prevent over-committing static shared resources when scheduling. The usual situation is that you configure single-user application licenses as static shared resources, and make that resource one of the job requirements. You should also reserve the resource for the duration of the job.
PAGE 260
External Load Indices External Load Indices If you have specific workload or resource requirements at your site, the LSF administrator can define external resources. You can use both built-in and external resources for LSF job scheduling and host selection. External load indices report the values of dynamic external resources. A dynamic external resource is a site-specific resource with a numeric value that changes over time, such as the space available in a directory.
PAGE 261
C H A P T E R 13 Managing Software Licenses with LSF Software licenses are valuable resources that must be fully utilized. This section discusses how LSF can help manage licensed applications to maximize utilization and minimize job failure due to license problems.
PAGE 262
Network Floating Licenses Configuring counted host-locked licenses You configure counted host-locked licenses by having LSF determine the number of licenses currently available. Use either of the following to count the host-locked licenses: Using an External LIM (ELIM) ◆ External LIM (ELIM) ◆ A check_licenses shell script To use an external LIM (ELIM) to get the number of licenses currently available, configure an external load index licenses giving the number of free licenses on each host.
PAGE 263
Managing Software Licenses with LSF ◆ All license jobs are run through LSF ◆ Licenses are managed outside of LSF control All licenses used through LSF If all jobs requiring licenses are submitted through LSF, then LSF could regulate the allocation of licenses to jobs and ensure that a job is not started if the required license is not available. A static resource is used to hold the total number of licenses that are available.
PAGE 264
Network Floating Licenses lsf.cluster.cluster_name Begin ResourceMap RESOURCENAME LOCATION verilog ([all]) End ResourceMap The INTERVAL in the lsf.shared file indicates how often the ELIM is expected to update the value of the Verilog resource—in this case every 60 seconds. Since this resource is shared by all hosts in the cluster, the ELIM only needs to be started on the master host.
PAGE 265
Managing Software Licenses with LSF Use the duration keyword in the queue resource requirement specification to release the shared resource after the specified number of minutes expires. This prevents multiple jobs started in a short interval from over-using the available licenses. By limiting the duration of the reservation and using the actual license usage as reported by the ELIM, underutilization is also avoided and licenses used outside of LSF can be accounted for.
PAGE 266
Network Floating Licenses 266 Administering Platform LSF
PAGE 267
P A R T III Job Scheduling Policies ◆ Time Syntax and Configuration on page 269 ◆ Deadline Constraint and Exclusive Scheduling on page 275 ◆ Preemptive Scheduling on page 277 ◆ Specifying Resource Requirements on page 279 ◆ Fairshare Scheduling on page 295 ◆ Goal-Oriented SLA-Driven Scheduling on page 341 ◆ Working with Application Profiles on page 371 Administering Platform LSF 267
PAGE 268
Administering Platform LSF
PAGE 269
C H A P T E R 14 Time Syntax and Configuration Contents ◆ Specifying Time Values on page 269 ◆ Specifying Time Windows on page 269 ◆ Specifying Time Expressions on page 270 ◆ Using Automatic Time-based Configuration on page 271 Specifying Time Values To specify a time value, a specific point in time, specify at least the hour. Day and minutes are optional. Time value syntax time = hour | hour:minute | day:hour:minute hour Integer from 0 to 23, representing the hour of the day.
PAGE 270
Specifying Time Expressions where all fields are numbers with the following ranges: ◆ day of the week: 0-6 (0 is Sunday) ◆ hour: 0-23 ◆ minute: 0-59 Specify a time window one of the following ways: ◆ hour-hour ◆ hour:minute-hour:minute ◆ day:hour:minute-day:hour:minute The default value for minute is 0 (on the hour); the default value for day is every day of the week. You must specify at least the hour. Day of the week and minute are optional.
PAGE 271
Time Syntax and Configuration The syntax for a time expression is: expression = time(time_window[ time_window ...]) | expression && expression | expression || expression | !expression Example Both of the following expressions specify weekends (Friday evening at 6:30 p.m. until Monday morning at 8:30 a.m.) and nights (8:00 p.m. to 8:30 a.m. daily).
PAGE 272
Using Automatic Time-based Configuration # for all other hours, normal is the default queue #if time(18:30-19:30) DEFAULT_QUEUE=short #else DEFAULT_QUEUE=normal #endif lsb.queues example Begin Queue ... #if time(8:30-18:30) INTERACTIVE = ONLY #endif ... End Queue # interactive only during day shift lsb.
PAGE 273
Time Syntax and Configuration default 1 - #endif End User lsf.licensescheduler example Begin Feature NAME = f1 #if time(5:16:30-1:8:30 20:00-8:30) DISTRIBUTION=Lan(P1 2/5 P2 1) #elif time(3:8:30-3:18:30) DISTRIBUTION=Lan(P3 1) #else DISTRIBUTION=Lan(P1 1 P2 2/5) #endif End Feature Creating if-else constructs The if-else construct can express single decisions and multi-way decisions by including elif statements in the construct.
PAGE 274
Using Automatic Time-based Configuration 274 Administering Platform LSF ❖ bladmin ckconfig ❖ blimits -c ❖ blinfo ❖ blstat ❖ bparams ❖ bqueues ❖ bresources ❖ busers
PAGE 275
C H A P T E R 15 Deadline Constraint and Exclusive Scheduling Contents ◆ Using Deadline Constraint Scheduling on page 275 ◆ Using Exclusive Scheduling on page 276 Using Deadline Constraint Scheduling Deadline constraints will suspend or terminate running jobs at a certain time.
PAGE 276
Using Exclusive Scheduling Disabling deadline constraint scheduling Deadline constraint scheduling is enabled by default. To disable it for a queue, set IGNORE_DEADLINE=y in lsb.queues. Example LSF will schedule jobs in the liberal queue without observing the deadline constraints. Begin Queue QUEUE_NAME = liberal IGNORE_DEADLINE=y End Queue Using Exclusive Scheduling Exclusive scheduling gives a job exclusive use of the host that it runs on.
PAGE 277
C H A P T E R 16 Preemptive Scheduling Contents ◆ About Preemptive Scheduling on page 277 About Preemptive Scheduling Preemptive scheduling lets a pending high-priority job take job slots away from a running job of lower priority. When two jobs compete for the same job slots, LSF automatically suspends the low-priority job to make slots available to the high-priority job. The low-priority job is resumed as soon as possible.
PAGE 278
About Preemptive Scheduling Preemptable jobs Preemptable jobs are running in a low-priority queue and are holding the specified job slot. Their queue must be able to be preempted by the high-priority queue.
PAGE 279
C H A P T E R 17 Specifying Resource Requirements Contents ◆ About Resource Requirements on page 279 ◆ Queue-level Resource Requirements on page 280 ◆ Job-level Resource Requirements on page 281 ◆ About Resource Requirement Strings on page 283 ◆ Selection String on page 284 ◆ Order String on page 287 ◆ Usage String on page 288 ◆ Span String on page 292 ◆ Same String on page 293 About Resource Requirements Resource requirements define which hosts a job can run on.
PAGE 280
Queue-level Resource Requirements To best place a job with optimized performance, resource requirements can be specified for each application. This way, you do not have to specify resource requirements every time you submit a job. The LSF administrator may have already configured the resource requirements for your jobs, or you can put your executable name together with its resource requirements into your personal remote task list.
PAGE 281
Specifying Resource Requirements loadStop The suspending condition that determines when running jobs should be suspended. Thresholds can be configured for each queue, for each host, or a combination of both. To schedule a job on a host, the load levels on that host must satisfy both the thresholds configured for that host and the thresholds for the queue from which the job is being dispatched.
PAGE 282
Job-level Resource Requirements This runs myjob on an HP-UX host that is lightly loaded (CPU utilization) and has at least 15 MB of swap memory available. bsub -R "select[swp > 15]" -R "select[hpux] order[r15m]" -R "order[r15m]" -R rusage[mem=100]" -R "order[ut]" -R "same[type] -R "rusage[tmp=50:duration=60]" -R "same[model]" myjob LSF merges the multiple -R options into one string and dispatches the job if all of the resource requirements can be met.
PAGE 283
Specifying Resource Requirements About Resource Requirement Strings Most LSF commands accept a -R res_req argument to specify resource requirements. The exact behaviour depends on the command. For example, specifying a resource requirement for the lsload command displays the load levels for all hosts that have the requested resources. Specifying resource requirements for the lsrun command causes LSF to select the best host out of the set of hosts that have the requested resources.
PAGE 284
Selection String By default, memory (mem) and swap (swp) limits in select[] and rusage[] sections are specified in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the these limits (MB, GB, TB, PB, or EB). How queue-level and job-level requirements are resolved If job-level resource requirements are specified together with queue-level resource requirements: ◆ In a select string, a host must satisfy both queue-level and job-level requirements for the job to be dispatched.
PAGE 285
Specifying Resource Requirements Specifying multiple -R options bsub accepts multiple -R options for the select section. You can specify multiple resource requirement strings instead of using the && operator. For example: bsub -R "select[swp > 15]" -R "select[hpux]" LSF merges the multiple -R options into one string and dispatches the job if all of the resource requirements can be met.
PAGE 286
Selection String Examples Syntax Meaning a && b a || b Logical AND: 1 if both a and b are non-zero, 0 otherwise Logical OR: 1 if either a or b is non-zero, 0 otherwise select[(swp > 50 && type == MIPS) || (swp > 35 && type == ALPHA)] select[((2*r15s + 3*r1m + r15m) / 6 < 1.0) && !fs && (cpuf > 4.0)] Specifying shared resources with the keyword “defined” A shared resource may be used in the resource requirement string of any LSF command.
PAGE 287
Specifying Resource Requirements Specifying exclusive resources An exclusive resource may be used in the resource requirement string of any placement or scheduling command, such as bsub, lsplace, lsrun, or lsgrun. An exclusive resource is a special resource that is assignable to a host. This host will not receive a job unless that job explicitly requests the host.
PAGE 288
Usage String Syntax [-]resource_name [:[-]resource_name]... You can specify any built-in or external load index. When an index name is preceded by a minus sign ‘-’, the sorting order is reversed so that hosts are ordered from worst to best on that index. Specifying multiple -R options bsub accepts multiple -R options for the order section. You can specify multiple resource requirement strings instead of using the && operator.
PAGE 289
Specifying Resource Requirements ◆ By default, duration is specified in minutes. For example, the following specify a duration of 1 hour: ❖ duration=60 ❖ duration=1h ❖ duration=3600s TIP: Duration is not supported for static shared resources. If the shared resource is defined in an lsb.resources Limit section, then duration is not applied. Decay The decay value indicates how the reserved amount should decrease over the duration.
PAGE 290
Usage String The resulting requirement for the job is rusage[mem=100:lic=1] where mem=100 specified by the job overrides mem=200 specified by the queue. However, lic=1 from queue is kept, since job does not specify it. For the following queue-level RES_REQ (decay and duration defined): RES_REQ = rusage[mem=200:duration=20:decay=1] ... and job submission (no decay or duration): bsub -R "rusage[mem=100]" ...
PAGE 291
Specifying Resource Requirements ◆ The following job requests two resources with same duration but different decay: bsub -R "rusage[mem=20:duration=30:decay=1, lic=1:duration=30] myjob Specifying alternative usage strings If you use more than one version of an application, you can specify the version you prefer to use together with a legacy version you can use if the preferred version is not available.
PAGE 292
Span String host is not overloaded. When LIM makes a placement advice, external load indices are not considered in the resource usage string. In this case, the syntax of the resource usage string is res[=value]:res[=value]: ... :res[=value] res is one of the resources whose value is returned by the lsload command. rusage[r1m=0.5:mem=20:swp=40] The above example indicates that the task is expected to increase the 1-minute run queue length by 0.5, consume 20 MB of memory and 40 MB of swap space.
PAGE 293
Specifying Resource Requirements ❖ For host type, you must specify same[type] in the resource requirement. In the following example, the job requests 8 processors on a host of type HP or SGI, and 2 processors on a host of type LINUX, and the predefined maximum job slot limit in lsb.hosts (MXJ) for other host types: span[ptile='!',HP:8,SGI:8,LINUX:2] same[type] ❖ For host model, you must specify same[model] in the resource requirement.
PAGE 294
Same String If hosts do not always have both resources, it is interpreted as allocate processors either on hosts that have the same value for resource1, or on hosts that have the same value for resource2, or on hosts that have the same value for both resource1 and resource2. Specifying multiple -R options bsub accepts multiple -R options for the same section. You can specify multiple resource requirement strings instead of using the && operator.
PAGE 295
C H A P T E R 18 Fairshare Scheduling To configure any kind of fairshare scheduling, you should understand the following concepts: ◆ User share assignments ◆ Dynamic share priority ◆ Job dispatch order You can configure fairshare at either host level or queue level. If you require more control, you can implement hierarchical fairshare. You can also set some additional restrictions when you submit a job.
PAGE 296
Understanding Fairshare Scheduling ❖ Using Historical and Committed Run Time on page 319 ❖ Users Affected by Multiple Fairshare Policies on page 323 ❖ Ways to Configure Fairshare on page 324 Understanding Fairshare Scheduling By default, LSF considers jobs for dispatch in the same order as they appear in the queue (which is not necessarily the order in which they are submitted to the queue). This is called first-come, first-served (FCFS) scheduling.
PAGE 297
Fairshare Scheduling The order of jobs in the queue is secondary. The most important thing is the dynamic priority of the user who submitted the job. When fairshare scheduling is used, LSF tries to place the first job in the queue that belongs to the user with the highest dynamic priority. User Share Assignments Both queue-level and host partition fairshare use the following syntax to define how shares are assigned to users or user groups.
PAGE 298
Dynamic User Priority Assigns 27 shares: 10 to User1, 9 to User2, and 8 to the remaining users, as a group. User1 is slightly more important than User2. Each of the remaining users has equal importance. ◆ If there are 3 users in total, the single remaining user has all 8 shares, and is almost as important as User1 and User2. ◆ If there are 12 users in total, then 10 users compete for those 8 shares, and each of them is significantly less important than User1 and User2.
PAGE 299
Fairshare Scheduling ◆ ❖ For queue-level fairshare, LSF measures the resource consumption of all the user’s jobs in the queue. This means a user’s dynamic priority can be different in every queue. ❖ For host partition fairshare, LSF measures resource consumption for all the user’s jobs that run on hosts in the host partition. This means a user’s dynamic priority is the same in every queue that uses hosts in the same partition.
PAGE 300
How Fairshare Affects Job Dispatch Order RUN_JOB_FACTOR The job slots weighting factor. Default: 3 HIST_HOURS Interval for collecting resource consumption history Default: 5 How Fairshare Affects Job Dispatch Order Within a queue, jobs are dispatched according to the queue’s scheduling policy. ◆ For FCFS queues, the dispatch order depends on the order of jobs in the queue (which depends on job priority and submission time, and can also be modified by the job owner).
PAGE 301
Fairshare Scheduling ◆ If any of these queues uses cross-queue fairshare, the other queues must also use cross-queue fairshare and belong to the same set, or they cannot have the same queue priority. For more information, see Cross-queue User-based Fairshare on page 302. Host Partition User-based Fairshare User-based fairshare policies configured at the host level handle resource contention across multiple queues. You can define a different fairshare policy for every host partition.
PAGE 302
Queue-level User-based Fairshare ◆ Optional: Use the reserved host name all to configure a single partition that applies to all hosts in a cluster. ◆ Optional: Use the not operator (~) to exclude hosts or host groups from the list of hosts in the host partition. ◆ Hosts in a host partition cannot participate in queue-based fairshare. Hosts that are not included in any host partition are controlled by FCFS scheduling policy instead of fairshare scheduling policy.
PAGE 303
Fairshare Scheduling In this way, if a user submits jobs to different queues, user priority is calculated by taking into account all the jobs the user has submitted across the defined queues. To submit jobs to a fairshare queue, users must be allowed to use the queue (USERS in lsb.queues) and must have a share assignment (FAIRSHARE in lsb.queues). Even cluster and queue administrators cannot submit jobs to a fairshare queue if they do not have a share assignment.
PAGE 304
Cross-queue User-based Fairshare SCHEDULING PARAMETERS r15s r1m r15m ut loadSched loadStop - loadSched loadStop pg - io cpuspeed - - ls - it - tmp - swp mem - - - - bandwidth - SCHEDULING POLICIES: FAIRSHARE FAIRSHARE_QUEUES: normal short license USER_SHARES: [user1, 100] [default, 1] SHARE_INFO_FOR: normal/ USER/GROUP user1 SHARES PRIORITY STARTED 100 9.645 2 USERS: all users HOSTS: all RESERVED 0 CPU_TIME RUN_TIME 0.2 7034 ...
PAGE 305
Fairshare Scheduling ❖ In master queue normal: FAIRSHARE_QUEUES=short license ❖ In master queue priority: FAIRSHARE_QUEUES= night owners You cannot, however, define night, owners, or priority as slaves in the normal queue; or normal, short and license as slaves in the priority queue; or short, license, night, owners as master queues of their own. Configure cross-queue fairshare ◆ Cross-queue fairshare cannot be used with host partition fairshare. It is part of queue-level fairshare.
PAGE 306
Hierarchical User-based Fairshare Controlling job dispatch order in cross-queue fairshare DISPATCH_ORDER parameter (lsb.queues) Use DISPATCH_ORDER=QUEUE in the master queue to define an ordered cross-queue fairshare set. DISPATCH_ORDER indicates that jobs are dispatched according to the order of queue priorities, not user fairshare priority. Priority range in cross-queue fairshare By default, the range of priority defined for queues in cross-queue fairshare cannot be used with any other queues.
PAGE 307
Fairshare Scheduling How hierarchical fairshare affects job dispatch order LSF uses the dynamic share priority of a user or group to find out which user's job to run next. If you use hierarchical fairshare, LSF works through the share tree from the top level down, and compares the dynamic priority of users and groups at each level, until the user with the highest dynamic priority is a single user, or a group that has no subgroups.
PAGE 308
Hierarchical User-based Fairshare SHARE_INFO_FOR: Partition1/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME group1 40 1.867 5 0 48.4 17618 group2 20 0.775 6 0 607.7 24664 SHARE_INFO_FOR: Partition1/group2/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME user1 8 1.144 1 0 9.6 5108 user2 2 0.667 0 0 0.0 0 others 1 0.046 5 0 598.1 19556 Configuring hierarchical fairshare To define a hierarchical fairshare policy, configure the top-level share assignment in lsb.queues or lsb.
PAGE 309
Fairshare Scheduling The Development group gets the largest share (50%) of the resources in the event of contention. Shares assigned to the Development group can be further divided among the Systems, Application, and Test groups, which receive 15%, 35%, and 50%, respectively. At the lowest level, individual users compete for these shares as usual. One way to measure a user’s importance is to multiply their percentage of the resources at every level of the share tree.
PAGE 310
Queue-based Fairshare Managing pools of queues You can configure your queues into a pool, which is a named group of queues using the same set of hosts. A pool is entitled to a slice of the available job slots. You can configure as many pools as you need, but each pool must use the same set of hosts. There can be queues in the cluster that do not belong to any pool yet share some hosts used by a pool.
PAGE 311
Fairshare Scheduling Four queues using two hosts each with maximum job slot limit of 6 for a total of 12 slots; queue4 does not belong to any pool. ◆ queue1 shares 50% of slots to be allocated = 2 * 6 * 0.5 = 6 ◆ queue2 shares 30% of slots to be allocated = 2 * 6 * 0.3 = 3.6 -> 4 ◆ queue3 shares 20% of slots to be allocated = 2 * 6 * 0.2 = 2.
PAGE 312
Configuring Slot Allocation per Queue b 2 SLOT_POOL Optional: Define the following in lsb.queues for each queue that uses queue-based fairshare: a HOSTS to list the hosts that can receive jobs from the queue If no hosts are defined for the queue, the default is all hosts. TIP: Hosts for queue-based fairshare cannot be in a host partition. b 3 PRIORITY to indicate the priority of the queue. For each host used by the pool, define a maximum job slot limit, either in lsb.hosts (MXJ) or lsb.
PAGE 313
Fairshare Scheduling Begin Queue QUEUE_NAME = PRIORITY = SLOT_POOL = SLOT_SHARE = HOSTS = ... End Queue Begin Queue QUEUE_NAME = PRIORITY = SLOT_POOL = SLOT_SHARE = HOSTS = ...
PAGE 314
Typical Slot Allocation Scenarios View slot allocation of running jobs Use bhosts, bmgroup, and bqueues to verify how LSF maintains the configured percentage of running jobs in each queue.
PAGE 315
Fairshare Scheduling 2 bqueues QUEUE_NAME Roma Verona Genova PRIO STATUS 50 Open:Active 48 Open:Active 48 Open:Active 3 bqueues QUEUE_NAME Roma Verona Genova When queue Verona has done its work, queues Roma and Genova get their respective shares of 8 and 3. This leaves 4 slots to be redistributed to queues according to their shares: 50% (2 slots) to Roma, 20% (1 slot) to Genova.
PAGE 316
Typical Slot Allocation Scenarios The queues Milano and Parma run very short jobs that get submitted periodically in bursts.
PAGE 317
Fairshare Scheduling Round-robin slot distribution—13 queues and 2 pools ◆ Pool poolA has 3 hosts each with 7 slots for a total of 21 slots to be shared. The first 3 queues are part of the pool poolA sharing the CPUs with proportions 50% (11 slots), 30% (7 slots) and 20% (3 remaining slots to total 21 slots). ◆ The other 10 queues belong to pool poolB, which has 3 hosts each with 7 slots for a total of 21 slots to be shared. Each queue has 10% of the pool (3 slots).
PAGE 318
Typical Slot Allocation Scenarios V Genova Pisa Milano Parma Bologna Sora Ferrara Napoli Livorno Palermo Venezia 48 47 44 43 42 40 40 40 40 40 40 4 Open:Active Open:Active Open:Active Open:Active Open:Active Open:Active Open:Active Open:Active Open:Active Open:Active Open:Active Open:Active - - - - 25 460 263 261 259 259 260 257 258 258 256 255 18 455 261 259 257 257 258 255 256 256 253 253 7 5 2 2 2 2 2 2 2 2 3 2 0 0 0 0 0 0 0 0 0 0 0 0 The following figure illustrates the round-robin distributi
PAGE 319
Fairshare Scheduling 10 queues sharing 10% each of 50 slots In this example, queue1 (the curve with the highest peaks) has the longer running jobs and so has less accumulated slots in use over time. LSF accordingly rebalances the load when all queues compete for jobs to maintain a configured 10% usage share. Using Historical and Committed Run Time By default, as a job is running, the dynamic priority decreases gradually until the job has finished running, then increases immediately when the job finishes.
PAGE 320
Using Historical and Committed Run Time ◆ Historical run time decay ◆ Committed run time Historical run time decay By default, historical run time does not affect the dynamic priority. You can configure LSF so that the user’s dynamic priority increases gradually after a job finishes. After a job is finished, its run time is saved as the historical run time of the job and the value can be used in calculating the dynamic priority, the same way LSF considers historical CPU time in calculating priority.
PAGE 321
Fairshare Scheduling Committed run time weighting factor Committed run time is the run time requested at job submission with the -W option of bsub, or in the queue configuration with the RUNLIMIT parameter. By default, committed run time does not affect the dynamic priority. While the job is running, the actual run time is subtracted from the committed run time. The user’s dynamic priority decreases immediately to its lowest expected value, and is maintained at that value until the job finishes.
PAGE 322
Using Historical and Committed Run Time Example The following fairshare parameters are configured in lsb.
PAGE 323
Fairshare Scheduling Users Affected by Multiple Fairshare Policies If you belong to multiple user groups, which are controlled by different fairshare policies, each group probably has a different dynamic share priority at any given time. By default, if any one of these groups becomes the highest priority user, you could be the highest priority user in that group, and LSF would attempt to place your job.
PAGE 324
Ways to Configure Fairshare ◆ User1 cannot associate the job with GroupC, because GroupC includes a subgroup. ◆ User1 cannot associate the job with his individual user account, because bsub -G only accepts group names. Ways to Configure Fairshare Global fairshare Global fairshare balances resource usage across the entire cluster according to one single fairshare policy. Resources used in one queue affect job dispatch order in another queue.
PAGE 325
Fairshare Scheduling 2 Configure a host partition for the host, and assign the shares appropriately. Begin HostPartition HPART_NAME = big_servers HOSTS = hostH USER_SHARES = [eng_users, 7] [acct_users, 3] End HostPartition Equal Share Equal share balances resource usage equally between users. Configure equal share 1 To configure equal share, use the keyword default to define an equal share for every user.
PAGE 326
Ways to Configure Fairshare 2 Configure fairshare and assign the overwhelming majority of shares to the key users: Begin Queue QUEUE_NAME = production FAIRSHARE = USER_SHARES[[key_users@, 2000] [others, 1]] ... End Queue In the above example, key users have 2000 shares each, while other users together have only 1 share. This makes it virtually impossible for other users’ jobs to get dispatched unless none of the users in the key_users group has jobs waiting to run.
PAGE 327
C H A P T E R 19 Resource Preemption Contents ◆ About Resource Preemption on page 328 ◆ Requirements for Resource Preemption on page 329 ◆ Custom Job Controls for Resource Preemption on page 329 ◆ Resource Preemption Steps on page 331 ◆ Configure Resource Preemption on page 333 ◆ License Preemption Example on page 335 ◆ Memory Preemption Example on page 337 Administering Platform LSF 327
PAGE 328
About Resource Preemption About Resource Preemption Preemptive Scheduling and Resource Preemption Resource preemption is a special type of preemptive scheduling. It is similar to job slot preemption. Job Slot Preemption and Resource Preemption If you enable preemptive scheduling, job slot preemption is always enabled. Resource preemption is optional. With resource preemption, you can configure preemptive scheduling based on other resources in addition to job slots.
PAGE 329
Resource Preemption Dynamic Resources Specify duration If the preemption resource is dynamic, you must specify the duration part of the resource reservation string when you submit a preempting or preemptable job. Resources outside the If an ELIM is needed to determine the value of a dynamic resource (such as the control of LSF number of software licenses available), LSF preempts jobs as necessary, then waits for ELIM to report that the resources are available before starting the high-priority job.
PAGE 330
Custom Job Controls for Resource Preemption Customizing the SUSPEND action Ask your application vendor what job control signals or actions cause your application to suspend a job and release the preemption resources. You need to replace the default SUSPEND action (the SIGSTOP signal) with another signal or script that works properly with your application when it suspends the job. For example, your application might be able to catch SIGTSTP instead of SIGSTOP.
PAGE 331
Resource Preemption Resource Preemption Steps To make resource preemption useful, you may need to work through all of these steps. 1 Read. Before you set up resource preemption, you should understand the following: 2 ◆ Preemptive Scheduling ◆ Resource Preemption ◆ Resource Reservation ◆ Customizing Resources ◆ Customizing Job Controls Plan.
PAGE 332
Resource Preemption Steps ◆ d Optional. Set PREEMPT_JOBTYPE to enable preemption of exclusive and backfill jobs. Specify one or both of the keywords EXCLUSIVE and BACKFILL. By default, exclusive and backfill jobs are only preempted if the exclusive low priority job is running on a host that is different than the one used by the preemptive high priority job. lsf.cluster.cluster_name Define how the custom resource is shared in the ResourceMap section. e lsf.task.cluster_name Optional.
PAGE 333
Resource Preemption Configure Resource Preemption 1 2 Configure preemptive scheduling (PREEMPTION in lsb.queues). Configure the preemption resources (PREEMPTABLE_RESOURCES in lsb.params). Job slots are the default preemption resource. To define additional resources to use with preemptive scheduling, set PREEMPTABLE_RESOURCES in lsb.params, and specify the names of the custom resources as a space-separated list. 3 Customize the preemption action.
PAGE 334
Configure Resource Preemption For example, to make LSF wait for 8 minutes, specify PREEMPTION_WAIT_TIME=480 334 Administering Platform LSF
PAGE 335
Resource Preemption License Preemption Example Configuration This example uses LicenseA as name of preemption resource. lsf.shared Add the resource to the Resource section. Begin Resource RESOURCENAME TYPE LicenseA INTERVAL INCREASING DESCRIPTION Numeric 60 N (custom application) ... End Resource lsf.cluster.cluster_name Add the resource to the ResourceMap section Begin ResourceMap RESOURCENAME LOCATION LicenseA [all] ... End ResourceMap lsb.
PAGE 336
License Preemption Example DESCRIPTION=jobs preempted by jobs in higher-priority queues ... End Queue ELIM Write an ELIM to report the current available number of Application A licenses. This ELIM starts on the master host. Operation Check how many LicenseA resources are available Using up all LicenseA resources Check the number of LicenseA existing in the cluster by using bhosts -s LicenseA. In this example, 2 licenses are available.
PAGE 337
Resource Preemption Memory Preemption Example Configuration This example uses pre_mem as the name of the preemption resource. lsf.shared Add the resource to the Resource section. Begin Resource RESOURCENAME TYPE pre_mem INTERVAL INCREASING Numeric 60 N DESCRIPTION (external memory usage reporter) ... End Resource lsf.cluster.cluster_n ame Add the resource to the "ResourceMap" section. Begin ResourceMap RESOURCENAME LOCATION pre_mem ([hostA] [hostB] ...
PAGE 338
Memory Preemption Example DESCRIPTION=jobs may be preempted by jobs in higher-priority queues ... End Queue ELIM This is an example of an ELIM that reports the current value of pre_mem. This ELIM starts on all the hosts that have the pre_mem resource.
PAGE 339
Resource Preemption Preempting the job Submit a job to a high-priority queue to preempt a job from low-priority queue to get the resource pre_mem. for pre_mem resources bsub -J second -q high -R "rusage[pre_mem=100:duration=2]" mem_app After a while, the second job is running and the first job was suspended.
PAGE 340
Memory Preemption Example 340 Administering Platform LSF
PAGE 341
C H A P T E R 20 Goal-Oriented SLA-Driven Scheduling Contents ◆ Using Goal-Oriented SLA Scheduling on page 341 ◆ Configuring Service Classes for SLA Scheduling on page 344 ◆ View Information about SLAs and Service Classes on page 346 ◆ Understanding Service Class Behavior on page 350 ◆ EGO-enabled SLA scheduling on page 355 Using Goal-Oriented SLA Scheduling Goal-oriented SLA scheduling policies help you configure your workload so that your jobs are completed on time and reduce the risk of
PAGE 342
Using Goal-Oriented SLA Scheduling Service classes SLA definitions consist of service-level goals that are expressed in individual service classes. A service class is the actual configured policy that sets the service-level goals for the LSF system. The SLA defines the workload (jobs or other services) and users that need the work done, while the service class that addresses the SLA defines individual goals, and a time window when the service class is active.
PAGE 343
Goal-Oriented SLA-Driven Scheduling Submit jobs to a service class You submit jobs to a service class as you would to a queue, except that a service class is a higher level scheduling policy that makes use of other, lower level LSF policies like queues and host partitions to satisfy the service-level goal that the service class expresses. The service class name where the job is to run is configured in lsb.serviceclasses.
PAGE 344
Configuring Service Classes for SLA Scheduling Modify SLA jobs (bmod) 1 Run bmod -sla to modify the service class a job is attached to, or to attach a submitted job to a service class. Run bmod -slan to detach a job from a service class: bmod -sla Kyuquot 2307 Attaches job 2307 to the service class Kyuquot. bmod -slan 2307 Detaches job 2307 from the service class Kyuquot.
PAGE 345
Goal-Oriented SLA-Driven Scheduling Service class priority A higher value indicates a higher priority, relative to other service classes. Similar to queue priority, service classes access the cluster resources in priority order. LSF schedules jobs from one service class at a time, starting with the highest-priority service class. If multiple service classes have the same priority, LSF run all the jobs from these service classes in first-come, first-served order.
PAGE 346
View Information about SLAs and Service Classes [VELOCITY 30 timeWindow (17:30-8:30)] DESCRIPTION = "day and night velocity" End ServiceClass ◆ The service class Kyuquot defines a velocity goal that is active during working hours (9:00 a.m. to 5:30 p.m.) and a deadline goal that is active during off-hours (5:30 p.m. to 9:00 a.m.) Only users user1 and user2 can submit jobs to this service class.
PAGE 347
Goal-Oriented SLA-Driven Scheduling NJOBS 300 ◆ PEND 280 RUN 10 SSUSP 0 USUSP 0 FINISH 10 The deadline goal of service class Uclulet is not being met, and bsla displays status Active:Delayed: bsla SERVICE CLASS NAME: -- working hours PRIORITY: 20 Uclulet GOAL: DEADLINE ACTIVE WINDOW: (8:30-19:00) STATUS: Active:Delayed SLA THROUGHPUT: 0.
PAGE 348
View Information about SLAs and Service Classes NJOBS 110 PEND 95 RUN 5 SSUSP 0 USUSP 0 FINISH 10 View jobs running in an SLA (bjobs) 1 Run bjobs -sla to display jobs running in a service class: bjobs -sla Inuvik JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 136 user1 RUN normal hostA hostA sleep 100 Sep 28 13:24 137 user1 RUN normal hostA hostB sleep 100 Sep 28 13:25 Use -sla with -g to display job groups attached to a service class.
PAGE 349
Goal-Oriented SLA-Driven Scheduling - submitted by users user1, - accounted on all projects. - completed normally or exited - executed on all hosts. - submitted to all queues. - accounted on service classes Inuvik, -----------------------------------------------------------------------------SUMMARY: ( time unit: second ) Total number of done jobs: 183 Total number of exited jobs: 1 Total CPU time consumed: 40.0 Average CPU time consumed: 0.2 Maximum CPU time of a job: 0.3 Minimum CPU time of a job: 0.
PAGE 350
Understanding Service Class Behavior Understanding Service Class Behavior A simple deadline goal The following service class configures an SLA with a simple deadline goal with a half hour time window. Begin ServiceClass NAME = Quadra PRIORITY = 20 GOALS = [DEADLINE timeWindow (16:15-16:45)] DESCRIPTION = short window End ServiceClass Six jobs submitted with a run time of 5 minutes each will use 1 slot for the half hour time window.
PAGE 351
Goal-Oriented SLA-Driven Scheduling An overnight run with two service classes bsla shows the configuration and status of two service classes Qualicum and Comox: ◆ Qualicum has a deadline goal with a time window which is active overnight: bsla Qualicum SERVICE CLASS NAME: PRIORITY: 23 Qualicum GOAL: VELOCITY 8 ACTIVE WINDOW: (8:00-18:00) STATUS: Inactive SLA THROUGHPUT: 0.
PAGE 352
Understanding Service Class Behavior ◆ Comox has a velocity goal of 2 concurrently running jobs that is always active: bsla Comox SERVICE CLASS NAME: PRIORITY: 20 Comox GOAL: VELOCITY 2 ACTIVE WINDOW: Always Open STATUS: Active:On time SLA THROUGHPUT: 2.00 JOBS/CLEAN_PERIOD NJOBS 100 PEND 98 RUN 2 SSUSP 0 USUSP 0 FINISH 0 The following illustrates the progress of the velocity SLA Comox running 100 jobs with random runtimes over a 14 hour period.
PAGE 353
Goal-Oriented SLA-Driven Scheduling When an SLA is missing its goal 1 Use the CONTROL_ACTION parameter in your service class to configure an action to be run if the SLA goal is delayed for a specified number of minutes. CONTROL_ACTION (lsb.serviceclasses) CONTROL_ACTION=VIOLATION_PERIOD[minutes] CMD [action] Example CONTROL_ACTION=VIOLATION_PERIOD[10] CMD [echo `date`: SLA is in violation >> ! /tmp/sla_violation.
PAGE 354
Understanding Service Class Behavior SLA statistics files Each active SLA goal generates a statistics file for monitoring and analyzing the system. When the goal becomes inactive the file is no longer updated. The files are created in the LSB_SHAREDIR/cluster_name/logdir/SLA directory. Each file name consists of the name of the service class and the goal type. For example the file named Quadra.deadline is created for the deadline goal of the service class name Quadra. The following file named Tofino.
PAGE 355
Goal-Oriented SLA-Driven Scheduling EGO-enabled SLA scheduling By default, all host management for scheduling SLA jobs is handled by LSF. Under EGO-enabled SLA scheduling, LSF uses EGO resource allocation facilities to get the hosts it needs to run SLA jobs. Host allocation is the responsibility of EGO, while job management remains managed by LSF. EGO-enabled SLA scheduling is a new scheduling paradigm that replaces other existing LSF scheduling policies.
PAGE 356
EGO-enabled SLA scheduling Key concepts ENABLE_DEFAULT_EGO_SLA in lsb.params is required to turn on EGO-enabled SLA scheduling. The host resources assigned to LSF are subject to EGO resource allocation policies. Hosts are given to an SLA based on EGO allocation decisions; LSF decides which jobs and how many from each SLA will run. All LSF compute host resource management is delegated to Platform EGO, and all LSF hosts are under EGO control.
PAGE 357
Goal-Oriented SLA-Driven Scheduling SLA goal behavior Only velocity goals are supported in EGO-enabled SLAs. Deadline goals and throughput goals are not supported. For EGO-enabled SLA, the configured velocity value is considered to be a minimum number of jobs that should be in run state from the SLA. This is different from a regular SLA, where once the velocity is reached, no more jobs are dispatched by the SLA. Under EGO-enabled SLA, if pending jobs exist, the SLA will try to run them all.
PAGE 358
EGO-enabled SLA scheduling Turn on basic EGO-enabled SLA scheduling Prerequisites: To use EGO-enabled SLA scheduling, all hosts that the SLA will use must be dynamically allocated by EGO. All hosts that the SLA will use must be dynamically allocated by EGO: ◆ Edit lsb.hosts to remove static hosts and host groups containing the hosts you want EGO to allocate. IMPORTANT: The lsb.hosts file must contain a "default" host line.
PAGE 359
Goal-Oriented SLA-Driven Scheduling Configure service classes for multiple SLAs that map to different EGO consumers. 1 Log in as the LSF administrator on any host in the cluster. 2 Edit lsb.serviceclasses to add the service class definition for EGO-enabled SLA.
PAGE 360
EGO-enabled SLA scheduling You can create resource groups by resource requirement (dynamic—specific member hosts not specified) or by host name (static—only hosts currently in the cluster are members; if you add new hosts to the cluster, you must manually add them to a static resource group.) NOTE: Unlike host groups in LSF, hosts should not overlap between resource groups. Resource groups determine the distribution of resources in your EGO resource plan.
PAGE 361
Goal-Oriented SLA-Driven Scheduling 4 0 4 0 0 0 If the SLA is under reclaim, bsla displays the following keywords: ◆ NUM_RECALLED_HOSTS ◆ RECALLED_HOSTS_TIMEOUT If a default system service class is configured with ENABLE_DEFAULT_EGO_SLA in lsb.params but no other service classes are explicity configured in lsb.serviceclasses, bsla only displays information for the default SLA.
PAGE 362
EGO-enabled SLA scheduling LSF preemption also supports queues that can be both preemptable and preemptive. Under the EGO reclaim mechanism, resource ownership guarantees that some jobs cannot be reclaimed. Job and queue priority Queue priority is respected by EGO-enabled SLA scheduling. You can also configure priority among SLAs. Configure the consumer rank in your EGO consumer tree to reflect the priories in your SLAs.
PAGE 363
Goal-Oriented SLA-Driven Scheduling For example: special Boolean () () (resource locked for group 1) bigmem Boolean () () (machine with big memory) 4 Edit lsb.queues and remove any existing resource requirements (RES_REQ) that specify these shared resources. 5 Edit lsb.serviceclasses and define an SLA that uses LSF_SLA as the consumer, and requires the shared resources.
PAGE 364
EGO-enabled SLA scheduling Complete the following steps to configure an SLA to schedule jobs using specific hosts. 1 Log on to the LSF master host as the LSF administrator (lsfadmin). 2 Let all LSF workload run to completion. 3 Shut down the LSF cluster. 4 Edit lsb.queues and remove any hosts and host groups from the queues you want to submit EGO-enabled SLA jobs to.
PAGE 365
Goal-Oriented SLA-Driven Scheduling LSF host partitions are typically used to implement user-based fairshare policies. Complete the following steps to allow the EGO-enabled SLA to allocate hosts to specific users. 1 Log on to the LSF master host as the cluster administrator. 2 Log on to the Platform Management Console: a Define an EGO resource group that contains the selected hosts. b Define an EGO consumer that is associated with this resource group. 3 Edit lsb.
PAGE 366
EGO-enabled SLA scheduling ◆ LSF MXJ and EGO slots SLA_TIMER—controls how often each service class is evaluated and a network message is sent to EGO communicating host demand. The default is 10 seconds. In LSF, you configure the maximum number of job slots (MXJ) in lsb.hosts. By default, the MXJ equals to the number of processors on the host. LSF schedules jobs on that host based on the MXJ.
PAGE 367
Goal-Oriented SLA-Driven Scheduling Job-level host Specific job-level host requests are similar to bsub -R (essentially the same as bsub preference (bsub -m) -R "select host_name"). The specified host is not guaranteed to be allocated by EGO. The job remains pending until the specified host actually allocated. Use EGO_RES_REQ=res_req in the service class configuration to specify all job resource requirements.
PAGE 368
EGO-enabled SLA scheduling 368 Administering Platform LSF
PAGE 369
P A R T IV Job Scheduling and Dispatch ◆ Resource Allocation Limits on page 387 ◆ Reserving Resources on page 405 ◆ Advance Reservation on page 421 ◆ Dispatch and Run Windows on page 445 ◆ Job Dependencies on page 449 ◆ Job Priorities on page 455 ◆ Job Requeue and Job Rerun on page 467 ◆ Job Checkpoint, Restart, and Migration on page 477 ◆ Chunk Job Dispatch on page 483 ◆ Job Arrays on page 489 ◆ Running Parallel Jobs on page 499 ◆ Submitting Jobs Using JSDL on page 525 Administe
PAGE 370
Administering Platform LSF
PAGE 371
C H A P T E R 21 Working with Application Profiles Application profiles improve the management of applications by separating scheduling policies (preemption, fairshare, etc.) from application-level requirements, such as pre-execution and post-execution commands, resource limits, or job controls, job chunking, etc.
PAGE 372
Manage application profiles Manage application profiles About application profiles Use application profiles to map common execution requirements to application-specific job containers. For example, you can define different job types according to the properties of the applications that you use; your FLUENT jobs can have different execution requirements from your CATIA jobs, but they can all be submitted to the same queue.
PAGE 373
Working with Application Profiles Remove an application profile Prerequisites: Before removing an application profile, make sure there are no pending jobs associated with the application profile. If there are jobs in the application profile, use bmod -app to move pending jobs to another application profile, then remove the application profile. Running jobs are not affected by removing the application profile associated with them, NOTE: You cannot remove a default application profile.
PAGE 374
Manage application profiles 4 Understanding successful application exit values Run badmin reconfig to reconfigure mbatchd. Jobs that exit with one of the exit codes specified by SUCCESS_EXIT_VALUES in an application profile are marked as DONE. These exit values are not be counted in the EXIT_RATE calculation. 0 always indicates application success regardless of SUCCESS_EXIT_VALUES. If both SUCCESS_EXIT_VALUES and REQUEUE_EXIT_VALUES are defined, job will be set to PEND state and requeued.
PAGE 375
Working with Application Profiles Use application profiles Submit jobs to application profiles Use the -app option of bsub to specify an application profile for the job. 1 Run bsub -app to submit jobs to an application profile. bsub -app fluent -q overnight myjob LSF rejects the job if the specified application profile does not exist. Modify the application profile associated with a job Prerequisites: You can only modify the application profile for pending jobs.
PAGE 376
Use application profiles Kills all jobs associated with the application profile fluent for the current user.
PAGE 377
Working with Application Profiles View application profile information To view the... Run... Available application profiles Detailed application profile information Jobs associated with an application profile Accounting information for all jobs associated with an application profile Job success and requeue exit code information bapp bapp -l bjobs -l -app application_profile_name bacct -l -app application_profile_name bapp -l bacct -l bhist -l bjobs -l View available application profiles 1 Run bapp.
PAGE 378
View application profile information 9 FILELIMIT DATALIMIT STACKLIMIT CORELIMIT MEMLIMIT SWAPLIMIT PROCESSLIMIT THREADLIMIT 800 K 500 100 K 900 K 700 K 300 K 1000 K 400 RERUNNABLE: Y CHUNK_JOB_SIZE: 5 View jobs associated with application profiles Run bjobs -l -app application_profile_name.
PAGE 379
Working with Application Profiles , Queue , Command
Wed May 31 16:52:42: Submitted from host , CWD <$HOME/src/mainline/lsbatch /cmd>; Wed May 31 16:52:48: Dispatched to 10 Hosts/Processors <10*hostA> Wed May 31 16:52:48: Completed . Accounting information about this job: CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP 0.02 6 6 done 0.0035 2M 5M -----------------------------------------------------------------------------...
PAGE 380
View application profile information PARAMETERS: SUCCESS_EXIT_VALUES: 230 222 12 ... 3 Run bhist -l to show command-line specified requeue exit values with bsub and modified requeue exit values with bmod.
PAGE 381
Working with Application Profiles How application profiles interact with queue and job parameters Application profiles operate in conjunction with queue and job-level options. In general, you use application profile definitions to refine queue-level settings, or to exclude some jobs from queue-level parameters.
PAGE 382
How application profiles interact with queue and job parameters Processor limits PROCLIMIT in an application profile specifies the maximum number of slots that can be allocated to a job. For parallel jobs, PROCLIMIT is the maximum number of processors that can be allocated to the job. You can optionally specify the minimum and default number of processors.
PAGE 383
Working with Application Profiles If CHUNK_JOB_DURATION is set in lsb.params, chunk jobs are accepted regardless of the value of CPULIMIT, RUNLIMIT or RUNTIME. Rerunnable jobs RERUNNABLE in an application profile overrides queue-level job rerun, and allows you to submit rerunnable jobs to a non-rerunnable queue. Job-level rerun (bsub -r or bsub -rn) overrides both the application profile and the queue.
PAGE 384
How application profiles interact with queue and job parameters ◆ Job chunking ◆ Advanced reservation ◆ SLA ◆ Slot reservation ◆ Backfill Define a runtime estimate Define the RUNTIME parameter at the application level. Use the bsub -We opion at the job-level. You can specify the runtime estimate as hours and minutes, or minutes only.
PAGE 385
Working with Application Profiles LSF calculates the normalized run time using the CPU factor defined for host model Ultra5S. TIP: Use lsinfo to see host name and host model information. Guidelines for defining a runtime estimate 1 You can define an estimated run time, along with a runtime limit (job level with bsub -W, application level with RUNLIMIT in lsb.applications, or queue level with RUNLIMIT lsb.queues).
PAGE 386
How application profiles interact with queue and job parameters Job-runtime Job-run limit estimate Application runtime estimate Application run limit Queue default Queue hard run limit run limit Result T1 T2<=T1*ratio — — T5 T6 Job is accepted Jobs running longer than T2 are killed If T2>T6, the job is rejected T1 — T3 T4 — — Job is accepted Jobs running longer than T1*ratio are killed T2 overrides T4 or T1*ratio overrides T4 T1 overrides T3 T1 — — — T5 T6 Job is accepted Jobs runnin
PAGE 387
C H A P T E R 22 Resource Allocation Limits Contents ◆ About Resource Allocation Limits on page 388 ◆ Configuring Resource Allocation Limits on page 393 ◆ Viewing Information about Resource Allocation Limits on page 402 Administering Platform LSF 387
PAGE 388
About Resource Allocation Limits About Resource Allocation Limits Contents ◆ What resource allocation limits do on page 388 ◆ How LSF enforces limits on page 389 ◆ How LSF counts resources on page 389 ◆ Limits for resource consumers on page 391 What resource allocation limits do By default, resource consumers like users, hosts, queues, or projects are not limited in the resources available to them for running jobs. Resource allocation limits configured in lsb.
PAGE 389
Resource Allocation Limits 604 user1 RUN normal hostA sleep 100 Aug 12 14:09 Resource allocation Resource allocation limits are not the same as resource usage limits, which are limits and resource enforced during job run time. For example, you set CPU limits, memory limits, and other limits that take effect after a job starts running. See Chapter 34, “Runtime usage limits Resource Usage Limits” for more information.
PAGE 390
About Resource Allocation Limits Job limits Job limits, specified by JOBS in a Limit section in lsb.resources, correspond to the maximum number of running and suspended jobs that can run at any point in time. If both job limits and job slot limits are configured, the most restrictive limit is applied. Resource reservation and backfill When processor or memory reservation occurs, the reserved resources count against the limits for users, queues, hosts, projects, and processors.
PAGE 391
Resource Allocation Limits bjobs shows 1 job running in the short queue, and two jobs running in the normal queue: bjobs JOBID USER STAT QUEUE FROM_HOST 17 user1 RUN normal 18 user1 PEND 19 user1 16 user1 20 21 EXEC_HOST JOB_NAME SUBMIT_TIME hosta hosta sleep 1000 Aug 30 16:26 normal hosta sleep 1000 Aug 30 16:26 RUN short hosta hosta sleep 1000 Aug 30 16:26 RUN normal hosta hosta sleep 1000 Aug 30 16:26 user1 PEND short hosta sleep 1000 Aug 30 16:26 user1 PEND
PAGE 392
About Resource Allocation Limits Limits for users and Jobs are normally queued on a first-come, first-served (FCFS) basis. It is possible for some users to abuse the system by submitting a large number of jobs; jobs from user groups other users must wait until these jobs complete. Limiting resources by user prevents users from monopolizing all the resources.
PAGE 393
Resource Allocation Limits Configuring Resource Allocation Limits Contents ◆ lsb.resources file on page 393 ◆ Enable resource allocation limits on page 394 ◆ Configure cluster-wide limits on page 394 ◆ Compatibility with pre-version 7 job slot limits on page 394 ◆ How resource allocation limits map to pre-version 7 job slot limits on page 395 ◆ How conflicting limits are resolved on page 396 ◆ Example limit configurations on page 398 lsb.
PAGE 394
Configuring Resource Allocation Limits Consumer parameters For jobs submitted ... Set in a Limit section of lsb.resources ...
PAGE 395
Resource Allocation Limits How resource allocation limits map to pre-version 7 job slot limits Job slot limits are the only type of limit you can configure in lsb.users, lsb.hosts, and lsb.queues. You cannot configure limits for user groups, host groups, and projects in lsb.users, lsb.hosts, and lsb.queues. You should not configure any new resource allocation limits in lsb.users, lsb.hosts, and lsb.queues. Use lsb.resources to configure all new resource allocation limits, including job slot limits.
PAGE 396
Configuring Resource Allocation Limits How conflicting limits are resolved LSF handles two kinds of limit conflicts: ◆ Similar conflicting limits ◆ Equivalent conflicting limits Similar conflicting limits For similar limits configured in lsb.resources, lsb.users, lsb.hosts, or lsb.queues, the most restrictive limit is used. For example, a slot limit of 3 for all users is configured in lsb.
PAGE 397
Resource Allocation Limits 825 user1 RUN normal hostA hostA sleep 1000 Jan 22 16:38 826 user1 RUN normal hostA hostA sleep 1000 Jan 22 16:38 827 user1 PEND normal hostA sleep 1000 Jan 22 16:38 Only one job (827) remains pending because the more restrictive limit of 3 in lsb.
PAGE 398
Configuring Resource Allocation Limits Reservation and backfill Reservation and backfill are still made at the job slot level, but despite a slot reservation being satisfied, the job may ultimately not run because the JOBS limit has been reached. This similar to a job not running because a license is not available. Other jobs ◆ brun forces a pending job to run immediately on specified hosts. A job forced to run with brun is counted as a running job, which may violate JOBS limits.
PAGE 399
Resource Allocation Limits ◆ Each other queue can run 30 jobs, each queue using up to 300 MB of memory in total Begin Limit HOSTS SLOTS MEM PER_QUEUE license1 10 - normal license1 - 200 short license1 30 300 (all ~normal ~short) End Limit Example 4 All users in user group ugroup1 except user1 using queue1 and queue2 and running jobs on hosts in host group hgroup1 are limited to 2 job slots per processor on each host: Begin Limit NAME = limit1 # Resources: SLOTS_PER_PROCESSOR = 2 #Consum
PAGE 400
Configuring Resource Allocation Limits Example 7 All users in user group ugroup1 except user1 can use all queues but queue1 and run jobs with a limit of 10% of available memory on each host in host group hgroup1: Begin Limit NAME = 10_percent_mem # Resources: MEM = 10% QUEUES = all ~queue1 USERS = ugroup1 ~user1 PER_HOST = hgroup1 End Limit Example 8 Limit users in the develop group to 1 job on each host, and 50% of the memory on the host.
PAGE 401
Resource Allocation Limits Example 11 Limit all hosts to 1 job slot per processor: Begin Limit NAME = default_limit SLOTS_PER_PROCESSOR = 1 PER_HOST = all End Limit Example 12 The short queue can have at most 200 running and suspended jobs: Begin Limit NAME = shortq_limit QUEUES = short JOBS = 200 End Limit Administering Platform LSF 401
PAGE 402
Viewing Information about Resource Allocation Limits Viewing Information about Resource Allocation Limits Your job may be pending because some configured resource allocation limit has been reached. Use the blimits command to show the dynamic counters of resource allocation limits configured in Limit sections in lsb.resources. blimits displays the current resource usage to show what limits may be blocking your job.
PAGE 403
Resource Allocation Limits Examples For the following limit definitions: Begin Limit NAME = limit1 USERS = user1 PER_QUEUE = all PER_HOST = hostA hostC TMP = 30% SWP = 50% MEM = 10% End Limit Begin Limit NAME = limit_ext1 PER_HOST = all RESOURCE = ([user1_num,30] [hc_num,20]) End Limit Begin Limit NAME = limit2 QUEUES = short JOBS = 200 End Limit blimits displays the following: blimits INTERNAL RESOURCE LIMITS: NAME USERS QUEUES HOSTS PROJECTS SLOTS MEM limit1 user1 q2 hostA@cluster1 - - 10/25
PAGE 404
Viewing Information about Resource Allocation Limits ◆ In limit policy limit_ext1, external resource user1_num is limited to 30 per host and external resource hc_num is limited to 20 per host. Again, no limits have been reached, so the jobs requesting those resources should run. ◆ In limit policy limit2, the short queue can have at most 200 running and suspended jobs. 50 jobs are running or suspended against the 200 job limit. The limit has not been reached, so jobs can run in the short queue.
PAGE 405
C H A P T E R 23 Reserving Resources Contents ◆ About Resource Reservation on page 405 ◆ Using Resource Reservation on page 406 ◆ Memory Reservation for Pending Jobs on page 408 ◆ Time-based Slot Reservation on page 410 ◆ Viewing Resource Reservation Information on page 417 About Resource Reservation When a job is dispatched, the system assumes that the resources that the job consumes will be reflected in the load information.
PAGE 406
Using Resource Reservation For example: bsub -R "rusage[tmp=30:duration=30:decay=1]" myjob will reserve 30 MB of temp space for the job. As the job runs, the amount reserved will decrease at approximately 1 MB/minute such that the reserved amount is 0 after 30 minutes. Queue-level and job-level resource reservation The queue level resource requirement parameter RES_REQ may also specify the resource reservation.
PAGE 407
Reserving Resources This will allow a job to be scheduled on any host that the queue is configured to use. The job will attempt to reserve 20 MB of memory, or 10 MB of memory and 20 MB of swap if the 20 MB of memory is unavailable. Job-level resource reservation 1 To specify resource reservation at the job level, use bsub -R and include the resource usage section in the resource requirement string.
PAGE 408
Memory Reservation for Pending Jobs Memory Reservation for Pending Jobs About memory reservation for pending jobs By default, the rusage string reserves resources for running jobs. Because resources are not reserved for pending jobs, some memory-intensive jobs could be pending indefinitely because smaller jobs take the resources immediately before the larger jobs can start running. The more memory a job requires, the worse the problem is.
PAGE 409
Reserving Resources Example queues The following queue enables memory reservation for pending jobs: Begin Queue QUEUE_NAME = reservation DESCRIPTION = For resource reservation PRIORITY=40 RESOURCE_RESERVE = MAX_RESERVE_TIME[20] End Queue Use memory reservation for pending jobs 1 Use the rusage string in the -R option to bsub or the RES_REQ parameter in lsb.queues to specify the amount of memory required for the job. Submit the job to a queue with RESOURCE_RESERVE configured.
PAGE 410
Time-based Slot Reservation Examples lsb.queues The following queues are defined in lsb.queues: Begin Queue QUEUE_NAME = reservation DESCRIPTION = For resource reservation PRIORITY=40 RESOURCE_RESERVE = MAX_RESERVE_TIME[20] End Queue Assumptions Assume one host in the cluster with 10 CPUs and 1 GB of free memory currently available. Sequential jobs Each of the following sequential jobs requires 400 MB of memory and runs for 300 minutes.
PAGE 411
Reserving Resources Time-based slot reservation vs. greedy slot reservation With time-based reservation, a set of pending jobs will get future allocation and an estimated start time, so that job can be placed in the future. Reservations are made based on future allocation to guarantee estimated start time. Start time and future allocation The estimated start time for a future allocation is the earliest start time when all considered job constraints are satisfied in the future.
PAGE 412
Time-based Slot Reservation Start time prediction Time-based reservation Greedy reservation Absolute predicted start time for all jobs Advance reservation considered No No No No Greedy reservation example A cluster has four hosts: A, B, C and D, with 4 CPUs each. Four jobs are running in the cluster: Job1, Job2, Job3 and Job4. According to calculated job estimated start time, the job finish times (FT) have this order: FT(Job2) < FT(Job4) < FT(Job1) < FT(Job3).
PAGE 413
Reserving Resources Some scheduling examples 1 Job5 requests –n 6 –R “span[ptile=2]”, which will require three hosts with 2 CPUs on each host. As in the greedy slot reservation example, four jobs are running in the cluster: Job1, Job2, Job3 and Job4.
PAGE 414
Time-based Slot Reservation 5 Job5 can now be placed with 2 CPUs on host A, 2 CPUs on host C, and 2 CPUs on host D. The estimated start time is shown as the finish time of Job1: Assumptions and limitations ◆ To get an accurate estimated start time, you must specify a run limit at the job level using the bsub -W option, in the queue by configuring RUNLIMIT in lsb.queues, or in the application by configuring RUNLIMIT in lsb.
PAGE 415
Reserving Resources Memory request ◆ MXJ, JL/U in lsb.hosts ◆ PJOB_LIMIT, HJOB_LIMIT, QJOB_LIMIT, UJOB_LIMIT in lsb.queues To request memory resources, configure RESOURCE_RESERVE in lsb.queues. When RESOURCE_RESERVE is used, LSF will consider memory and slot requests during time-based reservation calculation. LSF will not reserve slot or memory if any other resources are not satisfied.
PAGE 416
Time-based Slot Reservation Reservation scenarios Scenario 1 Even though no running jobs finish and no host status in cluster are changed, a job’s future allocation may still change from time to time. Why this happens Each scheduling cycle, the scheduler recalculates a job’s reservation information, estimated start time and opportunity for future allocation. The job candidate host list may be reordered according to current load.
PAGE 417
Reserving Resources Job 3876 is submitted and requests -n 6 -ext "RMS[nodes=3]". bjobs -l 3876 Job <3876>, User , Project , Status , Queue , Extsch ed , Command Fri Apr 22 15:35:28: Submitted from host , CWD <$HOME>, 6 Processors R equested; RUNLIMIT 840.0 min of sierraa Fri Apr 22 15:35:46: Reserved <4> job slots on host(s) <4*sierrab>; Sat Apr 23 01:34:12: Estimated job start time; rms_alloc=2*sierra[0,2-3] ...
PAGE 418
Viewing Resource Reservation Information View queue-level resource information (bqueues) 1 Use bqueues -l to see the resource usage configured at the queue level.
PAGE 419
Reserving Resources loadSched loadStop cpuspeed - bandwidth - View per-resource reservation (bresources) 1 Use bresources to display per-resource reservation configurations from lsb.
PAGE 420
Viewing Resource Reservation Information 420 Administering Platform LSF
PAGE 421
C H A P T E R 24 Advance Reservation Contents ◆ Understanding Advance Reservations on page 422 ◆ Configure Advance Reservation on page 424 ◆ Using Advance Reservation on page 426 Administering Platform LSF 421
PAGE 422
Understanding Advance Reservations Understanding Advance Reservations Advance reservations ensure access to specific hosts during specified times. During the time that an advanced reservation is active only users or groups associated with the reservation have access to start new jobs on the reserved hosts. Only LSF administrators or root can create or delete advance reservations. Any LSF user can view existing advance reservations.
PAGE 423
Advance Reservation If a non-advance reservation job is submitted while the open reservation is active, it remains pending until the reservation expires. Any advance reservation jobs that were suspened and becaame normal jobs when the reservation expired are resumed first before dispatching the non-advance reservation job submitted while the reservation was active.
PAGE 424
Configure Advance Reservation Configure Advance Reservation Enable advance reservation 1 To enable advance reservation in your cluster, make sure the advance reservation scheduling plugin schmod_advrsv is configured in lsb.modules. Begin PluginModule SCH_PLUGIN RB_PLUGIN SCH_DISABLE_PHASES schmod_default () () schmod_advrsv () () End PluginModule Allow users to create advance reservations By default, only LSF administrators or root can add or delete advance reservations.
PAGE 425
Advance Reservation All users in user group ugroup1 except user1 can make advance reservations on any host in hgroup1, except hostB, between 10:00 p.m. and 6:00 a.m. every day Begin ResourceReservation NAME = nightPolicy USERS = ugroup1 ~user1 HOSTS = hgroup1 ~hostB TIME_WINDOW = 20:00-8:00 End ResourceReservation IMPORTANT: The not operator (~) does not exclude LSF administrators from the policy.
PAGE 426
Using Advance Reservation Using Advance Reservation Advance reservation commands Use the following commands to work with advance reservations: brsvadd Add a reservation brsvdel Delete a reservation brsvmod Modify a reservation brsvs View reservations Add reservations NOTE: By default, only LSF administrators or root can add or delete advance reservations. 1 Run brsvadd to create new advance reservations.
PAGE 427
Advance Reservation ❖ The -R option selects hosts for the reservation according to a resource requirements string. Only hosts that satisfy the resource requirement expression are reserved. -R accepts any valid resource requirement string, but only the select string takes effect. If you also specify a host list with the -m option, -R is optional. Add a one-time reservation 1 Use the -b and -e options of brsvadd to specify the begin time and end time of a one-time advance reservation.
PAGE 428
Using Advance Reservation The following command creates a one-time advance reservation that reserves 12 slots on hostA between 6:00 p.m. on 01 December 2003 and 6:00 a.m. on 31 January 2004: brsvadd -n 12 -m hostA -u user1 -b 2003:12:01:18:00 -e 2004:01:31:06:00 Reservation user1#2 is created Add a recurring reservation 1 Use the -t option of brsvadd to specify a recurring advance reservation. The -t option specifies a time window for the reservation.
PAGE 429
Advance Reservation The following command creates a system reservation on hostA every Friday from 6:00 p.m. to 8:00 p.m.: brsvadd -n 1024 -m hostA -s -t "5:18:0-5:20:0" Reservation "system#0" is created While the system reservation is active, no other jobs can use the reserved hosts, and LSF does not dispatch jobs to the specified hosts.
PAGE 430
Using Advance Reservation If a job already exists that references a reservation with the specified name, an error message is returned: The specified reservation name is referenced by a job. Modify an advance reservation 1 Use brsvmod to modify reservations. Specify the reservation ID for the reservation you want to modify. For example, run the following command to extend the duration from 6:00 a.m. to 9:00 a.m.
PAGE 431
Advance Reservation ◆ Time means the time window of the reservation ◆ t1 is the begin time of the reservation ◆ t2 is the end time of the reservation ◆ The reservation size means the resources that are reserved, such as hosts (slots) or host groups Use brsvmod to shift, extend or reduce the time window horizontally; grow or shrink the size vertically. Extending the duration The following command creates a one-time advance reservation for 1024 job slots on host hostA for user user1 between 6:00 a.
PAGE 432
Using Advance Reservation RSVID TYPE USER groupA#0 user groupA NCPUS RSV_HOSTS TIME_WINDOW 0/1024 hostA:0/256 3:3:0-3:3:0 * hostB:0/768 The following commands reserve 512 slots from each host for the reservation: brsvmod addhost -n 256 -m "hostA" groupA#0 Reservation "groupA#0" is modified brsvmod rmhost -n 256 -m "hostB" groupA#0 Reservation "groupA#0" is modified Removing hosts from a reservation allocation Use brsvmod rmhost to remove hosts or slots on hosts from the original reservation a
PAGE 433
Advance Reservation Modifying closed reservations The following command creates an open advance reservation for 1024 job slots on host hostA for user user1 between 6:00 a.m. and 8:00 a.m. today. brsvadd -o -n 1024 -m hostA -u user1 -b 6:0 -e 8:0 Reservation "user1#0" is created Run the following command to close the reservation when it expires.
PAGE 434
Using Advance Reservation Run the following command to disable the reservation instance that is active between Dec 1 to Dec 10, 2007.
PAGE 435
Advance Reservation The command … Checks policies for … Creator Host TimeWindow brsvadd Yes Yes Yes brsvdel No No No -u or -g (changing user) No No No addhost Yes Yes Yes rmhost No No No -b, -e, -t (change timeWindow) Yes Yes Yes -d (description) No No No -o or -on No No No brsvmod Reservation policies are checked when: ◆ Modifying the reservation time window ◆ Adding hosts to the reservation Reservation policies are not checked when ◆ Running brsvmod to remove host
PAGE 436
Using Advance Reservation ◆ A one-time reservation displays fields separated by slashes (month/day/hour/minute). For example: 11/12/14/0-11/12/18/0 ◆ A recurring reservation displays fields separated by colons (day:hour:minute). An asterisk (*) indicates a recurring reservation. For example: 5:18:0-5:20:0 * Show a weekly planner 1 Use brsvs -p to show a weekly planner for specified hosts using advance reservation. The all keyword shows the planner for all hosts with reservations.
PAGE 437
Advance Reservation ... 7:30 7:40 7:50 8:0 8:10 8:20 ... 11:30 11:40 11:50 12:0 12:10 12:20 ... 13:30 13:40 13:50 14:0 14:10 14:20 ... 17:30 17:40 17:50 18:0 18:10 18:20 ... 19:30 19:40 19:50 20:0 20:10 20:20 ...
PAGE 438
Using Advance Reservation 2 Use brsvs -z instead of brsvs -p to show only the weekly items that have reservation configurations. Lines that show all zero (0) are omitted.
PAGE 439
Advance Reservation Reservation Status: Inactive Reservation disabled for these dates: Fri Feb 15 2008 Wed Feb 20 2008 - Mon Feb 25 2008 Next Active Period: Sat Feb 16 10:00:00 2008 - Sat Feb 16 12:00:00 2008 Creator: user1 Reservation Type: CLOSED RSVID TYPE USER user1#6 user user1 NCPUS RSV_HOSTS TIME_WINDOW 0/1 hostA:0/1 10:00-13:00 * Reservation Status: Active Creator: user1 Reservation Type: CLOSED Show reservation ID 1 Use bjobs -l to show the reservation ID used by a job: bjobs -l Job <
PAGE 440
Using Advance Reservation by jobs referencing user2#0, hostA is no longer available to other jobs using reservation user2#0. Any single user or user group can have a maximum of 100 reservation IDs. Jobs referencing the reservation are killed when the reservation expires. Modify job reservation ID Prerequisites: You must be an administrator to perform this task. 1 Use the -U option of bmod to change a job to another reservation ID.
PAGE 441
Advance Reservation If an advance reservation is modified, preemption is done for an active advance reservation after every modification of the reservation when there is at least one pending or suspended job associated with the reservation. When slots are added to an advance reservation with brsvmod, LSF preempts running non-reservation jobs if necessary to provide slots for jobs belonging to the reservation.
PAGE 442
Using Advance Reservation ◆ The user names of users who used the brsvadd command to create the advance reservations ◆ The user names of the users who can use the advance reservations (with bsub -U) ◆ Number of slots reserved ◆ List of hosts for which job slots are reserved ◆ Time window for the reservation. ❖ A one-time reservation displays fields separated by slashes (month/day/hour/minute).
PAGE 443
Advance Reservation RSVID TYPE CREATOR USER NCPUS RSV_HOSTS user1#1 user user1 user1 2 hostA:2 Active time with this configuration: 0 hour 1 minute 34 second ------------------------ Configuration 4 -----------------------RSVID TYPE CREATOR USER NCPUS RSV_HOSTS user1#1 user user1 user1 1 hostA:2 Active time with this configuration: 0 hour 2 minute 30 second The following reservation (user2#0) has one time modification during its life time.
PAGE 444
Using Advance Reservation 444 Administering Platform LSF
PAGE 445
C H A P T E R 25 Dispatch and Run Windows Contents ◆ Dispatch and Run Windows on page 445 ◆ Run Windows on page 445 ◆ Dispatch Windows on page 446 Dispatch and Run Windows Both dispatch and run windows are time windows that control when LSF jobs start and run. ◆ Dispatch windows can be defined in lsb.hosts. Dispatch and run windows can be defined in lsb.queues. ◆ Hosts can only have dispatch windows. Queues can have dispatch windows and run windows.
PAGE 446
Dispatch Windows Jobs can be submitted to a queue at any time; if the run window is closed, the jobs remain pending until it opens again. If the run window is open, jobs are placed and dispatched as usual. When an open run window closes, running jobs are suspended, and pending jobs remain pending. The suspended jobs are resumed when the window opens again. Configure run windows 1 To configure a run window, set RUN_WINDOW in lsb.queues. For example, to specify that the run window will be open from 4:30 a.
PAGE 447
Dispatch and Run Windows Configure dispatch windows Dispatch windows can be defined for both queues and hosts. The default is no restriction, or always open. Configure host dispatch windows 1 To configure dispatch windows for a host, set DISPATCH_WINDOW in lsb.hosts and specify one or more time windows. If no host dispatch window is configured, the window is always open. Configure queue dispatch windows 1 To configure dispatch windows for queues, set DISPATCH_WINDOW in lsb.
PAGE 448
Dispatch Windows 448 Administering Platform LSF
PAGE 449
C H A P T E R 26 Job Dependencies Contents ◆ Job Dependency Scheduling on page 449 ◆ Dependency Conditions on page 450 Job Dependency Scheduling About job dependency scheduling Sometimes, whether a job should start depends on the result of another job. For example, a series of jobs could process input data, run a simulation, generate images based on the simulation output, and finally, record the images on a high-resolution film output device.
PAGE 450
Dependency Conditions The dependency expression is a logical expression composed of one or more dependency conditions. For syntax of individual dependency conditions, see Dependency Conditions on page 450. ◆ To make dependency expression of multiple conditions, use the following logical operators: ❖ && (AND) ❖ || (OR) ❖ ! (NOT) ◆ Use parentheses to indicate the order of operations, if necessary.
PAGE 451
Job Dependencies ◆ started(job_ID | "job_name") done Syntax done(job_ID | "job_name") Description The job state is DONE. ended Syntax ended(job_ID | "job_name") Description The job state is EXIT or DONE. exit Syntax exit(job_ID | "job_name"[,[operator] exit_code]) where operator represents one of the following relational operators: > >= < <= == != Description The job state is EXIT, and the job’s exit code satisfies the comparison test.
PAGE 452
Dependency Conditions Job ID or job name Syntax job_ID | "job_name" Description If you specify a job without a dependency condition, the test is for the DONE state (LSF assumes the “done” dependency condition by default). post_done Syntax post_done(job_ID | "job_name") Description The job state is POST_DONE (the post-processing of specified job has completed without errors).
PAGE 453
Job Dependencies The submitted job will not start unless the job named 210 is finished. The numeric job name should be doubly quoted, since the UNIX shell treats -w "210" the same as -w 210, which would evaluate the job with the job ID of 210.
PAGE 454
Dependency Conditions 454 Administering Platform LSF
PAGE 455
C H A P T E R 27 Job Priorities Contents ◆ User-Assigned Job Priority on page 455 ◆ Automatic Job Priority Escalation on page 457 ◆ Absolute Job Priority Scheduling on page 457 User-Assigned Job Priority User-assigned job priority provides controls that allow users to order their jobs in a queue. Job order is the first consideration to determine job eligibility for dispatch. Jobs are still subject to all scheduling policies regardless of job priority.
PAGE 456
User-Assigned Job Priority Configure job priority Syntax 1 To configure user-assigned job priority edit lsb.params and define MAX_USER_PRIORITY. This configuration applies to all queues in your cluster. 2 Use bparams -l to display the value of MAX_USER_PRIORITY. MAX_USER_PRIORITY=max_priority Where: max_priority Specifies the maximum priority a user can assign to a job. Valid values are positive integers. Larger values represent higher priority; 1 is the lowest.
PAGE 457
Job Priorities bjobs -l [job_ID] Displays the current job priority and the job priority at submission time. Job priorities are changed by the job owner, LSF and queue administrators, and automatically when automatic job priority escalation is enabled.
PAGE 458
Absolute Job Priority Scheduling When configured in a queue, APS sorts pending jobs for dispatch according to a job priority value calculated based on several configurable job-related factors. Each job priority weighting factor can contain subfactors. Factors and subfactors can be independently assigned a weight. APS provides administrators with detailed yet straightforward control of the job selection process. ◆ APS only sorts the jobs; job scheduling is still based on configured LSF scheduling policies.
PAGE 459
Job Priorities Factors and subfactors Factors Subfactors Metric FS (user based fairshare factor) The existing fairshare feature tunes the dynamic user priority The fairshare factor automatically adjusts the APS value based on dynamic user priority. FAIRSHARE must be defined in the queue. The FS factor is ignored for non-fairshare queues. The FS factor is influenced by the following fairshare parameters in lsb.
PAGE 460
Absolute Job Priority Scheduling Where LSF gets the job information for each factor Factor or subfactor Gets job information from ... MEM The value for jobs submitted with -R "rusage[mem]" SWAP The value for jobs submitted with -R "rusage[swp]" PROC The value of n for jobs submitted with bsub -n (min, max), or the value of PROCLIMIT in lsb.queues JPRIORITY The dynamic priority of the job, updated every scheduling cycle and escalated by interval defined in JOB_PRIORITY_OVER_TIME defined in lsb.
PAGE 461
Job Priorities The system APS value set by bmod -aps is preserved after mbatchd reconfiguration or mbatchd restart. Use the ADMIN factor to adjust the APS value Administrators can use bmod -aps "admin=value" to change the calculated APS value for a pending job. The ADMIN factor is added to the calculated APS value to change the factor value. The absolute priority of the job is recalculated. The value cannot be zero (0).
PAGE 462
Absolute Job Priority Scheduling Absolute Priority Scheduling factor string changed to : system=10; Tue Feb 13 15:15:48: Parameters of Job are changed: Absolute Priority Scheduling factor string changed to : admin=20; Tue Feb 13 15:15:58: Parameters of Job are changed: Absolute Priority Scheduling factor string deleted; Summary of time in seconds spent in various states by Tue Feb 13 15:16:02 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 36 0 0 0 0 0 36 Configure APS across multiple queues Use QUEU
PAGE 463
Job Priorities PRIORITY = 30 NICE = 20 FAIRSHARE = USER_SHARES [[user1, 5000] [user2, 5000] [others, 1]] APS_PRIORITY = WEIGHT [[JPRIORITY, 1] [QPRIORITY, 10] [FS, 100]] QUEUE_GROUP = short DESCRIPTION = For normal low priority jobs, running only if hosts are lightly loaded. End Queue The APS value is now calculated as APS_PRIORITY = 1 * (1 * job_priority + 10 * queue_priority) + 100 * user_priority Example 3 Extending example 2, you now to add swap space to the APS value calculation.
PAGE 464
Absolute Job Priority Scheduling 22 User1 PEND Short HostA myjob Dec 21 14:30 (60) 2 User1 PEND Short HostA myjob Dec 21 11:00 360 12 User2 PEND normal HostB myjob Dec 21 14:30 355 4 User1 PEND Short HostA myjob Dec 21 14:00 270 5 User1 PEND Idle HostA myjob Dec 21 14:01 - For job 2, APS = 10 * 20 + 1 * (50 + 220 * 5 /10) = 360 For job 12, APS = 10 *30 + 1 * (50 + 10 * 5/10) = 355 For job 4, APS = 10 * 20 + 1 * (50 + 40 * 5 /10) = 270 View APS configuration for a que
PAGE 465
Job Priorities USERS: all HOSTS: all REQUEUE_EXIT_VALUES: 10 Feature interactions Fairshare The default user-based fairshare can be a factor in APS calculation by adding the FS factor to APS_PRIORITY in the queue. ◆ APS cannot be used together with DISPATCH_ORDER=QUEUE. ◆ APS cannot be used together with cross-queue fairshare (FAIRSHARE_QUEUES). The QUEUE_GROUP parameter replaces FAIRSHARE_QUEUES, which is obsolete in LSF 7.0.
PAGE 466
Absolute Job Priority Scheduling ◆ Job priority Backfill scheduling Not affected. Advance reservation Not affected.
PAGE 467
C H A P T E R 28 Job Requeue and Job Rerun Contents ◆ About Job Requeue on page 468 ◆ Automatic Job Requeue on page 469 ◆ Job-level automatic requeue on page 471 ◆ Reverse Requeue on page 472 ◆ Exclusive Job Requeue on page 473 ◆ User-Specified Job Requeue on page 474 ◆ Automatic Job Rerun on page 475 Administering Platform LSF 467
PAGE 468
About Job Requeue About Job Requeue A networked computing environment is vulnerable to any failure or temporary conditions in network services or processor resources. For example, you might get NFS stale handle errors, disk full errors, process table full errors, or network connectivity problems. Your application can also be subject to external conditions such as a software license problems, or an occasional failure due to a bug in your application.
PAGE 469
Job Requeue and Job Rerun Automatic Job Requeue You can configure a queue to automatically requeue a job if it exits with a specified exit value. ◆ The job is requeued to the head of the queue from which it was dispatched, unless the LSB_REQUEUE_TO_BOTTOM parameter in lsf.conf is set. ◆ When a job is requeued, LSF does not save the output from the failed run. ◆ When a job is requeued, LSF does not notify the user by sending mail. ◆ A job terminated by a signal is not requeued.
PAGE 470
Automatic Job Requeue When MAX_JOB_REQUEUE is set, if a job fails and its exit value falls into REQUEUE_EXIT_VALUES, the number of times the job has been requeued is set to 1 and the job is requeued. When the requeue limit is reached, the job is suspended with PSUSP status. If a job fails and its exit value is not specified in REQUEUE_EXIT_VALUES, the default requeue behavior applies.
PAGE 471
Job Requeue and Job Rerun Job-level automatic requeue Use bsub -Q to submit a job that is automatically requeued if it exits with the specified exit values. Use spaces to separate multiple exit codes. The reserved keyword all specifies all exit codes. Exit codes are typically between 0 and 255. Use a tilde (~) to exclude specified exit codes from the list. Job-level requeue exit values override application-level and queue-level configuration of the parameter REQUEUE_EXIT_VALUES, if defined.
PAGE 472
Reverse Requeue Reverse Requeue By default, if you use automatic job requeue, jobs are requeued to the head of a queue. You can have jobs requeued to the bottom of a queue instead. The job priority does not change. Configure reverse requeue You must already use automatic job requeue (REQUEUE_EXIT_VALUES in lsb.queues). To configure reverse requeue: 1 Set LSB_REQUEUE_TO_BOTTOM in lsf.conf to 1.
PAGE 473
Job Requeue and Job Rerun Exclusive Job Requeue You can configure automatic job requeue so that a failed job is not rerun on the same host. Limitations ◆ If mbatchd is restarted, this feature might not work properly, since LSF forgets which hosts have been excluded. If a job ran on a host and exited with an exclusive exit code before mbatchd was restarted, the job could be dispatched to the same host again after mbatchd is restarted.
PAGE 474
User-Specified Job Requeue User-Specified Job Requeue You can use brequeue to kill a job and requeue it. When the job is requeued, it is assigned the PEND status and the job’s new position in the queue is after other jobs of the same priority. Requeue a job 1 To requeue one job, use brequeue. ◆ You can only use brequeue on running (RUN), user-suspended (USUSP), or system-suspended (SSUSP) jobs. ◆ Users can only requeue their own jobs.
PAGE 475
Job Requeue and Job Rerun Automatic Job Rerun Job requeue vs. job rerun Automatic job requeue occurs when a job finishes and has a specified exit code (usually indicating some type of failure). Automatic job rerun occurs when the execution host becomes unavailable while a job is running. It does not occur if the job itself fails. About job rerun When a job is rerun or restarted, it is first returned to the queue from which it was dispatched with the same options as the original job.
PAGE 476
Automatic Job Rerun Submit a job as not rerunnable 1 To disable automatic job rerun at the job level, use bsub -rn. Disable post-execution for rerunnable jobs Running of post-execution commands upon restart of a rerunnable job may not always be desirable; for example, if the post-exec removes certain files, or does other cleanup that should only happen if the job finishes successfully. 1 476 Administering Platform LSF Use LSB_DISABLE_RERUN_POST_EXEC=Y in lsf.
PAGE 477
C H A P T E R 29 Job Checkpoint, Restart, and Migration Job checkpoint and restart optimizes resource usage by enabling a non-interactive job to restart on a new host from the point at which the job stopped—checkpointed jobs do not have to restart from the beginning. Job migration facilitates load balancing by enabling users to move a job from one host to another while taking advantage of job checkpoint and restart functionality.
PAGE 478
Checkpoint and restart options Checkpoint and restart options You can implement job checkpoint and restart at one of the following levels.
PAGE 479
Job Checkpoint, Restart, and Migration ◆ If the administrator specifies an initial checkpoint period in an application profile, in minutes, the first checkpoint does not happen until the initial period has elapsed. LSF then creates a checkpoint file every chkpnt_period after the initial checkpoint period, during job execution.
PAGE 480
Checkpoint and restart executables brestart -q priority mydir 123 Job <456> is submitted to queue LSF assigns a new job ID of 456, submits the job to the queue named "priority," and restarts the job. Once job 456 is running, you can change the checkpoint period using the bchkpnt command: bchkpnt -p 360 456 Job <456> is being checkpointed NOTE: For a detailed description of the commands used with the job checkpoint and restart feature, see the Platform LSF Configuration Reference.
PAGE 481
Job Checkpoint, Restart, and Migration Requirements To allow restart of a checkpointed job on a different host than the host on which the job originally ran, both the original and the new hosts must: ◆ Be binary compatible ◆ Run the same dot version of the operating system for predictable results ◆ Have network connectivity and read/execute permissions to the checkpoint and restart executables (in LSF_SERVERDIR by default) ◆ Have network connectivity and read/write permissions to the checkpoint dire
PAGE 482
Job migration At the host level, in lsb.hosts: Begin Host HOST_NAME ... hostA ... End Host r1m pg MIG # Keywords 5.0 18 30 For example, in an application profile, in lsb.applications: Begin Application ... MIG=30 # Migration threshold set to 30 mins DESCRIPTION=Migrate suspended jobs after 30 mins ... End Application If you want to requeue migrated jobs instead of restarting or rerunning them, you can define the following parameters in lsf.
PAGE 483
C H A P T E R 30 Chunk Job Dispatch Contents ◆ About Job Chunking on page 483 ◆ Configure Chunk Job Dispatch on page 484 ◆ Submitting and Controlling Chunk Jobs on page 486 About Job Chunking LSF supports job chunking, where jobs with similar resource requirements submitted by the same user are grouped together for dispatch. The CHUNK_JOB_SIZE parameter in lsb.queues and lsb.applications specifies the maximum number of jobs allowed to be dispatched together in a chunk job.
PAGE 484
Configure Chunk Job Dispatch Configuring a special high-priority queue for short jobs is not desirable because users may be tempted to send all of their jobs to this queue, knowing that it has high priority. Configure Chunk Job Dispatch CHUNK_JOB_SIZE (lsb.queues) By default, CHUNK_JOB_SIZE is not enabled. 1 To configure a queue to dispatch chunk jobs, specify the CHUNK_JOB_SIZE parameter in the queue definition in lsb.queues.
PAGE 485
Chunk Job Dispatch ◆ A queue-level CPU limit or run time limit is specified (CPULIMIT or RUNLIMIT in lsb.queues), and the values of the CPU limit, run time limit, and run time estimate are all less than or equal to the CHUNK_JOB_DURATION. Jobs are not chunked if: ◆ The CPU limit, run time limit, or run time estimate is greater than the value of CHUNK_JOB_DURATION, or ◆ No CPU limit, no run time limit, and no run time estimate are specified. The value of CHUNK_JOB_DURATION is displayed by bparams -l.
PAGE 486
Submitting and Controlling Chunk Jobs Submitting and Controlling Chunk Jobs When a job is submitted to a queue or application profile configured with the CHUNK_JOB_SIZE parameter, LSF attempts to place the job in an existing chunk.
PAGE 487
Chunk Job Dispatch Action (Command) Resume (bresume) Migrate (bmig) Switch queue (bswitch) Job State Effect on Job (State) WAIT RUN Job finishes (NJOBS-1, PEND -1) Entire chunk is resumed (RUN +1, USUSP -1) Removed from chunk Job is removed from the chunk and switched; all other WAIT jobs are requeued to PEND Only the WAIT job is removed from the chunk and switched, and requeued to PEND Job is checkpointed normally PEND Removed from the chunk to be scheduled later USUSP WAIT RUN WAIT Checkpoint (b
PAGE 488
Submitting and Controlling Chunk Jobs 488 Administering Platform LSF
PAGE 489
C H A P T E R 31 Job Arrays LSF provides a structure called a job array that allows a sequence of jobs that share the same executable and resource requirements, but have different input files, to be submitted, controlled, and monitored as a single unit. Using the standard LSF commands, you can also control and monitor individual jobs and groups of jobs submitted from a job array. After the job array is submitted, LSF independently schedules and dispatches the individual jobs.
PAGE 490
Create a Job Array Syntax The bsub syntax used to create a job array follows: bsub -J "arrayName[indexList, ...]" myJob Where: -J "arrayName[indexList, ...]" Names and creates the job array. The square brackets, [ ], around indexList must be entered exactly as shown and the job array name specification must be enclosed in quotes. Commas (,) are used to separate multiple indexList entries. The maximum length of this specification is 255 characters.
PAGE 491
Job Arrays By default, the maximum number of jobs in a job array is 1000, which means the maximum size of a job array can never exceed 1000 jobs. 1 To make a change to the maximum job array value, set MAX_JOB_ARRAY_SIZE in lsb.params to any positive integer between 1 and 2147483646. The maximum number of jobs in a job array cannot exceed the value set by MAX_JOB_ARRAY_SIZE.
PAGE 492
Passing Arguments on the Command Line Redirect standard input 1 Use the -i option of bsub and the %I variable when your executable reads from standard input. To use %I, all the input files must be named consistently with a variable part that corresponds to the indices of the job array. For example: input.1, input.2, input.3, ..., input.N For example, the following command submits a job array of 1000 jobs whose input files are named input.1, input.2, input.3, ..., input.
PAGE 493
Job Arrays input.2, input.3, ..., input.1000 and located in the current working directory. The executable is being passed an argument that specifies the name of the input files: bsub -J "myArray[1-1000]" myJob -f input.\$LSB_JOBINDEX Job Array Dependencies Like all jobs in LSF, a job array can be dependent on the completion or partial completion of a job or another job array. A number of job-array-specific dependency conditions are provided by LSF.
PAGE 494
Individual job status Display job array status 1 To display summary information about the currently running jobs submitted from a job array, use the -A option of bjobs. For example, a job array of 10 jobs with job ID 123: bjobs -A 123 JOBID ARRAY_SPEC OWNER NJOBS PEND DONE 123 myArra[1-10] user1 10 3 RUN EXIT SSUSP USUSP PSUSP 3 4 0 0 0 0 Individual job status Display current job status 1 To display the status of the individual jobs submitted from a job array, specify the job array job ID with bjobs.
PAGE 495
Job Arrays 456[8] user1 456[9] user1 456[10]user1 *rray[8] *rray[9] *ray[10] 339 356 375 0 0 0 29 26 24 0 0 0 0 0 0 0 0 0 368 382 399 Specific job status Display the current status of a specific job 1 To display the current status of a specific job submitted from a job array, specify in quotes, the job array job ID and an index value with bjobs.
PAGE 496
Requeuing a Job Array Control a whole array 1 To control the whole job array, specify the command as you would for a single job using only the job ID. For example, to kill a job array with job ID 123: bkill 123 Control individual jobs 1 To control an individual job submitted from a job array, specify the command using the job ID of the job array and the index value of the corresponding job. The job ID and index value must be enclosed in quotes.
PAGE 497
Job Arrays Requeue Jobs in EXIT state 1 To requeue EXIT jobs use the -e option of brequeue. For example, the command brequeue -J "myarray[1-10]" -e 123 requeues jobs with job ID 123 and EXIT status. Requeue all jobs in an array regardless of job state 1 A submitted job array can have jobs that have different job states. To requeue all the jobs in an array regardless of any job’s state, use the -a option of brequeue.
PAGE 498
Job Array Job Slot Limit Setting a job array job slot limit Set a job array slot limit at submission 1 Use the bsub command to set a job slot limit at the time of submission. To set a job array job slot limit of 100 jobs for a job array of 1000 jobs: bsub -J "job_array_name[1000]%100" myJob Set a job array slot limit after submission 1 Use the bmod command to set a job slot limit after submission.
PAGE 499
C H A P T E R 32 Running Parallel Jobs Contents ◆ How LSF Runs Parallel Jobs on page 499 ◆ Preparing Your Environment to Submit Parallel Jobs to LSF on page 500 ◆ Submitting Parallel Jobs on page 500 ◆ Starting Parallel Tasks with LSF Utilities on page 501 ◆ Job Slot Limits For Parallel Jobs on page 502 ◆ Specifying a Minimum and Maximum Number of Processors on page 502 ◆ Specifying a First Execution Host on page 503 ◆ Controlling Processor Allocation Across Hosts on page 504 ◆ Runn
PAGE 500
Preparing Your Environment to Submit Parallel Jobs to LSF Preparing Your Environment to Submit Parallel Jobs to LSF Getting the host list Some applications can take this list of hosts directly as a command line parameter. For other applications, you may need to process the host list. Example The following example shows a /bin/sh script that processes all the hosts in the host list, including identifying the host where the job script is executing.
PAGE 501
Running Parallel Jobs submits myjob as a parallel job. The job is started when 4 job slots are available. Starting Parallel Tasks with LSF Utilities For simple parallel jobs you can use LSF utilities to start parts of the job on other hosts. Because LSF utilities handle signals transparently, LSF can suspend and resume all components of your job without additional programming. Running parallel tasks with lsgrun The simplest parallel job runs an identical copy of the executable on every host.
PAGE 502
Job Slot Limits For Parallel Jobs ◆ Submit a job with a host list: bsub -n 4 blaunch -z "hostA hostB" myjob ◆ Submit a job with a host file: bsub -n 4 blaunch -u ./hostfile myjob ◆ Submit a job to an application profile bsub -n 4 -app pjob blaunch myjob Job Slot Limits For Parallel Jobs A job slot is the basic unit of processor allocation in LSF. A sequential job uses one job slot. A parallel job that has N components (tasks) uses N job slots, which can span multiple hosts.
PAGE 503
Running Parallel Jobs At most, 16 processors can be allocated to this job. If there are less than 16 processors eligible to run the job, this job can still be started as long as the number of eligible processors is greater than or equal to 4. Specifying a First Execution Host In general, the first execution host satisfies certain resource requirements that might not be present on other available hosts.
PAGE 504
Controlling Processor Allocation Across Hosts ◆ ❖ Become unavailable to the current job ❖ Remain available to other jobs as either regular or first execution hosts You cannot specify first execution host candidates when you use the brun command. If the first execution host is incorrect at job submission, the job is rejected. If incorrect configurations exist on the queue level, warning messages are logged and displayed when LSF starts, restarts or is reconfigured.
PAGE 505
Running Parallel Jobs If PARALLEL_SCHED_BY_SLOT=Y in lsb.params, the span string is used to control the number of job slots instead of processors. Syntax The span string supports the following syntax: span[hosts=1] Indicates that all the processors allocated to this job must be on the same host. span[ptile=value] Indicates the number of processors on each host that should be allocated to the job, where value is one of the following: ◆ Default ptile value, specified by n processors.
PAGE 506
Controlling Processor Allocation Across Hosts The following span strings are valid: same[type:model] span[ptile=LINUX:2,SGI:4] LINUX and SGI are both host types and can appear in the same span string. same[type:model] span[ptile=PC233:2,PC1133:4] PC233 and PC1133 are both host models and can appear in the same span string. You cannot mix host model and host type in the same span string.
PAGE 507
Running Parallel Jobs Submits myjob to request 4 processors running on 2 hosts of type LINUX (2 processors per host), or a single host of type SGI, or for other host types, the predefined maximum job slot limit in lsb.hosts (MXJ). bsub -n 16 -R "type==any same[type] span[ptile='!',HP:8,SGI:8,LINUX:2]" myjob Submits myjob to request 16 processors on 2 hosts of type HP or SGI (8 processors per hosts), or on 8 hosts of type LINUX (2 processors per host), or the predefined maximum job slot limit in lsb.
PAGE 508
Limiting the Number of Processors Allocated This example reflects a network in which network connections among hosts in the same group are high-speed, and network connections between host groups are low-speed. In order to specify this, you create a custom resource hgconnect in lsf.shared. Begin Resource RESOURCENAME hgconnect ... End Resource TYPE STRING INTERVAL () INCREASING () RELEASE () DESCRIPTION (OS release) In the lsf.cluster.
PAGE 509
Running Parallel Jobs 1 <= minimum <= default <= maximum You can specify up to three limits in the PROCLIMIT parameter: If you specify ... Then ... One limit It is the maximum processor limit. The minimum and default limits are set to 1. The first is the minimum processor limit, and the second is the maximum. The default is set equal to the minimum. The minimum must be less than or equal to the maximum.
PAGE 510
Limiting the Number of Processors Allocated If you specify -n min_proc,max_proc, but do not specify a queue, the first queue that satisfies the processor requirements of the job is used. If no queue satisfies the processor requirements, the job is rejected. Example For example, queues with the following PROCLIMIT values are defined in lsb.
PAGE 511
Running Parallel Jobs Example Description bsub -n 5 myjob The job myjob runs on 5 processors. The job myjob is rejected from the queue because the number of processors requested is less than the minimum number of processors configured for the queue (3). The job myjob runs on 4 or 5 processors. The job myjob runs on 3 to 6 processors. The job myjob runs on 4 to 8 processors. The default number of processors is equal to the minimum number (3). The job myjob runs on 3 processors.
PAGE 512
Reserving Memory for Pending Parallel Jobs time starts from the time the first slot is reserved. When the reservation time expires, the job cannot reserve any slots for one scheduling cycle, but then the reservation process can begin again. If you specify first execution host candidates at the job or queue level, LSF tries to reserve a job slot on the first execution host. If LSF cannot reserve a first execution host job slot, it does not reserve slots on any other hosts.
PAGE 513
Running Parallel Jobs Configuring memory reservation for pending parallel jobs Use the RESOURCE_RESERVE parameter in lsb.queues to reserve host memory for pending jobs, as described in Memory Reservation for Pending Jobs on page 408. lsb.queues 1 Set the RESOURCE_RESERVE parameter in a queue defined in lsb.queues. The RESOURCE_RESERVE parameter overrides the SLOT_RESERVE parameter.
PAGE 514
Backfill Scheduling: Allowing Jobs to Use Reserved Job Slots Backfill scheduling allows the reserved job slots to be used by small jobs that can run and finish before the large job starts. This improves the performance of LSF because it increases the utilization of resources. How backfilling works For backfill scheduling, LSF assumes that a job can run until its run limit expires. Backfill scheduling works most efficiently when all the jobs in the cluster have a run limit.
PAGE 515
Running Parallel Jobs Limitations ◆ A job does not have an estimated start time immediately after mbatchd is reconfigured. Backfilling and job slot limits A backfill job borrows a job slot that is already taken by another job. The backfill job does not run at the same time as the job that reserved the job slot first. Backfilling can take place even if the job slot limits for a host or processor have been reached. Backfilling cannot take place if the job slot limits for users or queues have been reached.
PAGE 516
Backfill Scheduling: Allowing Jobs to Use Reserved Job Slots View information about job start time 1 Use bjobs -l to view the estimated start time of a job. Using backfill on memory If BACKFILL is configured in a queue, and a run limit is specified with -W on bsub or with RUNLIMIT in the queue, backfill jobs can use the accumulated memory reserved by the other jobs, as long as the backfill job can finish before the predicted start time of the jobs with the reservation.
PAGE 517
Running Parallel Jobs Job 2: Submitting a second job with same requirements get the same result. Job 3: Submitting a third job with same requirements reserves one job slot, and reserve all free memory, if the amount of free memory is between 20 MB and 200 MB (some free memory may be used by the operating system or other software.) Job 4: bsub -W 400 -q backfill -R "rusage[mem=50]" myjob4 The job keeps pending, since memory is reserved by job 3 and it runs longer than job 1 and job 2.
PAGE 518
Backfill Scheduling: Allowing Jobs to Use Reserved Job Slots Using interruptible backfill Interruptible backfill scheduling can improve cluster utilization by allowing reserved job slots to be used by low priority small jobs that are terminated when the higher priority large jobs are about to start.
PAGE 519
Running Parallel Jobs Assumptions and limitations ◆ The interruptible backfill job holds the slot-reserving job start until its calculated start time, in the same way as a regular backfill job. The interruptible backfill job is killed when its run limit expires. ◆ Killing other running jobs prematurely does not affect the calculated run limit of an interruptible backfill job. Slot-reserving jobs do not start sooner.
PAGE 520
Backfill Scheduling: Allowing Jobs to Use Reserved Job Slots You should configure REQUEUE_EXIT_VALUES for the queue so that resubmission is automatic. In order to terminate completely, jobs must have specific exit values: ◆ If jobs are checkpointible, use their checkpoint exit value. ◆ If jobs periodically save data on their own, use the SIGTERM exit value.
PAGE 521
Running Parallel Jobs Displaying available slots for backfill jobs The bslots command displays slots reserved for parallel jobs and advance reservations. The available slots are not currently used for running jobs, and can be used for backfill jobs. The available slots displayed by bslots are only a snapshot of the slots currently not in use by parallel jobs or advance reservations. They are not guaranteed to be available at job submission.
PAGE 522
Parallel Fairshare No reserved slots available. bslots -n 15 -W 30 SLOTS RUNTIME 15 40 minutes 18 30 minutes Display available slots for backfill jobs requiring a host with more than 500 MB of memory: bslots -R "mem>500" SLOTS RUNTIME 7 45 minutes 15 40 minutes Display the host names with available slots for backfill jobs: bslots -l SLOTS: 15 RUNTIME: 40 minutes HOSTS: 1*hostB 1*hostE 3*hostC ... 3*hostZ ... ... SLOTS: 15 RUNTIME: 30 minutes HOSTS: 2*hostA 1*hostB 3*hostC ... 1*hostX ... ...
PAGE 523
Running Parallel Jobs Configure parallel fairshare To configure parallel fairshare so that the number of CPUs is considered when calculating dynamic priority for queue-level user-based fairshare: NOTE: LSB_NCPU_ENFORCE does not apply to host-partition user-based fairshare. For host-partition user-based fairshare, the number of CPUs is automatically considered. 1 2 Configure fairshare at the queue level as indicated in Fairshare Scheduling on page 295.
PAGE 524
Optimized Preemption of Parallel Jobs How optimized preemption works When you run many parallel jobs in your cluster, and parallel jobs preempt other parallel jobs, you can enable a feature to optimize the preemption mechanism among parallel jobs. By default, LSF can over-preempt parallel jobs. When a high-priority parallel job preempts multiple low-priority parallel jobs, sometimes LSF preempts more low-priority jobs than are necessary to release sufficient job slots to start the high-priority job.
PAGE 525
C H A P T E R 33 Submitting Jobs Using JSDL Contents ◆ Why Use JSDL? on page 525 ◆ Using JSDL Files with LSF on page 525 ◆ Collecting resource values using elim.jsdl on page 534 Why Use JSDL? The Job Submission Description Language (JSDL) provides a convenient format for describing job requirements. You can save a set of job requirements in a JSDL XML file, and then reuse that file as needed to submit jobs to LSF.
PAGE 526
Using JSDL Files with LSF Table 1: Supported JSDL and POSIX extension elements Element bsub Description Option Example JobDefinition N/A Root element of the JSDL document. Contains the mandatory child element JobDescription. ... JobDescription -P High-level container element that holds more specific job description elements. JobName -J String used to name the job.
PAGE 527
Submitting Jobs Using JSDL Element bsub Description Option Example ExclusiveExecution -x Boolean that designates whether the job must have exclusive access to the resources it uses. true OperatingSystemName -R A token type that contains the operating system name. LSF uses the external resource "osname." LINUX OperatingSystemVersion -R A token type that contains the operating system version.
PAGE 528
Using JSDL Files with LSF Element bsub Description Option Example IndividualNetworkBandwidth -R Range value that specifies the bandwidth requirements of each resource, in bits per second (bps). LSF uses the external resource "bandwidth." 104857600.0 TotalCPUCount -n Range value that specifies the total number of CPUs required for the job. 2.
PAGE 529
Submitting Jobs Using JSDL Element bsub Description Option Example URI -f Specifies the location used to stage in (Source) or stage out (Target) a file. For use with LSF, the URI must be a file path only, without a protocol. Target N/A Contains the location of the file or directory on the remote system. In LSF, the file location is specified by the URI element. The file is staged out after the job is executed. //input/myjobs/control.
PAGE 530
Using JSDL Files with LSF Element bsub Description Option Environment N/A and value of an environment variable /bin/bash defined for the job in the execution environment. LSF maps the JSDL element definitions to the matching LSF environment variables.
PAGE 531
Submitting Jobs Using JSDL Element bsub Description Option Example ProcessCountLimit -p Positive integer that 8 specifies the maximum number of processes the job can spawn. VirtualMemoryLimit -v Positive integer that 134217728 specifies the maximum amount of virtual memory the job can allocate, in bytes.
PAGE 532
Using JSDL Files with LSF Element bsub Option Description UserPriority -sp Positive integer that specifies the user-assigned job priority. This allows users to order their own jobs within a queue. ServiceClass -sla String that specifies the service class where the job is to run. Group -G String that associates the job with the specified group for fairshare scheduling. ExternalScheduler -ext [sched] String used to set application-specific external scheduling options for the job.
PAGE 533
Submitting Jobs Using JSDL Element bsub Option Description SignalJob -s String that specifies the signal to send when a queue-level run window closes. Use this to override the default signal that suspends jobs running in the queue. WarningAction -wa String that specifies the job action prior to the job control action. Requires that you also specify the job action warning time.
PAGE 534
Collecting resource values using elim.jsdl Submit a job using a JSDL file 1 To submit a job using a JSDL file, use one of the following bsub command options: a To submit a job that uses elements included in the LSF extension, use the -jsdl option. b To submit a job that uses only standard JSDL elements and POSIX extensions, use the -jsdl_strict option.
PAGE 535
Submitting Jobs Using JSDL 4 ◆ cpuarch ◆ cpuspeed ◆ bandwidth To propagate the changes through the LSF system, run the following commands. a lsadmin reconfig b badmin mbdrestart You have now configured LSF to use the elim.jsdl file to collect JSDL resources.
PAGE 536
Collecting resource values using elim.
PAGE 537
P A R T V Controlling Job Execution ◆ Runtime Resource Usage Limits on page 539 ◆ Load Thresholds on page 553 ◆ Pre-Execution and Post-Execution Commands on page 559 ◆ Job Starters on page 567 ◆ External Job Submission and Execution Controls on page 573 ◆ Configuring Job Controls on page 583 Administering Platform LSF 537
PAGE 538
Administering Platform LSF
PAGE 539
C H A P T E R 34 Runtime Resource Usage Limits Contents ◆ About Resource Usage Limits on page 539 ◆ Specifying Resource Usage Limits on page 543 ◆ Supported Resource Usage Limits and Syntax on page 545 ◆ CPU Time and Run Time Normalization on page 551 ◆ PAM resource limits on page 552 About Resource Usage Limits Resource usage limits control how much resource can be consumed by running jobs.
PAGE 540
About Resource Usage Limits Resource usage limits and resource allocation limits Resource usage limits are not the same as resource allocation limits, which are enforced during job scheduling and before jobs are dispatched. You set resource allocation limits to restrict the amount of a given resource that must be available during job scheduling for different classes of jobs to start, and which resource consumers the limits apply to. See Chapter 22, “Resource Allocation Limits” for more information.
PAGE 541
Runtime Resource Usage Limits If ... Then ... The default limit is not correct The default is ignored and the maximum limit is enforced The maximum is ignored and the resource has no maximum limit, only a default limit Both default and maximum limits are specified, and the maximum is not correct Both default and maximum limits are not correct The default and maximum are ignored and no limit is enforced Resource usage limits specified at job submission must be less than the maximum specified in lsb.
PAGE 542
About Resource Usage Limits The lmit unit specified by LSF_UNIT_FOR_LIMITS also applies to limts modified with bmod, and the display of resource usage limits in query commands (bacct, bapp, bhist, bhosts, bjobs, bqueues, lsload, and lshosts). IMPORTANT: Before changing the units of your resource usage limits, you should completely drain the cluster of all workload. There should be no running, pending, or finished jobs in the system.
PAGE 543
Runtime Resource Usage Limits Specifying Resource Usage Limits Queues can enforce resource usage limits on running jobs. LSF supports most of the limits that the underlying operating system supports. In addition, LSF also supports a few limits that the underlying operating system does not support. Specify queue-level resource usage limits using parameters in lsb.queues. Specifying queue-level resource usage limits Limits configured in lsb.queues apply to all jobs submitted to the queue.
PAGE 544
Specifying Resource Usage Limits Host specification with two limits ◆ PROCESSLIMIT ◆ RUNLIMIT ◆ THREADLIMIT If default and maximum limits are specified for CPU time limits or run time limits, only one host specification is permitted.
PAGE 545
Runtime Resource Usage Limits Specify job-level resource usage limits 1 To specify resource usage limits at the job level, use one of the following bsub options: ❖ -C core_limit ❖ -c cpu_limit ❖ -D data_limit ❖ -F file_limit ❖ -M mem_limit ❖ -p process_limit ❖ -W run_limit ❖ -S stack_limit ❖ -T thread_limit ❖ -v swap_limit Job-level resource usage limits specified at job submission override the queue definitions.
PAGE 546
Supported Resource Usage Limits and Syntax When the job accumulates the specified amount of CPU time, a SIGXCPU signal is sent to all processes belonging to the job. If the job has no signal handler for SIGXCPU, the job is killed immediately. If the SIGXCPU signal is handled, blocked, or ignored by the application, then after the grace period expires, LSF sends SIGINT, SIGTERM, and SIGKILL to the job to kill it.
PAGE 547
Runtime Resource Usage Limits On AIX, if the XPG_SUS_ENV=ON environment variable is set in the user's environment before the process is executed and a process attempts to set the limit lower than current usage, the operation fails with errno set to EINVAL. If the XPG_SUS_ENV environment variable is not set, the operation fails with errno set to EFAULT. The default is no soft limit. File size limit Job syntax (bsub) Queue syntax (lsb.
PAGE 548
Supported Resource Usage Limits and Syntax OS memory limit enforcement OS enforcement usually allows the process to eventually run to completion. LSF passes mem_limit to the OS, which uses it as a guide for the system scheduler and memory allocator. The system may allocate more memory to a process if there is a surplus. When memory is low, the system takes memory from and lowers the scheduling priority (re-nice) of a process that has exceeded its declared mem_limit.
PAGE 549
Runtime Resource Usage Limits Normalized run time The run time limit is normalized according to the CPU factor of the submission host and execution host. The run limit is scaled so that the job has approximately the same run time for a given run limit, even if it is sent to a host with a faster or slower CPU.
PAGE 550
Examples Sets a per-process (soft) stack segment size limit for all of the processes belonging to a job. By default, the limit is specified in KB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for the the limit (MB, GB, TB, PB, or EB). An sbrk() call to extend the stack segment beyond the stack limit causes the process to be terminated. The default is no soft limit. Virtual memory (swap) limit Job syntax (bsub) Queue syntax (lsb.
PAGE 551
Runtime Resource Usage Limits Job-level limits bsub -M 5000 myjob Submits myjob with a memory limit of 5000 KB. bsub -W 14 myjob myjob is expected to run for 14 minutes. If the run limit specified with bsub -W exceeds the value for the queue, the job is rejected. bsub -T 4 myjob Submits myjob with a maximum number of concurrent threads of 4.
PAGE 552
PAM resource limits 2 DEFAULT_HOST_SPEC is configured in lsb.params 3 If DEFAULT_HOST_SPEC is not configured in lsb.queues or lsb.params, host with the largest CPU factor is used. CPU time display (bacct, bhist, bqueues) Normalized CPU time is displayed in the output of bqueues. CPU time is not normalized in the output if bacct and bhist. PAM resource limits PAM limits are system resource limits defined in limits.conf. ◆ Windows: Not applicable ◆ Linux: /etc/pam.
PAGE 553
C H A P T E R 35 Load Thresholds Contents ◆ Automatic Job Suspension on page 553 ◆ Suspending Conditions on page 554 Automatic Job Suspension Jobs running under LSF can be suspended based on the load conditions on the execution hosts. Each host and each queue can be configured with a set of suspending conditions. If the load conditions on an execution host exceed either the corresponding host or queue suspending conditions, one or more jobs running on that host are suspended to reduce the load.
PAGE 554
Suspending Conditions When jobs are running on a host, LSF periodically checks the load levels on that host. If any load index exceeds the corresponding per-host or per-queue suspending threshold for a job, LSF suspends the job. The job remains suspended until the load levels satisfy the scheduling thresholds. At regular intervals, LSF gets the load levels for that host. The period is defined by the SBD_SLEEP_TIME parameter in the lsb.params file.
PAGE 555
Load Thresholds The load indices most commonly used for suspending conditions are the CPU run queue lengths (r15s, r1m, and r15m), paging rate (pg), and idle time (it). The (swp) and (tmp) indices are also considered for suspending jobs. To give priority to interactive users, set the suspending threshold on the it (idle time) load index to a non-zero value. Jobs are stopped when any user is active, and resumed when the host has been idle for the time given in the it scheduling condition.
PAGE 556
Suspending Conditions ◆ Configure load thresholds consistently across queues. If a low priority queue has higher suspension thresholds than a high priority queue, then jobs in the higher priority queue are suspended before jobs in the low priority queue. Configuring load thresholds at host level A shared resource cannot be used as a load threshold in the Hosts section of the lsf.cluster.cluster_name file.
PAGE 557
Load Thresholds When LSF automatically resumes a job, it invokes the RESUME action. The default action for RESUME is to send the signal SIGCONT. If there are any suspended jobs on a host, LSF checks the load levels in each dispatch turn. If the load levels are within the scheduling thresholds for the queue and the host, and all the resume conditions for the queue (RESUME_COND in lsb.queues) are satisfied, the job is resumed.
PAGE 558
Suspending Conditions 558 Administering Platform LSF
PAGE 559
C H A P T E R 36 Pre-Execution and Post-Execution Commands Jobs can be submitted with optional pre- and post-execution commands. A pre- or post-execution command is an arbitrary command to run before the job starts or after the job finishes. Pre- and post-execution commands are executed in a separate environment from the job.
PAGE 560
About Pre-Execution and Post-Execution Commands About Pre-Execution and Post-Execution Commands Each batch job can be submitted with optional pre- and post-execution commands. Pre- and post-execution commands can be any executable command lines to be run before a job is started or after a job finishes. Some batch jobs require resources that LSF does not directly support.
PAGE 561
Pre-Execution and Post-Execution Commands Post-execution commands If a post-execution command is specified, then the command is run after the job is finished regardless of the exit state of the job. Post-execution commands are typically used to clean up some state left by the pre-execution and the job execution. LSF supports job-level, queue-level, and application-level (lsb.applications) post-execution.
PAGE 562
Configuring Pre- and Post-Execution Commands Configuring Pre- and Post-Execution Commands Pre-execution commands can be configured at the job level, in queues, or in application profiles. Post-execution commands can be configured at the job level, in queues or in application profiles. Job-level commands Job-level pre-execution and post-execution commands require no configuration. Use the bsub -E option to specify an arbitrary command to run before the job starts.
PAGE 563
Pre-Execution and Post-Execution Commands Example ◆ If both queue and job-level pre-execution commands are specified, the job-level pre-execution is run after the queue-level pre-execution command. ◆ If both application-level and job-level post-execution commands are specified, job level post-execution overrides application-level post-execution commands.
PAGE 564
Configuring Pre- and Post-Execution Commands ◆ Example If the pre-execution command exits with a non-zero exit code, it is considered to have failed, and the job is requeued to the head of the queue. Use this feature to implement customized scheduling by having the pre-execution command fail if conditions for dispatching the job are not met.
PAGE 565
Pre-Execution and Post-Execution Commands Setting a pre- and post-execution user ID By default, both the pre- and post-execution commands are run as the job submission user. Use the LSB_PRE_POST_EXEC_USER parameter in lsf.sudoers to specify a different user ID for queue-level and application-level pre- and post-execution commands.
PAGE 566
Configuring Pre- and Post-Execution Commands Rerunnable jobs may rerun after they have actually finished because the host became unavailable before post-execution processing finished, but the mbatchd considers the job still in RUN state. Job preemption is delayed until post-execution processing is finished. Post-execution on SGI cpusets Post-execution processing on SGI cpusets behave differently from previous releases. If JOB_INCLUDE_POSTPROC=Y is specified in lsb.applications or cluster wide in lsb.
PAGE 567
C H A P T E R 37 Job Starters A job starter is a specified shell script or executable program that sets up the environment for a job and then runs the job. The job starter and the job share the same environment. This chapter discusses two ways of running job starters in LSF and how to set up and use them.
PAGE 568
Command-Level Job Starters Queue-level Defined by the LSF administrator, and run batch jobs submitted to a queue defined with the JOB_STARTER parameter set. Use bsub to submit jobs to queues with job-level job starters. A queue-level job starter is configured in the queue definition in lsb.queues. See Queue-Level Job Starters on page 570 for detailed information. Pre-execution commands are not job starters A job starter differs from a pre-execution command.
PAGE 569
Job Starters LSF_JOB_STARTER environment variable Use the LSF_JOB_STARTER environment variable to specify a command or script that is the job starter for the interactive job. When the environment variable LSF_JOB_STARTER is defined, RES invokes the job starter rather than running the job itself, and passes the job to the job starter as a command-line argument.
PAGE 570
Queue-Level Job Starters Queue-Level Job Starters LSF administrators can define a job starter for an individual queue to create a specific environment for jobs to run in. A queue-level job starter specifies an executable that performs any necessary setup, and then runs the job when the setup is complete. The JOB_STARTER parameter in lsb.queues specifies the command or script that is the job starter for the queue. This section describes how to set up and use a queue-level job starter.
PAGE 571
Job Starters %USRCMD string The special string %USRCMD indicates the position of the job starter command in the job command line. By default, the user commands run after the job starter, so the %USRCMD string is not usually required. For example, these two job starters both give the same results: JOB_STARTER = /bin/csh -c JOB_STARTER = /bin/csh -c "%USRCMD" You must enclose the %USRCMD string in quotes. The %USRCMD string can be followed by additional commands.
PAGE 572
Controlling Execution Environment Using Job Starters ◆ SHELL ◆ LOGNAME Any additional environment variables that exist in the user’s login environment on the submission host must be added to the job starter source code. Example A user’s .login script on the submission host contains the following setting: if ($TERM != "xterm") then set TERM=`tset - -Q -m 'switch:?vt100' ....
PAGE 573
C H A P T E R 38 External Job Submission and Execution Controls This document describes the use of external job submission and execution controls called esub and eexec. These site-specific user-written executables are used to validate, modify, and reject job submissions, pass data to and modify job execution environments.
PAGE 574
Using esub Interactive remote execution Interactive remote execution also runs esub and eexec if they are found in LSF_SERVERDIR. For example, lsrun invokes esub, and RES runs eexec before starting the task. esub is invoked at the time of the ls_connect(3) call, and RES invokes eexec each time a remote task is executed. RES runs eexec only at task startup time. DCE credentials and AFS tokens esub and eexec are also used for processing DCE credentials and AFS tokens.
PAGE 575
External Job Submission and Execution Controls Option Description LSB_SUB_ADDITIONAL String format parameter containing the value of the -a option to bsub LSB_SUB_BEGIN_TIME LSB_SUB_CHKPNT_DIR LSB_SUB_COMMAND_LINE LSB_SUB_CHKPNT_PERIOD LSB_SUB_DEPEND_COND LSB_SUB_ERR_FILE LSB_SUB_EXCEPTION LSB_SUB_EXCLUSIVE LSB_SUB_EXTSCHED_PARAM LSB_SUB_HOLD LSB_SUB_HOSTS LSB_SUB_HOST_SPEC LSB_SUB_IN_FILE LSB_SUB_INTERACTIVE LSB_SUB_LOGIN_SHELL LSB_SUB_JOB_NAME LSB_SUB_JOB_WARNING_ACTION LSB_SUB_JOB_ACTION_WARNING_TIM
PAGE 576
Using esub Option Description LSB_SUB_PROJECT_NAME LSB_SUB_PTY LSB_SUB_PTY_SHELL LSB_SUB_QUEUE LSB_SUB_RERUNNABLE Project name "Y" specifies an interactive job with PTY support "Y" specifies an interactive job with PTY shell support Submission queue name "Y" specifies a rerunnable job "N" specifies a nonrerunnable job (specified with bsub -rn). The job is not rerunnable even it was submitted to a rerunable queue or application profile For bmod -rn, the value is SUB_RESET.
PAGE 577
External Job Submission and Execution Controls Option Description LSB_SUB3_RUNTIME_ESTIMATION LSB_SUB3_USER_SHELL_LIMITS Runtime estimate spedified by bsub -We Pass user shell limits to execution host. Spedified by bsub -ul.
PAGE 578
Using esub 1 Is the esub exit value LSB_SUB_ABORT_VALUE? a Yes, step 2 b No, step 4 2 Reject the job 3 Go to step 5 4 Does LSB_SUB_MODIFY_FILE or LSB_SUB_MODIFY_ENVFILE exist? ❖ 5 Apply changes Done Rejecting jobs Depending on your policies you may choose to reject a job. To do so, have esub exit with LSB_SUB_ABORT_VALUE. If esub rejects the job, it should not write to either LSB_SUB_MODIFY_FILE or LSB_SUB_MODIFY_ENVFILE.
PAGE 579
External Job Submission and Execution Controls if [ $LSB_SUB_PROJECT_NAME = "proj1" ]; then # Only user1 and user2 can charge to proj1 if [$USER != "user1" -a $USER != "user2" ]; then echo "You are not allowed to charge to this project" exit $LSB_SUB_ABORT_VALUE fi fi Modifying job submission parameters esub can be used to modify submission parameters and the job environment before the job is actually submitted.
PAGE 580
Using esub Use multiple esub (mesub) LSF provides a master esub (LSF_SERVERDIR/mesub) to handle the invocation of individual application-specific esub executables and the job submission requirements of your applications. 1 Use the -a option of bsub to specify the application you are running through LSF. For example, to submit a FLUENT job: bsub -a fluent bsub_options fluent_command The method name fluent, uses the esub for FLUENT jobs (LSF_SERVERDIR/esub.
PAGE 581
External Job Submission and Execution Controls Example In this example: ◆ esub.dce is defined as the only mandatory esub ◆ An executable named esub already exists in LSF_SERVERDIR ◆ Executables named esub.fluent and esub.license exist in LSF_SERVERDIR ◆ bsub -a fluent license submits the job as a FLUENT job, and mesub invokes the following esub executables in LSF_SERVERDIR in this order: ◆ ❖ esub.dce ❖ esub ❖ esub.fluent ❖ esub.
PAGE 582
Existing esub The name of the esub program must be a valid file name. It can contain only alphanumeric characters, underscore (_) and hyphen (-). CAUTION: The file name esub.user is reserved for backward compatibility. Do not use the name esub.user for your application-specific esub. Existing esub Your existing esub does not need to follow this convention and does not need to be renamed.
PAGE 583
C H A P T E R 39 Configuring Job Controls After a job is started, it can be killed, suspended, or resumed by the system, an LSF user, or LSF administrator. LSF job control actions cause the status of a job to change. This chapter describes how to configure job control actions to override or augment the default job control actions.
PAGE 584
Default Job Control Actions ◆ SIGTSTP for parallel or interactive jobs. SIGTSTP is caught by the master process and passed to all the slave processes running on other hosts. ◆ SIGSTOP for sequential jobs. SIGSTOP cannot be caught by user programs. The SIGSTOP signal can be configured with the LSB_SIGSTOP parameter in lsf.conf.
PAGE 585
Configuring Job Controls Windows job control actions On Windows, actions equivalent to the UNIX signals have been implemented to do the default job control actions. Job control messages replace the SIGINT and SIGTERM signals, but only customized applications will be able to process them. Termination is implemented by the TerminateProcess() system call. See Platform LSF Programmer’s Guide for more information about LSF signal handling on Windows.
PAGE 586
Configuring Job Control Actions CHKPNT command Checkpoint the job. Only valid for SUSPEND and TERMINATE actions. ◆ If the SUSPEND action is CHKPNT, the job is checkpointed and then stopped by sending the SIGSTOP signal to the job automatically. ◆ If the TERMINATE action is CHKPNT, then the job is checkpointed and killed automatically. A /bin/sh command line. ◆ Do not quote the command line inside an action definition.
PAGE 587
Configuring Job Controls TERMINATE job actions Use caution when configuring TERMINATE job actions that do more than just kill a job. For example, resource usage limits that terminate jobs change the job state to SSUSP while LSF waits for the job to end. If the job is not killed by the TERMINATE action, it remains suspended indefinitely. TERMINATE_WHEN parameter (lsb.queues) In certain situations you may want to terminate the job instead of calling the default SUSPEND action.
PAGE 588
Customizing Cross-Platform Signal Conversion Customizing Cross-Platform Signal Conversion LSF supports signal conversion between UNIX and Windows for remote interactive execution through RES. On Windows, the CTRL+C and CTRL+BREAK key combinations are treated as signals for console applications (these signals are also called console control actions). LSF supports these two Windows console signals for remote interactive execution. LSF regenerates these signals for user tasks on the execution host.
PAGE 589
P A R T VI Interactive Jobs ◆ Interactive Jobs with bsub on page 591 ◆ Running Interactive and Remote Tasks on page 603 Administering Platform LSF 589
PAGE 590
Administering Platform LSF
PAGE 591
C H A P T E R 40 Interactive Jobs with bsub Contents ◆ About Interactive Jobs on page 591 ◆ Submitting Interactive Jobs on page 592 ◆ Performance Tuning for Interactive Batch Jobs on page 594 ◆ Interactive Batch Job Messaging on page 597 ◆ Running X Applications with bsub on page 598 ◆ Writing Job Scripts on page 598 ◆ Registering utmp File Entries for Interactive Batch Jobs on page 601 About Interactive Jobs It is sometimes desirable from a system management point of view to control a
PAGE 592
Submitting Interactive Jobs Interactive queues You can configure a queue to be interactive-only, batch-only, or both interactive and batch with the parameter INTERACTIVE in lsb.queues. See the Platform LSF Configuration Reference for information about configuring interactive queues in the lsb.queues file. Interactive jobs with non-batch utilities Non-batch utilities such as lsrun, lsgrun, etc., use LIM simple placement advice for host selection when running interactive tasks.
PAGE 593
Interactive Jobs with bsub When an interactive job is submitted, a message is displayed while the job is awaiting scheduling. The bsub command stops display of output from the shell until the job completes, and no mail is sent to the user by default. A user can issue a ctrl-c at any time to terminate the job. Interactive jobs cannot be checkpointed. Interactive batch jobs cannot be rerunnable (bsub -r) You can submit interactive batch jobs to rerunnable queues (RERUNNABLE=y in lsb.
PAGE 594
Performance Tuning for Interactive Batch Jobs Split stdout and stderr If in your environment there is a wrapper around bsub and LSF commands so that end-users are unaware of LSF and LSF-specific options, you can redirect standard output and standard error of batch interactive jobs to a file with the > operator. By default, both standard error messages and output messages for batch interactive jobs are written to stdout on the submission host.
PAGE 595
Interactive Jobs with bsub At the queue level, suspending conditions are defined as STOP_COND as described in lsb.queues or as suspending load threshold. At the host level, suspending conditions are defined as stop load threshold as described in lsb.hosts. Resuming conditions These conditions determine when a suspended job can be resumed. When these conditions are met, a RESUME action is performed on a suspended job. At the queue level, resume conditions are defined as by RESUME_COND in lsb.
PAGE 596
Performance Tuning for Interactive Batch Jobs The it index is only non-zero if no interactive users are active. Setting the it threshold to five minutes allows a reasonable amount of think time for interactive users, while making the machine available for load sharing, if the users are logged in but absent. For lower priority batch queues, it is appropriate to set an it suspending threshold of two minutes and scheduling threshold of ten minutes in the lsb.queues file.
PAGE 597
Interactive Jobs with bsub Interactive Batch Job Messaging LSF can display messages to stderr or the Windows console when the following changes occur with interactive batch jobs: ◆ Job state ◆ Pending reason ◆ Suspend reason Other job status changes, like switching the job’s queue, are not displayed. Limitations Interactive batch job messaging is not supported in a MultiCluster environment. Windows Interactive batch job messaging is not fully supported on Windows.
PAGE 598
Running X Applications with bsub << << << Job terminated by user Just started a job recently: 1 host; >> Load information unavailable: 1 host; >> Job's resource requirements not satisfied: 1 host; >> The following example shows messages displayed when a job in pending state is terminated by the user: bsub -m hostA -b 13:00 -Is sh Job <2015> is submitted to default queue . Job will be scheduled after Fri Nov 19 13:00:00 1999 <
PAGE 599
Interactive Jobs with bsub Writing a job file one line at a time UNIX example % bsub -q simulation bsub> cd /work/data/myhomedir bsub> myjob arg1 arg2 ...... bsub> rm myjob.log bsub> ^D Job <1234> submitted to queue . In the above example, the 3 command lines run as a Bourne shell (/bin/sh) script. Only valid Bourne shell command lines are acceptable in this case. Windows example C:\> bsub -q simulation bsub> cd \\server\data\myhomedir bsub> myjob arg1 arg2 ...... bsub> del myjob.
PAGE 600
Writing Job Scripts In this case the command line myscript is spooled, instead of the contents of the myscript file. Later modifications to the myscript file can affect job behavior.
PAGE 601
Interactive Jobs with bsub If running jobs under a particular shell is required frequently, you can specify an alternate shell using a command-level job starter and run your jobs interactively. See Controlling Execution Environment Using Job Starters on page 571 for more details. Registering utmp File Entries for Interactive Batch Jobs LSF administrators can configure the cluster to track user and account information for interactive batch jobs submitted with bsub -Ip or bsub -Is.
PAGE 602
Registering utmp File Entries for Interactive Batch Jobs 602 Administering Platform LSF
PAGE 603
C H A P T E R 41 Running Interactive and Remote Tasks This chapter provides instructions for running tasks interactively and remotely with non-batch utilities such as lsrun, lsgrun, and lslogin. Contents ◆ Running Remote Tasks on page 603 ◆ Interactive Tasks on page 606 ◆ Load Sharing Interactive Sessions on page 608 ◆ Load Sharing X Applications on page 608 Running Remote Tasks lsrun is a non-batch utility to run tasks on a remote host.
PAGE 604
Running Remote Tasks ◆ Run tasks on hosts specified by a file on page 606 Run a task on the best available host 1 To run mytask on the best available host, enter: lsrun mytask LSF automatically selects a host of the same type as the local host, if one is available. By default the host with the lowest CPU and memory load is selected.
PAGE 605
Running Interactive and Remote Tasks Run a task on a specific host 1 If you want to run your task on a particular host, use the lsrun -m option: lsrun -m hostD mytask Run a task by using a pseudo-terminal Submission of interaction jobs using pseudo-terminal is not supported for Windows for either lsrun or bsub LSF commands. Some tasks, such as text editors, require special terminal handling. These tasks must be run using a pseudo-terminal so that special terminal handling can be used over the network.
PAGE 606
Interactive Tasks Run tasks on hosts specified by a file lsgrun -f host_file 1 The lsgrun -f host_file option reads the host_file file to get a list of hosts on which to run the task. Interactive Tasks LSF supports transparent execution of tasks on all server hosts in the cluster. You can run your program on the best available host and interact with it just as if it were running directly on your workstation. Keyboard signals such as CTRL-Z and CTRL-C work as expected.
PAGE 607
Running Interactive and Remote Tasks Interactive processing and scheduling policies LSF lets you run interactive tasks on any computer on the network, using your own terminal or workstation. Interactive tasks run immediately and normally require some input through a text-based or graphical user interface. All the input and output is transparently sent between the local host and the job execution host.
PAGE 608
Load Sharing Interactive Sessions The result of the above example is for stderr to be redirected to mystderr, and stdout to mystdout. Without LSF_INTERACTIVE_STDERR set, both stderr and stdout will be redirected to mystdout. See the Platform LSF Configuration Reference for more details on LSF_INTERACTIVE_STDERR. Load Sharing Interactive Sessions There are different ways to use LSF to start an interactive session on the best available host.
PAGE 609
Running Interactive and Remote Tasks xterm on a PC Each X application makes a separate network connection to the X display on the user's desktop. The application generally gets the information about the display from the DISPLAY environment variable. X-based systems such as eXceed start applications by making a remote shell connection to the UNIX server, setting the DISPLAY environment variable, and then invoking the X application.
PAGE 610
Load Sharing X Applications Start an xterm in Exceed To start an xterm: 1 Double-click the Best icon. An xterm starts on the least loaded host in the cluster and is displayed on your screen. Examples Running any application on the least loaded host To run appY on the best machine licensed for it, you could set the command line in Exceed to be the following and set the description to appY: lsrun -R "type==any && appY order[mem:cpu]" sh -c "appY -display your_PC:0.
PAGE 611
P A R T VII Monitoring Your Cluster ◆ Achieving Performance and Scalability on page 613 ◆ Reporting on page 627 ◆ Event Generation on page 655 ◆ Tuning the Cluster on page 659 ◆ Authentication and Authorization on page 673 ◆ Job Email and Job File Spooling on page 681 ◆ Non-Shared File Systems on page 687 ◆ Error and Event Logging on page 693 ◆ Troubleshooting and Error Messages on page 703 ◆ Understanding Platform LSF Job Exit Information on page 725 Administering Platform LSF 611
PAGE 612
Administering Platform LSF
PAGE 613
C H A P T E R 42 Achieving Performance and Scalability Contents ◆ Optimizing Performance in Large Sites on page 613 ◆ Tuning UNIX for Large Clusters on page 614 ◆ Tuning LSF for Large Clusters on page 615 ◆ Monitoring Performance Metrics in Real Time on page 623 Optimizing Performance in Large Sites As your site grows, you must tune your LSF cluster to support a large number of hosts and an increased workload.
PAGE 614
Tuning UNIX for Large Clusters ◆ Load update intervals are scaled automatically The following graph shows the improvement in LIM startup after the LSF performance enhancements: Y axis: # of hosts x axis: Time in seconds Tuning UNIX for Large Clusters The following hardware and software specifications are requirements for a large cluster that supports 5,000 hosts and 100,000 jobs at any one time.
PAGE 615
Achieving Performance and Scalability Increase the file descriptor limit 1 To achieve efficiency of performance in LSF, follow the instructions in your operating system documentation to increase the number of file descriptors on the LSF master host. TIP: To optimize your configuration, set your file descriptor limit to a value at least as high as the number of hosts in your cluster. The following is an example configuration.
PAGE 616
Tuning LSF for Large Clusters Some operating systems, such as Linux and AIX, let you increase the number of file descriptors that can be allocated on the master host. You do not need to limit the number of file descriptors to 1024 if you want fast job dispatching. To take advantage of the greater number of file descriptors, you must set LSB_MAX_JOB_DISPATCH_PER_SESSION to a value greater than 300. Set LSB_MAX_JOB_DISPATCH_PER_SESSION to one-half the value of MAX_SBD_CONNS.
PAGE 617
Achieving Performance and Scalability Enable continuous scheduling 1 To enable the scheduler to run continuously, define the parameter JOB_SCHEDULING_INTERVAL=0 in lsb.params. Limiting the number of batch queries In large clusters, job querying can grow very quickly. If your site sees a lot of high traffic job querying, you can tune LSF to limit the number of job queries that mbatchd can handle. This helps decrease the load on the master host.
PAGE 618
Tuning LSF for Large Clusters When you define this parameter, mbatchd periodically obtains the host status from the master LIM, and then verifies the status by polling each sbatchd at an interval defined by the parameters MBD_SLEEP_TIME and LSB_MAX_PROBE_SBD. Managing your user’s ability to move jobs in a queue JOB_POSITION_CONTROL_BY_ADMIN=Y allows an LSF administrator to control whether users can use btop and bbot to move jobs to the top and bottom of queues.
PAGE 619
Achieving Performance and Scalability This instructs mbatchd to check if the events file has logged 1000 batch job completions every two hours. The two parameters can control the frequency of the events file switching as follows: ◆ After two hours, mbatchd checks the number of completed batch jobs.
PAGE 620
Tuning LSF for Large Clusters Run badmin reconfig to create and use the subdirectories. Duplicate event logging NOTE: If you enabled duplicate event logging, you must run badmin mbdrestart instead of badming reconfig to restart mbatchd. Run bparams -l to display the value of the MAX_INFO_DIRS parameter. Example MAX_INFO_DIRS=10 mbatchd creates ten subdirectories from LSB_SHAREDIR/cluster_name/logdir/info/0 to LSB_SHAREDIR/cluster_name/logdir/info/9.
PAGE 621
Achieving Performance and Scalability Processor, core, and By default, the number of CPUs on a host represents the number of physical processors a machine has. For LSF hosts with multiple cores, threads, and thread CPU load processors, ncpus can be defined by the cluster administrator to consider one of the balancing following: ◆ Processors Processors and cores Processors, cores, and threads Globally, this definition is controlled by the parameter EGO_DEFINE_NCPUS in lsf.conf or ego.conf.
PAGE 622
Tuning LSF for Large Clusters badmin hrestart restarts a new sbatchd. If a job process has already been bound to a processor, after sbatchd is restarted, processor binding for the job processes are restored. ◆ badmin reconfig If the BIND_JOB parameter is modified in an application profile, badmin reconfig only affects pending jobs. The change does not affect running jobs.
PAGE 623
Achieving Performance and Scalability Increase the job ID display length By default, bjobs and bhist display job IDs with a maximum length of 7 characters. Job IDs greater than 9999999 are truncated on the left. Use LSB_JOBID_DISP_LENGTH in lsf.conf to increase the width of the JOBID column in bjobs and bhist display. When LSB_JOBID_DISP_LENGTH=10, the width of the JOBID column in bjobs and bhist increases to 10 characters.
PAGE 624
Monitoring Performance Metrics in Real Time ◆ The number of jobs dispatched ◆ The number of jobs completed ◆ The numbers of jobs sent to remote cluster ◆ The numbers of jobs accepted by from cluster badmin perfmon view Performance monitor start time: Fri Jan 19 15:07:54 End time of last sample period: Fri Jan 19 15:25:55 Sample period : 60 Seconds -----------------------------------------------------------------Metrics Last Max Min Avg Total ----------------------------------------------
PAGE 625
Achieving Performance and Scalability Last Period Last sampling value of metric. It is calculated per sampling period. It is represented as the metric value per period, and normalized by the following formula. Max Maximum sampling value of metric. It is re-evaluated in each sampling period by comparing Max and Last Period. It is represented as the metric value per period. Min Minimum sampling value of metric. It is re-evaluated in each sampling period by comparing Min and Last Period.
PAGE 626
Monitoring Performance Metrics in Real Time Job requeue Requeued jobs may be dispatched, run, and exit due to some special errors again and again. The job data always exists in the memory, so LSF only counts one job submission request and one job submitted, and counts more than one job dispatched. For jobs completed, if a job is requeued with brequeue, LSF counts two jobs completed, since requeuing a job first kills the job and later puts the job into pending list.
PAGE 627
C H A P T E R 43 Reporting Reporting is a feature of Platform LSF. It allows you to look at the overall statistics of your entire cluster. You can analyze the history of hosts, resources, and workload in your cluster to get an overall picture of your cluster’s performance.
PAGE 628
Getting Started with Standard Reports Standard and custom reports Platform has provided a set of standard reports to allow you to immediately analyze your cluster without having to create any new reports. These standard reports provide the most common and useful data to analyze your cluster. You may also create custom reports to perform advanced queries and reports beyond the data produced in the standard reports.
PAGE 629
Reporting Name Description Category Hourly Desktop Job Throughput Number of downloaded and completed jobs for each LSF Desktop MED host or the entire cluster. You can only produce this report if you use LSF Desktop. Desktop Utilization Desktop utilization at each MED host or the entire cluster. You can only produce this report if you use LSF Desktop. License Usage The license usage under License Scheduler. You can only LSF License produce this report if you use LSF License Scheduler.
PAGE 630
Custom Reports After a short time, the resulting data is displayed graphically. When you close the report window, you lose the contents of the report unless you export it first. Export report data Once you produce a report, exporting is the best way to save the data for future use. You cannot produce the same report at a later date if the data has expired from the database. 1 In the Console, produce and view your report. 2 Click Export Report Data.
PAGE 631
Reporting Using reports You produce custom reports and export the data in the same way as standard reports. Data expires from the database periodically, so producing a report at a later date may return different data, or return no output at all. After you produce a report, you can keep your results by exporting the report data as comma-separated values in a CSV file. In this way you can preserve your data outside the system and integrate it with external programs, such as a spreadsheet.
PAGE 632
Custom Reports 3 Define the report properties and query string as desired. a In the Report properties section, specify the report name, summary, description, and category. b In the Report query section, input your SQL query string. For further information on the data schema, refer to Platform LSF Reports Data Schema in the Platform LSF Knowledge Center. c To validate your SQL query string and ensure that your report delivers the appropriate results, click Produce Report.
PAGE 633
Reporting Export report data Once you produce a report, exporting is the best way to save the data for future use. You cannot produce the same report at a later date if the data has expired from the database. 1 In the Console, produce and view your report. 2 Click Export Report Data. 3 In the browser dialog, specify the output path and name the exported file. In the Save as type field, specify "CSV". Delete a custom report 1 In the Console, navigate to Reports then Custom Reports.
PAGE 634
System Description System Description The reporting feature is built on top of the Platform Enterprise Reporting Framework (PERF) architecture. This architecture defines the communication between your EGO cluster, relational database, and data sources via the PERF Loader Controller (PLC). The loader controller is the module that controls multiple loaders for data collection.
PAGE 635
Reporting Sample A data sampling loader does not have full control over what data is gathered and needs to send a request to the data sources. The data sources send the requested system status information back to the data loader.
PAGE 636
Reports Administration Table 5: EGO data loaders Data loader name Data type Data gathering interval Data loads to Loader type Consumer resource (egoconsumerresloade r) resource allocation 5 minutes CONSUMER_DEMAND polling CONSUMER_RESOURCE_ALLOCATI ON CONSUMER_RESOURCELIST sample Dynamic metric host-related (egodynamicresloader) dynamic metric 5 minutes RESOURCE_METRICS polling RESOURCES_RESOURCE_METRICS sample EGO allocation events (egoeventsloader) resource allocation 5 minutes ALLOCATIO
PAGE 637
Reporting Directory name Directory description Default file path $PERF_LOGDIR Log files LSF_TOP/log/perf $PERF_WORKDIR Working directory LSF_TOP/log/perf $PERF_DATADIR Data directory LSF_TOP/work/cluster_name/perf/data Table 7: LSF reporting directory environment variables in Windows Directory name Directory description Default file path %PERF_TOP% Reports framework directory LSF_TOP\perf %PERF_CONFDIR% Configuration files LSF_TOP\conf\perf\cluster_name\conf %PERF_LOGDIR% Log files LS
PAGE 638
Reports Administration ◆ UNIX: LSF_CONFDIR/ego/cluster_name/eservice/esc/conf/services ◆ Windows: LSF_CONFDIR\ego\cluster_name\eservice\esc\conf\ services Loader controller The loader controller manages the data loaders.
PAGE 639
Reporting ◆ Windows: LSF_CONFDIR\ego\cluster_name\eservice\esc\conf\ services Data purger The relational database needs to be kept to a reasonable size to maintain optimal efficiency. The data purger manages the database size by purging old data at regular intervals. By default, the data purger purges records older than 14 days at 12:30am every day. To reschedule the purging of old data, you can change the purger schedule, as described in Change the data purger schedule on page 644.
PAGE 640
Reports Administration file path as described in Change the location of the LSF event data files on page 643. Change the disk usage or file path of your EGO allocation event data files as described in Change the disk usage of EGO allocation event data files on page 644. You can manage your event data files by editing the system configuration files. Edit ego.conf for the EGO allocation event data file configuration and lsb.params for the LSF event data file configuration.
PAGE 641
Reporting Stop or restart the derbydb (if you are using the Derby demo database), jobdt, plc, and purger services. If your cluster does not have PERF controlled by EGO, you use the perfadmin command to stop or restart these services. 1 In the command console, stop the service by running perfadmin stop. perfadmin stop 2 service_name If you want to restart the service, run perfadmin start.
PAGE 642
Reports Administration Dynamically change the log level of your loader controller log file Use the loader controller client tool to dynamically change the log level of your plc log file if it does not cover enough detail, or covers too much, to suit your needs. If you restart the plc service, the log level of your plc log file will be set back to the default level. To retain your new log level, change the level of your plc log file as described in Change the log level of your log files on page 642.
PAGE 643
Reporting For example, to change the log level of the data purger log files, navigate to the following section, which is set to the default INFO level: # Data purger ("purger") configuration log4j.logger.com.platform.perf.purger=INFO, com.platform.perf.purger 3 Change the log4j.logger.com.platform.perf. variable to the new logging level. In decreasing level of detail, the valid values are ALL (for all messages), TRACE, DEBUG, INFO, WARN, ERROR, FATAL, and OFF (for no messages).
PAGE 644
Reports Administration 3 Restart the plc service on the master host to activate this change. Change the disk usage of EGO allocation event data files Prerequisites: Your cluster must be EGO-enabled. If your system logs a large number of events, increase the disk space allocated to the EGO allocation event data files. If your disk space is insufficient, decrease the space allocated to the EGO allocation event data files or move these files to another location. 1 Edit ego.conf.
PAGE 645
Reporting 3 Change the -t parameter in the data purger script to the new time (-t new_time). You can change the data purger schedule to a specific daily time, or at regular time intervals, in minutes, from when the purger service first starts up. For example, to change the schedule of the data purger: ◆ To delete old data at 11:15pm every day: ...purger... -t 23:15 ◆ To delete old data every 12 hours from when the purger service first starts up: ...purger...
PAGE 646
Reports Administration ◆ To convert job data every fifteen minutes from when the jobdt service first starts up: ...jobdt... -t *:*[15] ◆ To convert job data every two hours from when the jobdt service first starts up: ...jobdt... -t *[2] 4 In the command console, restart EGO on the master host to activate these changes. egosh ego restart 5 master_host_name Restart the jobdt service.

PAGE 647

Reporting The purger configuration files are located in the purger subdirectory of the reports configuration directory: 2 ◆ UNIX: $PERF_CONFDIR/purger ◆ Windows: %PERF_CONFDIR%\purger Navigate to the specific

tag with the TableName attribute matching the table that you want to change. For example:

3 Add or edit the Duration attribute with your desired time in days, up to a maximum of 31 days.

PAGE 648

Test the Reporting Feature Disable data collection for individual data loaders To reduce unwanted data from being logged in the database, disable data collection for individual data loaders. 1 Edit the plc configuration files for your data loaders. ❖ For EGO data loaders, edit plc_ego_rawdata.xml. ❖ For LSF data loaders, edit plc_lsf_basic_rawdata.xml.

PAGE 649

Reporting c ◆ lsfeventsloader ◆ lsfslaloader ◆ lsfresproploader ◆ sharedresusageloader ◆ EGO data loaders (for EGO-enabled clusters only): ❖ egoconsumerresloader ❖ egodynamicresloader ❖ egoeventsloader ❖ egostaticresloader View the data purger and data loader log files and verify that there are no ERROR messages in these files. You need to view the following log files (PERF_LOGDIR is LSF_LOGDIR/perf): 3 ◆ PERF_LOGDIR/dataloader/bldloader.host_name.

PAGE 650

Disable the Reporting Feature Disable the Reporting Feature Prerequisites: You must have root or lsfadmin access in the master host. 1 Disable the LSF events data logging. a Define or edit the ENABLE_EVENT_STREAM parameter in the lsb.params file to disable event streaming. ENABLE_EVENT_STREAM = N b In the command console, reconfigure the master host to activate these changes. badmin reconfig 2 If your cluster is EGO-enabled, disable the EGO allocation events data logging.

PAGE 651

Reporting ❖ 2 If you are using a MySQL database, create a database schema as described in Create a MySQL database schema on page 652. Stop the reporting services. Stop the derbydb (if you are using the Derby demo database), jobdt, plc, and purger services as described in Stop or restart reporting services on page 494. 3 If you are using the Derby demo database, disable automatic startup of the derbydb service as described in Disable automatic startup of the reporting services on page 494.

PAGE 652

Move to a Production Database 2 Run the script to create the EGO database schema. sqlplus user_name/password@connect_string @egodata.sql data_tablespace index_tablespace where 3 4 ◆ user_name is the user name on the database server. ◆ password is the password for this user name on the database server. ◆ connect_string is the named SQLNet connection for this database. ◆ data_tablespace is the name of the tablespace where you intend to store the table schema.

PAGE 653

Reporting 3 4 ◆ user_name is the user name on the database server. ◆ password is the password for this user name on the database server. ◆ report_database is the name of the database to store the report data. In the command console, open the LSF database schema directory. ◆ UNIX: cd $PERF_TOP/lsf/version/DBschema/MySQL ◆ Windows: cd %PERF_TOP%\lsf\version\DBschema\MySQL Run the scripts to create the LSF database schema.

PAGE 654

Move to a Production Database 5 In the JDBC URL field, enter the URL for your database. This should be similar to the format given in Example URL format. 6 In the Maximum connections field, specify the maximum allowed number of concurrent connections to the database server. This is the maximum number of users who can produce reports at the same time.

PAGE 655

C H A P T E R 44 Event Generation Contents ◆ Event Generation on page 655 ◆ Enabling event generation on page 655 ◆ Events list on page 656 ◆ Arguments passed to the LSF event program on page 656 Event Generation LSF detects events occurring during the operation of LSF daemons. LSF provides a program which translates LSF events into SNMP traps. You can also write your own program that runs on the master host to interpret and respond to LSF events in other ways.

PAGE 656

Events list Enable event generation for custom programs If you use a custom program to handle the LSF events, take the following steps to enable event generation. 1 Write a custom program to interpret the arguments passed by LSF. See Arguments passed to the LSF event program on page 656 and Events list on page 656 for more information. 2 To enable event generation, define LSF_EVENT_RECEIVER in lsf.conf. You must specify an event receiver even if your program ignores it.

PAGE 657

Event Generation ◆ The event receiver (LSF_EVENT_RECEIVER in lsf.conf) ◆ The cluster name ◆ The LSF event number (LSF events list or LSF_EVENT_XXXX macros in lsf.h) ◆ The event argument (for events that take an argument) Example For example, if the event receiver is the string xxx and LIM goes down on HostA in Cluster1, the function returns: xxx Cluster1 1 HostA The custom LSF event program can interpret or ignore these arguments.

PAGE 658

Arguments passed to the LSF event program 658 Administering Platform LSF

PAGE 659

C H A P T E R 45 Tuning the Cluster Contents ◆ Tuning LIM on page 660 ◆ Improving performance of mbatchd query requests on UNIX on page 667 Administering Platform LSF 659

PAGE 660

Tuning LIM Tuning LIM LIM provides critical services to all LSF components. In addition to the timely collection of resource information, LIM provides host selection and job placement policies. If you are using Platform MultiCluster, LIM determines how different clusters should exchange load and resource information. You can tune LIM policies and parameters to improve performance. LIM uses load thresholds to determine whether to place remote jobs on a host.

PAGE 661

Tuning the Cluster LIM uses load thresholds to determine whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy and LIM will not recommend jobs to that host. Thresholds can be set for any load index supported internally by the LIM, and for any external load index. If a particular load index is not specified, LIM assumes that there is no threshold for that load index.

PAGE 662

Tuning LIM LOAD_THRESHOLDS: r15s r1m r15m 3.5 - ut - lsload HOST_NAME status r15s hostD ok 0.0 hostA busy 1.9 r1m 0.0 2.1 pg 15 io - ls - it - r15m 0.0 1.9 ut 0% 47% pg ls 0.0 6 *69.6 21 tmp it 0 0 swp 2M tmp 30M 38M swp 32M 96M mem 1M mem 10M 60M In this example, the hosts have the following characteristics: ◆ hostD is ok. ◆ hostA is busy —The pg (paging rate) index is 69.6, above the threshold of 15.

PAGE 663

Tuning the Cluster In this section ◆ In sites where each host in the cluster cannot share a common configuration directory or exact replica. ◆ Default LIM behavior on page 663 ◆ Changing Default LIM Behavior to Improve Performance on page 662 ◆ Reconfiguration and LSF_MASTER_LIST on page 663 ◆ How LSF works with LSF_MASTER_LIST on page 664 ◆ Considerations on page 665 Default LIM behavior By default, each LIM running in an LSF cluster must read the configuration files lsf.shared and lsf.

PAGE 664

Tuning LIM If you make changes that affect load report messages such as load indices, you must restart all the LIMs in the cluster. Use the command lsadmin reconfig. How LSF works with LSF_MASTER_LIST LSF_MASTER_LIST undefined In this example, lsf.shared and lsf.cluster.cluster_name are shared among all LIMs through an NFS file server. The preferred master host is the first available server host in the cluster list in lsf.cluster.cluster_name. Any slave LIM can become the master LIM.

PAGE 665

Tuning the Cluster Considerations Generally, the files lsf.cluster.cluster_name and lsf.shared for hosts that are master candidates should be identical. When the cluster is started up or reconfigured, LSF rereads configuration files and compares lsf.cluster.cluster_name and lsf.shared for hosts that are master candidates. In some cases in which identical files are not shared, files may be out of sync. This section describes situations that may arise should lsf.cluster.cluster_name and lsf.

PAGE 666

Tuning LIM lsadmin limrestart hostA hostB hostC LSF_MASTER_LIST defined, and master host goes down If LSF_MASTER_LIST is defined and the elected master host goes down, and if the number of load indices in lsf.cluster.cluster_name or lsf.shared for the new elected master is different from the number of load indices in the files of the master that went down, LSF will reject all master candidates that do not have the same number of load indices in their files as the newly elected master.

PAGE 667

Tuning the Cluster Improving performance of mbatchd query requests on UNIX You can improve mbatchd query performance on UNIX systems using the following methods: ◆ Multithreading—On UNIX platforms that support thread programming, you can change default mbatchd behavior to use multithreading and increase performance of query requests when you use the bjobs command. Multithreading is beneficial for busy clusters with many jobs and frequent query requests.

PAGE 668

Improving performance of mbatchd query requests on UNIX MBD_REFRESH_TIME has the following syntax: MBD_REFRESH_TIME=seconds [min_refresh_time] where min_refresh_time defines the minimum time (in seconds) that the child mbatchd will stay to handle queries. The valid range is 0 - 300. The default is 5 seconds. ◆ If MBD_REFRESH_TIME is < min_refresh_time, the child mbatchd exits at MBD_REFRESH_TIME even if the job changes status or a new job is submitted before MBD_REFRESH_TIME expires.

PAGE 669

Tuning the Cluster Set a query-dedicated port for mbatchd To change the default mbatchd behavior so that mbatchd forks a child mbatchd that can create threads, specify a port number with LSB_QUERY_PORT in lsf.conf. TIP: This configuration only works on UNIX platforms that support thread programming. 1 Log on to the host as the primary LSF administrator. 2 Edit lsf.conf. 3 Add the LSB_QUERY_PORT parameter and specify a port number that will be dedicated to receiving requests from hosts.

PAGE 670

Improving performance of mbatchd query requests on UNIX When you define this parameter, LSF runs mbatchd child query processes only on the specified CPUs. The operating system can assign other processes to run on the same CPU, however, if utilization of the bound CPU is lower than utilization of the unbound CPUs. 1 Identify the CPUs on the master host that will run mbatchd child query processes.

PAGE 671

Tuning the Cluster When NEWJOB_REFRESH=Y the parent mbatchd pushes new job information to a child mbatchd. Job queries with bjobs display new jobs submitted after the child mbatchd was created. 1 Log on to the host as the primary LSF administrator. 2 Edit lsb.params. 3 Add NEWJOB_REFRESH=Y. You should set MBD_REFRESH_TIME in lsb.params to a value greater than 10 seconds. 4 Save the lsb.params file.

PAGE 672

Improving performance of mbatchd query requests on UNIX 672 Administering Platform LSF

PAGE 673

C H A P T E R 46 Authentication and Authorization LSF uses authentication and authorization to ensure the security of your cluster. The authentication process verifies the identity of users, hosts, and daemons, depending on the security requirements of your site. The authorization process enforces user account permissions.

PAGE 674

Authentication options Authentication method Description Configuration External authentication ◆ A framework that enables you LSF_AUTH=eauth to integrate LSF with any third-party authentication product—such as Kerberos or DCE Security Services—to authenticate users, hosts, and daemons. This feature provides a secure transfer of data within the authentication data stream between LSF clients and servers. Using external authentication, you can customize LSF to meet the security requirements of your site.

PAGE 675

Authentication and Authorization the $HOME/.rhosts file. Include the name of the local host in both files. This additional level of authentication works in conjunction with eauth, privileged ports (setuid), or identd authentication. CAUTION: Using the /etc/hosts.equiv and $HOME/.rhosts files grants permission to use the rlogin and rsh commands without requiring a password.

PAGE 676

Authorization options All external executables invoked by the LSF daemons, such as esub, eexec, elim, eauth, and pre- and post-execution commands, run under the lsfadmin user account. Windows passwords Windows users must register their Windows user account passwords with LSF by running the command lspasswd. If users change their passwords, they must use this command to update LSF. A Windows job does not run if the password is not registered in LSF. Passwords must be 31 characters or less.

PAGE 677

Authentication and Authorization Specifying a user account To change the user account for … Define the parameter … In the file … eauth LSF_EAUTH_USER lsf.sudoers eexec LSF_EEXEC_USER Pre- and post execution commands LSB_PRE_POST_EXEC_USER Controlling user access to LSF resources and functionality If you want to … Define … In the file … Specify the user accounts with cluster administrator privileges ADMINISTRATORS lsf.cluster.

PAGE 678

Authorization options Authorization failure Symptom Probable cause Solution User receives an email notification that LSF has placed a job in the USUSP state. The job cannot run because the The user should Windows password for the job ◆ Register the Windows is not registered with LSF. password with LSF using the command lspasswd. ◆ Use the bresume command to resume the suspended job.

PAGE 679

Authentication and Authorization Symptom ◆ Probable cause Solution resControl: operation The user with user ID uid is not To allow the root user to make permission denied, uid = allowed to make RES control RES control requests, define requests. By default, only the LSF_ROOT_REX in lsf.conf. LSF administrator can make RES control requests.

PAGE 680

Authorization options 680 Administering Platform LSF

PAGE 681

C H A P T E R 47 Job Email and Job File Spooling Contents ◆ Mail Notification When a Job Starts on page 681 ◆ File Spooling for Job Input, Output, and Command Files on page 684 Mail Notification When a Job Starts When a batch job completes or exits, LSF by default sends a job report by electronic mail to the submitting user account.

PAGE 682

Mail Notification When a Job Starts If you specify a -o output_file or -oo output_file option and do not specify a -e error_file or -eo error_file option, the standard output and standard error are merged and stored in output_file. You can also specify the standard input file if the job needs to read input from stdin.

PAGE 683

Job Email and Job File Spooling LSB_MAILSIZE is not recognized by the LSF default mail program. To prevent large job output files from interfering with your mail system, use LSB_MAILSIZE_LIMIT to explicitly set the maximum size in KB of the email containing the job information. LSB_MAILSIZE values The LSB_MAILSIZE environment variable can take the following values: ◆ A positive integer: if the output is being sent by email, LSB_MAILSIZE is set to the estimated mail size in KB.

PAGE 684

File Spooling for Job Input, Output, and Command Files For more information See the Platform LSF Configuration Reference for information about the LSB_MAILSIZE environment variable and the LSB_MAILTO, LSB_MAILSIZE_LIMIT parameters in lsf.conf, and JOB_SPOOL_DIR in lsb.params. File Spooling for Job Input, Output, and Command Files About job file spooling LSF enables spooling of job input, output, and command files by creating directories and files for buffering input and output for a job.

PAGE 685

Job Email and Job File Spooling Unless you use -is, you can use the special characters %J and %I in the name of the input file. %J is replaced by the job ID. %I is replaced by the index of the job in the array, if the job is a member of an array, otherwise by 0 (zero). The special characters %J and %I are not valid with the -is option. Specifying a job command file (bsub -Zs) Use the bsub -Zs command to spool a job command file to the directory specified by the JOB_SPOOL_DIR parameter in lsb.params.

PAGE 686

Modifying the job input file ◆ The job command file for bsub -Zs is spooled to LSB_SHAREDIR/cluster_name/lsf_cmddir. If the lsf_cmddir directory does not exist, LSF creates it before spooling the file. LSF removes the spooled file when the job completes. If you want to use job file spooling, but do not specify JOB_SPOOL_DIR, the LSB_SHAREDIR/cluster_name directory must be readable and writable by all the job submission users.

PAGE 687

C H A P T E R 48 Non-Shared File Systems Contents ◆ About Directories and Files on page 687 ◆ Using LSF with Non-Shared File Systems on page 688 ◆ Remote File Access on page 688 ◆ File Transfer Mechanism (lsrcp) on page 690 About Directories and Files LSF is designed for networks where all hosts have shared file systems, and files have the same names on all hosts.

PAGE 688

Using LSF with Non-Shared File Systems Some networks do not share files between hosts. LSF can still be used on these networks, with reduced fault tolerance. See Using LSF with Non-Shared File Systems on page 688 for information about using LSF in a network without a shared file system.

PAGE 689

Non-Shared File Systems If the directory is not available on the execution host, the job is run in /tmp. Any files created by the batch job, including the standard output and error files created by the -o and -e options to bsub, are left on the execution host. LSF provides support for moving user data from the submission host to the execution host before executing a batch job, and from the execution host back to the submitting host after the job completes.

PAGE 690

File Transfer Mechanism (lsrcp) bsub -i If the input file specified with bsub -i is not found on the execution host, the file is copied from the submission host using the LSF remote file access facility and is removed from the execution host after the job finishes. bsub -o and bsub -e The output files specified with the -o and -e arguments to bsub are created on the execution host, and are not copied back to the submission host by default.

PAGE 691

Non-Shared File Systems rcp on UNIX If lsrcp cannot contact RES on the submission host, it attempts to use rcp to copy the file. You must set up the /etc/hosts.equiv or HOME/.rhosts file in order to use rcp. See the rcp(1) and rsh(1) man pages for more information on using the rcp command. Custom file transfer You can replace lsrcp with your own file transfer mechanism as long as it supports the same syntax as lsrcp.

PAGE 692

File Transfer Mechanism (lsrcp) 692 Administering Platform LSF

PAGE 693

C H A P T E R 49 Error and Event Logging Contents ◆ System Directories and Log Files on page 693 ◆ Managing Error Logs on page 694 ◆ System Event Log on page 695 ◆ Duplicate Logging of Event Logs on page 696 ◆ LSF Job Termination Reason Logging on page 697 System Directories and Log Files LSF uses directories for temporary work files, log files and transaction files and spooling. LSF keeps track of all jobs in the system by maintaining a transaction log in the work subtree.

PAGE 694

Managing Error Logs MAX_INFO_DIRS is defined in lsb.params) for use at dispatch time or if the job is rerun. The info directory is managed by LSF and should not be modified by anyone. Log directory permissions and ownership Ensure that the permissions on the LSF_LOGDIR directory to be writable by root. The LSF administrator must own LSF_LOGDIR. Support for UNICOS accounting In Cray UNICOS environments, LSF writes to the Network Queuing System (NQS) accounting data file, nqacct, on the execution host.

PAGE 695

Error and Event Logging LSF daemons log error messages in different levels so that you can choose to log all messages, or only log messages that are deemed critical. Message logging for LSF daemons (except LIM) is controlled by the parameter LSF_LOG_MASK in lsf.conf. Possible values for this parameter can be any log priority symbol that is defined in /usr/include/sys/syslog.h. The default value for LSF_LOG_MASK is LOG_WARNING. IMPORTANT: LSF_LOG_MASK in lsf.

PAGE 696

Duplicate Logging of Event Logs Duplicate Logging of Event Logs To recover from server failures, host reboots, or mbatchd restarts, LSF uses information stored in lsb.events. To improve the reliability of LSF, you can configure LSF to maintain copies of these logs, to use as a backup. If the host that contains the primary copy of the logs fails, LSF will continue to operate using the duplicate logs. When the host recovers, LSF uses the duplicate logs to update the primary copies.

PAGE 697

Error and Event Logging This may happen given certain network topologies and failure modes. For example, connectivity is lost between the first master, M1, and both the file server and the secondary master, M2. Both M1 and M2 will run mbatchd service with M1 logging events to LSB_LOCALDIR and M2 logging to LSB_SHAREDIR. When connectivity is restored, the changes made by M2 to LSB_SHAREDIR will be lost when M1 updates LSB_SHAREDIR from its copy in LSB_LOCALDIR.

PAGE 698

LSF Job Termination Reason Logging - submitted by all users. - accounted on all projects. - completed normally or exited - executed on all hosts. - submitted to all queues. - accounted on all service classes.

PAGE 699

Error and Event Logging Keyword displayed by bacct Termination reason TERM_ADMIN TERM_BUCKET_KILL TERM_CHKPNT TERM_CPULIMIT TERM_CWD_NOTEXIST Integer value logged to JOB_FINISH in lsb.

PAGE 700

LSF Job Termination Reason Logging Example output of bacct and bhist Example termination cause Termination reason in bacct –l Example bhist output bkill -s KILL bkill job_ID Completed ; TERM_OWNER or TERM_ADMIN bkill –r Completed ; TERM_FORCE_ADMIN or TERM_FORCE_OWNER when sbatchd is not reachable.

PAGE 701

Error and Event Logging For instance, if your application had an explicit exit 129, you would see Exit code 129 in your output. When you send a signal that terminates the job, bhist reports either the signal or the value of signal+128. If the return status is greater than 128 and the job was terminated with a signal, then return_status-128=signal. Example For return status 133, the job was terminated with signal 5 (SIGTRAP on most systems, 133-128=5).

PAGE 702

LSF Job Termination Reason Logging 702 Administering Platform LSF

PAGE 703

C H A P T E R 50 Troubleshooting and Error Messages Contents ◆ Shared File Access on page 704 ◆ Common LSF Problems on page 705 ◆ Error Messages on page 712 ◆ Setting Daemon Message Log to Debug Level on page 719 ◆ Setting Daemon Timing Levels on page 722 Administering Platform LSF 703

PAGE 704

Shared File Access Shared File Access A frequent problem with LSF is non-accessible files due to a non-uniform file space. If a task is run on a remote host where a file it requires cannot be accessed using the same name, an error results. Almost all interactive LSF commands fail if the user’s current working directory cannot be found on the remote host. Shared files on UNIX If you are running NFS, rearranging the NFS mount table may solve the problem.

PAGE 705

Troubleshooting and Error Messages Common LSF Problems This section lists some other common problems with the LIM, RES, mbatchd, sbatchd, and interactive applications. Most problems are due to incorrect installation or configuration. Check the error log files; often the log message points directly to the problem. LIM dies quietly 1 Run the following command to check for errors in the LIM configuration files. lsadmin ckconfig -v This displays most configuration errors.

PAGE 706

Common LSF Problems host name and IP address to the loopback address. Any client requests will get the master LIM address as 127.0.0.1, and try to connect to it, and in fact will try to access itself. 1 Check the IP configuration of your master LIM in /etc/hosts. The following example incorrectly sets the master LIM IP address to the loopback address: 127.0.0.1 localhost myhostname The following example correctly sets the master LIM IP address: 127.0.0.1 localhost 192.168.123.

PAGE 707

Troubleshooting and Error Messages 4 If you are using an identification daemon (defined in the lsf.conf file by LSF_AUTH), inetd must be configured to run the daemon. The identification daemon must not be run directly. 5 If LSF_USE_HOSTEQUIV is defined in the lsf.conf file, check if /etc/hosts.equiv or HOME/.rhosts on the destination host has the client host name in it. Inconsistent host names in a name server with /etc/hosts and /etc/hosts.equiv can also cause this problem.

PAGE 708

Common LSF Problems This reports most errors. You should also check if there is any email in the LSF administrator’s mailbox. If the mbatchd is running but the sbatchd dies on some hosts, it may be because mbatchd has not been configured to use those hosts. See Host not used by LSF on page 708. sbatchd starts but mbatchd does not 1 Check whether LIM is running. You can test this by running the lsid command. If LIM is not running properly, follow the suggestions in this chapter to fix the LIM first.

PAGE 709

Troubleshooting and Error Messages UNKNOWN host type or model Viewing UNKNOWN host type or model 1 lshosts HOST_NAME type hostA UNKNOWN Run lshosts. A model or type UNKNOWN indicates the host is down or the LIM on the host is down. You need to take immediate action. For example: model Ultra2 cpuf 20.2 ncpus 2 maxmem 256M maxswp 710M server Yes RESOURCES () Fixing UNKNOWN matched host type or matched model 1 Start the host. 2 Run lsadmin limstartup to start LIM on the host.

PAGE 710

Common LSF Problems incorrect CPU factors. A DEFAULT type may also cause binary incompatibility because a job from a DEFAULT host type can be migrated to anotherDEFAULT host type. 1 Run lshosts. If Model or Type are displayed as DEFAULT when you use lshosts and automatic host model and type detection is enabled, you can leave it as is or change it.

PAGE 711

Troubleshooting and Error Messages b In the HostModel section, enter the new host model with architecture and CPU factor. Use the architecture detected with lim -t. Add the host model to the end of the host model list. The limit for host model entries is 127. Lines commented out with # are not counted in the 127-line limit. For example: Begin HostModel MODELNAME CPUFACTOR Ultra2 20 ARCHITECTURE # keyword SUNWUltra2_200_sparcv9 End HostModel 3 Save changes to lsf.shared.

PAGE 712

Error Messages Error Messages The following error messages are logged by the LSF daemons, or displayed by the following commands. lsadmin ckconfig badmin ckconfig General errors The messages listed in this section may be generated by any LSF daemon. can’t open file: error The daemon could not open the named file for the reason given by error. This error is usually caused by incorrect file permissions or missing files.

PAGE 713

Troubleshooting and Error Messages userok: Forged username suspected from /: / The service request claimed to come from user claimed_user but ident authentication returned that the user was actually actual_user. The request was not serviced. userok: ruserok(,) failed LSF_USE_HOSTEQUIV is defined in the lsf.conf file, but host has not been set up as an equivalent host (see /etc/host.equiv), and user uid has not set up a .rhosts file.

PAGE 714

Error Messages file: HostModel section missing or invalid file: Resource section missing or invalid file: HostType section missing or invalid The HostModel, Resource, or HostType section in the lsf.shared file is either missing or contains an unrecoverable error. file(line): Name name reserved or previously defined. Ignoring index The name assigned to an external load index must not be the same as any built-in or previously defined resource or load index.

PAGE 715

Troubleshooting and Error Messages LIM messages The following messages are logged by the LIM: main: LIM cannot run without licenses, exiting The LSF software license key is not found or has expired. Check that FLEXlm is set up correctly, or contact your LSF technical support. main: Received request from unlicensed host / LIM refuses to service requests from hosts that do not have licenses.

PAGE 716

Error Messages This is a warning message. The ELIM sent a value for one of the built-in index names. LIM uses the value from ELIM in place of the value obtained from the kernel. getusr: Protocol error numIndx not read (cc=num): error getusr: Protocol error on index number (cc=num): error Protocol error between ELIM and LIM. RES messages These messages are logged by the RES.

PAGE 717

Troubleshooting and Error Messages logJobInfo_: write xdrpos failed: error logJobInfo_: write xdr buf len failed: error logJobInfo_: close() failed: error rmLogJobInfo: Job : can’t unlink(): error rmLogJobInfo_: Job : can’t stat(): error readLogJobInfo: Job can’t open(): error start_job: Job : readLogJobInfo failed: error readLogJobInfo: Job

PAGE 718

Error Messages LSF command messages LSF daemon (LIM) not responding ... still trying During LIM restart, LSF commands will fail and display this error message. User programs linked to the LIM API will also fail for the same reason. This message is displayed when LIM running on the master host list or server host list is restarted after configuration changes, such as adding new resources, binary upgrade, and so on. Use LSF_LIM_API_NTRIES in lsf.

PAGE 719

Troubleshooting and Error Messages Setting Daemon Message Log to Debug Level The message log level for LSF daemons is set in lsf.conf with the parameter LSF_LOG_MASK. To include debugging messages, set LSF_LOG_MASK to one of: ◆ LOG_DEBUG ◆ LOG_DEBUG1 ◆ LOG_DEBUG2 ◆ LOG_DEBUG3 By default, LSF_LOG_MASK=LOG_WARNING and these debugging messages are not displayed. The debugging log classes for LSF daemons is set in lsf.

PAGE 720

Setting Daemon Message Log to Debug Level Examples lsadmin limdebug -c "LC_MULTI LC_PIM" -f myfile hostA hostB Log additional messages for the LIM daemon running on hostA and hostB, related to MultiCluster and PIM. Create log files in the LSF_LOGDIR directory with the name myfile.lim.log.hostA, and myfile.lim.log.hostB. The debug level is the default value, LOG_DEBUG level in parameter LSF_LOG_MASK.

PAGE 721

Troubleshooting and Error Messages LSB_DEBUG_MBD, LSB_DEBUG_SBD, and LSB_DEBUG_SCH. The log file is reset to the LSF system log file in the directory specified by LSF_LOGDIR in the format daemon_name.log.host_name. For timing level examples, see Setting Daemon Timing Levels on page 722.

PAGE 722

Setting Daemon Timing Levels Setting Daemon Timing Levels The timing log level for LSF daemons is set in lsf.conf with the parameters LSB_TIME_CMD, LSB_TIME_MBD, LSB_TIME_SBD, LSB_TIME_SCH, LSF_TIME_LIM, LSF_TIME_RES. The location of log files is specified with the parameter LSF_LOGDIR in lsf.conf. Timing is included in the same log files as messages. To change the timing log level, you need to stop any running daemons, change lsf.conf, and then restart the daemons.

PAGE 723

Troubleshooting and Error Messages For debug level examples, see Setting Daemon Message Log to Debug Level on page 719. For a detailed description of lsadmin and badmin, see the Platform LSF Command Reference.

PAGE 724

Setting Daemon Timing Levels 724 Administering Platform LSF

PAGE 725

C H A P T E R 51 Understanding Platform LSF Job Exit Information Contents ◆ Why did my job exit? on page 725 ◆ How LSF translates events into exit codes on page 725 ◆ Application and system exit values on page 726 ◆ LSF job termination reason logging on page 728 ◆ Job termination by LSF exit information on page 731 ◆ LSF RMS integration exit values on page 733 Why did my job exit? LSF collects job information and reports the final status of a job.

PAGE 726

Application and system exit values Error codition LSF exit code Operating system System exit code Meaning equivalent LSF internal error -127, 127 all N/A RES returns -127 or 127 for all internal problems. Out of memory N/A all N/A Exit code depends on the error handling of the application itself. LSF job states 0 all N/A Exit code 0 is returned for all job states Host failure If an LSF server host fails, jobs running on that host are lost. No other jobs are affected.

PAGE 727

Understanding Platform LSF Job Exit Information It is possible for a job to explicitly exit with an exit code greater than 128, which can be confused with the corresponding UNIX signal. Make sure that applications you write do not use exit codes greater than128. System signal exit values When you send a signal that terminates the job, LSF reports either the signal or the signal_value+128. If the return status is greater than 128, and the job was terminated with a signal, then return_status-128=signal.

PAGE 728

LSF job termination reason logging loadSched - - loadStop - - LSF job termination reason logging When LSF takes action on a job, it may send multiple signals. In the case of job termination, LSF will send, SIGINT, SIGTERM and SIGKILL in succession until the job has terminated. As a result, the job may exit with any of those corresponding exit values at the system level. Other actions may send "warning" signals to applications (SIGUSR2) etc.

PAGE 729

Understanding Platform LSF Job Exit Information Command Thu Sep 16 15:22:09: Submitted from host , CWD <$HOME>; Thu Sep 16 15:22:20: Dispatched to 4 Hosts/Processors <4*hostA>; Thu Sep 16 15:23:21: Completed ; TERM_RUNLIMIT: job killed after reaching LSF run time limit. Accounting information about this job: Share group charged CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP 0.04 11 72 exit 0.

PAGE 730

LSF job termination reason logging Keyword displayed by bacct Termination reason TERM_ADMIN TERM_BUCKET_KILL TERM_CHKPNT TERM_CPULIMIT TERM_CWD_NOTEXIST Job killed by root or LSF administrator Job killed with bkill -b Job killed after checkpointing Job killed after reaching LSF CPU usage limit Current working directory is not accessible or does not exist on the execution host TERM_DEADLINE Job killed after deadline expires TERM_EXTERNAL_SIGNAL Job killed by a signal external to LSF TERM_FORCE_ADMIN Job k

PAGE 731

Understanding Platform LSF Job Exit Information Example output of bacct and bhist Example termination cause Termination reason in bacct –l Example bhist output bkill -s KILL bkill job_ID Completed ; TERM_OWNER or TERM_ADMIN bkill –r Completed ; TERM_FORCE_ADMIN or TERM_FORCE_OWNER when sbatchd is not reachable.

PAGE 732

Job termination by LSF exit information The job exit information in the POST_EXEC is defined in 2 parts: ◆ LSB_JOBEXIT_STAT—the raw wait3() output (converted using the wait macros /usr/include/sys/wait.h) ◆ LSB_JOBEXIT_INFO—defined only if the job exit was due to a defined LSF reason. Queue-level POST_EXEC commands should be written by the cluster administrator to perform whatever task is necessary for specific exit situations.

PAGE 733

Understanding Platform LSF Job Exit Information Example termination cause LSB_JOBEXIT_STAT LSB_JOBEXIT_INFO Example bhist output Job being migrated bmig -m togni Job <213> is being migrated 33280 SIGNAL -1 SIG_CHKPNT Fri Feb 14 15:04:42: Migration requested by user or administrator ; Specified Hosts ; Fri Feb 14 15:04:44: Job is being requeued; Fri Feb 14 15:05:01: Job has been requeued; Fri Feb 14 15:05:01: Pending: Migrating job is waiting for rescheduling; Undefined Fri Feb 14 15:1

PAGE 734

LSF RMS integration exit values Upon successful completion, rms_run() returns the global OR of the exit status values of the processes in the parallel program. If one of the processes is killed, rms_run() returns a status value of 128 plus the signal number. It can also return the following codes: Return Code RMS Meaning 0 A process exited with the code 127 (GLOBAL EXIT), which indicates success, causing all of the processes to exit.

PAGE 735

P A R T VIII LSF Utilities Using lstcsh on page 737 Administering Platform LSF 735

PAGE 736

Administering Platform LSF

PAGE 737

C H A P T E R 52 Using lstcsh This chapter describes lstcsh, an extended version of the tcsh command interpreter. The lstcsh interpreter provides transparent load sharing of user jobs. This chapter is not a general description of the tcsh shell. Only load sharing features are described in detail. Interactive tasks, including lstcsh, are not supported on Windows.

PAGE 738

About lstcsh variables is automatically reflected on remote hosts. Note that shell variables, the nice value, and resource usage limits are not automatically propagated to remote hosts. For more details on lstcsh, see the lstcsh(1) man page. In this section ◆ Task Lists on page 738 ◆ Local and Remote Modes on page 738 ◆ Automatic Remote Execution on page 739 Task Lists LSF maintains two task lists for each user, a local list (.lsftask) and a remote list (lsf.task).

PAGE 739

Using lstcsh Automatic Remote Execution Every time you enter a command, lstcsh looks in your task lists to determine whether the command can be executed on a remote host and to find the configured resource requirements for the command. See the Platform LSF Configuration Reference for information about task lists and lsf.task file. If the command can be executed on a remote host, lstcsh contacts LIM to find the best available host.

PAGE 740

Starting lstcsh Shell variables Shell variables are not propagated across machines. When you set a shell variable locally, then run a command remotely, the remote shell will not see that shell variable. Only environment variables are automatically propagated. fg command The fg command for remote jobs must use @, as shown by examples in Task Control on page 742. tcsh version lstcsh is based on tcsh 6.03 (7 bit mode). It does not support the new features of the latest tcsh.

PAGE 741

Using lstcsh Using a standard system shell if you cannot set your login shell using chsh, you can use one of the standard system shells to start lstcsh when you log in. To set up lstcsh to start when you log in: 1 Use chsh to set /bin/sh to be your login shell. 2 Edit the .

PAGE 742

Task Control Similarly, when specifying resource requirements following the @, it is necessary to use / only if the first requirement characters specified are also the first characters of a host name. You do not have to type in resource requirements for each command line you type if you put these task names into remote task list together with their resource requirements by running lsrtasks. Task Control Task control in lstcsh is the same as in tcsh except for remote background tasks.

PAGE 743

Using lstcsh lsmode Syntax lsmode [on|off] [local|remote] [e|-e] [v|-v] [t|-t] Description The lsmode command reports that LSF is enabled if lstcsh was able to contact LIM when it started up. If LSF is disabled, no load-sharing features are available. The lsmode command takes a number of arguments that control how lstcsh behaves.

PAGE 744

Writing Shell Scripts in lstcsh connect Syntax connect [host_name] Description lstcsh opens a connection to a remote host when the first command is executed remotely on that host. The same connection is used for all future remote executions on that host. The connect command with no argument displays connections that are currently open. The connect host_name command creates a connection to the named host.

PAGE 745

Using lstcsh The following assumes you installed lstcsh in the /usr/share/lsf/bin directory): #!/usr/share/lsf/bin/lstcsh -L 1 Start an interactive lstcsh. 2 Enable load sharing, and set to remote mode: lsmode on remote 3 Use the source command to read the script in.

PAGE 746

Writing Shell Scripts in lstcsh 746 Administering Platform LSF

PAGE 747

Index Symbols ! (NOT) operator, job dependencies 450 %I substitution string in job arrays 492 %J substitution string in job arrays 492 %USRCMD string in job starters 570 && (AND) operator, job dependencies 450 .cshrc file and lstcsh 740 .lsbatch directory 37 .rhosts file disadvantages 675 file transfer with lsrcp 691 host authentication 675 troubleshooting 706 /etc/hosts file example host entries 89 host naming 86 name lookup 87 troubleshooting 706 /etc/hosts.

PAGE 748

Index controlling jobs 375 default application profile 373 description 372 job success exit values 373 modifying jobs (bmod -app) 375 pre- and post-execution commands, configuring 563 submitting jobs (bsub -app) 375 viewing detailed information (bapp -l) 377 jobs (bjobs -app) 378 summary information (bacct -app) 378 summary information (bapp) 377 APS. See absolute priority scheduling APS_PRIORITY parameter in lsb.

PAGE 749

Index bjobs -l, modifed absolute job priority scheduling values 461 bjobs -x, viewing job exception status 118 bkill -app 375 bkill -g 135 black hole hosts 98, 138 bladmin chkconfig command, checking time-based configuration 274 blimits -c command, checking time-based configuration 274 blimits command 402 blinfo command, checking time-based configuration 274 blstat command, checking time-based configuration 274 bmod, absolute job priority scheduling 460 bmod -app 375 bmod -g 134 bmod -is 686 bmod -Zs 686 Bo

PAGE 750

Index lsb.params 484 CLEAN_PERIOD parameter in lsb.

PAGE 751

Index time limit, job-level resource limit 545 tuning CPU factors in lsf.shared 96 utilization, ut load index 240 viewing run queue length 97 CPU factor, non-normalized run time limit 549 CPU factor (cpuf) static resource 242 CPU time, idle job exceptions 110, 138 CPU time normalization 546 CPU_TIME_FACTOR parameter in lsb.params, fairshare dynamic user priority 299 cpuf static resource 243 CPULIMIT parameter in lsb.

PAGE 752

Index log, permissions and ownership 204, 694 .lsbatch 37 LSF_SERVERDIR, esub and eexec 574 remote access 573, 688 shared 37 user accounts 37 directory for license (demo) 157 directory for license (permanent) 160 disks, I/O rate 241 dispatch order, fairshare 300 dispatch turn, description 34 dispatch windows description 446 hosts 68 queues 106 tuning for LIM 660 dispatch, adaptive.

PAGE 753

Index description 574 environment variables 574 job submission parameters 578 mandatory method (LSB_ESUB_METHOD) 580 pass data to execution environments 582 esub method (LSB_ESUB_METHOD) 580 /etc/hosts file example host entries 89 host naming 86 name lookup 87 troubleshooting 706 /etc/hosts.equiv file host authentication 674 troubleshooting 706 using rcp 691 /etc/services file, adding LSF entries to 84 /etc/syslog.

PAGE 754

Index policies 296 priority user 325 resource usage measurement 298 static priority 326 user share assignment 297 viewing cross-queue fairshare information 303 FAIRSHARE_QUEUES parameter in bqueues 303 in lsb.queues 304 OBSOLETE 462 fast job dispatching 615 fault tolerance description 38 non-shared file systems 688 FCFS (first-come, first-served) scheduling 34 FEATURE line license.dat file (demo) 152 license.

PAGE 755

Index G gethostbyname function (host naming) 87 global fairshare 324 GLOBAL_EXIT_RATE parameter in lsb.params 141 GLOBEtrotter Software 154 goal-oriented scheduling.

PAGE 756

Index resource information 417 host-locked software licenses 261 HOSTRESORDER environment variable 707 hosts adding with lsfinstall 71 associating resources with 257 closing 68 connecting to remote 744 controlling 68 copying files across 607 dispatch windows 68 displaying 62 file 87 finding resource 608 for advance reservations 426 logging on the least loaded 608 master candidates 663 multiple network interfaces 88 official name 86 opening 68 preselecting masters for LIM 662 redirecting 741 removing 73 reso

PAGE 757

Index interfaces, network 88 Internet addresses, matching with host names 86 Internet Domain Name Service (DNS), host naming 86 inter-queue priority 553 interruptible backfill 518 io load index 241 IPv6 configure hosts 91 supported platforms 90 using IPv6 addresses 90 IRIX Comprehensive System Accounting (CSA), configuring 694 utmp file registration 601 it load index automatic job suspension 554 description 240, 595 suspending conditions 555 J %J substitution string in job arrays 492 JL/P parameter in lsb.

PAGE 758

Index viewing with bhosts 66 job file spooling See also command file spooling default directory 685 description 684 JOB_SPOOL_DIR parameter in lsb.params 684 job files 33 job groups add limits 131 automatic deletion 137 controlling jobs 134 default job group 128 description 127 displaying SLA service classes 354 example hierarchy 130 job limits 129 modify limits 136 viewing 132 job idle factor, viewing with bjobs 118 job ladders.

PAGE 759

Index pre-execution commands configuring 562 description 561 resource requirements 281 resource reservation 406 run limits 548 job-level suspending conditions, viewing 556 jobs changing execution order 120 checkpointing, chunk jobs 487 CHKPNT 586 controlling, in an application profile 375 dispatch order 35 email notification disabling 682 options 681 enabling rerun 475 enforcing memory usage limits 547 exit codes description 700 job success exit values 373 forcing execution 122 interactive.

PAGE 760

Index specifying host or model type 176 removing feature (lmremove) 168 rereading license file (lmreread) 168 shutting down FLEXlm server (lmdown) 168 software counted 261 dedicated queue for 264 floating 262 host locked 261 interactive jobs and 265 LIM (Load Information Manager) preselecting master hosts 662 tuning 662 load indices 661 load thresholds 661 policies 660 run windows 660 LIM, master 154 limdebug command 719 limitations lsrcp command 690 on chunk job queues 485 limits job group 129 See resource

PAGE 761

Index LOG_DAEMON facility, LSF error logging 203, 695 logging classes, description 205 logging levels, description 205 logical operators in time expesssions 270 job dependencies 450 login sessions 240 login shell, using lstcsh as 740 logs classes 205 entry formats 204 levels 205 lost_and_found queue 108 ls load index 240 ls_connect API call 574 LS_EXEC_T environment variable 583 LS_JOBPID environment variable 582 ls_postevent() arguments 656 lsadmin command limlock 58 limunlock 58 lsb.

PAGE 762

Index POST_EXEC parameter 562 PRE_EXEC parameter 562 QUEUE_NAME parameter 108 REQUEUE_EXIT_VALUES parameter 469 resource usage limits 543 restricting host use by queues 109 time-based configuration 271 user groups 146 USERS parameter 146 using host groups 93 using user groups 146 lsb.queues files, DEFAULT_HOST_SPEC parameter 551 lsb.resources file advance reservation policies 424 if-else constructs 271 parameters 393 time-based configuration 271 viewing limit configuration (blimits) 402 lsb.

PAGE 763

Index daemon service ports 84 default UNIX directory 47 duplicate event logging 696 dynamic host startup time 75 limiting the size of job email 682 LSB_CHUNK_RUSAGE parameter 541 LSB_DISABLE_RERUN_POST_EXEC parameter 476, 562 LSB_JOB_CPULIMIT parameter 546 LSB_JOB_MEMLIMIT 547 LSB_MAILSIZE_LIMIT parameter 682 LSB_MAILTO parameter 681 LSB_MAX_JOB_DISPATCH_PER_SESSION parameter 615 LSB_MEMLIMIT_ENFORCE 547 LSB_QUERY_PORT parameter 617, 669 LSB_SIGSTOP parameter 123 LSB_SUB_COMMANDNAME parameter 575 LSF_BINDIR

PAGE 764

Index configuring 581 description 580 master host candidates 201 with LSF_MASTER_LIST 663 master host failover, about 202 master hosts 201 in non-shared file systems 688 preselecting 662 specifying 697 viewing current 42 MAX_CONCURRENT_JOB_QUERY parameter in lsb.params 617 MAX_HOST_IDLE_TIME parameter in lsb.serviceclasses 365 MAX_INFO_DIRS parameter in lsb.params 619 MAX_JOB_NUM parameter in lsb.params 695 MAX_JOBS parameter in lsb.users 395 MAX_PEND_JOBS parameter in lsb.params or lsb.

PAGE 765

Index host name lookup in LSF 86 ypcat hosts.

PAGE 766

Index example host entries 89 host naming 86 name lookup 87 /etc/hosts.

PAGE 767

Index default 33 dispatch windows 106 fairshare across queues 302 interactive 592 interruptible backfill 519 lost_and_found 108 overview 32 preemptive and preemptable 277 REQUEUE_EXIT_VALUES parameter 562 restricting host use 109 run windows 107 setting rerun level 475 specifying suspending conditions 556 viewing available 102 default 33 detailed queue information 102 for interactive jobs 592 history 103 job exception status 104 resource allocation limits (blimits) 402 status 102 viewing absolute job priori

PAGE 768

Index reports (reporting feature) 650 reports 627 architecture 634 creating 630, 631, 632 custom 628, 630 creating 630, 631, 632 deleting 630, 633 producing 632 data loader plug-ins 638 log files 638, 642 LSF 634 data purger 636, 639 record expiry time 646 schedule 639, 644 database 628, 637, 639 Derby 628, 639 moving 639, 650 MySQL 652 Oracle 226, 227, 651 schema 226, 651, 652, 653 deleting 630, 633 disabling 650 event data files 639 EGO 644 LSF 643 exporting 629, 630, 631, 633 job data transformer 636, 63

PAGE 769

Index resource requirements 283, 288 viewing 237 resource usage limits ceiling 543 chunk job enforcement 541 configuring 543 conflicting 540 default 543 for deadline constraints 275 hard 543 maximum 543 priority 540 soft 543 specifying 543 ResourceMap section in lsf.cluster.cluster_name 257 ResourceReservation section in lsb.

PAGE 770

Index scheduling 458 schmod_advrsv plugin for advance reservation 424 S-Class LSF license type 151 scripts check_license for counted software licenses 262 lic_starter to manage software licenses 265 redirecting to standard input for interactive jobs 599 writing for interactive jobs 598 writing in lstcsh 744 SDK, defining demand 187 security, LSF authentication 674 selection strings defined keyword 286 description 284 operators 285 server hosts, viewing detailed information 63 SERVER line, license.

PAGE 771

Index bacct command 348 bjgroup command 354 bjobs command 348 bsla command 346 configuring 344 configuring EGO-enabled SLA service classes 358 deadline goals 342 default SLA for EGO-enabled SLA scheduling 358 delayed goals 353 description 341 EGO-enabled SLA scheduling 355 job preemption 353 missed goals 353 optimum number of running jobs 342 service classes description 342 examples 345 service level goals 342 submitting jobs 343 throughput goals 342 velocity goals 342 violation period 353 SLA_TIMER paramet

PAGE 772

Index SWP absolute job priority scheduling factor 460 swp load index description 241 suspending conditions 555 viewing resource allocation limits (blimits) 402 syslog.

PAGE 773

Index formula 299 user share assignments 297 USER_ADVANCE_RESERVATION parameter in lsb.params, obsolete parameter 425 USER_NAME parameter in lsb.users 146 USER_NAME parameter in lsb.users file 146 USER_SHARES parameter in lsb.hosts 146 USER_SHARES parameter in lsb.

PAGE 774

Index 774 Administering Platform LSF