Platform LSF® Reference Version 6.2 March 2006 Comments to: doc@platform.
Copyright © 1994-2006 Platform Computing Corporation All rights reserved. We’d like to hear from you You can help us make this document better by telling us what you think of the content, organization, and usefulness of the information. If you find an error, or just want to make a suggestion for improving this document, please address your comments to doc@platform.com. Your comments should pertain only to Platform documentation. For product support, contact support@platform.com.
Contents Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Part I: Commands bacct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 badmin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 bbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 bchkpnt . . . .
Contents bmod . . bparams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 bpeek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 bpost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 . . . . . . . . .
Contents lsfsetcluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 lsfshutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 lsfstartup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 lsgrun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 lshosts . . . . . . . . .
Contents Part III: Configuration Files bld.license.acct . . . . cshrc.lsf and profile.lsf hosts . . . . . . . . . . . . . . . . . . . . . . . . . . 295 . . . . . . . . . . . . . . . . . . . . . . . . . . 297 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 install.config . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 lim.acct . . . . .
Welcome Contents ◆ ◆ ◆ “About This Guide” on page 8 “Learn About Platform Products” on page 9 “Get Technical Support” on page 10 About Platform Computing Platform Computing is the largest independent grid software developer, delivering intelligent, practical enterprise grid software and services that allow organizations to plan, build, run and manage grids by optimizing IT resources.
About This Guide About This Guide Last update March 1 2006 Latest version www.platform.com/Support/Documentation.htm Purpose of this guide This guide provides reference information for the Platform LSF® software (“LSF”). It covers the following topics: ◆ ◆ ◆ ◆ LSF commands Environment variables Configuration files Troubleshooting Who should use this guide This guide accompanies Administering Platform LSF, and is your source for reference information.
Welcome Learn About Platform Products World Wide Web and FTP The latest information about all supported releases of Platform LSF is available on the Platform Web site at www.platform.com. Look in the Online Support area for current Release Notes, Upgrade Notices, Frequently Asked Questions (FAQs), Troubleshooting, and other helpful information. The Platform FTP site (ftp.platform.com) also provides current Release Notes, and Upgrade information for all supported releases of Platform LSF.
Get Technical Support Get Technical Support Contact Platform Contact Platform Computing or your LSF vendor for technical support. Use one of the following to contact Platform technical support: Email support@platform.com World Wide Web www.platform.com Mail Platform Support Platform Computing Corporation 3760 14th Avenue Markham, Ontario Canada L3R 3T7 When contacting Platform, please include the full name of your company. See the Platform Web site at www.platform.
P A R T I Commands
bacct bacct displays accounting statistics about finished jobs SYNOPSIS bacct [-b | -l] [-d] [-e] [-w] [-C time0 ,time1] [-D time0 ,time1] [-f logfile_name] [-Lp ls_project_name ...] [-m host_name ...] [-N host_name | -N host_model | -N CPU_factor] [-P project_name ...] [-q queue_name ...] [-sla service_class_name ...] [-S time0 ,time1] [-u user_name ... | -u all] [-x] [job_ID ...] bacct -U reservation_ID ... | -U all [-u user_name ...
bacct You can use the option -C time0 ,time1 to specify the Start time as time0 and the End time as time1. In this way, you can examine throughput during a specific time period. Jobs involved in the throughput calculation are only those being logged (that is, with a DONE or EXIT status). Jobs that are running, suspended, or that have never been dispatched after submission are not considered, because they are still in the LSF system and not logged in lsb.acct.
bacct -Lp ls_project_name ... Displays accounting statistics for jobs belonging to the specified License Scheduler projects. If a list of projects is specified, project names must be separated by spaces and enclosed in quotation marks (") or (’). -m host_name ... Displays accounting statistics for jobs dispatched to the specified hosts. If a list of hosts is specified, host names must be separated by spaces and enclosed in quotation marks (") or (’).
bacct -u user_name ...|-u all Displays accounting statistics for jobs submitted by the specified users, or by all users if the keyword all is specified. If a list of users is specified, user names must be separated by spaces and enclosed in quotation marks (") or (’). You can specify both user names and user IDs in the list of users. -x Displays jobs that have triggered a job exception (overrun, underrun, idle). Use with the -l option to show the exception status for individual jobs. job_ID ...
bacct The wait time is the elapsed time from job submission to job dispatch. The turnaround time is the elapsed time from job submission to job completion. The hog factor is the amount of CPU time consumed by a job divided by its turnaround time. The throughput is the number of completed jobs divided by the time period to finish these jobs (jobs/hour). For more details, see “DESCRIPTION” on page 13.
bacct COMPL_TIME Time when the job exited or completed. HOG_FACTOR Average hog factor, equal to "CPU time" / "turnaround time". MEM Maximum resident memory usage of all processes in a job, in kilobytes. SWAP Maximum virtual memory usage of all processes in a job, in kilobytes. CWD Current working directory of the job. INPUT_FILE File from which the job reads its standard input (see bsub(1)). OUTPUT_FILE File to which the job writes its standard output (see bsub(1)).
bacct USER User name of the advance reservation user, who submitted the job with bsub -U NCPUS Number of CPUs reserved RSV_HOSTS List of hosts for which processors are reserved, and the number of processors reserved TIME_WINDOW Time window for the reservation. ✧ A one-time reservation displays fields separated by slashes (month/day/hour/minute). For example: 11/12/14/0-11/12/18/0 ✧ A recurring reservation displays fields separated by colons (day:hour:minute).
bacct See lsbatch.h for the mapping between the integer value logged to lsb.acct and termination reason keyword. EXAMPLES Default format % bacct Accounting information about jobs that are: - submitted by users user1. - accounted on all projects. - completed normally or exited. - executed on all hosts. - submitted to all queues. - accounted on all service classes.
bacct EXCEPTION STATUS: underrun Accounting information about this job: CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP 0.19 65 157 done 0.0012 4M 5M -----------------------------------------------------------------------------Job <1948>, User , Project , Status , Queue , Command Tue Aug 12 14:15:03: Submitted from host , CWD <$HOME/jobs>, Output File ; Tue Aug 12 14:15:15: Dispatched to ; Tue Aug 12 14:25:08: Completed .
bacct Total number of done jobs: 45 Total number of exited jobs: 56 Total CPU time consumed: 1009.1 Average CPU time consumed: 10.0 Maximum CPU time of a job: 991.4 Minimum CPU time of a job: 0.1 Total wait time in queues: 116864.0 Average wait time in queue: 1157.1 Maximum wait time in queue: 7069.0 Minimum wait time in queue: 7.0 Average turnaround time: 1317 (seconds/job) Maximum turnaround time: 7070 Minimum turnaround time: 10 Average hog factor of a job: 0.
bacct Thu Sep 16 15:23:21: Completed ; TERM_RUNLIMIT: job killed after reaching LSF run time limit. Accounting information about this job: Share group charged CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP 0.04 11 72 exit 0.0006 0K 0K -----------------------------------------------------------------------------SUMMARY: ( time unit: second ) Total number of done jobs: 0 Total number of exited jobs: Total CPU time consumed: 0.0 Average CPU time consumed: Maximum CPU time of a job: 0.
badmin badmin administrative tool for LSF SYNOPSIS badmin subcommand badmin [-h | -V] SUBCOMMAND LIST ckconfig [-v] diagnose [job_ID ... | "job_ID [index ]" ...] reconfig [-v] [-f] mbdrestart [-C comment] [-v] [-f] qopen [-C comment] [queue_name ... | all] qclose [-C comment] [queue_name ... | all] qact [-C comment] [queue_name ... | all] qinact [-C comment] [queue_name ... | all] qhist [-t time0,time1] [-f logfile_name] [queue_name ...] hopen [-C comment] [host_name ... | host_group ...
badmin The badmin commands consist of a set of privileged commands and a set of nonprivileged commands. Privileged commands can only be invoked by root or LSF administrators as defined in the configuration file (see lsf.cluster.cluster(5) for ClusterAdmin). Privileged commands are: reconfig mbdrestart qopen qclose qact qinact hopen hclose hrestart hshutdown hstartup diagnose The configuration file lsf.sudoers(5) has to be set in order to use the privileged command hstartup by a non-root user.
badmin By default, badmin ckconfig displays only the result of the configuration file check. If warning errors are found, badmin prompts you to display detailed messages. -v Verbose mode. Displays detailed messages about configuration file checking to stderr. diagnose [job_ID ... | "job_ID[index]" ...] Displays full pending reason list if CONDENSE_PENDING_REASONS=Y is set in lsb.params. For example: % badmin diagnose 1057 reconfig [-v] [-f] Dynamically reconfigures LSF without restarting mbatchd.
badmin If warning errors are found, badmin prompts you to display detailed messages. If fatal errors are found, mbatchd and mbschd restart is not performed, and badmin exits. If lsb.events is large, or many jobs are running, restarting mbatchd can take several minutes. If you only need to reload the configuration files, use badmin reconfig. -C comment Logs the text of comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters. -v Verbose mode.
badmin -C comment Logs the text of comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters. qhist [-t time0,time1] [-f logfile_name] [queue_name ...] Displays historical events for specified queues, or for all queues if no queue is specified. Queue events are queue opening, closing, activating and inactivating. -t time0,time1 Displays only those events that occurred during the period from time0 to time1. See bhist(1) for the time format.
badmin -f Disables interaction and does not ask for confirmation for restarting sbatchd. hshutdown [-f] [host_name ... | all] Shuts down sbatchd on the specified hosts, or on all batch server hosts if the reserved word all is specified. If no host is specified, the local host is assumed. sbatchd will exit upon receiving the request. -f Disables interaction and does not ask for confirmation for shutting down sbatchd. hstartup [-f] [host_name ...
badmin This command also fails if you try to add dynamic hosts to condensed host groups. To enable dynamic host configuration, define LSF_MASTER_LIST and LSF_DYNAMIC_HOST_WAIT_TIME in lsf.conf and LSF_HOST_ADDR_RANGE in lsf.cluster.cluster_name. -C comment Logs the text of comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters. hghostdel [-f] [-C comment] host_group host_name [host_name ...
badmin sbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [host_name ...] Sets the message log level for sbatchd to include additional information in log files. You must be root or the LSF administrator to use this command. In MultiCluster, debug levels can only be set for hosts within the same cluster. For example, you could not set debug or timing levels from a host in clusterA for a host in clusterB.
badmin LC_SIGNAL - Log messages pertaining to signals LC_SYS - Log system call messages LC_TRACE - Log significant program walk steps LC_XDR - Log everything transferred by XDR Default: 0 (no additional classes are logged) -l debug_level Specifies level of detail in debug messages. The higher the number, the more detail that is logged. Higher levels include all lower levels. Possible values: 0 LOG_DEBUG level in parameter LSF_LOG_MASK in lsf.conf. 1 LOG_DEBUG1 level for extended logging.
badmin Default: local host (host from which command was submitted) sbdtime [-l timing_level] [-f logfile_name] [-o] [host_name ...] Sets the timing level for sbatchd to include additional timing information in log files. You must be root or the LSF administrator to use this command. In MultiCluster, timing levels can only be set for hosts within the same cluster. For example, you could not set debug or timing levels from a host in clusterA for a host in clusterB.
badmin host_name ... Sets the timing level on the specified host or hosts. Lists of hosts must be separated by spaces and enclosed in quotation marks. Default: local host (host from which command was submitted) schddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] Sets message log level for mbschd to include additional information in log files. You must be root or the LSF administrator to use this command. See sbddebug for an explanation of options.
bbot bbot moves a pending job relative to the last job in the queue SYNOPSIS bbot job_ID | "job_ID [index_list ]" [position] bbot [-h | -V] DESCRIPTION Changes the queue position of a pending job, or a pending job array element, to affect the order in which jobs are considered for dispatch. By default, LSF dispatches jobs in a queue in the order of arrival (that is, first-come-first-served), subject to availability of suitable server hosts.
bbot position Optional. The position argument can be specified to indicate where in the queue the job is to be placed. position is a positive number that indicates the target position of the job from the end of the queue. The positions are relative to only the applicable jobs in the queue, depending on whether the invoker is a regular user or the LSF administrator. The default value of 1 means the position is after all other jobs with the same priority. -h Prints command usage to stderr and exits.
bchkpnt bchkpnt checkpoints one or more checkpointable jobs SYNOPSIS bchkpnt [-f] [-k] [-p minutes | -p 0] [job_ID | "job_ID [index_list ]"] ... bchkpnt [-f] [-k] [-p minutes | -p 0] [-J job_name] [-m host_name | -m host_group] [-q queue_name] [-u "user_name " | -u all] [0] bchkpnt [-h | -V] DESCRIPTION Checkpoints your running (RUN) or suspended (SSUSP, USUSP, and PSUSP) checkpointable jobs. LSF administrators and root can checkpoint jobs submitted by other users.
bchkpnt -m host_name | -m host_group Only checkpoints jobs dispatched to the specified hosts. -q queue_name Only checkpoints jobs dispatched from the specified queue. -u "user_name" | -u all Only checkpoints jobs submitted by the specified users. The keyword all specifies all users. Ignored if a job ID other than 0 (zero) is specified. job_ID | "job_ID[index_list]" Checkpoints only the specified jobs. -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits.
bclusters bclusters displays status of MultiCluster connections SYNOPSIS bclusters [-h | -V] DESCRIPTION Displays a list of MultiCluster queues together with their relationship with queues in remote clusters. OPTIONS -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits. OUTPUT Job Forwarding Model Information related to the job forwarding model is displayed under the heading Remote Batch Information.
bclusters ok The two clusters can exchange information and the system is properly configured. disc Communication between the two clusters has not been established. This could occur because there are no jobs waiting to be dispatched, or because the remote master cannot be located. Resource Leasing Model Information related to the resource leasing model is displayed under the heading Resource Lease Information. REMOTE_CLUSTER For borrowed resources, name of the remote cluster that is the provider.
bgadd bgadd creates job groups SYNOPSIS bgadd job_group_name bgadd [-h | -V] DESCRIPTION Creates a job group with the job group name specified by job_group_name. You must provide full group path name for the new job group. The last component of the path is the name of the new group to be created. You do not need to create the parent job group before you create a sub-group under it. If no groups in the job group hierarchy exist, all groups are created with the specified hierarchy.
bgdel bgdel deletes job groups SYNOPSIS bgdel job_group_name ... bgdel [-h | -V] DESCRIPTION Deletes a job group with the job group name specified by job_group_name and all its subgroups. You must provide full group path name for the job group to be deleted. The job group cannot contain any jobs. OPTIONS job_group_name Full path of the job group name. -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits.
bhist bhist displays historical information about jobs SYNOPSIS bhist [-a | -d | -p | -r | -s] [-b | -w] [-l] [-t] [-C start_time ,end_time] [-D start_time ,end_time] [-S start_time ,end_time] [-T start_time ,end_time] [-f logfile_name | -n number_logfiles | -n 0] [-J job_name] [-Lp ls_project_name] [-m host_name] [-N host_name | -N host_model | -N CPU_factor] [-P project_name] [-q queue_name] [-u user_name | -u all] bhist [-J job_name] [-N host_name | -N host_model | -N CPU_factor] [job_ID ...
bhist -p Only displays information about pending jobs. -r Only displays information about running jobs. -s Only displays information about suspended jobs. -t Displays job events chronologically. -w Wide format. Displays the information in a wide format. -C start_time,end_time Only displays jobs that completed or exited during the specified time interval. Specify the span of time for which you want to display the history.
bhist -T start_time,end_time Used together with -t. Only displays information about job events within the specified time interval. Specify the span of time for which you want to display the history. If you do not specify a start time, the start time is assumed to be the time of the first occurrence. If you do not specify an end time, the end time is assumed to be now. If you do not specify an end time, the end time is assumed to be now. Specify the times in the format "yyyy/mm/dd/HH:MM".
bhist -q queue_name Only displays information about jobs submitted to the specified queue. -u user_name | -u all Displays information about jobs submitted by the specified user, or by all users if the keyword all is specified. job_ID | "job_ID[index]" Searches all event log files and only displays information about the specified jobs. If you specify a job array, displays all elements chronologically. This option overrides all other options except -J, -N, -h, and -V.
bhist Command The job command. Detailed history includes job group modification, the date and time the job was forwarded and the name of the cluster to which the job was forwarded. FILES Reads lsb.events. SEE ALSO lsb.events(5), bgadd(1), bgdel(1), bjgroup(1), bsub(1), bjobs(1), lsinfo(1) TIME INTERVAL FORMAT You use the time interval to define a start and end time for collecting the data to be retrieved and displayed.
bhist ABSOLUTE TIME EXAMPLES Assume the current time is May 9 17:06 2006: 1,8 = May 1 00:00 2006 to May 8 23:59 2006 ,4 = the time of the first occurrence to May 4 23:59 2006 6 = May 6 00:00 2006 to May 6 23:59 2006 2/ = Feb 1 00:00 2006 to Feb 28 23:59 2006 /12: = May 9 12:00 2006 to May 9 12:59 2006 2/1 = Feb 1 00:00 2006 to Feb 1 23:59 2006 2/1, = Feb 1 00:00 to the current time ,.
bhosts bhosts displays hosts and their static and dynamic resources SYNOPSIS bhosts [-e | -l | -w] [-x] [-X] [-R "res_req "] [host_name | host_group] ... bhosts [-e | -l | -w] [-X] [-R "res_req "] [cluster_name] bhosts [-e ] -s [shared_resource_name ...] bhosts [-h | -V] DESCRIPTION By default, returns the following information about all hosts: host name, host status, job state statistics, and job slot limits. bhosts displays output for condensed host groups.
bhosts ◆ num_ok, num_unavail, num_unreach, and num_busy are the number of hosts that are ok, unavail, unreach, and busy, respectively.
bhosts OUTPUT Host-Based Default Displays the following fields: HOST_NAME The name of the host. If a host has batch jobs running and the host is removed from the configuration, the host name will be displayed as lost_and_found. For condensed host groups, this is the name of host group. STATUS With MultiCluster, not shown for fully exported hosts. The current status of the host and the sbatchd daemon. Batch jobs can only be dispatched to hosts with an ok status.
bhosts MAX The maximum number of job slots available. If a dash (-) is displayed, there is no limit. For condensed host groups, this is the total maximum number of job slots available in all hosts in the host group. These job slots are used by running jobs, as well as by suspended or pending jobs that have slots reserved for them. If preemptive scheduling is used, suspended jobs are not counted (see the description of PREEMPTIVE in lsb.queues(5) and MXJ in lsb.hosts(5)).
bhosts loadSched, loadStop The scheduling and suspending thresholds for the host. If a threshold is not defined, the threshold from the queue definition applies. If both the host and the queue define a threshold for a load index, the most restrictive threshold is used. The migration threshold is the time that a job dispatched to this host can remain suspended by the system before LSF attempts to migrate the job to another host.
bhosts close, jobs are not suspended. Jobs already running continue to run, but no new jobs are started until the windows reopen. The default for the dispatch window is no restriction or always open (that is, twenty-four hours a day and seven days a week). For the dispatch window specification, see the description for the DISPATCH_WINDOWS keyword under the -l option in bqueues(1). CURRENT LOAD Displays the total and reserved host load.
bhosts RESOURCE The name of the resource. TOTAL The value of the shared resource used for scheduling. This is the sum of the current and the reserved load for the shared resource. RESERVED The amount reserved by jobs. You specify the reserved resource using bsub -R (see lsfintro(1)). LOCATION The hosts that are associated with the shared resource. FILES Reads lsb.hosts. SEE ALSO lsb.
bhpart bhpart displays information about host partitions SYNOPSIS bhpart [-r] [host_partition_name ...] bhpart [-h | -V] DESCRIPTION By default, displays information about all host partitions. Host partitions are used to configure host-partition fairshare scheduling. OPTIONS -r Displays the entire information tree associated with the host partition recursively. host_partition_name ... Displays information about the specified host partitions only. -h Prints command usage to stderr and exits.
bhpart In general, users or user groups with larger SHARES, fewer STARTED and RESERVED, and a lower CPU_TIME and RUN_TIME will have higher PRIORITY. STARTED Number of job slots used by running or suspended jobs owned by users or user groups in the host partition. RESERVED Number of job slots reserved by the jobs owned by users or user groups in the host partition. CPU_TIME Cumulative CPU time used by jobs of users or user groups executed in the host partition. Measured in seconds, to one decimal place.
bjgroup bjgroup displays information about job groups SYNOPSIS bjgroup [-s] bjgroup [-h | -V] DESCRIPTION Displays all job groups. OPTIONS -s Sorts job groups by hierarchy.
bjgroup RUN The number of job slots used by running jobs in the specified job group. SSUSP The number of job slots used by the system-suspended jobs in the specified job group. USUSP The number of job slots used by user-suspended jobs in the specified job group. FINISH The number of jobs in the specified job group in EXITED or DONE state.
bjobs bjobs displays information about LSF jobs SYNOPSIS bjobs [-a] [-A] [-w | -l] [-X] [-g job_group_name |-sla service_class_name] [-J job_name] [-Lp ls_project_name] [-m host_name | -m host_group | -m cluster_name] [-N host_name | -N host_model | -N CPU_factor] [-P project_name] [-q queue_name] [-u user_name | -u user_group | -u all] [-x] job_ID | "job_ID [index_list ]" ...
bjobs -d Displays information about jobs that finished recently, within an interval specified by CLEAN_PERIOD in lsb.params (the default period is 1 hour). -l Long format. Displays detailed information for each job in a multiline format.
bjobs The suspending reason may not remain the same while the job stays suspended. For example, a job may have been suspended due to the paging rate, but after the paging rate dropped another load index could prevent the job from being resumed. The suspending reason will be updated according to the load index. The reasons could be as old as the time interval specified by SBD_SLEEP_TIME in lsb.params. So the reasons shown may not reflect the current load situation. -w Wide format.
bjobs With MultiCluster, displays jobs in the specified cluster. If a remote cluster name is specified, you will see the remote job ID, even if the execution host belongs to the local cluster. To determine the available clusters, use bclusters. -N host_name | -N host_model | -N CPU_factor Displays the normalized CPU time consumed by the job. Normalizes using the CPU factor specified, or the CPU factor of the host or host model specified.
bjobs submitted but this order can be changed by using the commands btop or bbot. If more than one job is dispatched to a host, the jobs on that host are listed in the order in which they will be considered for scheduling on this host by their queue priorities and dispatch times. Finished jobs are displayed in the order in which they were completed. Default Display A listing of jobs is displayed with the following fields: JOBID The job ID that LSF assigned to the job. USER The user who submitted the job.
bjobs SUBMIT_TIME The submission time of the job. -l output The -l option displays a long format listing with the following additional fields: Project The project the job was submitted from. Command The job command. CWD The current working directory on the submission host. PENDING REASONS The reason the job is in the PEND or PSUSP state. The names of the hosts associated with each reason will be displayed when both -p and -l options are specified.
bjobs EXIT The job has terminated with a non-zero status – it may have been aborted due to an error in its execution, or killed by its owner or the LSF administrator. For example, exit code 131 means that the job exceeded a configured resource usage limit and LSF killed the job. UNKWN mbatchd has lost contact with the sbatchd on the host on which the job runs. WAIT For jobs submitted to a chunk job queue, members of a chunk job that are waiting to run.
bjobs PIDs Currently active processes in a job. RESOURCE LIMITS The hard resource usage limits that are imposed on the jobs in the queue (see getrlimit(2) and lsb.queues(5)). These limits are imposed on a per-job and a per-process basis.
bjobs ARRAY_SPEC Array specification in the format of name[index]. The array specification may be truncated, use -w option together with -A to show the full array specification. OWNER Owner of the job array. NJOBS Number of jobs in the job array. PEND Number of pending jobs of the job array. RUN Number of running jobs of the job array. DONE Number of successfully completed jobs of the job array. EXIT Number of unsuccessfully completed jobs of the job array.
bjobs SEE ALSO bsub(1), bkill(1), bhosts(1), bmgroup(1), bclusters(1), bqueues(1), bhist(1), bresume(1), bsla(1), bstop(1), lsb.params(5), lsb.
bkill bkill sends signals to kill, suspend, or resume unfinished jobs SYNOPSIS bkill [-l] [-g job_group_name |-sla service_class_name] [-J job_name] [-m host_name | -m host_group] [-q queue_name] [-r | -s (signal_value | signal_name )] [-u user_name | -u user_group | -u all] [job_ID ... | 0 | "job_ID [index ]" ...] bkill [-l] [-b] [-g job_group_name |-sla service_class_name] [-J job_name] [-m host_name | -m host_group] [-q queue_name] [-u user_name | -u user_group | -u all] [job_ID ...
bkill Using bkill on a repetitive job kills the current run, if the job has been started, and requeues the job. See bcadd(1) and bsub(1) for information on setting up a job to run repetitively. If the job cannot be killed, use bkill -r to remove the job from the LSF system without waiting for the job to terminate, and free the resources of the job. OPTIONS 0 Kills all the jobs that satisfy other options (-g, -m, -q, -u, and -J). -b Kills large numbers of jobs as soon as possible.
bkill % bsub -g /risk_group myjob Job <115> is submitted to default queue . % bsub -g /risk_group/consolidate myjob2 Job <116> is submitted to default queue . The following bkill command only kills jobs in /risk_group, not the subgroup /risk_group/consolidate: % bkill -g /risk_group 0 Job <115> is being terminated % bkill -g /risk_group/consolidate 0 Job <116> is being terminated -J job_name Operates only on jobs with the specified job_name.
bkill You cannot use -g with -sla. A job can either be attached to a job group or a service class, but not both. The -sla option is ignored if a job ID other than 0 is specified in the job_ID option. Use bsla to display the properties of service classes configured in LSB_CONFDIR/cluster_name/configdir/lsb.serviceclasses (see lsb.serviceclasses(5)) and dynamic information about the state of each configured service class.
bkill SEE ALSO bsub(1), bjobs(1), bqueues(1), bhosts(1), bresume(1), bsla(1), bstop(1), bgadd(1), bgdel(1), bjgroup(1), bparams(5), lsb.
bladmin bladmin reconfigures the Platform LSF License Scheduler daemon (bld). SYNOPSIS bladmin reconfig | shutdown bladmin [-h | -V] DESCRIPTION Use this command to reconfigure the License Scheduler daemon (bld). You must be a License Scheduler administrator to use this command. OPTIONS reconfig [host_name ... | all] Reconfigures License Scheduler. shutdown [host_name ... | all] Shuts down License Scheduler. -h Prints command usage to stderr and exits. -V Prints release version to stderr and exits.
blcollect blcollect license information collection daemon SYNOPSIS blcollect [-c collector_name ] [-m host_name ...] [-p license_scheduler_port] blcollect [-h | -V | -i lmstat_interval | -D lmstat_path] DESCRIPTION Periodically collects license usage information from Macrovision® FLEXnet™. It queries FLEXnet for license usage information from the FLEXnet lmstat command, and passes the information to the License Scheduler daemon (bld).
blcollect -V Prints release version to stderr and exits. SEE ALSO lsf.
blhosts blhosts displays the names of all the hosts running the License Scheduler daemon (bld). SYNOPSIS blhosts [-h | -V] DESCRIPTION Displays a list of hosts running the License Scheduler daemon. This includes the License Scheduler master host and all the candidate License Scheduler hosts running bld. OPTIONS -h Prints command usage to stderr and exits. -V Prints release version to stderr and exits. OUTPUT Prints out the names of all the hosts running the License Scheduler daemon (bld).
blimits blimits displays information about resource allocation limits of running jobs SYNOPSIS blimits [-n limit_name ...] [-m host_name | -m host_group | -m cluster_name ...] [-P project_name ...] [-q queue_name ...] [-u user_name | -u user_group ...] blimits -c blimits -h | -V DESCRIPTION Displays current usage of resource allocation limits configured in Limit sections in lsb.
blimits OPTIONS -c Displays all resource configurations in lsb.resources. This is the same as bresources with no options. -n limit_name ... Displays resource allocation limits the specified named Limit sections. If a list of limit sections is specified, Limit section names must be separated by spaces and enclosed in quotation marks (") or (’). -m host_name | -m host_group | -m cluster_name ... Displays resource allocation limits for the specified hosts. Do not use quotes when specifying multiple hosts.
blimits -V Prints LSF release version to stderr and exits. OUTPUT Configured limits and resource usage for builtin resources (slots, mem, tmp, and swp load indices) are displayed as INTERNAL RESOURCE LIMITS separately from custom external resources, which are shown as EXTERNAL RESOURCE LIMITS. Resource Consumers blimits displays the following fields for resource consumers: NAME The name of the limit policy as specified by the Limit section NAME parameter.
blimits SLOTS Number of slots currently used and maximum number of slots configured for the limit policy, as specified by the Limit section SLOTS parameter. MEM Amount of memory currently used and maximum configured for the limit policy, as specified by the Limit section MEM parameter. TMP Amount of tmp space currently used and maximum amount of tmp space configured for the limit policy, as specified by the Limit section TMP parameter.
blinfo blinfo displays static License Scheduler configuration information. SYNOPSIS blinfo [ -a | -Lp | -p | -D | -G ] blinfo [ -h | -V ] DESCRIPTION Displays different license configuration information, depending on the option selected. By default, displays information about the distribution of licenses managed by License Scheduler. OPTIONS -a Prints out all information, including information about non-shared licenses (NON_SHARED_DISTRIBUTION) and workload distribution (WORKLOAD_DISTRIBUTION).
blinfo FEATURE The license name. This becomes the license token name. SERVICE_DOMAIN The name of the service domain that provided the license. TOTAL The total number of licenses managed by FLEXnet. This number comes from FLEXnet. DISTRIBUTION The distribution of the licenses among license projects in the format [project_name, percentage[/number_licenses_owned]]. This determines how many licenses a project is entitled to use when there is competition for licenses.
blinfo Hierarchical Output (-G) The following fields describe the values of their corresponding configuration fields in the ProjectGroup Section of lsf.licensecheduler. GROUP The project names in the hierarchical grouping and its relationships. Each entry specifies the name of the hierarchical group and its members. The entry is enclosed in parentheses as shown: (group (member ...)) SHARES The shares assigned to the hierarchical group member projects.
blinfo % blinfo -a FEATURE g1 SERVICE_DOMAIN LS TOTAL 0 g2 g33 LS WS 0 0 ◆ % blinfo -a FEATURE g1 DISTRIBUTION [p1, 50.0%] [p2, 50.0%] WORKLOAD_DISTRIBUTION [LSF 66.7%, NON_LSF 33.3%] [p1, 50.0%] [p2, 50.0%] [p1, 50.0%] [p2, 50.0%] blinfo -a does not display WORKLOAD_DISTRIBUTION, if the WORKLOAD_DISTRIBUTION is not defined: SERVICE_DOMAIN LS TOTAL 3 DISTRIBUTION [p1, 50.0%] [p2, 50.0% / 2] NON_SHARED_DISTRIBUTION [p2, 2] FILES Reads lsf.
blkill blkill terminates an interactive License Scheduler task SYNOPSIS blkill [-t seconds] task_ID blkill [-h | -V] DESCRIPTION Terminates a running or waiting interactive task in License Scheduler. Users can kill their own tasks. You must be a License Scheduler administrator to terminate another user’s task. By default, blkill notifies the user and waits 30 seconds before killing the task. OPTIONS task_ID Task ID of the task you want to kill.
blstat blstat displays dynamic license information SYNOPSIS blstat [-G] [-s] [-S] [-D service_domain_name |"service_domain_name ..."] [-Lp ls_project_name |"ls_project_name ..."] [-t token_name |"token_name ..."] blstat [ -h | -V] DESCRIPTION Displays license usage statistics. By default, shows information about all licenses and all clusters. OPTIONS -D service_domain_name | "service_domain_name ..." Only shows information about specified service domains.
blstat OUTPUT Information is organized first by license feature, then by service domain. For each combination of license and service domain, License Scheduler displays a line of summary information followed by rows of license project information (one row for each license project configured to use the license). In each group of statistics, numbers and percentages refer only to licenses of the specified license feature that can be checked out from FLEXnet license server hosts in the specified service domain.
blstat NON_LSF_DESERVE The total number of licenses assigned to projects in the non-LSF workload. NON_LSF_FREE The total number of free licenses available to projects in the non-LSF workload. Project output For each project that is configured to use the license, blstat displays the following information. PROJECT The License Scheduler project name. SHARE The percentage of licenses assigned to the license project by the License Scheduler administrator.
blstat PROJECT/GROUP The members of the hierarchical group, listed by its group or project name.
bltasks bltasks displays task information SYNOPSIS bltasks [-l] [task_ID] bltasks [-l] [-p | -r | -w] [-Lp “ls_project_name...”] [-m “host_name...”] [-t “terminal_name...”] [-u “user_name...”] bltasks [-h | -V] DESCRIPTION Displays current information about interactive tasks managed by License Scheduler (submitted using taskman). By default, displays information about all tasks. OPTIONS task_ID Only displays information about the specified task. -l Long format.
bltasks -V Prints License Scheduler release version to stderr and exits. OUTPUT Default Output Displays the short format with the following information: TID Task ID that License Scheduler assigned to the task. USER The user who submitted the task. STAT The current status of the task. - RUN: Task is running. - WAIT: Task has not yet started. - PREEMPT: Task has been preempted and currently has no license token. HOST The name of host from which the task was submitted.
bltasks Keyboard idle since Time at which the task became idle. RES_REQ The resource requirement of the task. Command line The command the License Scheduler task manager is executing.
blusers blusers displays license usage information SYNOPSIS blusers [-J | -l | -P -j job_ID -u user_name -m host_name | -P -c cluster_name -j job_ID -u user_name -m host_name | -h | -V] DESCRIPTION By default, displays summarized information about usage of licenses. OPTIONS -J Displays detailed license usage information about each job. -l Long format. Displays additional license usage information. See “OUTPUT” for a description of information that is displayed.
blusers HOST The name of the host where jobs have started. NLICS The number of licenses checked out from FLEXnet. NTASKS The number of running tasks using these licenses. -J Output Displays the following summary information for each job: JOBID The job ID assigned by LSF. USER The name of the user who submitted the job. HOST The name of the host where the job has been started. PROJECT The name of the license project that the job is associated with.
blusers EXAMPLES % blusers -l FEATURE SERVICE_DOMAIN feat1 LanServer % blusers -J JOBID USER 553 user1 RESOURCE p1_f1 HOST hostA RUSAGE 1 USER user1 HOST hostA NLICS 1 PROJECT p3 SERVICE_DOMAIN app_1 NTASKS OTHERS DISPLAYS PIDS 1 0 (/dev/tty) (16326) CLUSTER cluster1 START_TIME Oct 5 15:47:14 SEE ALSO blhosts(1), blinfo(1), blstat(1) Platform LSF Reference 97
bmgroup bmgroup displays information about host groups SYNOPSIS bmgroup [-r] [-l] [-w] [host_group ...] bmgroup [-h | -V] DESCRIPTION Displays host groups and host names for each group. By default, displays information about all host groups. A host partition is also considered a host group. OPTIONS -r Expands host groups recursively. The expanded list contains only host names; it does not contain the names of subgroups. Duplicate names are listed only once.
bmig bmig migrates checkpointable or rerunnable jobs SYNOPSIS bmig [-f] [job_ID | "job_ID [index_list ]"] ... bmig [-f] [-J job_name] [-m "host_name ..." | -m "host_group ..."] [-u user_name | u user_group | -u all] [0] bmig [-h | -V] DESCRIPTION Migrates one or more of your checkpointable and rerunnable jobs. LSF administrators and root can migrate jobs submitted by other users.
bmig OPTIONS -f Forces a checkpointable job to be checkpointed even if non-checkpointable conditions exist (these conditions are OS-specific). job_ID | "job_ID[index_list]" | 0 Specifies the job ID of the jobs to be migrated. The -J and -u options are ignored. If you specify a job ID of 0 (zero), all other job IDs are ignored, and all jobs that satisfy the -J and -u options are migrated. If you do not specify a job ID, the most recently submitted job that satisfies the -J and -u options is migrated.
bmod bmod modifies job submission options of a job SYNOPSIS bmod [bsub_options] [job_ID | "job_ID [index ]"] bmod -g job_gr oup_name | -gn [job_ID] bmod [-sla service_class_name | -slan] [job_ID] bmod [-h | -V] OPTION LIST [-B | -Bn] [-N | -Nn] [-r | -rn ] [-x | -xn] [-a esub_parameters | -an] [-b begin_time | -bn] [-C core_limit | -Cn] [-c [hour :]minute[/host_name | /host_model] | -cn] [-D data_limit | -Dn] [-e err_file | -en] [-E "pre_exec_command [argument ...
bmod [-t term_time | -tn] [-U r eser vation_ID | -Un] [-u mail_user | -un] [-w 'dependency_expression ' | -wn] [-wa '[signal | command | CHKPNT]' | -wan] [-wt 'job_war ning_time ' | -wtn] [-W run_limit [/host_name | /host_model] | -Wn] [-Z "new_command " | -Zs "new_command " | -Zsn] [job_ID | "job_ID [index ]"] DESCRIPTION Modifies the options of a previously submitted job. See bsub(1) for complete descriptions of job submission options you can modify with bmod.
bmod - Overwrite standard output (stdout) file name (-oo out put_file) - Overwrite standard error (stderr) file name (-eo er r or_file) Modified resource usage limits cannot exceed limits defined in the queue. To modify the CPU limit or the memory limit of running jobs, the parameters LSB_JOB_CPULIMIT=Y and LSB_JOB_MEMLIMIT=Y must be defined in lsf.conf. If you want to specify array dependency by array name, set JOB_DEP_LAST_SUB in lsb.params.
bmod At 15:15, while job 122 in user1#17 is running, you modify its termination time as follows: % bmod -t 15:40 122 This termination time overlaps the reservation window of user1#18, in which job 245 started at 15:35. After modifying the termination time of job 122, the following events occur: At this time ... These events occur ...
bmod You cannot: Use -sla with other bmod options ❖ Move job array elements from one service class to another, only entire job arrays ❖ Modify the service class of job already attached to a job group Use bsla to display the properties of service classes configured in LSB_CONFDIR/cluster_name/configdir/lsb.serviceclasses (see lsb.serviceclasses(5))and dynamic information about the state of each configured service class. ❖ OPTIONS job_ID | "job_ID[index]" Modifies jobs with the specified job ID.
bparams bparams displays information about configurable system parameters in lsb.params SYNOPSIS bparams [-l] bparams [-h | -V] DESCRIPTION By default, displays only the interesting parameters. OPTIONS -l Long format. Displays detailed information about all the configurable parameters in lsb.params. -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits. SEE ALSO lsb.
bpeek bpeek displays the stdout and stderr output of an unfinished job SYNOPSIS bpeek [-f] [-q queue_name | -m host_name | -J job_name | job_ID | "job_ID [index_list ]"] bpeek [-h | -V] DESCRIPTION Displays the standard output and standard error output that have been produced by one of your unfinished jobs, up to the time that this command is invoked. By default, displays the output using the command cat. This command is useful for monitoring the progress of a job and identifying errors.
bpost bpost sends external status messages and attaches data files to a job SYNOPSIS bpost [-i message_index] [-d "description "] [-a data_file] job_ID | "job_ID [index ]" | -J job_name bpost [-h | -V] DESCRIPTION Provides external status information or sends data to a job in the system. Done or exited jobs cannot accept messages. By default, operates on the message index 0. By default, posts the message "no description". If a you specify a job ID: You can only send messages and data to your own jobs.
bpost OPTIONS -i message_index Operates on the specified message index. Default: 0 Use the MAX_JOB_MSG_NUM parameter in lsb.params to set a maximum number of messages for a job. With MultiCluster, to avoid conflicts, MAX_JOB_MSG_NUM should be the same in all clusters. -d "description" Places your own status text as a message to the job. The message description has a maximum length of 512 characters.
bqueues bqueues displays information about queues SYNOPSIS bqueues [-w | -l | -r] [-m host_name | -m host_group | -m cluster_name | -m all] [-u user_name | -u user_group | -u all] [queue_name ...] bqueues [-h | -V] DESCRIPTION Displays information about queues. By default, returns the following information about all queues: queue name, queue priority, queue status, job slot statistics, and job state statistics. In MultiCluster, returns the information about all queues in the local cluster.
bqueues -u user_name | -u user_group | -u all Displays the queues that can accept jobs from the specified user. If the keyword all is specified, displays the queues that can accept jobs from all users. If a user group is specified, displays the queues that include that group in their configuration. For a list of user groups see bugroup(1)). queue_name ... Displays information about the specified queues. -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits.
bqueues At any moment, each queue is either Open or Closed, and is either Active or Inactive. The queue can be opened, closed, inactivated and re-activated by the LSF administrator using badmin (see badmin(8)). Jobs submitted to a queue that is later closed are still dispatched as long as the queue is active. The queue can also become inactive when either its dispatch window is closed or its run window is closed (see DISPATCH_WINDOWS in the “Output for the -l Option” section).
bqueues RUN The number of job slots used by running jobs in the queue. SUSP The number of job slots used by suspended jobs in the queue. Long Output (-l) In addition to the above fields, the -l option displays the following: Description A description of the typical use of the queue. Default queue indication Indicates that this is the default queue. PARAMETERS/STATISTICS NICE The nice value at which jobs in the queue will be run. This is the UNIX nice value for reducing the process priority (see nice(1)).
bqueues Interval for a host to accept two jobs The length of time in seconds to wait after dispatching a job to a host before dispatching a second job to the same host. If the job accept interval is zero, a host may accept more than one job in each dispatching interval. See the JOB_ACCEPT_INTERVAL parameter in lsb.queues and lsb.params. RESOURCE LIMITS The hard resource usage limits that are imposed on the jobs in the queue (see getrlimit(2) and lsb.queues(5)).
bqueues THREADLIMIT The maximum number of concurrent threads allocated to a job. If THREADLIMIT is reached, the system sends the following signals in sequence to all processes belonging to the job: SIGINT, SIGTERM, and SIGKILL. The possible UNIX per-process resource limits are: RUNLIMIT The maximum wall clock time a process can use, in minutes. RUNLIMIT is scaled by the CPU factor of the execution host.
bqueues r1m The 1-minute exponentially averaged effective CPU run queue length. r15m The 15-minute exponentially averaged effective CPU run queue length. ut The CPU utilization exponentially averaged over the last minute, expressed as a percentage between 0 and 1. pg The memory paging rate exponentially averaged over the last minute, in pages per second. io The disk I/O rate exponentially averaged over the last minute, in kilobytes per second. ls The number of current login users.
bqueues overrun Configured threshold in minutes for overrun jobs, and the number of jobs in the queue that have triggered an overrun job exception by running longer than the overrun threshold underrun Configured threshold in minutes for underrun jobs, and the number of jobs in the queue that have triggered an underrun job exception by finishing sooner than the underrun threshold idle Configured threshold (CPU time/runtime) for idle jobs, and the number of jobs in the queue that have triggered an overrun
bqueues FAIRSHARE_QUEUES Lists queues participating in cross-queue fairshare. The first queue listed is the master queue—the queue in which fairshare is configured; all other queues listed inherit the fairshare policy from the master queue. Fairshare information applies to all the jobs running in all the queues in the master-slave set. DISPATCH_ORDER DISPATCH_ORDER=QUEUE is set in the master queue.
bqueues batch jobs. Interactive jobs scheduled by LIM are controlled by another set of dispatch windows (see lshosts(1)). Similar dispatch windows may be configured for individual hosts (see bhosts(1)). A window is displayed in the format begin_time–end_time. Time is specified in the format [day:]hour[:minute], where all fields are numbers in their respective legal ranges: 0(Sunday)-6 for day, 0-23 for hour, and 0-59 for minute. The default value for minute is 0 (on the hour).
bqueues POST_EXEC The queue’s post-execution command. The post-execution command is run on the execution host when a job terminates. See lsb.queues(5) for more information. REQUEUE_EXIT_VALUES Jobs that exit with these values are automatically requeued. See lsb.queues(5) for more information. RES_REQ Resource requirements of the queue. Only the hosts that satisfy these resource requirements can be used by the queue.
bqueues PREEMPTABLE The queue is preemptable. Running jobs in a preemptable queue may be preempted by jobs in higher-priority queues, even if the higher-priority queues are not specified as preemptive. RERUNNABLE If the RERUNNABLE field displays yes, jobs in the queue are rerunnable. That is, jobs in the queue are automatically restarted or rerun if the execution host becomes unavailable. However, a job in the queue will not be restarted if the you have removed the rerunnable option from the job. See lsb.
bqueues Recursive Share Tree Output (-r) In addition to the fields displayed for the -l option, the -r option displays the following: SCHEDULING POLICIES FAIRSHARE The -r option causes bqueues to recursively display the entire share information tree associated with the queue. SEE ALSO bugroup(1), nice(1), getrlimit(2), lsb.
bread bread reads messages and attached data files from a job SYNOPSIS bread [-i message_index] [-a file_name] job_ID | "job_ID [index ]" | -J job_name bread [-h | -V] DESCRIPTION Reads messages and data posted to an unfinished job with bpost. By default, displays the message description text of the job. By default, operates on the message with index 0. You can read messages and data from a job until it is cleaned from the system. You cannot read messages and data from done or exited jobs.
bread If you do not specify a message index, copies the attachment of message index 0 to the file. The job must have an attachment, and you must specify a name for the file you are copying the attachment to. If the file already exists, -a overwrites it with the new file. By default, -a gets the attachment file from the directory specified by the JOB_ATTA_DIR parameter. If JOB_ATTA_DIR is not specified, job message attachments are saved in LSB_SHAREDIR/info/.
brequeue brequeue Kills and requeues a job SYNOPSIS brequeue [-J job_name | -J "job_name [index_list ]"] [ -u user_name | -u all ] [ job_ID | "job_ID [index_list ]"] [-d] [-e] [-r] [-a] [-H] brequeue [-h | -V] DESCRIPTION You can only use brequeue on a job you own, unless you are root or the LSF administrator. Kills a running (RUN), user-suspended (USUSP), or system-suspended (SSUSP) job and returns it to the queue.
brequeue -e Requeues jobs that have terminated abnormally with EXIT job status. -r Requeues jobs that are running. -a Requeues all jobs including running jobs, suspending jobs, and jobs with EXIT or DONE status. -H Requeues jobs to PSUSP job status. -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits. LIMITATIONS brequeue cannot be used on interactive batch jobs; brequeue only kills interactive batch jobs, it does not restart them.
bresources bresources displays information about resource reservation and resource limits configuration. SYNOPSIS bresources [-s] [resource_name ...] bresources [-h | -V] DESCRIPTION By default, bresources displays all resource configurations in lsb.resources. This is the same as blimits -c. OPTIONS -s Displays per-resource reservation configurations from the ReservationUsage section of lsb.resources.
brestart brestart restarts checkpointed jobs SYNOPSIS brestart [bsub_options] [-f] checkpoint_dir [job_ID | "job_ID [index ]"] brestart [-h | -V] OPTION LIST -B -f -N -x -b begin_time -C core_limit -c [hour :]minute[/host_name | /host_model] -D data_limit -E "pre_exec_command [argument ...]" -F file_limit -m "host_name[+[pref_level]] | host_group[+[pref_level]] ..." -G user_group -M mem_limit -q "queue_name ...
brestart Like bsub, brestart also calls mesub and any existing esub executables. brestart cannot make changes to the job environment through esub. Environment changes only occur when esub is called by the original job submission with bsub. OPTIONS Only the bsub options listed in the option list above can be used for brestart. Except for the following option, see bsub(1) for a description of brestart options.
bresume bresume resumes one or more suspended jobs SYNOPSIS bresume [-g job_group_name] [-J job_name] [-m host_name ] [-q queue_name] [-u user_name | -u user_group | -u all ] [0] bresume [job_ID | "job_ID [index_list ]"] ... bresume [-h | -V] DESCRIPTION Sends the SIGCONT signal to resume one or more of your suspended jobs. Only root and LSF administrators can operate on jobs submitted by other users. You cannot resume a job that is not suspended.
bresume -u user_name | -u user_group | -u all Resumes only jobs owned by the specified user or group, or all users if the reserved user name all is specified. job_ID ... | "job_ID[index_list]" ... Resumes only the specified jobs. Jobs submitted by any user can be specified here without using the -u option. -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits. EXAMPLES % bresume -q night 0 Resumes all of the user’s suspended jobs that are in the night queue.
brlainfo brlainfo displays host topology information SYNOPSIS brlainfo [-l] [host_name ...] brlainfo [-h | -V] DESCRIPTION brlainfo contacts the Platform LSF HPC topology adapter (RLA) on the specified host and presents topology information to the user. By default, displays information about all hosts running RLA. OPTIONS -l Long format. Displays additional host topology information. See “OUTPUT” for a description of information that is displayed. host_name ...
brlainfo Long output (-l) The -l option displays a long format listing with the following additional fields: FREE CPU LIST List of free CPUs in the cpuset For example: 0-2 NFREECPUS ON EACH NODE Number of free CPUs on each node For example: 2/0,1/1 STATIC CPUSETS List of static cpuset names For example: NO STATIC CPUSETS CPU_RADIUS Available CPUs with a given radius. CPU radius is determined by the processor topology of the system and is expressed in terms of the number of router hops between CPUs.
brsvadd brsvadd adds an advance reservation SYNOPSIS brsvadd [-o] -n processors | -s [-n processors] -m "host_name | host_group ..." [-R "res_req "] [-u user_name | -g group_name] -b begin_time -e end_time brsvadd [-o] -n processors | -s [-n processors] -m "host_name | host_group ...
brsvadd The time value for -b must use the same syntax as the time value for -e. It must be earlier than the time value for -e, and cannot be earlier than the current time. -e end_time End time for a one-time reservation. The end time is in the form [[[year:]month:]day:]hour:minute with the following ranges: ◆ year: any year after 1900 (YYYY) ◆ month: 1-12 (MM) ◆ day of the month: 1-31 (dd) ◆ hour: 0-23 (hh) ◆ minute: 0-59 (mm) You must specify at least hour:minute. Year, month, and day are optional.
brsvadd -R "res_req" Selects hosts for the reservation according to the specified resource requirements. Only hosts that satisfy the resource requirement expression are reserved. -R accepts any valid resource requirement string, but only the select string takes effect. If you also specify a host list with the -m option, -R is optional. For more information about resource requirements, see lsfintro(1). The size of the resource requirement string is limited to 512 bytes.
brsvadd -V Prints LSF release version and exits. EXAMPLES ◆ The following command creates a one-time advance reservation for 1024 processors on host hostA for user user1 between 6:00 a.m. and 8:00 a.m. today: % brsvadd -n 1024 -m hostA -u user1 -b 6:0 -e 8:0 Reservation "user1#0" is created The hosts specified by -m can be local to the cluster or hosts leased from remote clusters.
brsvdel brsvdel deletes an advance reservation SYNOPSIS brsvdel reservation_ID ... brsvdel [-h | -V] DESCRIPTION By default, this command can only be used by LSF administrators or root. Deletes advance reservations for the specified reservation IDs.
brsvs brsvs displays advance reservations SYNOPSIS brsvs [-l] [-p all | "host_name ..."] [-w] brsvs [-l] [-c all | "policy_name "] [-w] brsvs [-h | -V] DESCRIPTION By default, displays the current advance reservations for all hosts, users, and groups.
brsvs EXAMPLE % brsvs -c reservation1 Policy Name: reservation1 Users: ugroup1 ~user1 Hosts: hostA hostB Time Window: 8:00-13:00 SEE ALSO brsvadd(8), brsvdel(8), lsb.
brun brun forces a job to run immediately SYNOPSIS brun [-b] [-c] [-f] -m "host_name [#num_cpus ] ... " job_ID brun [-b] [-c] [-f] -m "host_name [#num_cpus ] ... " "job_ID[index_list ]" brun [-h | -V] DESCRIPTION This command can only be used by LSF administrators. Forces a pending job to run immediately on specified hosts. A job which has been forced to run is counted as a running job, this may violate the user, queue, or host job limits, and fairshare priorities.
brun on hostA are used, the job will remain pending. With -c, LSF takes into consideration that hostA has 2 slots in use and hostB is completely free, so LSF is able to dispatch the job using the 2 free slots on hostA and all 4 slots on hostB. -f Allows the job to run without being suspended due to run windows or suspending conditions. -m "host_name[#num_cpus] ... " Required. Specify one or more hosts on which to run the job.
bsla bsla displays information about service class configuration for goal-oriented service-level agreement (SLA) scheduling SYNOPSIS bsla [service_class_name] bsla [-h | -V] DESCRIPTION bsla displays the properties of service classes configured in LSB_CONFDIR/cluster_name/configdir/lsb.serviceclasses (see lsb.serviceclasses(5)) and dynamic information about the state of each configured service class. OPTIONS service_class_name The name of a service class configured in lsb.serviceclasses.
bsla STATUS Current status of the service class goal: ❖ ❖ ❖ Active:On time—the goal is active and meeting its target. Active:Delayed—the goal is active but is missing its target. Inactive—the goal is not active; its time window is closed. Jobs are scheduled as if no service class is defined. LSF does not enforce any service-level goal for an inactive SLA. THROUGHPUT For throughput goals, the configured job throughput (finished jobs per hour) for the service class.
bsla Begin ServiceClass NAME = Kyuquot PRIORITY = 23 USER_GROUP = user1 user2 GOALS = [VELOCITY 8 timeWindow (9:00-17:30)] \ [DEADLINE timeWindow (17:30-9:00)] DESCRIPTION = Daytime/Nighttime SLA End ServiceClass bsla shows the following properties and current status: % bsla Kyuquot SERVICE CLASS NAME: Kyuquot -- Daytime/Nighttime SLA PRIORITY: 23 USER_GROUP: user1 user2 GOAL: VELOCITY 8 ACTIVE WINDOW: (9:00-17:30) STATUS: Active:On time SLA THROUGHPUT: 0.
bstatus bstatus gets current external job status or sets new job status SYNOPSIS bstatus [-d "description "] job_ID | "job_ID [index ]" | -J job_name bstatus [-h | -V] DESCRIPTION Gets and displays the message description text of a job, or changes the contents of the message description text with the -d option. Always operates on the message with index 0. You can set the external status of a job until it completes. You cannot change the status of done or exited jobs.
bstatus Changes the message description text of message index 0 of job 2500 to step 2.
bstop bstop suspends unfinished jobs SYNOPSIS bstop [-a] [-g job_group_name |-sla service_class_name] [-J job_name] [-m host_name | -m host_group] [-q queue_name] [-u user_name | -u user_group | -u all] [0] [job_ID ... | "job_ID [index ]"] ... bstop [-h | -V] DESCRIPTION Suspends unfinished jobs. Sends the SIGSTOP signal to sequential jobs and the SIGTSTP signal to parallel jobs to suspend them. You must specify a job ID or -g, -J, -m, -u, or -q. You cannot suspend a job that is already suspended.
bstop -sla service_class_name Suspends jobs belonging to the specified service class. You cannot use -g with -sla. A job can either be attached to a job group or a service class, but not both. Use bsla to display the properties of service classes configured in LSB_CONFDIR/cluster_name/configdir/lsb.serviceclasses (see lsb.serviceclasses(5)) and dynamic information about the state of each configured service class.
bsub bsub submits a batch job to LSF SYNOPSIS bsub [options] command [arguments] bsub [-h | -V] OPTION LIST -B -H -I | -Ip | -Is -K -N -r -x -a esub_parameters -b [[month :]day :]hour :minute -c [hour :]minute[/host_name | /host_model] 139 -C core_limit -D data_limit -e err_file -eo err_file -ext[sched] "external_scheduler_options " -E "pre_exec_command [arguments ...]" -f "local_file operator [remote_file]" ...
bsub -T thread_limit -U reservation_ID -u mail_user -v swap_limit -w 'dependency_expression ' -W [hour :]minute[/host_name | /host_model] -wa '[signal | command | CHKPNT]' -wt '[hour :]minute ' -Zs -h -V DESCRIPTION Submits a job for batch execution and assigns it a unique numerical job ID. Runs the job on a host that satisfies all requirements of the job, when all conditions on the job, host, queue, and cluster are satisfied.
bsub To submit jobs from UNIX to display GUIs through Microsoft Terminal Services on Windows, submit the job with bsub and define the environment variables LSF_LOGON_DESKTOP=1 and LSB_TSJOB=1 on the UNIX host. Use tssub to submit a Terminal Services job from Windows hosts. See Using Platform LSF on Windows for more details. Use bmod to modify jobs submitted with bsub. bmod takes similar options to bsub. If the parameter LSB_STDOUT_DIRECT in lsf.
bsub DEFAULT_PROJECT parameter in lsb.params(5)). If DEFAULT_PROJECT is not defined, then LSF uses default as the default project name. OPTIONS -B Sends mail to you when the job is dispatched and begins execution. -H Holds the job in the PSUSP state when the job is submitted. The job will not be scheduled until you tell the system to resume the job (see bresume(1)). -I | -Ip | -Is -I Submits a batch interactive job. A new job cannot be submitted until the interactive job is completed or terminated.
bsub successfully. bsub will exit with the same exit code as the job so that job scripts can take appropriate actions based on the exit codes. bsub exits with value 126 if the job was terminated while pending. You cannot use the -K option with the -I, -Ip, or -Is options. -N Sends the job report to you by mail when the job finishes. When used without any other options, behaves the same as the default.
bsub -a esub_parameters String format parameter containing the name of an application-specific esub program to be passed to the master esub. The master esub program (LSF_SERVERDIR/mesub) handles job submission requirements of the applications. Application-specific esub programs can specify their own job submission requirements. The value of -a is set in the LSB_SUB_ADDITIONAL option in the LSB_SUB_PARM file used by esub. Use the -a option to specify which application-specific esub is invoked by mesub.
bsub The CPU limit is in the form of [hour:]minute. The minutes can be specified as a number greater than 59. For example, three and a half hours can either be specified as 3:30, or 210. The CPU time you specify is the normalized CPU time. This is done so that the job does approximately the same amount of processing for a given CPU limit, even if it is sent to host with a faster or slower CPU.
bsub -eo err_file Specify a file path. Overwrites the standard error output of the job to the specified file. If the parameter LSB_STDOUT_DIRECT in lsf.conf is set to Y or y, the standard error output of a job is written to the file you specify as the job runs, which will occur every time the job is submitted with the overwrite option, even if it is requeued manually or by the system.
bsub If the pre-exec command exits with 0 (zero), then the real job is started on the selected host. Otherwise, the job (including the pre-exec command) goes back to PEND status and is rescheduled. If your job goes back into PEND status, LSF will keep on trying to run the pre-exec command and the real job when conditions permit. For this reason, be sure that your pre-exec command can be run many times without having side effects.
bsub If the local and remote hosts have different file name spaces, you must always specify relative path names. If the local and remote hosts do not share the same file system, you must make sure that the directory containing the remote file exists. It is recommended that only the file name be given for the remote file when running in heterogeneous file systems. This places the file in the job’s current working directory.
bsub in the directory specified by the JOB_SPOOL_DIR parameter, or your $HOME/.lsbatch directory on the execution host. LSF removes this file when the job completes. By default, the input file is spooled to LSB_SHAREDIR/cluster_name/lsf_indir. If the lsf_indir directory does not exist, LSF creates it before spooling the file. LSF removes the spooled file when the job completes. Use the -is option if you need to modify or remove the input file before the job completes.
bsub When a job is checkpointed, the checkpoint information is stored in checkpoint_dir/job_ID/file_name. Multiple jobs can checkpoint into the same directory. The system can create multiple files. The checkpoint directory is used for restarting the job (see brestart(1)). Optionally, specifies a checkpoint period in minutes. Specify a positive integer. The running job is checkpointed automatically every checkpoint period. The checkpoint period can be changed using bchkpnt(1).
bsub The keyword others can be specified with or without a preference level to refer to other hosts not otherwise listed. The keyword others must be specified with at least one host name or host group, it cannot be specified by itself. For example, -m "hostA+ others" means that hostA is preferred over all other hosts. If you also use -q, the specified queue must be configured to include all the hosts in the your host list. Otherwise, the job is not submitted.
bsub See the PROCLIMIT parameter in lsb.queues(5) for more information. In a MultiCluster environment, if a queue exports jobs to remote clusters (see the SNDJOBS_TO parameter in lsb.queues(5)), then the process limit is not imposed on jobs submitted to this queue. Once at the required number of processors is available, the job is dispatched to the first host selected. The list of selected host names for the job are specified in the environment variables LSB_HOSTS and LSB_MCPU_HOSTS.
bsub If you use -oo without -e or -eo, the standard error of the job is stored in the output file. If you use -oo without -N, the job report is stored in the output file as the file header. If you use both -oo and -N, the output is stored in the output file and the job report is sent by mail. The job report itself does not contain the output, but the report will advise you where to find your output.
bsub ◆ ◆ A job spanning section (span). The job spanning section indicates if a parallel batch job should span across multiple hosts. A same resource section (same). The same section indicates that all processes of a parallel job must run on the same type of host. If no section name is given, then the entire string is treated as a selection string. The select keyword may be omitted if the selection string is the first string in the resource requirement.
bsub You are running an application version 1.5 as a resource called app_lic_v15 and the same application version 2.0.1 as a resource called app_lic_v201. The license key for version 2.0.1 is backward compatible with version 1.5, but the license key for version 1.5 will not work with 2.0.1. Job-level resource requirement specifications that use the || operator take precedence over any queue-level resource requirement specifications.
bsub -sp priority Specifies user-assigned job priority which allow users to order their jobs in a queue. Valid values for priority are any integers between 1 and MAX_USER_PRIORITY (displayed by bparams -l). Job priorities that are not valid are rejected. LSF and queue administrators can specify priorities beyond MAX_USER_PRIORITY. The job owner can change the priority of their own jobs. LSF and queue administrators can change the priority of all jobs in a queue.
bsub The job can only use hosts reserved by the reservation user1#0. LSF only selects hosts in the reservation. You can use the -m option to specify particular hosts within the list of hosts reserved by the reservation, but you cannot specify other hosts not included in the original reservation.
bsub Enclose the dependency expression in single quotes (') to prevent the shell from interpreting special characters (space, any logic operator, or parentheses). If you use single quotes for the dependency expression, use double quotes for quoted items within it, such as job names. In dependency conditions, job names specify only your own jobs, unless you are the LSF administrator.
bsub If you specify an exit code with no operator, the test is for equality (== is assumed). If you specify only the job, any exit code satisfies the test. external(job_ID | "job_name", "status_text") The job has the specified job status. If you specify the first word of the message description (no spaces), the text of the job’s status begins with the specified word. Only the first word is evaluated.
bsub - RUN, DONE, or EXIT - PEND or PSUSP, and the job has a pre-execution command (bsub -E) that is running. -W [hour:]minute[/host_name | /host_model] Sets the run time limit of the batch job. If a UNIX job runs longer than the specified run limit, the job is sent a SIGUSR2 signal, and is killed if it does not terminate within ten minutes. If a Windows job runs longer than the specified run limit, it is killed immediately. (For a detailed description of how these jobs are killed, see bkill.
bsub If -wa is specified, LSF sends the warning action to the job before the actual control action is taken. This allows the job time to save its result before being terminated by the job control action. You can specify actions similar to the JOB_CONTROLS queue level parameter: send a signal, invoke a command, or checkpoint the job. The warning action specified by -wa option overrides JOB_WARNING_ACTION in the queue. JOB_WARNING_ACTION is used as the default when no command line option is specified.
bsub -V Prints LSF release version to stderr and exits. command [argument] The job can be specified by a command line argument command, or through the standard input if the command is not present on the command line. The command can be anything that is provided to a UNIX Bourne shell (see sh(1)). command is assumed to begin with the first word that is not part of a bsub option. All arguments that follow command are provided as the arguments to the command.
bsub Submit a batch interactive job that starts csh as an interactive shell. % bsub -b 20:00 -J my_job_name my_program Submit my_program to run after 8 p.m. and assign it the job name my_job_name. % bsub my_script Submit my_script as a batch job. Since my_script is specified as a command line argument, the my_script file is not spooled. Later changes to the my_script file before the job completes may affect this job. % bsub < default_shell_script where default_shell_script contains: sim1.exe sim2.
bsub Submit the UNIX command sleep together with its argument 100 as a batch job to the service class named Kyuquot. LIMITATIONS When using account mapping the command bpeek(1) will not work. File transfer via the -f option to bsub(1) requires rcp(1) to be working between the submission and execution hosts. Use the -N option to request mail, and/or the -o and -e options to specify an output file and error file, respectively.
bswitch bswitch switches unfinished jobs from one queue to another SYNOPSIS bswitch [-J job_name] [-m host_name | -m host_group] [-q queue_name] [-u user_name | -u user_group | -u all] destination_queue [0] bswitch destination_queue [job_ID | "job_ID [index_list ]"] ... bswitch [-h | -V] DESCRIPTION Switches one or more of your unfinished jobs to the specified queue. LSF administrators and root can switch jobs submitted by other users.
bswitch -q queue_name Only switches jobs in the specified queue. -u user_name | -u user_group | -u all Only switches jobs submitted by the specified user, or all users if you specify the keyword all. If you specify a user group, switches jobs submitted by all users in the group. destination_queue Required. Specify the queue to which the job is to be moved. job_ID ... |"job_ID[index_list]" ... Switches only the specified jobs. -h Prints command usage to stderr and exits.
btop btop moves a pending job relative to the first job in the queue SYNOPSIS btop job_ID | "job_ID [index_list ]" [position] btop [-h | -V] DESCRIPTION Changes the queue position of a pending job or a pending job array element, to affect the order in which jobs are considered for dispatch. By default, LSF dispatches jobs in a queue in the order of their arrival (that is, first-come-first-served), subject to availability of suitable server hosts.
btop position Optional. The position argument can be specified to indicate where in the queue the job is to be placed. position is a positive number that indicates the target position of the job from the beginning of the queue. The positions are relative to only the applicable jobs in the queue, depending on whether the invoker is a regular user or the LSF administrator. The default value of 1 means the position is before all the other jobs in the queue that have the same priority.
bugroup bugroup displays information about user groups SYNOPSIS bugroup [-l] [-r] [-w] [user_group ...] bugroup [-h | -V] DESCRIPTION Displays user groups and user names for each group. The default is to display information about all user groups. OPTIONS -l Displays information in a long multi-line format. Also displays share distribution if shares are configured. -r Expands the user groups recursively. The expanded list contains only user names; it does not contain the names of subgroups.
busers busers displays information about users and user groups SYNOPSIS busers [-w] [user_name ... | user_group ... | all] busers [-h | -V] DESCRIPTION Displays information about users and user groups. By default, displays information about the user who runs the command. OPTIONS user_name ... | user_group ... | all Displays information about the specified users or user groups, or about all users if you specify all. -h Prints command usage to stderr and exits.
busers PREEMPTIVE in lsb.queues(5)). If the character ‘–’ is displayed, there is no limit. MAX is defined by the MAX_JOBS parameter in the configuration file lsb.users(5). NJOBS The current number of job slots used by specified users’ jobs. A parallel job that is pending is counted as n job slots for it will use n job slots in the queue when it is dispatched. PEND The number of pending job slots used by jobs of the specified users. RUN The number of job slots used by running jobs of the specified users.
ch ch changes the host on which subsequent commands are to be executed SYNOPSIS ch [-S] [-t] [host_name] ch [-h | -V] DESCRIPTION Changes the host on which subsequent commands are to be executed. By default, if no arguments are specified, changes the current host to the home host, the host from which the ch command was issued. By default, executes commands on the home host. By default, shell mode support is not enabled. By default, does not display execution time of tasks.
ch OPTIONS -S Starts remote tasks with shell mode support. Shell mode support is required for running interactive shells or applications which redefine the CTRL-C and CTRL-Z keys (for example, jove). -t Turns on the timing option. The amount of time each subsequent command takes to execute is displayed. host_name Executes subsequent commands on the specified host. -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits.
ch LIMITATIONS Currently, the ch command does not support script, history, nor alias. The ch prompt is always the current working host:current working directory followed by a > (right angle bracket) character. If the ch session is invoked by a shell that supports job control (such as tcsh or ksh), CTRL-Z suspends the whole ch session. The exit status of a command line is printed to stderr if the status is non-zero.
lsacct lsacct displays accounting statistics on finished RES tasks in the LSF system SYNOPSIS lsacct [-l] [-C time0 ,time1] [-S time0 ,time1] [-f logfile_name] [-m host_name] [-u user_name ... | -u all] [pid ...] lsacct [-h | -V] DESCRIPTION Displays statistics on finished tasks run through RES. When a remote task completes, RES logs task statistics in the task log file. By default, displays accounting statistics for only tasks owned by the user who invoked the lsacct command.
lsacct -m host_name ... Displays accounting statistics for only tasks executed on the specified hosts. If a list of hosts is specified, host names must be separated by spaces and enclosed in quotation marks (") or (’). -u user_name ... | -u all Displays accounting statistics for only tasks owned by the specified users, or by all users if the keyword all is specified. If a list of users is specified, user names must be separated by spaces and enclosed in quotation marks (") or (’).
lsacct Blocks in Number of input blocks. Blocks out Number of output blocks. Messages sent Number of System VIPC messages sent. Messages rcvd Number of IPC messages received. Voluntary cont sw Number of voluntary context switches. Involuntary con sw Number of involuntary context switches. Turnaround Elapsed time from task execution to task completion.
lsacctmrg lsacctmrg merges task log files SYNOPSIS lsacctmrg [-f] logfile_name ... target_logfile_name lsacctmrg [-h | -V] DESCRIPTION Merges specified task log files into the specified target file in chronological order according to completion time. All files must be in the format specified in lsf.acct (see lsf.acct(5)). OPTIONS -f Overwrites the target file without prompting for confirmation. logfile_name ... Specify log files to be merged into the target file, separated by spaces.
lsadmin lsadmin administrative tool for LSF SYNOPSIS lsadmin subcommand lsadmin [-h | -V] SUBCOMMAND LIST ckconfig [-v] reconfig [-f] [-v] limstartup [-f] [host_name ... |all] limshutdown [-f] [host_name ... | all] limrestart [-v] [-f] [host_name ... | all] limlock [-l time_seconds] limunlock resstartup [-f] [host_name ... | all] resshutdown [-f] [host_name ... | all] resrestart [-f] [host_name ... | all] reslogon [host_name ... | all] [-c cpu_time] reslogoff [host_name ...
lsadmin OPTIONS subcommand Executes the specified subcommand. See Usage section. -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits. USAGE ckconfig [-v] Checks LSF configuration files. -v Displays detailed messages about configuration file checking. reconfig [-f] [-v] Restarts LIMs on all hosts in the cluster. You should use reconfig after changing configuration files. The configuration files are checked before all LIMs in the cluster are restarted.
lsadmin limshutdown [-f] [host_name ... | all] Shuts down LIM on the local host if no arguments are supplied. Shuts down LIMs on the specified hosts or on all hosts in the cluster if the word all is specified. You are promted to confirm LIM shutdown. -f Disables interaction and does not ask for confirmation for shutting down LIMs. limrestart [-v] [-f] [host_name ... | all] Restarts LIM on the local host if no arguments are supplied.
lsadmin The shell command specified by LSF_RSH in lsf.conf is used before rsh is tried. -f Disables interaction and does not ask for confirmation for starting RESs. resshutdown [-f] [host_name ... | all] Shuts down RES on the local host if no arguments are specified. Shuts down RESs on the specified hosts or on all hosts in the cluster if the word all is specified. You are prompted to confirm RES shutdown. If RES is running, it will keep running until all remote tasks exit.
lsadmin debug_level=0 (LOG_DEBUG level in parameter LSF_LOG_MASK) logfile_name=current LSF system log file in the LSF system log file directory, in the format daemon_name. log.host_name host_name= local host (host from which command was submitted) In MultiCluster, debug levels can only be set for hosts within the same cluster. For example, you could not set debug or timing levels from a host in clusterA for a host in clusterB.
lsadmin 2 - LOG_DEBUG2 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and LOG_DEBUG levels. 3 - LOG_DEBUG3 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG3 includes LOG_DEBUG2, LOG_DEBUG1, and LOG_DEBUG levels. Default: 0 (LOG_DEBUG level in parameter LSF_LOG_MASK) -f logfile_name Specify the name of the file into which debugging messages are to be logged.
lsadmin In MultiCluster, timing levels can only be set for hosts within the same cluster. For example, you could not set debug or timing levels from a host in clusterA for a host in clusterB. You need to be on a host in clusterB to set up debug or timing levels for clusterB hosts. -l timing_level Specifies detail of timing information that is included in log files. Timing messages indicate the execution time of functions in the software and are logged in milliseconds.
lsadmin help [subcommand ...] | ? [subcommand ...] Displays the syntax and functionality of the specified commands. The commands must be explicit to lsadmin. From the command prompt, you may use help or ?. quit Exits the lsadmin session. SEE ALSO ls_limcontrol(3), ls_rescontrol(3), ls_readconfenv(3), ls_gethostinfo(3), ls_connect(3), ls_initrex(3), lsf.conf(5), lsf.sudoers(5), lsf.
lsclusters lsclusters displays configuration information about LSF clusters SYNOPSIS lsclusters [-l] [cluster_name ...] lsclusters [-h | -V] DESCRIPTION Displays configuration information about LSF clusters. By default, returns information about the local cluster and all other clusters of which the local cluster is aware (all clusters defined in the RemoteClusters section of lsf.cluster.cluster_name if that section exists, otherwise all clusters defined in lsf.shared). OPTIONS -l Long format.
lsclusters ADMIN The user account name of the cluster’s primary LSF administrator. HOSTS Number of LSF hosts in the cluster. SERVERS Number of LSF server hosts in the cluster. Long Format (-l) If this option is specified, the command will also list available resource names, host types, host models and cluster administrator’s login names, and whether local cluster accepts or sends interactive jobs to this cluster. SEE ALSO lsfintro(1), ls_info(3), ls_policy(3), ls_clusterinfo(3) lsf.
lseligible lseligible displays whether a task is eligible for remote execution SYNOPSIS lseligible [-r] [-q] [-s] task lseligible [-h | -V] DESCRIPTION Displays whether the specified task is eligible for remote execution. By default, only tasks in the remote task list are considered eligible for remote execution. OPTIONS -r Remote mode. Considers eligible for remote execution any task not included in the local task list. -q Quiet mode.
lseligible SEE ALSO ls_eligible(3), lsrtasks(1), lsf.
lsfinstall lsfinstall runs lsfinstall, the Platform LSF installation and configuration script SYNOPSIS lsfinstall -f install.config lsfinstall -s -f slave.config lsfinstall -h DESCRIPTION lsfinstall runs the LSF installation scripts and configuration utilities to install a new Platform LSF cluster or upgrade LSF from a previous release. To install a fully operational LSF cluster that all users can access, you should install as root.
lsfinstall Where lsfinstall is located lsfinstall is included in the LSF installation script tar file lsf6.2_lsfinstall.tar.Z and is located in the lsf6.2_lsfinstall directory created when you uncompress and extract installation script tar file. After installation, lsfinstall is located in LSF_TOP/6.2/install/. Before running lsfinstall 1 Plan your installation by choosing: ❖ LSF installation directory on file server (e.g.
lsfinstall If you do not specify a license file with LSF_LICENSE, or lsfinstall cannot find a license file in the default location, lsfinstall exits. ❖ Make sure the installation file system containing LSF_TOP is writable by the user account that is running lsfinstall. Running lsfinstall 1 2 Log on as root to the installation file server. Edit lsf6.2_lsfinstall/install.config or lsf6.2_lsfinstall/slave.config.
lsfinstall After installing Platform LSF 1 Optional. Run hostsetup to configure host-based resources and set up automatic LSF startup on your server hosts. For Platform LSF HPC hosts, running hostsetup is optional on AIX and Linux. You must run hostsetup on SGI IRIX, TRIX, and Altix hosts, and on HP-UX hosts. a Log on to each server host as root. Start with the master host. If you are not root, you can continue with host setup, but by default, only root can start the LSF daemons.
lsfinstall ◆ ◆ ◆ ◆ ◆ LSF_RSHCMD—the remote shell command (e.g, rsh or ssh) accessing the remote host LSF_HOSTS—list of hosts to run hostsetup on LSF_TOPDIR—sets the hostsetup --top option. Specify the full path to the top-level installation directory. rhostsetup tries to detect this from lsf.conf if it is not defined here. LSF_BOOT—sets the hostsetup --boot option. Default is no (n). LSF_QUIET—sets the hostsetup --quiet option. Default is no (n).
lsfinstall If LSF_LOCAL_RESOURCES are already defined in a local lsf.conf on the slave host, lsfinstall does not add resources you define in LSF_LOCAL_RESOURCES in slave.config. lsfinstall creates a local lsf.conf for the slave host, which sets the following parameters: ◆ ◆ ◆ ◆ ◆ ◆ LSF_CONFDIR="/path" LSF_GET_CONF=lim LSF_LIM_PORT=port_number LSF_LOCAL_RESOURCES="resource ..." LSF_SERVER_HOSTS="host_name [host_name ...]" LSF_VERSION=6.2 -h Prints command usage and exits. SEE ALSO lsf.
lsfmon lsfmon installs or uninstalls LSF Monitor SYNOPSIS lsfmon -install lsfmon -remove DESCRIPTION Installs or uninstalls LSF Monitor in an existing cluster. LSF Monitor runs on Microsoft Windows and allows you to use Windows Performance Monitor to chart information about the LSF cluster. The LSF Monitor service runs under the account of an LSF cluster administrator. OPTIONS -install Installs LSF Monitor on the host. -remove Removes LSF Monitor from the host.
lsfrestart lsfrestart restarts LIM, RES, sbatchd and mbatchd on all hosts in the cluster SYNOPSIS lsfrestart [-f | -h | -V] DESCRIPTION This command can only be used by root or users listed in lsf.sudoers. Restarts LIM, RES, sbatchd and mbatchd, in that order, on all hosts in the local cluster. By default, prompts for confirmation of the next operation if an error is encountered. In order to be able to control all daemons in the cluster: ◆ ◆ The file /etc/lsf.sudoers has to be set up properly.
lsfsetcluster lsfsetcluster specifies a default LSF cluster for the host SYNOPSIS lsfsetcluster cluster_name lsfsetcluster [-h | -V] DESCRIPTION You must be a Windows local administrator of this host. This command specifies the LSF cluster that users of the host interact with by default, and modifies LSF_BINDIR and LSF_ENVDIR system environment variables on the host. Users of the host must set a different environment to interact with a different cluster.
lsfshutdown lsfshutdown shuts down LIM, RES, sbatchd and mbatchd on all hosts in the cluster SYNOPSIS lsfshutdown [-f | -h | -V] DESCRIPTION This command can only be used by root or users listed in lsf.sudoers. Shuts down sbatchd, RES, LIM, and mbatchd, in that order, on all hosts. By default, prompts for confirmation of the next operation if an error is encountered. In order to be able to control all daemons in the cluster: ◆ ◆ The file /etc/lsf.sudoers has to be set up properly.
lsfstartup lsfstartup starts LIM, RES, sbatchd, and mbatchd on all hosts in the cluster SYNOPSIS lsfstartup [-f ] lsfstartup [-h | -V] DESCRIPTION This command can only be used by root or users listed in lsf.sudoers. Starts LIM, RES, sbatchd, and mbatchd, in that order, on all hosts. By default, prompts for confirmation of the next operation if an error is encountered.
lsgrun lsgrun executes a task on a set of hosts SYNOPSIS lsgrun [-i] [-p | -P | -S] [-v] -f host_file | -m host_name ... | -n num_hosts [-R "res_req"] [command [argument ...]] lsgrun [-h | -V] DESCRIPTION Executes a task on the specified hosts. lsgrun is useful for fast global operations such as starting daemons, replicating files to or from local disks, looking for processes running on all hosts, checking who is logged in on each host, and so on.
lsgrun -P Creates a pseudo-terminal on UNIX hosts. This is necessary to run programs requiring a pseudo-terminal (for example, vi). This option is not supported on Windows. -S Creates a pseudo-terminal with shell mode support on UNIX hosts. Shell mode support is required for running interactive shells or applications which redefine the CTRL-C and CTRL-Z keys (such as jove). This option is not supported on Windows. -v Verbose mode. Displays the name of the host or hosts running the task.
lsgrun Exclusive resources need to be explicitly specified within the resource requirement string. For example, you defined a resource called bigmem in lsf.shared and defined it as an exclusive resource for hostE in lsf.cluster.mycluster. Use the following command submit a task to run on hostE: % lsgrun -R "bigmem" myjob or % lsgrun -R "defined(bigmem)" myjob If the -m option is specified with a single host name, the -R option is ignored. command [argument ...] Specify the command to execute.
lshosts lshosts displays hosts and their static resource information SYNOPSIS lshosts [-w | -l] [-R "res_req "] [host_name | cluster_name] ... lshosts -s [shared_resource_name ...] lshosts [-h | -V] DESCRIPTION Displays static resource information about hosts. By default, returns the following information: host name, host type, host model, CPU factor, number of CPUs, total memory, total swap space, whether or not the host is a server host, and static resources.
lshosts host_name...| cluster_name... Only displays information about the specified hosts. Do not use quotes when specifying multiple hosts. For MultiCluster, displays information about hosts in the specified clusters. The names of the hosts belonging to the cluster are displayed instead of the name of the cluster. Do not use quotes when specifying multiple clusters. -s [shared_resource_name ...] Displays information about the specified resources. The resources must be static shared resources.
lshosts maxswp The total available swap space. server Indicates whether the host is a server or client host. “Yes” is displayed for LSF servers. “No” is displayed for LSF clients. “Dyn” is displayed for dynamic hosts. RESOURCES The Boolean resources defined for this host, denoted by resource names, and the values of external numeric and string static resources. See lsf.cluster(5), and lsf.shared(5) on how to configure external static resources.
lshosts RESOURCE The name of the resource. VALUE The value of the static shared resource. LOCATION The hosts that are associated with the static shared resource. FILES Reads lsf.cluster.cluster_name. SEE ALSO lsfintro(1), ls_info(3), ls_policy(3), ls_gethostinfo(3), lsf.cluster(5), lsf.
lsid lsid displays the current LSF version number, the cluster name, and the master host name SYNOPSIS lsid [-h | -V] DESCRIPTION Displays the current LSF version number, the cluster name, and the master host name. The master host is dynamically selected from all hosts in the cluster. OPTIONS -h Prints command usage to stderr and exits. -V Prints LSF release version to stderr and exits. FILES The host names and cluster names are defined in lsf.cluster.cluster_name and lsf.shared, respectively.
lsinfo lsinfo displays load sharing configuration information SYNOPSIS lsinfo [-l] [-m | -M] [-r] [-t] [resource_name ...] lsinfo [-h | -V] DESCRIPTION By default, displays all load sharing configuration information including resource names and their meanings, host types and models, and associated CPU factors known to the system. By default, displays information about all resources. Resource information includes resource name, resource type, description, and the default sort order for the resource.
lsinfo OUTPUT -l option The -l option displays all information available about load indices. TYPE Indicates whether the resource is numeric, string, or Boolean. ORDER ❖ Inc—If the numeric value of the load index increases as the load it measures increases, such as CPU utilization(ut). ❖ Dec—If the numeric value decreases as the load increases. ❖ N/A—If the resource is not numeric. INTERVAL The number of seconds between updates of that index. Load indices are updated every INTERVAL seconds.
lsload lsload displays load information for hosts SYNOPSIS lsload [-l] [-N | -E] [-I load_index[:load_index] ...] [-n num_hosts] [-R res_req] [host_name ... | cluster_name ...] lsload -s [resource_name ...] lsload [-h | -V] DESCRIPTION Displays load information for hosts. Load information can be displayed on a per-host basis, or on a per-resource basis. By default, displays load information for all hosts in the local cluster, per host.
lsload -R res_req Displays only load information for hosts that satisfy the specified resource requirements. See lsinfo(1) for a list of built-in resource names. Load information for the hosts is sorted according to load on the specified resources. If res_req contains special resource names, only load information for hosts that provide these resources is displayed (see lshosts(1) to find out what resources are available on each host).
lsload busy The host is overloaded because some load indices exceed configured thresholds. Load index values that caused the host to be busy are preceded by an asterisk (*). lockW The host is locked by its run window. Run windows for a host are specified in the configuration file (see lsf.conf(5)) and can be displayed by lshosts. A locked host will not accept load shared jobs from other hosts. lockU The host is locked by the LSF administrator or root.
lsload If -l is specified, shows the disk I/O rate exponentially averaged over the last minute, in KB per second. external_index By default, external load indices are not shown. If -l is specified, shows indices for all dynamic custom resources available on the host, including shared, string and Boolean resources. If -I load_index is specified, only shows indices for specified non-shared (hostbased) dynamic numeric custom resources.
lsload Exit status is -1 if a bad parameter is specified, otherwise lsload returns 0. SEE ALSO lsfintro(1), lim(8), lsf.
lsloadadj lsloadadj adjusts load indices on hosts SYNOPSIS lsloadadj [-R res_req] [host_name [:num_task] ...] lsloadadj [-h | -V] DESCRIPTION Adjusts load indices on hosts. This is useful if a task placement decision is made outside LIM by another application. By default, assumes tasks are CPU-intensive and memory-intensive. This means the CPU and memory load indices are adjusted to a higher number than other load indices.
lsloadadj -V Prints LSF release version to stderr and exits. EXAMPLES % lsloadadj -R "rusage[swp=20:mem=10]" Adjusts the load indices swp and mem on the host from which the command was submitted. DIAGNOSTICS Returns -1 if a bad parameter is specified; otherwise returns 0.
lslogin lslogin remotely logs in to a lightly loaded host SYNOPSIS lslogin [-v] [-m "host_name ... " | -m "cluster_name ... "] [-R "res_req "] [rlogin_options] lslogin [-h | -V] DESCRIPTION Remotely logs in to a lightly loaded host. By default, lslogin selects the least loaded host, with few users logged in, and remotely logs in to that host using the UNIX rlogin command. In a MultiCluster environment, the default is to select the least loaded host in the local cluster.
lslogin EXAMPLE % lslogin -R "select[it>1 && bsd]" Remotely logs in to a host that has been idle for at least 1 minute, that runs BSD UNIX, and is lightly loaded both in CPU resources and the number of users logged in. DIAGNOSTICS Because lslogin passes all unrecognized arguments to rlogin, incorrect options usually cause the rlogin usage message to be displayed rather than the lslogin usage message.
lsltasks lsltasks displays or updates a user’s local task list SYNOPSIS lsltasks [+ task_name ... | – task_name ...] lsltasks [-h | -V] DESCRIPTION Displays or updates a user’s local task list in $HOME/.lsftask. When no options are specified, displays tasks listed in the system task file lsf.task and the user’s task file .lsftask. If there is a conflict between the system task file lsf.task and the user’s task file .lsftask, the user’s task file overrides the system task file.
lsltasks FILES Reads the system task file lsf.task, and the user task file .lsftask in the user’s home directory. See lsf.task(5) for more details. The system and user task files contain two sections, one for the remote task list, the other for the local task list. The local tasks section starts with Begin LocalTasks and ends with End LocalTasks. Each line in the section is an entry consisting of a task name. A plus sign (+) or a minus sign (–) can optionally precede each entry.
lsmake lsmake runs make tasks in parallel SYNOPSIS lsmake [-c num_tasks] [-F res_req] [-m "host_name ..."] [-E] [-G] [-M] [-V] [makeoption ...] [target ...] lsmake [-c num_tasks] [-F res_req] [-T] [-j max_processors] [-P minutes] [-R res_req] [-E] [-G] [-M] [-V] [makeoption ...] [target ...] DESCRIPTION Runs make tasks in parallel on LSF hosts. Sets the environment variables on the remote hosts when lsmake first starts.
lsmake -F res_req Temporarily reduces the number of tasks running when the load on the network file server exceeds the specified resource requirements. This might also reduce the number of processors used. The number of tasks is increased again when the load on the network file server is below the specified resource requirements. The network file server is considered to be the host mounting the current working directory on the local host. If this machine is not in the local cluster, -F is ignored.
lsmon lsmon displays load information for LSF hosts and periodically updates the display SYNOPSIS lsmon [-N | -E] [-n num_hosts] [-R res_req] [-I index_list] [-i interval] [-L file_name] [host_name ...] lsmon [-h | -V] DESCRIPTION lsmon is a full-screen LSF monitoring utility that displays and updates load information for hosts in a cluster. By default, displays load information for all hosts in the cluster, up to the number of lines that will fit on-screen. By default, displays raw load indices.
lsmon -i interval Sets how often load information is updated on-screen, in seconds. -L file_name Saves load information in the specified file while it is displayed on-screen. If you do not want load information to be displayed on your screen at the same time, use lsmon -L file_name < /dev/null. The format of the file is described in lim.acct(5). host_name ... Displays only load information for the specified hosts. -h Prints command usage to stderr and exits.
lsmon OUTPUT The following fields are displayed by default. HOST_NAME Name of specified hosts for which load information is displayed, or if resource requirements were specified, name of hosts that satisfied the specified resource requirement and for which load information is displayed. status Status of the host. A minus sign (-) may precede the status, indicating that the Remote Execution Server (RES) on the host is not running.
lsmon ls The number of current login users. it On UNIX, the idle time of the host (keyboard not touched on all logged in sessions), in minutes. On Windows, the it index is based on the time a screen saver has been active on a particular host. tmp The amount of free space in /tmp, in megabytes. swp The amount of currently available swap space, in megabytes. mem The amount of currently available memory, in megabytes.
lspasswd lspasswd registers user passwords in LSF on Windows SYNOPSIS lspasswd [-u user_name] DESCRIPTION Registers user passwords in LSF on Windows. Passwords must be 3 characters or longer. By default, if no options are specified, the password applies to the user who issued the command. Only the LSF administrator can enter passwords for other users. Users must update the password maintained by LSF if they change their Windows user account password.
lsplace lsplace displays hosts available to execute tasks SYNOPSIS lsplace [-L] [-n minimum | -n 0] [-R res_req] [-w maximum | -w 0] [host_name ...] lsplace [-h | -V] DESCRIPTION Displays hosts available for the execution of tasks, and temporarily increases the load on these hosts (to avoid sending too many jobs to the same host in quick succession). The inflated load will decay slowly over time before the real load produced by the dispatched task is reflected in the LIM’s load information.
lsplace EXAMPLES lsplace is mostly used in backquotes to pick out a host name which is then passed to other commands. The following example issues a command to display a lightly loaded HPPA-RISC host for your program to run on: % lsrun -m ‘lsplace -R hppa‘ myprogram In order for a job to land on a host with an exclusive resource, you need to explicitly specify that resource for the resource requirements.
lsrcp lsrcp remotely copies files using LSF SYNOPSIS lsrcp [-a] source_file target_file lsrcp [-h | -V] DESCRIPTION Remotely copies files using LSF. lsrcp is an LSF-enabled remote copy program that transfers a single file between hosts in an LSF cluster. lsrcp uses RES on an LSF host to transfer files. If LSF is not installed on a host or if RES is not running then lsrcp uses rcp to copy the file. To use lsrcp, you must have read access to the file being copied.
lsrcp Always use "/" to transfer files from a UNIX host to a Windows host, or from a Windows host to a UNIX host. This is because the operating system interprets "\" and lsrcp will open the wrong files. For example, to transfer a file from UNIX to a Windows host: % lsrcp file1 hostA:/c:/temp/file2 For example, to transfer a file from Windows to a UNIX host: c:\share>lsrcp file1 hostD:/home/usr2/test/file2 file_name Name of source file. File name expansion is not supported.
lsrcp ◆ ◆ rcp on UNIX. If lsrcp cannot contact RES on the submission host, it attempts to use rcp to copy the file. You must set up the /etc/hosts.equiv or HOME/.rhosts file in order to use rcp. See the rcp(1), rsh(1), ssh(1) manual pages for more information on using the rcp, rsh, and ssh commands. You can replace lsrcp with your own file transfer mechanism as long as it supports the same syntax as lsrcp.
lsrtasks lsrtasks displays or updates a user’s remote task list SYNOPSIS lsrtasks [+ task_name[/res_req] ... | – task_name[/res_req] ...] lsrtasks [-h | -V] DESCRIPTION Displays or updates a user’s remote task list in $HOME/.lsftask. When no options are specified, displays tasks listed in the system task file lsf.task and the user’s task file .lsftask. If there is a conflict between the system task file lsf.task and the user’s task file .lsftask, the user’s task file overrides the system task file.
lsrtasks -V Prints LSF release version to stderr and exits. EXAMPLES % lsrtasks + task1 task2/"select[cpu && mem]" - task3 or in restricted form: % lsrtasks + task1 task2/cpu:mem - task3 Adds the command task1 to the remote task list with no resource requirements, adds task2 with the resource requirement cpu:mem, and removes task3 from the remote task list. % lsrtasks + myjob/swap>=100 && cpu Adds myjob to the remote tasks list with its resource requirements.
lsrun lsrun runs an interactive task through LSF SYNOPSIS lsrun [-l] [-L] [-P] [-S] [-v] [-m "host_name ..." | -m "cluster_name ..."] [-R "res_req "] command [argument ...] lsrun [-h | -V] DESCRIPTION Submits a task to LSF for execution. With MultiCluster job forwarding model, the default is to run the task on a host in the local cluster. By default, lsrun first tries to obtain resource requirement information from the remote task list to find an eligible host. (See lseligible(1) and ls_task(3).
lsrun -m "host_name ..." | -m "cluster_name ..." The execution host must be one of the specified hosts. If a single host is specified, all resource requirements are ignored. If multiple hosts are specified and you do not use the -R option, the execution host must satisfy the resource requirements in the remote task list (see lsrtasks(1)). If none of the specified hosts satisfy the resource requirements, the task will not run.
lsrun DIAGNOSTICS lsrun exits with status -10 and prints an error message to stderr if a problem is detected in LSF and the task is not run. The exit status is -1 and an error message is printed to stderr if a system call fails or incorrect arguments are specified. Otherwise, the exit status is the exit status of the task. SEE ALSO rsh(1), lsfintro(1), ls_rexecv(3), lsplace(1), lseligible(1), lsload(1), lshosts(1), lsrtasks(1), lsf.
lstcsh lstcsh load sharing tcsh for LSF SYNOPSIS lstcsh [tcsh_options] [-L] [argument ...] DESCRIPTION lstcsh is an enhanced version of tcsh. lstcsh behaves exactly like tcsh, except that it includes a load sharing capability with transparent remote job execution for LSF. By default, a lstcsh script is executed as a normal tcsh script with load sharing disabled.
lstcsh OPTIONS tcsh_options lstcsh accepts all the options used by tcsh. See tcsh(1) for the meaning of specific options. -L Executes a script with load sharing enabled. There are three ways to run a lstcsh script with load sharing enabled: - Execute the script with the -L option - Use the built-in command source to execute the script - Insert "#!/local/bin/lstcsh -L" as the first line of the script (assuming you install lstcsh in /local/bin).
lstcsh remote Remote operation mode. In this mode, a command line is considered eligible for remote execution only if none of the specified tasks are present in the local task list in the user’s tasks file $HOME/.lsftask. Tasks in the remote list can be executed remotely. The remote mode of operation is aggressive, and promotes extensive use of LSF.
lstcsh v | -v Turns task placement verbose mode on (v) or off (-v). If verbose mode is on, lstcsh displays the name of the host on which the command is run if the command is not run on the local host. The default is on. t | -t Turns wall clock timing on (t) or off (-t). If timing is on, the actual response time of the command is displayed. This is the total elapsed time in seconds from the time you submit the command to the time the prompt comes back. This time includes all remote execution overhead.
lstcsh FILES There are three optional configuration files for lstcsh: .shrc .hostrc .lsftask The .shrc and .hostrc files are used by lstcsh alone, whereas .lsftask is used by LSF to determine general task eligibility. ~/.shrc Use this file when you want an execution environment on remote hosts that is different from that on the local host. This file is sourced automatically on a remote host when a connection is established.
pam pam Parallel Application Manager – job starter for MPI applications SYNOPSIS HP-UX vendor MPI bsub pam -mpi mpirun [mpirun_options ] mpi_app [ar gument ...] syntax SGI vendor MPI bsub pam [-n num_tasks ] -mpi -auto_place mpi_app [ar gument ...] syntax Generic PJL bsub pam [-t] [-v] [-n num_tasks ] -g [num_ar gs] pjl_wrapper [pjl_options] framework syntax mpi_app [ar gument ...] pam [-h] [-V] DESCRIPTION The Parallel Application Manager (PAM) is the point of control for Platform LSF HPC.
pam TASK STARTUP FOR LSF HPC GENERIC PJL JOBS For parallel jobs submitted with bsub: ❖ ❖ PAM invokes the PJL, which in turn invokes the TaskStarter (TS). TS starts the tasks on each execution host, reports the process ID to PAM, and waits for the task to finish. OPTIONS OPTIONS FOR VENDOR MPI JOBS -auto_place The -auto_place option on the pam command line tells the SGI IRIX mpirun library to launch the MPI application according to the resources allocated by LSF.
pam -V Prints LSF release version to stderr and exit. OPTIONS FOR LSF HPC GENERIC PJL JOBS -t This option tells pam not to print out the MPI job tasks summary report to the standard output. By default, the summary report prints out the task ID, the host on which it was executed, the command that was executed, the exit status, and the termination time. -v Verbose mode. Displays the name of the execution host or hosts.
pam -V Prints LSF release version to stderr and exit. EXIT STATUS pam exits with the exit status of mpirun or the PJL wrapper.
taskman taskman checks out a license token and manages interactive UNIX applications SYNOPSIS taskman -Lp project -R “rusage[token =number[:duration=minutes | hours h] [:token =number[:duration=minutes | hours h]]...] [-N n_retries] [-v] command taskman [-h | -V] DESCRIPTION Runs the interactive UNIX application on behalf of the user. When it starts, the task manager connects to License Scheduler to request the application license tokens.
wgpasswd wgpasswd changes a user’s password for an entire Microsoft Windows workgroup SYNOPSIS wgpasswd [user_name] wgpasswd [-h] DESCRIPTION You must run this command on a host in a Windows workgroup. You must have administrative privileges to change another user’s password. Prompts for old and new passwords, then changes the password on every host in the workgroup. By default, modifies your own user account. OPTIONS user_name Specifies the account to modify.
wgpasswd 262 Platform LSF Reference
wguser wguser modifies user accounts for an entire Microsoft Windows workgroup SYNOPSIS wguser [-r] user_name ... wguser [-h] DESCRIPTION You must run this command on a host in a Microsoft Windows workgroup. You should have administrative privileges on every host in the workgroup. Modifies accounts on every host in the workgroup that you have administrative privileges on.
wguser 264 Platform LSF Reference
P A R T II Environment Variables
Environment Variables Contents ◆ ◆ “Environment Variables Set for Job Execution” on page 268 “Environment Variable Reference” on page 269 Platform LSF Reference 267
Environment Variables Set for Job Execution Environment Variables Set for Job Execution LSF transfers most environment variables between submission and execution hosts. In addition to environment variables inherited from the user environment, LSF also sets several other environment variables for batch jobs: ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ 268 Platform LSF Reference LSB_ERRORFILE: Name of the error file specified with a bsub -e LSB_JOBID: Batch job ID assigned by LSF.
Environment Variables Environment Variable Reference BSUB_BLOCK BSUB_QUIET BSUB_STDERR CLEARCASE_DRIVE BSUB_QUIET2 CLEARCASE_MOUNTDIR CLEARCASE_ROOT LM_LICENSE_FILE LS_EXEC_T LS_JOBPID LS_LICENSE_SERVER_feature LS_SUBCWD LSB_CHKPNT_DIR LSB_DEBUG LSB_DEBUG_CMD LSB_DEBUG_MBD LSB_DEBUG_NQS LSB_DEBUG_SBD LSB_DEBUG_SCH LSB_DEFAULTPROJECT LSB_DEFAULTQUEUE LSB_ECHKPNT_METHOD LSB_ECHKPNT_METHOD_DIR LSB_ECHKPNT_KEEP_OUTPUT LSB_ERESTART_USRCMD LSB_EXEC_RUSAGE LSB_EXECHOSTS LSB_EXIT_PRE_ABOR
Environment Variable Reference Default Undefined Where defined From the command line Example BSUB_QUIET=1 BSUB_QUIET2 Syntax BSUB_QUIET2=any_value Description Suppresses the printing of information about job completion when a job is submitted with the bsub -K option. If set, bsub will not print information about job completion to stdout. For example, when this variable is set, the message <> will not be written to stdout.
Environment Variables Example CLEARCASE_DRIVE=F: CLEARCASE_DRIVE=f: See also CLEARCASE_MOUNTDIR, CLEARCASE_ROOT. CLEARCASE_MOUNTDIR Syntax CLEARCASE_MOUNTDIR=path Description Optional. Defines the Rational ClearCase mounting directory. Default /vobs Notes: CLEARCASE_MOUNTDIR is used if any of the following conditions apply: ◆ ◆ A job is submitted from a UNIX environment but run in a Windows host.
Environment Variable Reference See Also See “lsf.conf ” under “LSF_LICENSE_FILE” on page 553 LS_EXEC_T Syntax LS_EXEC_T= START | END | CHKPNT | JOB_CONTROLS Description Indicates execution type for a job. LS_EXEC_T is set to: ◆ ◆ ◆ START or END for a job when the job begins executing or when it completes execution CHKPNT when the job is checkpointed JOB_CONTROLS when a control action is initiated Where defined Set by sbatchd during job execution LS_JOBPID Description The process ID of the job.
Environment Variables Description The directory containing files related to the submitted checkpointable job. Valid values The value of checkpoint_dir is the directory you specified through the -k option of bsub when submitting the checkpointable job. The value of job_ID is the job ID of the checkpointable job. Where defined Set by LSF, based on the directory you specified when submitting a checkpointable job with the -k option of bsub.
Environment Variable Reference If you submit a job with the -P option of bsub, the job belongs to the project specified through the -P option. Where defined From the command line, or through the -P option of bsub Example LSB_DEFAULTPROJECT=engineering See also See “lsb.params” under “DEFAULT_PROJECT” on page 378, the -P option of bsub. LSB_DEFAULTQUEUE Syntax LSB_DEFAULTQUEUE=queue_name Description Defines the default LSF queue. Default mbatchd decides which is the default queue.
Environment Variables See also LSB_ECHKPNT_METHOD, erestart, echkpnt LSB_EXEC_RUSAGE Syntax LSB_EXEC_RUSAGE="resource_name1 resource_value1 resource_name2 resource_value2..." Description Indicates which rusage string is satisfied to permit the job to run. This environment variable is necessary because the OR (||) operator specifies alternative rusage strings for running jobs. Valid values resource_value1, resource_value2,... refer to the resource values on resource_name1, resource_name2,... respectively.
Environment Variable Reference LSB_FRAMES Syntax LSB_FRAMES=start_number,end_number,step Description Determines the number of frames to be processed by a frame job. Valid values The values of start_number, end_number, and step are positive integers. Use commas to separate the values. Default Undefined Notes When the job is running, LSB_FRAMES will be set to the relative frames with the format LSB_FRAMES=start_number,end_number,step.
Environment Variables If you want to run an interactive job that requires some preliminary setup, LSF provides a job starter function at the command level. A command-level job starter allows you to specify an executable file that will run prior to the actual job, doing any necessary setup and running the job when the setup is complete. If the environment variable LSB_JOB_STARTER is properly defined, sbatchd will invoke the job starter (rather than the job itself), supplying your commands as arguments.
Environment Variable Reference When the post-execution command is run, the environment variable LSB_JOBEXIT_INFO is set if the job is signalled internally. If the job ends successfully, or the job is killed or signalled externally, LSB_JOBEXIT_INFO is not set.
Environment Variables LSB_JOBINDEX Syntax LSB_JOBINDEX=index Description Contains the job array index. Valid values Any integer greater than zero but less than the maximum job array size. Notes LSB_JOBINDEX is set when each job array element is dispatched. Its value corresponds to the job array index. LSB_JOBINDEX is set for all jobs. For non-array jobs, LSB_JOBINDEX is set to zero (0). Where defined Set during job execution based on bsub options.
Environment Variable Reference Notes The name of a job can be specified explicitly when you submit a job. The name does not have to be unique. If you do not specify a job name, the job name defaults to the actual batch command as specified on the bsub command line.
Environment Variables If the output fails or cannot be read, LSB_MAILSIZE is set to -1 and the output is sent by email using LSB_MAILPROG if specified in lsf.conf. ◆ Undefined If you use the -o or -e options of bsub, the output is redirected to an output file. Because the output is not sent by email in this case, LSB_MAILSIZE is not used and LSB_MAILPROG is not called. If the -N option is used with the -o option of bsub, LSB_MAILSIZE is not set.
Environment Variable Reference LSB_NQS_PORT This parameter can be defined in lsf.conf or in the services database such as /etc/services. See “lsf.conf ” under “LSB_NQS_PORT” on page 528 for more details. LSB_NTRIES Syntax LSB_NTRIES=integer Description The number of times that LSF libraries attempt to contact mbatchd or perform a concurrent jobs query.
Environment Variables LSB_REMOTEINDEX Syntax LSB_REMOTEINDEX=index Description The job array index of a remote MultiCluster job. LSB_REMOTEINDEX is set only if the job is an element of a job array. Valid values Any integer greater than zero, but less than the maximum job array size Where defined Set by sbatchd See also LSB_JOBINDEX, “MAX_JOB_ARRAY_SIZE” on page 385 in “lsb.params” LSB_REMOTEJID Syntax LSB_REMOTEJID=job_ID Description The job ID of a remote MultiCluster job.
Environment Variable Reference Notes When a checkpointed job is restarted, the operating system assigns a new process ID to the job. Batch sets LSB_RESTART_PID to the new process ID. Where defined Defined by Batch during restart of a checkpointed job See also LSB_RESTART_PGID, LSB_RESTART LSB_SUB_CLUSTER Description Name of submission cluster (MultiCluster only) Where defined Set on the submission environment and passed to the execution cluster environment.
Environment Variables Description An integer representing suspend reasons. Suspend reasons are defined in lsbatch.h. This parameter is set when a job goes to system-suspended (SSUSP) or user-suspended status (USUSP). It indicates the exact reason why the job was suspended. To determine the exact reason, you can test the value of LSB_SUSP_REASONS against the symbols defined in lsbatch.h.
Environment Variable Reference LSF_DEBUG_CMD This parameter can be set from the command line or from lsf.conf. See “lsf.conf ” under “LSB_DEBUG_MBD” on page 513. LSF_DEBUG_LIM This parameter can be set from the command line or from lsf.conf. See “lsf.conf ” under “LSF_DEBUG_LIM” on page 539. LSF_DEBUG_RES This parameter can be set from the command line or from lsf.conf. See “lsf.conf ” under “LSF_DEBUG_RES” on page 540.
Environment Variables Description A string that specifies the daemon or user that is calling eauth -c. Notes Sets the context for the call to eauth, and allows the eauth writer to perform daemon authentication. Where defined Set internally by the LSF libraries, or by the daemon calling eauth -c. See also LSF_EAUTH_SERVER LSF_EAUTH_SERVER Syntax ◆ SUN HPC LSF_EAUTH_SERVER=mbatchd | sbatchd | pam | res ◆ LSF3.
Environment Variable Reference Default Undefined Notes Interactive Jobs If you want to run an interactive job that requires some preliminary setup, LSF provides a job starter function at the command level. A command-level job starter allows you to specify an executable file that will run prior to the actual job, doing any necessary setup and running the job when the setup is complete.
Environment Variables LSF_LOGDIR This parameter can be set from the command line or from lsf.conf. See “lsf.conf ” under “LSF_LOGDIR” on page 558. LSF_MASTER Description Specifies whether ELIM has been started on the master host. Notes LIM communicates with ELIM through two environment variables: LSF_MASTER and LSF_RESOURCES. LSF_MASTER is set to Y when LIM starts ELIM on the master host. It is set to N or is undefined otherwise.
Environment Variable Reference If this parameter is defined, and an interactive batch job is pending for longer than the specified time, the interactive batch job is terminated. Valid values Any integer greater than zero Default Undefined LSF_RESOURCES Syntax LSF_RESOURCES=dynamic_shared_resource_name... Description Space-separated list of customized dynamic shared resources that the ELIM is responsible for collecting.
Environment Variables Description Set during LSF installation or setup. If you modify this parameter in an existing cluster, you probably have to modify passwords and configuration files also. Windows or mixed UNIX-Windows clusters only. Enables default user mapping, and specifies the LSF user domain. The period (.) specifies local accounts, not domain accounts.
Environment Variable Reference 292 Platform LSF Reference
P A R T III S Configuration Files ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ “bld.license.acct” on page 295 “cshrc.lsf and profile.lsf ” on page 297 “hosts” on page 303 “install.config” on page 307 “lim.acct” on page 313 “lsb.acct” on page 315 “lsb.events” on page 323 “lsb.hosts” on page 353 “lsb.modules” on page 367 “lsb.params” on page 373 “lsb.queues” on page 399 “lsb.resources” on page 433 “lsb.serviceclasses” on page 457 “lsb.users” on page 465 “lsf.acct” on page 475 “lsf.
bld.license.acct The bld.license.acct file is the license and accounting file for LSF License Scheduler. Contents ◆ “bld.license.
bld.license.acct Structure bld.license.acct Structure The license accounting log file is an ASCII file with one record per line. The fields of a record are separated by blanks. LSF License Scheduler adds a new record to the file every hour. File properties Location The default location of this file is LSF_SHAREDIR/db. Use LSF_LICENSE_ACCT_PATH in lsf.conf to specify another location. Owner The primary LSF License Scheduler admin is the owner of this file.
cshrc.lsf and profile.lsf Contents ◆ ◆ “About cshrc.lsf and profile.lsf ” on page 298 “LSF Environment Variables Set by cshrc.lsf and profile.
About cshrc.lsf and profile.lsf About cshrc.lsf and profile.lsf The user environment shell files cshrc.lsf and profile.lsf set the LSF operating environment on a Platform LSF host. They define machine-dependent paths to LSF commands and libraries as environment variables: ◆ cshrc.lsf sets the C shell (csh or tcsh) user environment for LSF commands and libraries ◆ profile.
cshrc.lsf and profile.lsf ◆ For example, in csh or tcsh: % source /usr/share/lsf/lsf_62/conf/cshrc.lsf ◆ For example, in sh, ksh, or bash: $ . /usr/share/lsf/lsf_62/conf/profile.lsf Making your cluster available to users with cshrc.lsf and profile.lsf To set up the LSF user environment, run one of the following two shell files: ◆ LSF_CONFDIR/cshrc.lsf (for csh, tcsh) ◆ LSF_CONFDIR/profile.
About cshrc.lsf and profile.lsf $ set ... LD_LIBRARY_PATH=/usr/share/lsf/lsf_62/6.2/sparc-sol7-32/lib LSF_BINDIR=/usr/share/lsf/lsf_62/6.2/sparc-sol7-32/bin LSF_ENVDIR=/usr/share/lsf/lsf_62/conf LSF_LIBDIR=/usr/share/lsf/lsf_62/6.2/sparc-sol7-32/lib LSF_SERVERDIR=/usr/share/lsf/lsf_62/6.2/sparc-sol7-32/etc MANPATH=/usr/share/lsf/lsf_62/6.
cshrc.lsf and profile.lsf LSF Environment Variables Set by cshrc.lsf and profile.lsf LSF_BINDIR Syntax LSF_BINDIR=dir Description Directory where LSF user commands are installed. Examples ◆ Set in csh and tcsh by cshrc.lsf: setenv LSF_BINDIR /usr/share/lsf/lsf_62/6.2/sparc-sol732/bin ◆ Set and exported in sh, ksh, or bash by profile.lsf: LSF_BINDIR=/usr/share/lsf/lsf_62/6.2/sparc-sol7-32/bin Values ◆ In cshrc.
LSF Environment Variables Set by cshrc.lsf and profile.lsf ◆ Set and exported in sh, ksh, or bash by profile.lsf: LSF_LIBDIR=/usr/share/lsf/lsf_62/6.2/sparc-sol7-32/lib Values ◆ In cshrc.lsf for csh and tcsh: setenv LSF_LIBDIR $LSF_TOP/$LSF_VERSION/$BINARY_TYPE/lib ◆ Set and exported in profile.lsf for sh, ksh, or bash: LSF_LIBDIR=$LSF_TOP/$LSF_VERSION/$BINARY_TYPE/lib LSF_SERVERDIR Syntax LSF_SERVERDIR=dir Description Directory where LSF server binaries and shell scripts are installed.
hosts For hosts with multiple IP addresses and different official host names configured at the system level, this file associates the host names and IP addresses in LSF.
IP Address Written using the conventional dotted decimal notation (nnn.nnn.nnn.nnn) and interpreted using the inet_addr routine from the Internet address manipulation library, inet(3N). Official Host Name The official host name. Single character names are not allowed. Specify -GATEWAY or -GW as part of the host name if the host serves as a GATEWAY. Specify -TAC as the last part of the host name if the host is a TAC and is a DoD host.
hosts For example, some systems map internal compute nodes to single LSF host names. A host file might contains 64 lines, each specifying an LSF host name and 32 node names that correspond to each LSF host: ... 177.16.1.1 atlasD0 atlas0 atlas1 atlas2 atlas3 atlas4 ... atlas31 177.16.1.2 atlasD1 atlas32 atlas33 atlas34 atlas35 atlas36 ... atlas63 ...
Platform LSF Reference
install.config Contents ◆ ◆ “About install.
About install.config About install.config The install.config file contains options for Platform LSF installation and configuration. Use lsfinstall -f install.config to install LSF using the options specified in install.config. Template location A template install.config is included in the installation script tar file lsf6.2_lsfinstall.tar.Z and is located in the lsf6.2_lsfinstall directory created when you uncompress and extract installation script tar file.
install.config Parameters ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ “LSF_ADD_SERVERS” “LSF_ADD_CLIENTS” “LSF_ADMINS” “LSF_CLUSTER_NAME” “LSF_DYNAMIC_HOST_WAIT_TIME” “LSF_LICENSE” “LSF_MASTER_LIST” “LSF_QUIET_INST” “LSF_TARDIR” “LSF_TOP” “ENABLE_HPC_INST” LSF_ADD_SERVERS Syntax LSF_ADD_SERVERS="host_name [ host_name...]" Description Lists the hosts in the cluster to be set up as server hosts. The first host in the list becomes the master host in lsf.cluster.cluster_name.
Parameters The LSF administrator accounts must exist on all hosts in the cluster before installing LSF The primary LSF administrator account is typically named lsfadmin. It owns the LSF configuration files and log files for job events. It also has permission to reconfigure LSF and to control batch jobs submitted by other users. It typically does not have authority to start LSF daemons. Unless an lsf.sudoers file exists to grant LSF administrators permission, only root has permission to start LSF daemons.
install.config Default INFINIT_INT (the host will never send a request to the master LIM) LSF_LICENSE Syntax LSF_LICENSE="/path /license_file " Description Full path to the name of the LSF license file. You must have a valid license file to install LSF. If you do not specify LSF_LICENSE, or lsfinstall cannot find a valid license file in the default location, lsfinstall exits. Recommended /path /license.dat Value Example LSF_LICENSE="/usr/share/lsf_distrib/liscense.
Parameters Description Do not display lsfinstall messages. Example LSF_QUIET_INST="y" Default Display all messages. (LSF_QUIET_INST="n") LSF_TARDIR Syntax LSF_TARDIR="/path " Description Full path to the directory containing the LSF distribution tar files. Example LSF_TARDIR="/usr/share/lsf_distrib" Default The parent directory of the current working directory where lsfinstall is running (../current_directory) LSF_TOP Syntax LSF_TOP="/path " Description Top-level LSF installation directory.
lim.acct The lim.acct file is the log file for Load Information Manager (LIM). Produced by lsmon, lim.acct contains host load information collected and distributed by LIM. Contents ◆ “lim.
lim.acct Structure lim.acct Structure The first line of lim.acct contains a list of load index names separated by spaces. This list of load index names can be specified in the lsmon command line. The default list is "r15s r1m r15m ut pg ls it swp mem tmp". Subsequent lines in the file contain the host’s load information at the time the information was recorded.
lsb.acct The lsb.acct file is the batch job log file of LSF. The master batch daemon (see mbatchd(8)) generates a record for each job completion or failure. The record is appended to the job log file lsb.acct. The file is located in LSB_SHAREDIR/cluster_name/logdir, where LSB_SHAREDIR must be defined in lsf.conf(5) and cluster_name is the name of the LSF cluster, as returned by lsid(1). See mbatchd(8) for the description of LSB_SHAREDIR. The bacct command uses the current lsb.acct file for its output.
lsb.acct Structure lsb.acct Structure The job log file is an ASCII file with one record per line. The fields of a record are separated by blanks. If the value of some field is unavailable, a pair of double quotation marks ("") is logged for character string, 0 for time and number, and -1 for resource usage. Configuring automatic archiving The following parameters in lsb.params affect how records are logged to lsb.
lsb.
lsb.acct Structure outFile (%s) output file name errFile (%s) Error output file name jobFile (%s) Job script file name numAskedHosts (%d) Number of host names to which job dispatching will be limited askedHosts (%s) List of host names to which job dispatching will be limited (%s for each); nothing is logged to the record for this value if the last field value is 0.
lsb.
lsb.
lsb.acct warningAction (%s) Job warning action warningTimePeriod (%d) Job warning time period in seconds chargedSAAP (%s) SAAP charged to a job licenseProject (%s) LSF License Scheduler project name EVENT_ADRSV_FINISH An advance reservation has expired.
lsb.
lsb.events The LSF batch event log file lsb.events is used to display LSF batch event history and for mbatchd failure recovery. Whenever a host, job, or queue changes status, a record is appended to the event log file. The file is located in LSB_SHAREDIR/cluster_name/logdir, where LSB_SHAREDIR must be defined in lsf.conf(5) and cluster_name is the name of the LSF cluster, as returned by lsid(1). See mbatchd(8) for the description of LSB_SHAREDIR. The bhist command searches the most current lsb.
lsb.events Structure lsb.events Structure The event log file is an ASCII file with one record per line. For the lsb.events file, the first line has the format "# ", which indicates the file position of the first history event after log switch. For the lsb.events.# file, the first line has the format "# ", which gives the timestamp of the recent event in the file. Limiting the size of lsb.events Use MAX_JOB_NUM in lsb.
lsb.events ◆ ◆ PRE_EXEC_START JOB_FORCE JOB_NEW A new job has been submitted.
lsb.
lsb.
lsb.
lsb.events If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format. The fields in order of occurrence are: Version number (%s) The version number Event time (%d) The time of the event jobId (%d) Job ID numReserHosts (%d) Number of reserved hosts in the remote cluster If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .
lsb.events Structure idx (%d) Job array index JOB_START A job has been dispatched. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.
lsb.events jFlags (%d) Job processing flags userGroup (%s) User group name idx (%d) Job array index additionalInfo (%s) Placement information of HPC jobs JOB_START_ACCEPT A job has started on the execution host(s). The fields in order of occurrence are: Version number (%s) The version number Event time (%d) The time of the event jobId (%d) Job ID jobPid (%d) Job process ID jobPGid (%d) Job process group ID idx (%d) Job array index JOB_STATUS The status of a job changed after dispatch.
lsb.events Structure subreasons (%d) Pending or suspended subreason code, see cpuTime (%f) CPU time consumed so far endTime (%d) Job completion time ru (%d) Resource usage flag lsfRusage (%s) Resource usage statistics, see exitStatus (%d) Exit status of the job, see idx (%d) Job array index exitInfo (%d) Job termination reason, see JOB_SWITCH A job switched from one queue to another (bswitch).
lsb.events Version number (%s) The version number Event time (%d) The time of the event userId (%d) UNIX user ID of the user invoking the command jobId (%d) Job ID position (%d) Position number base (%d) Operation code, (TO_TOP or TO_BOTTOM), see idx (%d) Job array index userName (%s) Name of the job submitter QUEUE_CTRL A job queue has been altered.
lsb.events Structure Version number (%s) The version number Event time (%d) The time of the event opCode (%d) Operation code, see host (%s) Host name userId (%d) UNIX user ID of the user invoking the command userName (%s) Name of the user ctrlComments (%s) Administrator comment text from the -C option of badmin host control commands hclose and hopen MBD_START The mbatchd has started.
lsb.events master (%s) Master host name numRemoveJobs (%d) Number of finished jobs that have been removed from the system and logged in the current event file exitCode (%d) Exit code from mbatchd ctrlComments (%s) Administrator comment text from the -C option of badmin mbdrestart UNFULFILL Actions that were not taken because the mbatchd was unable to contact the sbatchd on the job execution host.
lsb.events Structure Version number (%s) The version number Event time (%d) The time of the event nIdx (%d) Number of index names name (%s) List of index names JOB_SIGACT An action on a job has been taken.
lsb.events MIG A job has been migrated (bmig).
lsb.events Structure userName (%s) User name submitTime (%d) Job submission time umask (%d) File creation mask for this job numProcessors (%d) Number of processors requested for execution. The value 2147483646 means the number of processors is undefined.
lsb.
lsb.events Structure chkpntDir (%s) Checkpoint directory nxf (%d) Number of files to transfer xf (%s) List of file transfer specifications jobFile (%s) Job file name fromHost (%s) Submission host name cwd (%s) Current working directory preExecCmd (%s) Job pre-execution command mailUser (%s) Mail user name projectName (%s) Project name niosPort (%d) Callback port if batch interactive job maxNumProcessors (%d) Maximum number of processors.
lsb.events jobGroup (%s) The job group to which the job is attached sla (%s) SLA service class name that the job is to be attached to extsched (%s) External scheduling options warningAction (%s) Job warning action warningTimePeriod (%d) Job warning time period in seconds licenseProject (%s) LSF License Scheduler project name JOB_SIGNAL This is created when a job is signaled with bkill or deleted with bdel.
lsb.
lsb.events Version number (%s) The version number Event time (%d) The time of the event jobId (%d) Job ID idx (%d) Job array index JOB_EXCEPTION This is created when an exception condition is detected for a job.
lsb.events Structure idx (%d) Job array index JOB_EXT_MSG An external message has been sent to a job.
lsb.events idx (%d) Job array index msgIdx (%d) Index in the list dataSize (%ld) Size of the data if is has any, otherwise 0 dataStatus (%d) Status of the attached data fileName (%s) File name of the attached data JOB_CHUNK This is created when a job is inserted into a chunk. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.
lsb.events Structure Version number (%s) The version number Event time (%d) The time of the event jobId (%d) Job ID actPid (%d) Acting processing ID jobPid (%d) Job process ID jobPGid (%d) Job process group ID newStatus (%d) New status of the job reason (%d) Pending or suspending reason code, see suspreason (%d) Pending or suspending subreason code, see lsfRusage The following fields contain resource usage information for the job (see getrusage(2)).
lsb.
lsb.events Structure 1: Action started 2: One action preempted other actions 3: Action succeeded 4: Action Failed sigValue (%d) Signal value seq (%d) Sequence status of the job idx (%d) Job array index jRusage The following fields contain resource usage information for the job. If the value of some field is unavailable (due to job exit or the difference among the operating systems), -1 will be logged. Times are measured in seconds, and sizes are measured in KB.
lsb.events exitInfo (%d) Job termination reason, see PRE_EXEC_START A pre-execution command has been started.
lsb.events Structure JOB_FORCE A job has been forced to run with brun. Version number (%s) The version number Event time (%d) The time of the event jobId (%d) Job ID userId (%d) UNIX user ID of the user invoking the command idx (%d) Job array index options (%d) Bit flags for job processing numExecHosts (%ld) Number of execution hosts If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the execHosts field.
lsb.events SEE ALSO Related Topics: lsid(1), getrlimit(2), lsb_geteventrec(3), lsb.acct(5), lsb.queues(5), lsb.hosts(5), lsb.users(5), lsb.params(5), lsf.conf(5), lsf.cluster(5), badmin(8), bhist(1), mbatchd(8) Files: LSB_SHAREDIR/cluster_name/logdir/lsb.events[.
SEE ALSO 352 Platform LSF Reference
lsb.hosts The lsb.hosts file contains host-related configuration information for the server hosts in the cluster. It is also used to define host groups and host partitions. This file is optional. All sections are optional. By default, this file is installed in LSB_CONFDIR/cluster_name/configdir. Changing lsb.hosts configuration After making any changes to lsb.hosts, run badmin reconfig to reconfigure mbatchd.
Host Section Host Section Description Optional. Defines the hosts, host types, and host models used as server hosts, and contains per-host configuration information. If this section is not configured, LSF uses all hosts in the cluster (the hosts listed in lsf.cluster.cluster_name) as server hosts.
lsb.hosts DISPATCH_WINDOW Description The time windows in which jobs from this host, host model, or host type are dispatched. Once dispatched, jobs are no longer affected by the dispatch window. Default Undefined (always open). EXIT_RATE Description Specifies a threshold in minutes for exited jobs. If the job exit rate is exceeded for 10 minutes or the period specified by JOB_EXIT_RATE_DURATION, LSF invokes LSF_SERVERDIR/eadmin to trigger a host exception.
Host Section MXJ Description The number of job slots on the host. With MultiCluster resource leasing model, this is the number of job slots on the host that are available to the local cluster. Use “!” to make the number of job slots equal to the number of CPUs on a host. For the reserved host name default, “!” makes the number of job slots equal to the number of CPUs on all hosts in the cluster not otherwise referenced in the section.
lsb.hosts SUNSOL is a host type defined in lsf.shared. This example Host section configures one host and one host type explicitly and configures default values for all other loadsharing hosts. HostA runs one batch job at a time. A job will only be started on hostA if the r1m index is below 0.6 and the pg index is below 10; the running job is stopped if the r1m index goes above 1.6 or the pg index goes above 20.
HostGroup Section HostGroup Section Description Optional. Defines host groups. The name of the host group can then be used in other host group, host partition, and queue definitions, as well as on the command line. Specifying the name of a host group has exactly the same effect as listing the names of all the hosts in the group. Structure Host groups are specified in the same format as user groups in lsb.users.
lsb.hosts When a leased-in host joins the cluster, the host name is in the form of host@cluster. For these hosts, only the host part of the host name is subject to pattern definitions. You can use the following special characters to specify host group members: ◆ ◆ ◆ ◆ ◆ Restrictions ◆ Use a tilde (~) to exclude specified hosts or host groups from the list. Use an asterisk (*) as a wildcard character to represent any number of characters.
HostGroup Section Example 2 Begin HostGroup GROUP_NAME GROUP_MEMBER groupA (all) groupB (groupA ~hostA ~hostB) groupC (hostX hostY hostZ) groupD (groupC ~hostX) groupE (all ~groupC ~hostB) groupF (hostF groupC hostK) End HostGroup This example defines the following host groups: ◆ groupA contains all hosts in the cluster. ◆ groupB contains all the hosts in the cluster except for hostA and hostB. ◆ groupC contains only hostX, hostY, and hostZ. ◆ groupD contains the hosts in groupC except for hostX.
lsb.hosts ◆ groupE shows uncondensed output, and contains all hosts from hostC51 to hostC100 and all hosts from hostC151 to hostC200. ◆ groupF shows condensed output, and contains hostD1, hostD3, and all hosts from hostD5 to hostD10. groupG shows uncondensed output, and contains all hosts from hostD11 to hostD50 except for hostD15, hostD20, and hostD25. groupG also includes hostD2.
HostPartition Section HostPartition Section Description Optional; used with host partition user-based fairshare scheduling. Defines a host partition, which defines a user-based fairshare policy at the host level. Configure multiple sections to define multiple partitions. The members of a host partition form a host group with the same name as the host partition. Limitations on Queue Configuration ◆ ◆ If you configure a host partition, you cannot configure fairshare at the queue level.
lsb.hosts Hosts that are not included in any host partition are controlled by the FCFS scheduling policy instead of the fairshare scheduling policy. Optionally, use the reserved host name all to configure a single partition that applies to all hosts in a cluster. Optionally, use the not operator (~) to exclude hosts or host groups from the list of hosts in the host partition. Examples HOSTS=all ~hostK ~hostM The partition includes all the hosts in the cluster, except for hostK and hostM.
HostPartition Section Example of a HostPartition Section Begin HostPartition HPART_NAME = Partition1 HOSTS = hostA hostB USER_SHARES = [groupA@, 3] [groupB, 7] [default, 1] End HostPartition 364 Platform LSF Reference
lsb.hosts Automatic Time-based Configuration Variable configuration is used to automatically change LSF configuration based on time windows. You define automatic configuration changes in lsb.hosts by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command. The expressions are evaluated by LSF every 10 minutes based on mbatchd start time.
Automatic Time-based Configuration 366 Platform LSF Reference
lsb.modules The lsb.modules file contains configuration information for LSF scheduler and resource broker modules. The file contains only one section, named PluginModule. This file is optional. If no scheduler or resource broker modules are configured, LSF uses the default scheduler plugin modules named schmod_default and schmod_fcfs. The lsb.modules file is stored in the directory LSB_CONFDIR/cluster_name/configdir, where LSB_CONFDIR is defined in lsf.conf. Changing lsb.
PluginModule Section PluginModule Section Description Defines the plugin modules for the LSF scheduler and LSF resource broker. If this section is not configured, LSF uses the default scheduler plugin modules named schmod_default and schmod_fcfs, which enable the LSF default scheduling features.
lsb.modules schmod_fcfs Enables the first-come, first-served (FCFS) scheduler features. schmod_fcfs can appear anywhere in the SCH_PLUGIN list. By default, if schmod_fcfs is not configured in lsb.modules, it is loaded automatically along with schmod_default. Source code (sch.mod.fcfs.c) for the schmod_fcfs scheduler plugin module is installed in the directory LSF_TOP/6.
PluginModule Section contains sample plugin code. See Using the Platform LSF SDK for more detailed information about writing, building, and configuring your own custom scheduler plugins. schmod_jobweight An optional scheduler plugin module to enable Cross-Queue Job Weight scheduling policies. The schmod_jobweight plugin must be listed before schmod_cpuset and schmod_rms, and after all other scheduler plugin modules.
lsb.modules In the order phase, the scheduler applies policies such as FCFS, Fairshare and Hostpartition and consider job priorities within user groups and share groups. By default, job priority within a pool of jobs from the same user is based on how long the job has been pending. For resource intensive jobs (jobs requiring a lot of CPUs or a large amount of memory), resource reservation is performed so that these jobs are not starved.
SEE ALSO SEE ALSO lsf.cluster(5), lsf.
lsb.params The lsb.params file defines general parameters used by the LSF system. This file contains only one section, named Parameters. mbatchd uses lsb.params for initialization. The file is optional. If not present, the LSF-defined defaults are assumed. Some of the parameters that can be defined in lsb.params control timing within the system. The default settings provide good throughput for long-running batch jobs while adding a minimum of processing overhead in the batch daemons.
Parameters Section Parameters Section This section and all the keywords in this section are optional. If keywords are not present, the default values are assumed.
lsb.
Parameters Section See also ◆ ◆ ◆ ACCT_ARCHIVE_SIZE also enables automatic archiving. ACCT_ARCHIVE_TIME also enables automatic archiving. MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives. Default Undefined (no limit to the age of lsb.acct). ACCT_ARCHIVE_SIZE Syntax ACCT_ARCHIVE_SIZE=kilobytes Description Enables automatic archiving of LSF accounting log files, and specifies the archive threshold. LSF archives the current log file if its size exceeds the specified number of kilobytes.
lsb.
Parameters Section COMMITTED_RUN_TIME_FACTOR Syntax COMMITTED_RUN_TIME_FACTOR=number Description Used only with fairshare scheduling. Committed run time weighting factor. In the calculation of a user’s dynamic priority, this factor determines the relative importance of the committed run time in the calculation. If the -W option of bsub is not specified at job submission and a RUNLIMIT has not been set for the queue, the committed run time is not considered. Valid Values Any positive number between 0.
lsb.params DETECT_IDLE_JOB_AFTER Syntax DETECT_IDLE_JOB_AFTER=time_minutes Description The minimum job run time before mbatchd reports that the job is idle. Default 20 (mbatchd checks if the job is idle after 20 minutes of run time) DISABLE_UACCT_MAP Syntax DISABLE_UACCT_MAP=y | Y Description Specify y or Y to disable user-level account mapping. Default Undefined EADMIN_TRIGGER_DURATION Description Defines how often LSF_SERVERDIR/eadmin is invoked once a job exception is detected.
Parameters Section The directories are always synchronized when data is logged to the files, or when mbatchd is started on the first LSF master host. Use this parameter if NFS traffic is too high and you want to reduce network traffic. Valid Values 1 to INFINIT_INT INFINIT_INT is defined in lsf.h Default Undefined See also See “lsf.conf ” under “LSB_LOCALDIR” on page 523. HIST_HOURS Syntax HIST_HOURS=hours Description Used only with fairshare scheduling.
lsb.params Use JOB_ATTA_DIR if you use bpost(1) and bread(1)to transfer large data files between jobs and want to avoid using space in LSB_SHAREDDIR. By default, the bread(1) command reads attachment data from the JOB_ATTA_DIR directory. JOB_ATTA_DIR should be shared by all hosts in the cluster, so that any potential LSF master host can reach it. Like LSB_SHAREDIR, the directory should be owned and writable by the primary LSF administrator. The directory must have at least 1 MB of free space.
Parameters Section Description Allows LSF administrators to control whether users can use btop and bbot to move jobs to the top and bottom of queues. When JOB_POSITION_CONTROL_BY_ADMIN=Y, only the LSF administrator (including any queue administrators) can use bbot and btop to move jobs within a queue.
lsb.params If JOB_SPOOL_DIR is not accessible or does not exist, files are spooled to the default job output directory $HOME/.lsbatch. For bsub -is and bsub -Zs, JOB_SPOOL_DIR must be readable and writable by the job submission user, and it must be shared by the master host and the submission host.
Parameters Section MAX_ACCT_ARCHIVE_FILE Syntax MAX_ACCT_ARCHIVE_FILE=integer Description Enables automatic deletion of archived LSF accounting log files and specifies the archive limit. Compatibility ACCT_ARCHIVE_SIZE or ACCT_ARCHIVE_AGE should also be defined. Example MAX_ACCT_ARCHIVE_FILE=10 LSF maintains the current lsb.acct and up to 10 archives. Every time the old lsb.acct.9 becomes lsb.acct.10, the old lsb.acct.10 gets deleted. See also ◆ ◆ ◆ ◆ ACCT_ARCHIVE_AGE also enables automatic archiving.
lsb.params IMPORTANT If you are using local duplicate event logging, you must run badmin mbdrestart after changing MAX_INFO_DIRS for the changes to take effect. Valid values 1-1024 Default Not defined (no subdirectories under the info directory; mbatchd writes all jobfiles to the info directory) Example MAX_INFO_DIRS=10 mbatchd creates ten subdirectories from LSB_SHAREDIR/cluster_name/logdir/info/0 to LSB_SHAREDIR/cluster_name/logdir/info/9.
Parameters Section LSF assigns job IDs in sequence. When the job ID limit is reached, the count rolls over, so the next job submitted gets job ID "1". If the original job 1 remains in the system, LSF skips that number and assigns job ID "2", or the next available job ID. If you have so many jobs in the system that the low job IDs are still in use when the maximum job ID is assigned, jobs with sequential numbers could have totally different submission times.
lsb.params MAX_PEND_JOBS Syntax MAX_PEND_JOBS=integer Description The maximum number of pending jobs in the system. This is the hard system-wide pending job threshold. No user or user group can exceed this limit unless the job is forwarded from a remote cluster. If the user or user group submitting the job has reached the pending job threshold as specified by MAX_PEND_JOBS, LSF will reject any further job submission requests sent by that user or user group.
Parameters Section Description The maximum number of retries for reaching a non-responding slave batch daemon, sbatchd. The interval between retries is defined by MBD_SLEEP_TIME. If mbatchd fails to reach a host and has retried MAX_SBD_FAIL times, the host is considered unreachable. When a host becomes unreachable, mbatchd assumes that all jobs running on that host have exited and that all rerunnable jobs (jobs submitted with the bsub -r option) are scheduled to be rerun on another host.
lsb.params If MBD_REFRESH_TIME is < 10 seconds, the child mbatchd exits at MBD_REFRESH_TIME if the job changes status or a new job is submitted before MBD_REFRESH_TIME expires ◆ If MBD_REFRESH_TIME > 10 seconds, the child mbatchd exits at 10 seconds if the job changes status or a new job is submitted before the 10 seconds ◆ If MBD_REFRESH_TIME > 10 seconds and no job changes status or no new job is submitted, the child mbatchd exits at MBD_REFRESH_TIME The value of this parameter must be between 5 and 300.
Parameters Section Description MultiCluster job forwarding model only. Pending reason update package size, in KB. Defines the maximum amount of pending reason data this cluster will send to submission clusters in one cycle. Specify the keyword 0 (zero) to disable the limit and allow any amount of data in one package. Default 512 MC_PENDING_REASON_UPDATE_INTERVAL Syntax MC_PENDING_REASON_UPDATE_INTERVAL=seconds | 0 Description MultiCluster job forwarding model only.
lsb.params NO_PREEMPT_FINISH_TIME Syntax NO_PREEMPT_FINISH_TIME=finish_time Description If set, jobs that will finish within the specified number of minutes will not be preempted. Run time is wall-clock time, not normalized run time. You must define a run limit for the job, either at job level by bsub -W option or in the queue by configuring RUNLIMIT in lsb.queues. NQS_QUEUES_FLAGS Syntax NQS_QUEUES_FLAGS=integer Description For Cray NQS compatibility only. Used by LSF to get the NQS queue information.
Parameters Section ◆ See SLOTS and SLOTS_PER_PROCESSOR under “lsb.resources” on page 433 PEND_REASON_UPDATE_INTERVAL Syntax PEND_REASON_UPDATE_INTERVAL=seconds Description Time interval that defines how often pending reasons are calculated by the scheduling daemon mbschd. Default 30 seconds PEND_REASON_MAX_JOBS Syntax PEND_REASON_MAX_JOBS=integer Description Number of jobs for each user per queue for which pending reasons are calculated by the scheduling daemon mbschd.
lsb.params Description If preemptive scheduling is enabled, this parameter can change the behavior of job slot limits and can also enable the optimized preemption mechanism for parallel jobs. Specify a space-separated list of the following keywords: GROUP_MAX—LSF does not count suspended jobs against the total job slot limit for user groups, specified at the user level (MAX_JOBS in lsb.
Parameters Section Some parallel jobs need to reserve resources based on job slots, rather than by host. In this example, if per-slot reservation is enabled by RESOURCE_RESERVE_PER_SLOT, the job my_job must reserve 500 MB of memory for each job slot (4*500=2 GB) on the host in order to run.
lsb.params See also “MAX_PEND_JOBS” on page 387 SYSTEM_MAPPING_ACCOUNT Syntax SYSTEM_MAPPING_ACCOUNT=user_account Description LSF Windows Workgroup installations only. User account to which all Windows workgroup user accounts are mapped.
Automatic Time-based Configuration Automatic Time-based Configuration Variable configuration is used to automatically change LSF configuration based on time windows. You define automatic configuration changes in lsb.params by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command. The expressions are evaluated by LSF every 10 minutes based on mbatchd start time.
lsb.params SEE ALSO lsf.conf(5), lsb.params(5), lsb.hosts(5), lsb.
SEE ALSO 398 Platform LSF Reference
lsb.queues The lsb.queues file defines batch queues. Numerous controls are available at the queue level to allow cluster administrators to customize site policies. This file is optional; if no queues are configured, LSF creates a queue named default, with all parameters set to default values. This file is installed by default in LSB_CONFDIR/cluster_name/configdir. Changing lsb.queues configuration After making any changes to lsb.queues, run badmin reconfig to reconfigure mbatchd. Contents ◆ ◆ “lsb.
lsb.queues Structure lsb.queues Structure Each queue definition begins with the line Begin Queue and ends with the line End Queue. The queue name must be specified; all other parameters are optional. ADMINISTRATORS Syntax ADMINISTRATORS=user_name | user_group ... Description List of queue administrators. Queue administrators can perform operations on any user’s job in the queue, as well as on the queue itself.
lsb.queues Job chunking can have the following advantages: Reduces communication between sbatchd and mbatchd and reduces scheduling overhead in mbschd. ◆ Increases job throughput in mbatchd and CPU utilization on the execution hosts. However, throughput can deteriorate if the chunk job size is too big. Performance may decrease on queues with CHUNK_JOB_SIZE greater than 30. You should evaluate the chunk job size on your own systems for best performance.
lsb.queues Structure When the total CPU time for the whole job has reached the limit, a SIGXCPU signal is sent to all processes belonging to the job. If the job has no signal handler for SIGXCPU, the job is killed immediately. If the SIGXCPU signal is handled, blocked, or ignored by the application, then after the grace period expires, LSF sends SIGINT, SIGTERM, and SIGKILL to the job to kill it.
lsb.queues DEFAULT_EXTSCHED Syntax DEFAULT_EXTSCHED=external_scheduler_options Description Specifies default external scheduling options for the queue. -extsched options on the bsub command are merged with DEFAULT_EXTSCHED options, and -extsched options override any conflicting queue-level options set by DEFAULT_EXTSCHED. Default Undefined DEFAULT_HOST_SPEC Syntax DEFAULT_HOST_SPEC=host_name | host_model Description The default CPU time normalization host for the queue.
lsb.queues Structure Default Undefined DISPATCH_WINDOW Syntax DISPATCH_WINDOW=time_window ... Description The time windows in which jobs from this queue are dispatched. Once dispatched, jobs are no longer affected by the dispatch window. Default Undefined (always open) EXCLUSIVE Syntax EXCLUSIVE=Y | N Description If Y, specifies an exclusive queue. Jobs submitted to an exclusive queue with bsub -x will only be dispatched to a host that has no other LSF jobs running.
lsb.queues The number of shares assigned to each user is only meaningful when you compare it to the shares assigned to other users or to the total number of shares. The total number of shares is just the sum of all the shares assigned in each share assignment. Compatibility Do not configure hosts in a cluster to use fairshare at both queue and host levels. However, you can configure user-based fairshare and queue-based fairshare together.
lsb.queues Structure FILELIMIT Syntax FILELIMIT=integer Description The per-process (hard) file size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)). Default Unlimited HJOB_LIMIT Syntax HJOB_LIMIT=integer Description Per-host job slot limit. Maximum number of job slots that this queue can use on any host. This limit is configured per host, regardless of the number of processors it may have.
lsb.queues If host groups and host partitions are included in the list, the job can run on any host in the group or partition. All the members of the host list should either belong to a single host partition or not belong to any host partition. Otherwise, job scheduling may be affected. Some items can be followed by a plus sign (+) and a positive number to indicate the preference for dispatching a job to that host. A higher number indicates a higher preference.
lsb.queues Structure Run jobs on all hosts in the cluster, except for hostA. With MultiCluster resource leasing model, this queue does not use borrowed hosts. Example 4 HOSTS=Group1 ~hostA hostB hostC Run jobs on hostB, hostC, and all hosts in Group1 except for hostA. With MultiCluster resource leasing model, this queue will use borrowed hosts if Group1 uses the keyword allremote.
lsb.queues Specify the minimum number of seconds for the job to be considered for backfilling.This minimal time slice depends on the specific job properties; it must be longer than at least one useful iteration of the job. Multiple queues may be created if a site has jobs of distinctively different classes.
lsb.queues Structure Default Undefined (the queue uses JOB_ACCEPT_INTERVAL defined in lsb.params, which has a default value of 1) JOB_ACTION_WARNING_TIME Syntax JOB_ACTION_WARNING_TIME=[hour :]minute Description Specifies the amount of time before a job control action occurs that a job warning action is to be taken. For example, 2 minutes before the job reaches run time limit or termination deadline, or the queue's run window is closed, an URG signal is sent to the job.
lsb.queues ◆ ◆ ◆ The standard input, output, and error of the command are redirected to the NULL device, so you cannot tell directly whether the command runs correctly. The default null device on UNIX is /dev/null. The command is run as the user of the job. All environment variables set for the job are also set for the command action.
lsb.queues Structure Example JOB_IDLE=0.10 A job idle exception is triggered for jobs with an idle value (CPU time/runtime) less than 0.10. Default Undefined. No job idle exceptions are detected. JOB_OVERRUN Syntax JOB_OVERRUN=run_time Description Specifies a threshold for job overrun exception handling. If a job runs longer than the specified run time, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job overrun exception.
lsb.queues JOB_WARNING_ACTION Syntax JOB_WARNING_ACTION=signal Description Specifies the job action to be taken before a job control action occurs. For example, 2 minutes before the job reaches run time limit or termination deadline, or the queue's run window is closed, an URG signal is sent to the job. A job warning action must be specified with a job action warning time in order for job warning to take effect.
lsb.queues Structure Example MEM=100/10 SWAP=200/30 These two lines translate into a loadSched condition of mem>=100 && swap>=200 and a loadStop condition of mem < 10 || swap < 30 Default Undefined MANDATORY_EXTSCHED Syntax MANDATORY_EXTSCHED=external_scheduler_options Description Specifies mandatory external scheduling options for the queue.
lsb.queues LSF has two methods of enforcing memory usage: ◆ ◆ OS Memory Limit Enforcement LSF Memory Limit Enforcement OS memory limit OS memory limit enforcement is the default MEMLIMIT behavior and does not require enforcement further configuration. OS enforcement usually allows the process to eventually run to completion. LSF passes MEMLIMIT to the OS which uses it as a guide for the system scheduler and memory allocator. The system may allocate more memory to a process if there is a surplus.
lsb.queues Structure Members of a chunk job can be migrated. Chunk jobs in WAIT state are removed from the job chunk and put into PEND state. Default Undefined (no automatic job migration) NEW_JOB_SCHED_DELAY Syntax NEW_JOB_SCHED_DELAY=seconds Description The number of seconds that a new job waits, before being scheduled. A value of zero (0) means the job is scheduled without any delay.
lsb.queues Since many features of LSF are not supported by NQS, the following queue configuration parameters are ignored for NQS forward queues: PJOB_LIMIT, POLICIES, RUN_WINDOW, DISPATCH_WINDOW, RUNLIMIT, HOSTS, MIG. In addition, scheduling load threshold parameters are ignored because NQS does not provide load information about hosts. Default Undefined PJOB_LIMIT Syntax PJOB_LIMIT=float Description Per-processor job slot limit for the queue.
lsb.queues Structure ◆ ◆ When the post-execution command is run, the environment variable LSB_JOBEXIT_STAT is set to the exit status of the job. Refer to the manual page for wait(2) for the format of this exit status. The post-execution command is also run if a job is requeued because the job’s execution environment fails to be set up or if the job exits with one of the queue’s REQUEUE_EXIT_VALUES. The environment variable LSB_JOBPEND is set if the job is requeued.
lsb.queues Description Enables preemptive scheduling and defines a preemption policy for the queue. You can specify PREEMPTIVE or PREEMPTABLE or both.When you specify a list of queues, you must enclose the list in one set of square brackets. ◆ PREEMPTIVE defines a preemptive queue. Jobs in this queue preempt jobs from the specified lower-priority queues or from all lower-priority queues by default (if the parameter is specified with no queue names).
lsb.queues Structure Default Unlimited PROCLIMIT Syntax PROCLIMIT=[minimum_limit [default_limit]] maximum_limit Description Maximum number of slots that can be allocated to a job. For parallel jobs, the maximum number of processors that can be allocated to the job. Optionally specifies the minimum and default number of job slots.
lsb.queues Default You must specify this parameter to define a queue. The default queue automatically created by LSF is named default. RCVJOBS_FROM Syntax RCVJOBS_FROM=cluster_name ... | allclusters Description MultiCluster only. Defines a MultiCluster receive-jobs queue. Specify cluster names, separated by a space. The administrator of each remote cluster determines which queues in that cluster will forward jobs to the local cluster. Use the keyword allclusters to specify any remote cluster.
lsb.queues Structure RESOURCE_RESERVE Syntax RESOURCE_RESERVE=MAX_RESERVE_TIME[integer ] Description Enables processor reservation and memory reservation for pending jobs for the queue. Specifies the number of dispatch turns (MAX_RESERVE_TIME) over which a job can reserve job slots and memory. Overrides the SLOT_RESERVE parameter. If both RESOURCE_RESERVE and SLOT_RESERVE are defined in the same queue, an error is displayed when the cluster is reconfigured, and SLOT_RESERVE is ignored.
lsb.queues ◆ Given a RES_REQ definition in a queue: RES_REQ=rusage[mem=200:lic=1] ... and job submission: bsub -R'rusage[mem=100]' ... The resulting requirement for the job is rusage[mem=100:lic=1] where mem=100 specified by the job overrides mem=200 specified by the queue. However, lic=1 from queue is kept, since job does not specify it. ◆ For the following queue-level RES_REQ (decay and duration defined): RES_REQ=rusage[mem=200:duration=20:decay=1] ...
lsb.queues Structure Description Time periods during which jobs in the queue are allowed to run. When the window closes, LSF suspends jobs running in the queue and stops dispatching jobs from the queue. When the window reopens, LSF resumes the suspended jobs and begins dispatching additional jobs.
lsb.queues If no host or host model is given, LSF uses the default run time normalization host defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it has been configured; otherwise, LSF uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC in lsb.params) if it has been configured; otherwise, the host with the largest CPU factor (the fastest host in the cluster).
lsb.queues Structure If BACKFILL is configured in a queue, and a run limit is specified with -W on bsub or with RUNLIMIT in the queue, backfill parallel jobs can use job slots reserved by the other jobs, as long as the backfill job can finish before the predicted start time of the jobs with the reservation. Unlike memory reservation, which applies both to sequential and parallel jobs, slot reservation applies only to parallel jobs.
lsb.queues LSF will not suspend the only job running on the host if the machine is interactively idle (it > 0). ◆ LSF will not suspend a forced job (brun -f). ◆ LSF will not suspend a job because of paging rate if the machine is interactively idle. If STOP_COND is specified in the queue and there are no load thresholds, the suspending reasons for each individual load index will not be displayed by bjobs.
lsb.queues Structure THREADLIMIT Syntax THREADLIMIT=[default_limit] maximum_limit Description Limits the number of concurrent threads that can be part of a job. Exceeding the limit causes the job to terminate. The system sends the following signals in sequence to all processes belongs to the job: SIGINT, SIGTERM, and SIGKILL. By default, if a default thread limit is specified, jobs submitted to the queue without a job-level thread limit are killed when the default thread limit is reached.
lsb.queues Use the not operator (~) to exclude users from the all specification or from user groups. This is useful if you have a large number of users but only want to exclude a few users or groups from the queue definition. The not operator can only be used with the all keyword or to exclude users from user groups. The not operator does not exclude LSF administrators from the queue definintion.
Automatic Time-based Configuration Automatic Time-based Configuration Variable configuration is used to automatically change LSF configuration based on time windows. You define automatic configuration changes in lsb.queues by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command. The expressions are evaluated by LSF every 10 minutes based on mbatchd start time.
lsb.queues SEE ALSO lsf.cluster(5), lsf.conf(5), lsb.params(5), lsb.hosts(5), lsb.users(5), lsf.
SEE ALSO 432 Platform LSF Reference
lsb.resources The lsb.resources file contains configuration information for resource allocation limits, exports, and resource usage limits. This file is optional. The lsb.resources file is stored in the directory LSB_CONFDIR/cluster_name/configdir, where LSB_CONFDIR is defined in lsf.conf. Changing lsb.resources configuration After making any changes to lsb.resources, run badmin reconfig to reconfigure mbatchd.
Limit Section Limit Section Sets limits for the maximum amount of the specified resources must be available for different classes of jobs to start, and which resource consumers the limits apply to. Limits are enforced during job resource allocation. For limits to be enforced, jobs must specify rusage resource requirements (bsub -R or RES_REQ in lsb.queues). The blimits command displays view current usage of resource allocation limit configured in Limit sections in lsb.
lsb.resources Horizontal format Use the horizontal format to give a name for your limits and to configure more complicated combinations of consumers and resource limits. The first line of the Limit section gives the name of the limit configuration.
Limit Section Compatibility with lsb.queues, lsb.users, and lsb.hosts The Limit section of lsb.resources does not support the keywords or format used in lsb.users, lsb.hosts, and lsb.queues. However, your existing job slot limit configuration in these files will continue to apply. Job slot limits are the only type of limit you can configure in lsb.users, lsb.hosts, and lsb.queues. You cannot configure limits for user groups, host groups, and projects in lsb.users, lsb.hosts, and lsb.queues.
lsb.resources Example 3 HOSTS (all ~hostK ~hostM) SWP 10 Enforces a 10 MB swap limit on all hosts in the cluster, except for hostK and hostM LICENSE Syntax LICENSE=[license_name ,integer ] [[license_name ,integer ] ...] LICENSE ( [license_name ,integer ] [[license_name ,integer ] ...] ) Description Maximum number of specified software licenses available to resource consumers. The value must be a positive integer greater than or equal to zero.
Limit Section If only QUEUES are configured in the Limit section, MEM must be an integer value. MEM is the maximum amount of memory available to the listed queues for any hosts, users, or projects. If only USERS are configured in the Limit section, MEM must be an integer value. MEM is the maximum amount of memory that the users or user groups can use on any hosts, queues, or projects. If only HOSTS are configured in the Limit section, MEM must be an integer value. It cannot be a percentage.
lsb.resources Use the keyword all to configure limits that apply to each host in a cluster. If host groups are configured, the limit applies to each member of the host group, not the group as a whole. Use the not operator (~) to exclude hosts or host groups from the all specification in the limit. This is useful if you have a large cluster but only want to exclude a few hosts from the limit definition. In vertical tabular format, multiple host names must be enclosed in parentheses.
Limit Section Use the not operator (~) to exclude queues from the all specification in the limit. This is useful if you have a large number of queues but only want to exclude a few queues from the limit definition. In vertical tabular format, multiple queue names must be enclosed in parentheses. In vertical tabular format, use empty parentheses () or a dash (-) to indicate each queue. Fields cannot be left blank. Default None.
lsb.resources Description A space-separated list of project names on which limits are enforced. Limits are enforced on all projects listed. To specify a per-project limit, use the PER_PROJECT keyword. Do not configure PROJECTS and PER_PROJECT limits in the same Limit section. In horizontal format, use only one PROJECTS line per Limit section. Use the keyword all to configure limits that apply to all projects in a cluster.
Limit Section The RESOURCE keyword is a synonym for the LICENSE keyword. You can use RESOURCE to configure software licenses. You cannot specify RESOURCE and LICENSE in the same Limit section. In horizontal format, use only one RESOURCE line per Limit section. In vertical tabular format, resource names must be enclosed in parentheses. In vertical tabular format, use empty parentheses () or a dash (-) to indicate all queues. Fields cannot be left blank.
lsb.resources In horizontal format, use only one SLOTS line per Limit section. In vertical format, use empty parentheses () or a dash (-) to indicate the default value (no limit). Fields cannot be left blank. Default No limit Example SLOTS=20 SLOTS_PER_PROCESSOR Syntax SLOTS_PER_PROCESSOR=number SLOTS_PER_PROCESSOR - | number Description Per processor job slot limit, based on the number of processors on each host affected by the limit.
Limit Section If only PER_HOST is configured in the Limit section, SLOTS_PER_PROCESSOR is the maximum amount of job slots per processor available to the listed hosts for any users, queues, or projects. If only PROJECTS and PER_HOST are configured in the Limit section, SLOTS_PER_PROCESSOR is the maximum amount of job slots per processor available to the listed projects for any users, queues, or hosts.
lsb.resources Default No limit Example SWP=60 TMP Syntax TMP=integer[%] TMP - | integer[%] Description Maximum amount of tmp space available to resource consumers. Specify a value in MB or a percentage (%) as a positive integer greater than or equal 0. If you specify a percentage, you must also specify PER_HOST and list the hosts that the limit is to be enforced on.
Limit Section If a group contains a subgroup, the limit also applies to each member in the subgroup recursively. User names must be valid login names. User group names can be LSF user groups or UNIX and Windows user groups. To specify a per-user limit, use the PER_USER keyword. Do not configure USERS and PER_USER limits in the same Limit section. In horizontal format, use only one USERS line per Limit section. Use the keyword all to configure limits that apply to all users or user groups in a cluster.
lsb.resources HostExport Section Defines an export policy for a host or a group of related hosts. Defines how much of each host’s resources are exported, and how the resources are distributed among the consumers. Each export policy is defined in a separate HostExport section, so it is normal to have multiple HostExport sections in lsb.resources.
HostExport Section Description Required. Specifies how the exported resources are distributed among consumer clusters. The syntax for the distribution list is a series of share assignments. The syntax of each share assignment is the cluster name, a comma, and the number of shares, all enclosed in square brackets, as shown. Use a space to separate multiple share assignments. Enclose the full distribution list in a set of round brackets.
lsb.
SharedResourceExport Section SharedResourceExport Section Optional. Requires HostExport section. Defines an export policy for a shared resource. Defines how much of the shared resource is exported, and the distribution among the consumers. The shared resource must be available on hosts defined in the HostExport sections.
lsb.resources ResourceReservation Section By default, only LSF administrators or root can add or delete advance reservations. The ResourceReservation section defines an advance reservation policy. It specifies: Users or user groups that can create reservations ◆ Hosts that can be used for the reservation ◆ Time window when reservations can be created Each advance reservation policy is defined in a separate ResourceReservation section, so it is normal to have multiple ResourceReservation sections in lsb.
ResourceReservation Section Use all@cluster_name to specify the group of all hosts borrowed from one remote cluster. You cannot specify a host group or partition that includes remote resources. With MultiCluster resource leasing model, the not operator (~) can be used to exclude local hosts or host groups. You cannot use the not operator (~) with remote hosts. Examples ◆ ◆ HOSTS=hgroup1 ~hostA hostB hostC Advance reservations can be created on hostB, hostC, and all hosts in hgroup1 except for hostA.
lsb.resources You can specify multiple time windows, but they cannot overlap. For example: TIME_WINDOW=8:00-14:00 18:00-22:00 is correct, but TIME_WINDOW=8:00-14:00 11:00-15:00 is not valid. Example TIME_WINDOW=8:00-14:00 Users can create advance reservations with begin time (brsvadd -b), end time (brsvadd -e), or time window (brsvadd -t) on any day between 8:00 a.m. and 2:00 p.m. Default Undefined (any time) USERS Syntax USERS=[~]user_name | [~]user_group ...
ReservationUsage Section ReservationUsage Section To enable greater flexibility for reserving numeric resources are reserved by jobs, configure the ReservationUsage section in lsb.resources to reserve resources like license tokes per resource as PER_JOB, PER_SLOT, or PER_HOST. For example: Example ReservationUsage section Begin ReservationUsage RESOURCE METHOD licenseX PER_JOB licenseY PER_HOST licenseZ PER_SLOT End ReservationUsage RESOURCE The name of the resource to be reserved.
lsb.resources Automatic Time-based Configuration Variable configuration is used to automatically change LSF configuration based on time windows. You define automatic configuration changes in lsb.resources by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command. The expressions are evaluated by LSF every 10 minutes based on mbatchd start time.
SEE ALSO SEE ALSO badmin(8), blimits(1), brsvadd(1), bsub(1), lsb.hosts(5), lsb.queues(5), lsb.
lsb.serviceclasses The lsb.serviceclasses file defines the service-level agreements (SLAs) in an LSF cluster as service classes, which define the properties of the SLA. This file is optional. You can configure as many service class sections as you need. Use bsla to display the properties of service classes configured in lsb.serviceclasses and dynamic information about the state of each configured service class. By default, lsb.serviceclasses is installed in LSB_CONFDIR/cluster_name/configdir. Changing lsb.
lsb.serviceclasses structure lsb.serviceclasses structure Each service class definition begins with the line Begin ServiceClass and ends with the line End ServiceClass. Syntax Begin ServiceClass NAME = string PRIORITY = integer GOALS = [throughput | velocity | deadline] [\ [throughput | velocity | deadline] ...] CONTROL_ACTION = VIOLATION_PERIOD[minutes] CMD [action] USER_GROUP = all | [user_name] [user_group] ...
lsb.serviceclasses Default None GOALS Syntax GOALS=[throughput | velocity | deadline ] [\ [throughput | velocity | deadline ] ...] Description Required. Defines the service-level goals for the service class. A service class can have more than one goal, each active at different times of the day and days of the week. Outside of the time window, the SLA is inactive and jobs are scheduled as if no service class is defined. LSF does not enforce any service-level goal for an inactive SLA.
lsb.serviceclasses structure You must specify at least the hour. Day of the week and minute are optional. Both the start time and end time values must use the same syntax. If you do not specify a minute, LSF assumes the first minute of the hour (:00). If you do not specify a day, LSF assumes every day of the week. If you do specify the day, you must also specify the minute. You can specify multiple time windows, but they cannot overlap.
lsb.serviceclasses USER_GROUP Syntax USER_GROUP=all | [user_name] [user_group] ... Description Optional. A space-separated list of user names or user groups who can submit jobs to the service class. Administrators, root, and all users or groups listed can use the service class. Use the reserved word all to specify all LSF users.
lsb.serviceclasses structure ◆ The service class Tofino defines two velocity goals in a 24 hour period. The first goal is to have a maximum of 10 concurrently running jobs during business hours (9:00 a.m. to 5:00 p.m). The second goal is a maximum of 30 concurrently running jobs during off-hours (5:30 p.m. to 8:30 a.m.
lsb.serviceclasses SEE ALSO bacct(1), bhist(1), bjobs(1), bkill(1), bmod(1), bsla(1), bsub(1), lsb.
SEE ALSO 464 Platform LSF Reference
lsb.users The lsb.users file is used to configure user groups, hierarchical fairshare for users and user groups, and job slot limits for users and user groups. It is also used to configure account mappings in a MultiCluster environment. This file is optional. The lsb.users file is stored in the directory LSB_CONFDIR/cluster_name/configdir, where LSB_CONFDIR is defined in lsf.conf. Changing lsb.users configuration After making any changes to lsb.users, run badmin reconfig to reconfigure mbatchd.
UserGroup Section UserGroup Section Optional. Defines user groups. The name of the user group can be used in other user group and queue definitions, as well as on the command line. Specifying the name of a user group has exactly the same effect as listing the names of all users in the group. The total number of user groups cannot be more than MAX_GROUPS in lsbatch.h. Structure The first line consists of two mandatory keywords, GROUP_NAME and GROUP_MEMBER. The USER_SHARES keyword is optional.
lsb.users User group names can be LSF user groups defined previously in this section, or UNIX and Windows user groups. ◆ all The reserved name all specifies all users in the cluster. ◆ ! The exclamation mark ! specifies that the group membership should be retrieved via egroup. USER_SHARES Optional. Enables hierarchical fairshare and defines a share tree for users and user groups.
User Section User Section Optional. If this section is not defined, all users and user groups can run an unlimited number of jobs in the cluster. This section defines the maximum number of jobs a user or user group can run concurrently in the cluster. This is to avoid situations in which a user occupies all or most of the system resources while other users’ jobs are waiting. Structure Three fields are mandatory: USER_NAME, MAX_JOBS, JL/P. MAX_PEND_JOBS is optional.
lsb.users Total number of job slots that each user or user group can use per processor. This job slot limit is configured per processor so that multiprocessor hosts will automatically run more jobs. This number can be a fraction such as 0.5, so that it can also serve as a per-host limit. This number is rounded up to the nearest integer equal to or greater than the total job slot limits for a host. For example, if JL/P is 0.
UserMap Section UserMap Section Optional. Used only in a MultiCluster environment. Defines system-level account mapping for users and user groups. To support the execution of batch jobs across non-uniform user name spaces between clusters, LSF allows user account mapping. For a job submitted by one user account in one cluster to run under a different user account in a remote cluster, both the local and remote clusters must have the account mapping properly configured.
lsb.users The export keyword configures local users/groups to run jobs as remote users/groups. ◆ The import keyword configures remote users/groups to run jobs as local users/groups. Both directions must be configured for a mapping to work. The mapping must be configured in both the local and remote clusters.
Automatic Time-based Configuration Automatic Time-based Configuration Variable configuration is used to automatically change LSF configuration based on time windows. You define automatic configuration changes in lsb.users by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command. The expressions are evaluated by LSF every 10 minutes based on mbatchd start time.
lsb.users SEE ALSO lsf.cluster(5), lsf.conf(5), lsb.params(5), lsb.hosts(5), lsb.
SEE ALSO 474 Platform LSF Reference
lsf.acct The lsf.acct file is the LSF task log file. The LSF Remote Execution Server, RES (see res(8)), generates a record for each task completion or failure. If the RES task logging is turned on (see lsadmin(8)), it appends the record to the task log file lsf.acct.. Contents ◆ “lsf.
lsf.acct Structure lsf.acct Structure The task log file is an ASCII file with one task record per line. The fields of each record are separated by blanks. The location of the file is determined by the LSF_RES_ACCTDIR variable defined in lsf.conf. If this variable is not defined, or the RES cannot access the log directory, the log file is created in /tmp instead.
lsf.
SEE ALSO SEE ALSO Related Topics: lsadmin(8), res(8), lsf.conf(5), getrusage(2) Files: $LSF_RES_ACCTDIR/lsf.acct.
lsf.cluster Contents ◆ ◆ ◆ ◆ ◆ ◆ “About lsf.
About lsf.cluster About lsf.cluster This is the cluster configuration file. There is one for each cluster, called lsf.cluster.cluster_name . The cluster_name suffix is the name of the cluster defined in the Cluster section of lsf.shared. All LSF hosts are listed in this file, along with the list of LSF administrators and the installed LSF features. The lsf.cluster.cluster_name file contains two types of configuration information: ◆ ◆ Cluster definition information—affects all LSF applications.
lsf.cluster Parameters Section (Optional) This section contains miscellaneous parameters for the LIM. ADJUST_DURATION Syntax ADJUST_DURATION=integer Description Integer reflecting a multiple of EXINTERVAL that controls the time period during which load adjustment is in effect The lsplace(1) and lsloadadj(1) commands artificially raise the load on a selected host. This increase in load decays linearly to 0 over time.
Parameters Section When the master LIM starts, up to number_of_floating_client_licenses will be checked out for use as floating client licenses. If fewer licenses are available than specified by number_of_floating_client_licenses, only the available licenses will be checked out and used. If FLOAT_CLIENTS is not specified in lsf.cluster.cluster_name or there is an error in either license.dat or in lsf.cluster.cluster_name , the floating LSF client license feature is disabled.
lsf.cluster Notes After you configure FLOAT_CLIENTS_ADDR_RANGE, check the lim.log.host_name file to make sure this parameter is correctly set. If this parameter is not set or is wrong, this will be indicated in the log file. Examples ◆ FLOAT_CLIENTS_ADDR_RANGE=100 All client hosts with a domain address starting with 100 will be allowed access. ◆ FLOAT_CLIENTS_ADDR_RANGE=100-110.34.1-10.
Parameters Section Default 5 LSF_ELIM_BLOCKTIME Syntax LSF_ELIM_BLOCKTIME=seconds Description UNIX only Maximum amount of time LIM waits for a load update string from the ELIM or MELIM if it is not immediately available. Use this parameter to add fault-tolerance to LIM when using ELIMs. If there is an error in the ELIM or some situation arises that the ELIM cannot send the entire load update string to the LIM, LIM will not wait indefinitely for load information from ELIM.
lsf.cluster For example, LIM is expecting 3 name-value-pairs, such as: 3 tmp2 47.5 nio 344.0 licenses 5 However, LIM only receives the following from ELIM: 3 tmp2 47.5 LIM waits 2 seconds after the last value is received and if no more information is received, LIM restarts the ELIM. If LSF_ELIM_BLOCKTIME is defined, the LIM waits for the specified amount of time before restarting the ELIM instead of the 2 seconds.
Parameters Section Specify an IP address or range of addresses, in dotted quad notation (nnn.nnn.nnn.nnn). Multiple ranges can be defined, separated by spaces. If there is an error in the configuration of this variable (for example, an address range is not in the correct format), no host will be allowed to join the cluster dynamically and an error message will be logged in the LIM log. Address ranges are validated at configuration time, so they must conform to the required format.
lsf.cluster Default *.*.*.* No security is enabled. Any host in any domain can join the LSF cluster dynamically if you enabled dynamic host configuration. MASTER_INACTIVITY_LIMIT Syntax MASTER_INACTIVITY_LIMIT=integer Description An integer reflecting a multiple of EXINTERVAL. A slave will attempt to become master if it does not hear from the previous master after (HOST_INACTIVITY_LIMIT +host_number*MASTER_INACTIVITY_LIMIT)*EXINTERVAL seconds, where host_number is the position of the host in lsf.
Parameters Section If the master does not hear from a slave for HOST_INACTIVITY_LIMIT exchange intervals, it will actively poll the slave for RETRY_LIMIT exchange intervals before it will declare the slave as unavailable. If a slave does not hear from the master for HOST_INACTIVITY_LIMIT exchange intervals, it will actively poll the master for RETRY_LIMIT intervals before assuming that the master is down.
lsf.cluster ClusterAdmins Section (Optional) The ClusterAdmins section defines the LSF administrators for the cluster. The only keyword is ADMINISTRATORS. If the ClusterAdmins section is not present, the default LSF administrator is root. Using root as the primary LSF administrator is not recommended. ADMINISTRATORS Syntax ADMINISTRATORS=administrator_name ... Description Specify UNIX user names. You can also specify UNIX user group name, Windows user names, and Windows user group names.
ClusterAdmins Section Begin ClusterAdmins ADMINISTRATORS = user2 user7 End ClusterAdmins Default lsfadmin 490 Platform LSF Reference
lsf.cluster Host Section The Host section is the last section in lsf.cluster.cluster_name and is the only required section. It lists all the hosts in the cluster and gives configuration information for each host. The order in which the hosts are listed in this section is important, because the first host listed becomes the LSF master host. Since the master LIM makes all placement decisions for the cluster, it should be on a fast machine.
Host Section model Description Host model The name must be defined in the HostModel section of lsf.shared. This determines the CPU speed scaling factor applied in load and placement calculations. Optionally, the ! keyword for the model or type column, indicates that the host model or type is to be automatically detected by the LIM running on the host. nd Description Number of local disks This corresponds to the ndisks static resource.
lsf.cluster The range is from -20 to 20. REXPRI corresponds to the BSD-style nice value used for remote jobs. For hosts with System V-style nice values with the range 0 - 39, a REXPRI of -20 corresponds to a nice value of 0, and +20 corresponds to 39. Higher values of REXPRI correspond to lower execution priority; -20 gives the highest priority, 0 is the default priority for login sessions, and +20 is the lowest priority. Default 0 RUNWINDOW Description Dispatch window for interactive tasks.
Host Section Threshold fields The LIM uses these thresholds in determining whether to place remote jobs on a host. If one or more LSF load indices exceeds the corresponding threshold (too many users, not enough swap space, etc.), then the host is regarded as busy, and LIM will not recommend jobs to that host. The CPU run queue length threshold values (r15s, r1m, and r15m) are taken as effective queue lengths as reported by lsload -E.
lsf.cluster ResourceMap Section The ResourceMap section defines shared resources in your cluster. This section specifies the mapping between shared resources and their sharing hosts. When you define resources in the Resources section of lsf.shared, there is no distinction between a shared and non-shared resource. By default, all resources are not shared and are local to each host.
ResourceMap Section ◆ others —Indicates that the rest of the server hosts not explicitly listed in the LOCATION field comprise one instance of the resource For example: 2@[host1] 4@[others] indicates that there are 2 units of the resource on host1 and 4 units of the resource shared by all other hosts. ◆ default —Indicates an instance of a resource on each host in the cluster This specifies a special case where the resource is in effect not shared and is local to every host. default means at each host.
lsf.cluster RemoteClusters Section Optional. This section is used only in a MultiCluster environment. By default, the local cluster can obtain information about all other clusters specified in lsf.shared. The RemoteClusters section limits the clusters that the local cluster can obtain information about. The RemoteClusters section is required if you want to configure cluster equivalency, cache interval, daemon authentication across clusters, or if you want to run parallel jobs across clusters.
RemoteClusters Section RECV_FROM Description Specifies whether the local cluster accepts parallel jobs that originate in a remote cluster RECV_FROM does not affect regular or interactive batch jobs. Specify ‘Y’ if you want to run parallel jobs across clusters. Otherwise, specify ‘N’. Default Y AUTH Description Defines the preferred authentication method for LSF daemons communicating across clusters. Specify the same method name that is used to identify the corresponding eauth program (eauth.
lsf.cluster_name.license.acct This is the license accounting file. There is one for each cluster, called lsf.cluster_name.license.acct. The cluster_name variable is the name of the cluster defined in the Cluster section of lsf.shared. The lsf.cluster_name.license.acct file contains two types of configuration information: ◆ ◆ ◆ Contents ◆ LSF license information MultiCluster license information Dual-core CPU license information “lsf.cluster_name.license.
lsf.cluster_name.license.acct Structure lsf.cluster_name.license.acct Structure The license audit log file is an ASCII file with one record per line. The fields of a record are separated by blanks. File properties Location The default location of this file is defined by LSF_LOGDIR in lsf.conf, but you can override this by defining LSF_LICENSE_ACCT_PATH in lsf.conf. Owner The primary LSF admin is the owner of this file.
lsf.cluster_name.license.acct LSF_DUALCORE mc_peak mc_max_avail Where ❖ ❖ mc_peak is the peak usage value (in number of CPUs) of the LSF dual-core CPU license mc_max_avail is the maximum availability and usage (in number of CPUs) of the LSF dual-core CPU license. This is determined by the license that you purchased. status (%s) The results of the license usage check.
lsf.cluster_name.license.
lsf.conf Contents ◆ ◆ “About lsf.
About lsf.conf About lsf.conf The lsf.conf file controls the operation of LSF. The lsf.conf file is created during installation by the LSF setup program, and records all the settings chosen when LSF was installed. The lsf.conf file dictates the location of the specific configuration files and operation of individual servers and applications. The lsf.conf file is used by LSF and applications built on top of it. For example, information in lsf.
lsf.
Parameters ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ 506 Platform LSF Reference LSB_MIG2PEND LSB_MOD_ALL_JOBS LSB_NCPU_ENFORCE LSB_NQS_PORT LSB_PSET_BIND_DEFAULT LSB_QUERY_PORT LSB_REQUEUE_TO_BOTTOM LSB_RLA_HOST_LIST LSB_RLA_PORT LSB_RLA_UPDATE LSB_RLA_WORKDIR LSB_RMSACCT_DELAY LSB_RMS_MAXNUMNODES LSB_RMS_MAXNUMRAILS LSB_RMS_MAXPTILE LSB_SLURM_BESTFIT LSB_SBD_PORT LSB_SET_TMPDIR LSB_SHAREDIR LSB_SHORT_HOSTLIST LSB_SIGSTOP LSB_SUB_COMMANDNAME LSB_STDOUT_DIRECT LSB_
lsf.
Parameters ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ 508 Platform LSF Reference LSF_MANDIR LSF_MASTER_LIST LSF_MC_NON_PRIVILEGED_PORTS LSF_MISC LSF_NON_PRIVILEGED_PORTS LSF_NIOS_DEBUG LSF_NIOS_JOBSTATUS_INTERVAL LSF_NIOS_RES_HEARTBEAT LSF_PAM_HOSTLIST_USE LSF_PAM_PLUGINDIR LSF_PAM_USE_ASH LSF_POE_TIMEOUT_BIND LSF_POE_TIMEOUT_SELECT LSF_PIM_INFODIR LSF_PIM_SLEEPTIME LSF_PIM_SLEEPTIME_UPDATE LSF_RES_ACCT LSF_RES_ACCTDIR LSF_RES_ACTIVE_TIME LSF_RES_CONNECT_RETRY LSF_
lsf.conf ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ LSF_TOPD_WORKDIR LSF_ULDB_DOMAIN LSF_USE_HOSTEQUIV LSF_USER_DOMAIN LSF_VPLUGIN MC_PLUGIN_REMOTE_RESOURCE XLSF_APPDIR XLSF_UIDDIR LSB_API_CONNTIMEOUT Syntax LSB_API_CONNTIMEOUT=time_seconds Description The timeout in seconds when connecting to LSF. Valid Values Any positive integer or zero Default 10 See also LSB_API_RECVTIMEOUT LSB_API_RECVTIMEOUT Syntax LSB_API_RECVTIMEOUT=time_seconds Description Timeout in seconds when waiting for a reply from LSF.
Parameters LSB_CHUNK_RUSAGE Syntax LSB_CHUNK_RUSAGE=y Description Applies only to chunk jobs. When set, sbatchd contacts PIM to retrieve resource usage information to enforce resource usage limits on chunk jobs. By default, resource usage limits are not enforced for chunk jobs because chunk jobs are typically too short to allow LSF to collect resource usage. If LSB_CHUNK_RUSAGE=Y is defined, limits may not be enforced for chunk jobs that take less than a minute to run. Default Undefined.
lsf.conf ◆ LOG_DEBUG3 Default LOG_WARNING See also LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD LSB_CMD_LOGDIR Syntax LSB_CMD_LOGDIR=path Description Specifies the path to the LSF command log files.
Parameters If undefined, the system uses /bin. LSB_DEBUG Syntax LSB_DEBUG=1 | 2 Description Sets the LSF batch system to debug. If defined, LSF runs in single user mode: No security checking is performed ◆ Daemons do not run as root When LSB_DEBUG is defined, LSF will not look in the system services database for port numbers. Instead, it uses the port numbers defined by the parameters LSB_MBD_PORT/LSB_SBD_PORT in lsf.conf.
lsf.
Parameters To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example: LSB_DEBUG_MBD="LC_TRACE LC_EXEC" You need to restart the daemons after setting LSB_DEBUG_MBD for your changes to take effect. If you use the command badmin mbddebug to temporarily change this parameter without changing lsf.conf, you will not need to restart the daemons.
lsf.conf LSB_DEBUG_SBD sets the log class and is used in combination with LSF_LOG_MASK, which sets the log level. For example: LSF_LOG_MASK=LOG_DEBUG LSB_DEBUG_SBD="LC_TRACE LC_EXEC" To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example: LSB_DEBUG_SBD="LC_TRACE LC_EXEC" You need to restart the daemons after setting LSB_DEBUG_SBD for your changes to take effect. If you use the command badmin sbddebug to temporarily change this parameter without changing lsf.
Parameters LSB_DEFAULT_PJLTYPE Syntax LSB_DEFAULT_PJLTYPE="pjl_type [pjl_type] ..." Description Contains the list of all PJL types as Boolean resources that are intended to be autodetected. The order of MPI types in the list defines the preference of one type over another within the same host group, but not the host group order itself. For heterogeneous HPC environments, bsub -a auto recognizes the actual PJL type at execution time. esub.auto checks the value of LSB_DEFAULT_PJLTYPE in lsf.conf. esub.
lsf.conf Default Undefined (no default PJL type) LSB_DISABLE_RERUN_POST_EXEC Syntax LSB_DISABLE_RERUN_POST_EXEC=y | Y Description If set, and the job is rerunnable, the POST_EXEC configured in the queue is not executed if the job is rerun. Running of post-execution commands upon restart of a rerunanble job may not always be desirable.
Parameters When this parameter is undefined in lsf.conf or as an environment variable and no custom method is specified at job submission through bsub -k, LSF uses echkpnt.default and erestart.default to checkpoint and restart jobs. When this parameter is defined, LSF uses the custom checkpoint and restart methods specified. Limitations The method name and directory (LSB_ECHKPNT_METHOD_DIR) combination must be unique in the cluster.
lsf.conf ◆ poe —for POE job submission; esub calls esub.poe ◆ ls_dyna —for LS-Dyna job submission; esub calls esub.ls_dyna ◆ fluent —for FLUENT job submission; esub calls esub.fluent ◆ afs or dce —for AFS or DCE security; esub calls esub.afs or esub.dce ◆ lammpi or mpich_gm —for LAM/MPI or MPI-GM job submission; esub calls esub.lammpi or esub.
Parameters Default Undefined. If LSB_INTERACT_MSG_INTVAL is set to an incorrect value, the default update interval is 60 seconds. See also LSB_INTERACT_MSG_ENH LSB_IRIX_NODESIZE (OBSOLETE) LSB_IRIX_NODESIZE is obsolete. It is ignored if set. LSB_KEEP_SYSDEF_RLIMIT Syntax LSB_KEEP_SYSDEF_RLIMIT=y | n Description If resource limits are configured for a user in the SGI IRIX User Limits Database (ULDB) domain specified in LSF_ULDB_DOMAIN, and there is no domain default, the system default is honored.
lsf.conf ◆ OS-enforced per process limit—When one process in the job exceeds the CPU limit, the limit is enforced by the operating system. For more details, refer to your operating system documentation for setrlimit(). Default Undefined Notes To make LSB_JOB_CPULIMIT take effect, use the command badmin hrestart all to restart all sbatchds in the cluster. Changing the default Terminate job control action—You can define a different terminate action in lsb.
Parameters ◆ LSF-enforced per-job limit—When the total memory allocated to all processes in the job exceeds the memory limit, LSF sends the following signals to kill the job: SIGINT, SIGTERM, then SIGKILL. The interval between signals is 10 seconds by default. On UNIX, the time interval between SIGINT, SIGKILL, SIGTERM can be configured with the parameter JOB_TERMINATE_INTERVAL in lsb.params.
lsf.conf LSB_JOB_MEMLIMIT=n or undefined), the job will be allowed to run without limits because the per-process limit was previously disabled. See also LSB_MEMLIMIT_ENFORCE, LSB_MOD_ALL_JOBS, lsb.queues(5), bsub(1), JOB_TERMINATE_INTERVAL in “lsb.params” LSB_LOCALDIR Syntax LSB_LOCALDIR=path Description Enables duplicate logging. Specify the path to a local directory that exists only on the first LSF master host (the first host configured in lsf.cluster.cluster_name).
Parameters LSB_MAILPROG=/serverA/tools/lsf/bin/unixhost.exe Default /usr/lib/sendmail (UNIX) blank (Windows) See also LSB_MAILSERVER, LSB_MAILTO LSB_MAILSERVER Syntax LSB_MAILSERVER=mail_protocol:mail_server Description Part of mail configuration on Windows. This parameter only applies when lsmail is used as the mail program (LSB_MAILPROG=lsmail.exe).Otherwise, it is ignored. Both mail_protocol and mail_server must be indicated.
lsf.conf See also LSB_MAILPROG, LSB_MAILTO LSB_MAILTO Syntax LSB_MAILTO=mail_account Description LSF sends electronic mail to users when their jobs complete or have errors, and to the LSF administrator in the case of critical errors in the LSF system. The default is to send mail to the user who submitted the job, on the host on which the daemon is running; this assumes that your electronic mail system forwards messages to a central mailbox.
Parameters See also MAX_SBD_CONNS in “lsb.params” LSB_MAX_PROBE_SBD Syntax LSB_MAX_PROBE_SBD=integer Description Specifies the maximum number of sbatchd instances can be polled by mbatchd in the interval MBD_SLEEP_TIME/10. Use this parameter in large clusters to reduce the time it takes for mbatchd to probe all sbatchds. The value of LSB_MAX_PROBE_SBD cannot be greater than the number of hosts in the cluster. If it is, mbatchd adjusts the value of LSB_MAX_PROBE_SBD to be same as the number of hosts.
lsf.conf The submission cluster does not need to forward the job again. The execution cluster reports the job’s new pending status back to the submission cluster, and the job is dispatched to the same host to restart from the beginning Default n LSB_MC_INITFAIL_MAIL Syntax LSB_MC_INITFAIL_MAIL=y | n Description MultiCluster job forwarding model only. Specify y to make LSF email the job owner when a job is suspended after reaching the retry threshold.
Parameters If you do not want migrating jobs to be run or restarted immediately, set LSB_MIG2PEND so that migrating jobs are considered as pending jobs and inserted in the pending jobs queue. If you want migrating jobs to be considered as pending jobs but you want them to be placed at the bottom of the queue without considering submission time, define both LSB_MIG2PEND and LSB_REQUEUE_TO_BOTTOM. Also considers job priority when requeuing jobs. Does not work with MultiCluster.
lsf.conf Example LSB_NQS_PORT=607 Default Undefined LSB_PSET_BIND_DEFAULT Syntax LSB_PSET_BIND_DEFAULT=y | Y Description If set, Platform LSF HPC binds a job that is not explicitly associated with an HP-UX pset to the default pset 0. If LSB_PSET_BIND_DEFAULT is not set, LSF HPC must still attach the job to a pset, and so binds the job to the same pset used by the LSF HPC daemons.
Parameters lsb.params has passed (see “MBD_REFRESH_TIME” on page 388 for more details). When any of these happens, the parent mbatchd sends a message to the child mbatchd to exit. Operating system See the Online Support area of the Platform Computing Web site at support www.platform.com for the latest information about operating systems that support multithreaded mbatchd. Default Undefined See also MBD_REFRESH_TIME in “lsb.params”.
lsf.conf Default 600 seconds LSB_RLA_WORKDIR Syntax LSB_RLA_WORKDIR=directory Description Directory to store the LSF HPC topology adapter (RLA) status file. Allows RLA to recover its original state when it restarts. When RLA first starts, it creates the directory defined by LSB_RLA_WORKDIR if it does not exist, then creates subdirectories for each host. You should avoid using /tmp or any other directory that is automatically cleaned up by the system.
Parameters ◆ -extsched option of bsub ◆ DEFAULT_EXTSCHED and MANDATORY_EXTSCHED in lsb.queues Default 32 LSB_SLURM_BESTFIT Syntax LSB_SLURM_BESTFIT=y | Y Description Enables best-fit node allocation for HP XC SLURM jobs. By default, LSF applies a first-fit allocation policy to select from the nodes available for the job. The allocations are made left to right for all parallel jobs, and right to left for all serial jobs (all other job requirements being equal).
lsf.conf Description Displays an abbreviated list of hosts in bjobs and bhist for a parallel job where multiple processes of a job are running on a host. Multiple processes are displayed in the following format: processes*hostA For example, if a parallel job is running 5 processes on hostA, the information is displayed in the following manner: 5*hostA Setting this parameter may improve mbatchd restart performance and accelerate event replay.
Parameters #!/bin/sh . $LSB_SUB_PARM_FILE exec 1>&2 if [ $LSB_SUB_COMMAND_LINE="netscape" ]; then echo "netscape is not allowed to run in batch mode" exit $LSB_SUB_ABORT_VALUE fi LSB_SUB_COMMAND_LINE is defined in $LSB_SUB_PARM_FILE as: LSB_SUB_COMMAND_LINE=netscape A job submitted with: bsub netscape ...
lsf.conf See also LSB_TIME_CMD, LSB_TIME_SBD, LSF_TIME_LIM, LSF_TIME_RES LSB_TIME_RESERVE_NUMJOBS Syntax LSB_TIME_RESERVE_NUMJOBS=maximum_reservation_jobs Description Enables time-based slot reservation. The value must be positive integer. LSB_TIME_RESERVE_NUMJOBS controls maximum number of jobs using timebased slot reservation. For example, if LSB_TIME_RESERVE_NUMJOBS=4, only the top 4 jobs will get their future allocation information.
Parameters Because interactive batch jobs submitted with bsub -I are not associated with a pseudo-terminal, utmp file registration is not supported for these jobs. Default Undefined LSF_AFS_CELLNAME Syntax LSF_AFS_CELLNAME=AFS_cell_name Description Must be defined to AFS cell name if the AFS file system is in use. Example: LSF_AFS_CELLNAME=cern.
lsf.conf LSF_AUTH Syntax LSF_AUTH=eauth | ident Description Optional. Determines the type of authentication used by LSF. External user authentication is configured automatically during installation (LSF_AUTH=eauth). If LSF_AUTH is not defined, privileged ports (setuid) authentication is used. This is the mechanism most UNIX remote utilities use. External authentication is the only way to provide security for clusters that contain Windows hosts.
Parameters Default LSF_MACHDEP/bin LSF_CMD_LOGDIR Syntax LSF_CMD_LOGDIR=path Description The path to the log files used for debugging LSF commands. This parameter can also be set from the command line. Default /tmp See also LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD LSF_CMD_LOG_MASK Syntax LSF_CMD_LOG_MASK=log_level Description Specifies the logging level of error messages from LSF commands.
lsf.conf See also LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOGDIR, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD LSF_CONF_RETRY_INT Syntax LSF_CONF_RETRY_INT=time_seconds Description The number of seconds to wait between unsuccessful attempts at opening a configuration file (only valid for LIM). This allows LIM to tolerate temporary access failures.
Parameters Specifies the log class filtering that will be applied to LIM. Only messages belonging to the specified log class are recorded. The LSF_DEBUG_LIM sets the log class and is used in combination with LSF_LOG_MASK, which sets the log level. For example: LSF_LOG_MASK=LOG_DEBUG LSF_DEBUG_LIM=LC_TRACE You need to restart the daemons after setting LSF_DEBUG_LIM for your changes to take effect. If you use the command lsadmin limdebug to temporarily change this parameter without changing lsf.
lsf.conf LSF_LOG_MASK=LOG_DEBUG LSF_DEBUG_RES=LC_TRACE To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example: LSF_DEBUG_RES="LC_TRACE LC_EXEC" You need to restart the daemons after setting LSF_DEBUG_RES for your changes to take effect. If you use the command lsadmin resdebug to temporarily change this parameter without changing lsf.conf, you will not need to restart the daemons.
Parameters Example # clients managed by LSF # Roma # Verona # Genova # Pisa 15/3 19:4:50 0 0 0 0 0 15/3 19:5:51 8 5 2 5 2 15/3 19:6:51 8 5 2 5 5 15/3 19:7:53 8 5 2 5 5 15/3 19:8:54 8 5 2 5 5 15/3 19:9:55 8 5 0 5 4 # Venezia # Bologna 0 0 1 5 0 2 The queue names are in the header line of the file. The columns correspond to the allocations per each queue.
lsf.conf See the IRIX 6.5.9 resource administration documentation for information about CSA. Setting up IRIX 1 CSA Define the LSF_ENABLE_CSA parameter in lsf.conf: ... LSF_ENABLE_CSA=Y ... 2 Set the following parameters in /etc/csa.conf to on: ❖ CSA_START ❖ WKMG_START 3 Run the csaswitch command to turn on the configuration changes in /etc/csa.conf. See the IRIX 6.5.9 resource administration documentation for information about the csaswitch command.
Parameters LSF_ENABLE_EXTSCHEDULER Syntax LSF_ENABLE_EXTSCHEDULER=y | Y Description If set, enables mbatchd external scheduling for LSF HPC. Default Undefined LSF_ENVDIR Syntax LSF_ENVDIR=dir Description Directory containing the lsf.conf file. By default, lsf.conf is installed by creating a shared copy in LSF_CONFDIR and adding a symbolic link from /etc/lsf.conf to the shared copy. If LSF_ENVDIR is set, the symbolic link is installed in LSF_ENVDIR/lsf.conf. The lsf.
lsf.conf ◆ ◆ ◆ CUMULATIVE_RUSAGE—when a parallel job script runs multiple pam commands, resource usage is collected for jobs in the job script, rather than being overwritten when each pam command is executed. DISP_RES_USAGE_LIMITS— bjobs displays resource usage limits configured in the queue as well as job-level limits. LSB_HCLOSE_BY_RES— If res is down, host is closed with a message Host is closed because RES is not available. The status of the closed host is closed_Adm.
Parameters ◆ SHORT_PIDLIST—shortens the output from bjobs to omit all but the first process ID (PID) for a job. bjobs displays only the first ID and a count of the process group IDs (PGIDs) and process IDs for the job. Without SHORT_PIDLIST, bjobs -l displays all the PGIDs and PIDs for the job. With SHORT_PIDLIST set, bjobs -l displays a count of the PGIDS and PIDs. ◆ TASK_MEMLIMIT—enables enforcment of a memory limit (bsub -M, bmod -M, or MEMLIMIT in lsb.queues) for individual tasks in a parallel job.
lsf.conf "JOB_FINISH" "6.2" 1058990001 710 33054 33816578 64 1058989880 0 0 1058989891 "user1" "normal" "span[ptile=32]" "" "" "hostA" "/scratch/user1/work" "" "" "" "1058989880.
Parameters % bjobs -l Job <109>, User , Project , Status , Queue , Inte ractive mode, Command <./myjob.sh> Mon Jul 21 20:54:44: Submitted from host , CWD <$HOME/LSF/jobs; RUNLIMIT 10.0 min of hostA STACKLIMIT CORELIMIT MEMLIMIT 5256 K 10000 K 5000 K Mon Jul 21 20:54:51: Started on ; Mon Jul 21 20:55:03: Resource usage collected.
lsf.conf LSF_HPC_PJL_LOADENV_TIMEOUT Syntax LSF_HPC_PJL_LOADENV_TIMEOUT=seconds Description Timeout value in seconds for PJL to load or unload the environment. For example, set LSF_HPC_PJL_LOADENV_TIMEOUT to the number of seconds needed for IBM POE to load or unload adapter windows. At job startup, the PJL times out if the first task fails to register with PAM within the specified timeout value.
Parameters LSF_INTERACTIVE_STDERR Syntax LSF_INTERACTIVE_STDERR=y | n Description Separates stderr from stdout for interactive tasks and interactive batch jobs. This is useful to redirect output to a file with regular operators instead of the bsub -e err_file and -o out_file options. This parameter can also be enabled or disabled as an environment variable. WARNING If you enable this parameter globally in lsf.conf, check any custom scripts that manipulate stderr and stdout.
lsf.conf ◆ synchronized. This can be emphasized with parallel jobs. This situation is similar to that of rsh. NIOS standard and debug messages—NIOS standard messages, and debug messages (when LSF_NIOS_DEBUG=1 in lsf.conf or as an environment variable) are written to stderr. NIOS standard messages are in the format <>, which makes it easier to remove them if you wish. To redirect NIOS debug messages to a file, define LSF_CMD_LOGDIR in lsf.conf or as an environment variable.
Parameters Default N See Also LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE, LSF_LIC_SCHED_PREEMPT_STOP LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE Syntax LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE=y | n Description Set this parameter to release the slot of a job that is suspended when the its license is preempted by LSF License Scheduler. If you set LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE, do not set LSF_LIC_SCHED_PREEMPT_REQUEUE. If both these parameters are set, LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE is ignored.
lsf.conf LSF_LICENSE_FILE Syntax LSF_LICENSE_FILE="file_name ... | port_number@host_name" Description Specifies one or more demo or FLEXnet-based permanent license files used by LSF. The value for LSF_LICENSE_FILE can be either of the following: ◆ The full path name to the license file. UNIX example: LSF_LICENSE_FILE=/usr/share/lsf/cluster1/conf/license.dat Windows example: LSF_LICENSE_FILE= C:\licenses\license.dat or LSF_LICENSE_FILE=\\HostA\licenses\license.
Parameters Description Specifies how often notification email is sent to the primary cluster administrator about overuse of LSF Family product licenses and LSF License Scheduler tokens. Recommended To avoid getting the same audit information more than once, set value LSF_LICENSE_NOTIFICATION_INTERVAL greater than 24 hours. Example Subject: LSF license overuse notification email LSF Administrator: Your cluster has experienced license overuse.
lsf.conf Description Configure LSF_LIM_IGNORE_CHECKSUM=Y to ignore warning messages logged to lim log files on non-master hosts. When LSF_MASTER_LIST is set, lsadmin reconfig only restarts master candidate hosts (for example, after adding or removing hosts from the cluster). This can cause superflous warning messages like the following to be logged in the lim log files for nonmaster hosts because lim on these hosts are not restarted after configuration change: Aug 26 13:47:35 2005 9746 4 6.
Parameters Default Path to LSF_LIBDIR See also LSF_RES_SOL27_PLUGINDIR LSF_LOCAL_RESOURCES Syntax LSF_LOCAL_RESOURCES="resource ..." Description Defines instances of local resources residing on the slave host. ◆ For numeric resources, defined name-value pairs: "[resourcemap value*resource_name]" ◆ For Boolean resources, the value will be the resource name in the form: "[resource resource_name]" When the slave host calls the master host to add itself, it also reports its local resources.
lsf.conf LOG_CRIT ◆ LOG_ERR ◆ LOG_WARNING ◆ LOG_NOTICE ◆ LOG_INFO ◆ LOG_DEBUG ◆ LOG_DEBUG1 ◆ LOG_DEBUG2 ◆ LOG_DEBUG3 The most important LSF log messages are at the LOG_ERR or LOG_WARNING level. Messages at the LOG_INFO and LOG_DEBUG level are only useful for debugging. ◆ Although message log level implements similar functionalities to UNIX syslog, there is no dependency on UNIX syslog. It works even if messages are being logged to files instead of syslog.
Parameters ◆ lim.log.host_name ◆ res.log.host_name ◆ sbatchd.log.host_name ◆ mbatchd.log.host_name ◆ pim.log.host_name The log levels you can specify for this parameter, in order from highest to lowest, are: ◆ ◆ ◆ ◆ LOG_ERR LOG_WARNING LOG_INFO LOG_NONE (LSF does not log Windows events) Default LOG_ERR See also LSF_LOG_MASK LSF_LOGDIR Syntax LSF_LOGDIR=dir Description Defines the LSF system log file directory. Error messages from all servers are logged into files in this directory.
lsf.conf See also LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR_USE_WIN_REG, LSF_TIME_CMD Files ◆ lim.log.host_name ◆ res.log.host_name ◆ sbatchd.log.host_name ◆ sbatchdc.log.host_name (Windows only) ◆ mbatchd.log.host_name ◆ eeventd.log.host_name ◆ pim.log.host_name LSF_LOGDIR_USE_WIN_REG Syntax LSF_LOGDIR_USE_WIN_REG=n | N Description Windows only.
Parameters See also LSF_INDEP LSF_MANDIR Syntax LSF_MANDIR=dir Description Directory under which all man pages are installed. The man pages are placed in the man1, man3, man5, and man8 subdirectories of the LSF_MANDIR directory. This is created by the LSF installation process, and you should not need to modify this parameter. Man pages are installed in a format suitable for BSD-style man commands. For most versions of UNIX, you should add the directory LSF_MANDIR to your MANPATH environment variable.
lsf.conf Specify Y to make LSF daemons use non-privileged ports for communication across clusters. Compatibility This disables privileged port daemon authentication, which is a security feature. If security is a concern, you should use eauth for LSF daemon authentication (see LSF_AUTH_DAEMONS in lsf.conf).
Parameters LSF_NIOS_JOBSTATUS_INTERVAL Syntax LSF_NIOS_JOBSTATUS_INTERVAL=time_minutes Description Applies only to interactive batch jobs. Time interval at which NIOS polls mbatchd to check if a job is still running. Used to retrieve a job’s exit status in the case of an abnormal exit of NIOS, due to a network failure for example. Use this parameter if you run interactive jobs and you have scripts that depend on an exit code being returned.
lsf.conf Notes The time you set this parameter to depends how long you want to allow NIOS to wait before exiting. Typically, it can be a number of hours or days. Too low a number may add load to the system. LSF_PAM_HOSTLIST_USE Syntax LSF_PAM_HOSTLIST_USE=unique Description Used to start applications that use both OpenMP and MPI. Valid values unique Default Undefined Notes At job submission, LSF reserves the correct number of processors and PAM will start only 1 process per host.
Parameters LSF_POE_TIMEOUT_SELECT Syntax LSF_POE_TIMEOUT_SELECT=seconds Description Specifies the time in seconds for the poe_w wrapper to wait for connections from the pmd_w wrapper. pmd_w is the wrapper for pmd (IBM PE Partition Manager Daemon). LSF_POE_TIMEOUT_SELECT can also be set as an environment variable for poe_w to read. Default 160 seconds LSF_PIM_INFODIR Syntax LSF_PIM_INFODIR=path Description The path to where PIM writes the pim.info.host_name file.
lsf.conf ◆ It may take longer to view resource usage with bjobs -l. Default Undefined LSF_RES_ACCT Syntax LSF_RES_ACCT=time_milliseconds | 0 Description If this parameter is defined, RES will log information for completed and failed tasks by default (see lsf.acct(5)). The value for LSF_RES_ACCT is specified in terms of consumed CPU time (milliseconds). Only tasks that have consumed more than the specified CPU time will be logged.
Parameters See Also LSF_NIOS_RES_HEARTBEAT LSF_RES_DEBUG Syntax LSF_RES_DEBUG=1 | 2 Description Sets RES to debug mode. If LSF_RES_DEBUG is defined, the Remote Execution Server (RES) will operate in single user mode. No security checking is performed, so RES should not run as root. RES will not look in the services database for the RES service port number. Instead, it uses port number 36002 unless LSF_RES_PORT has been defined. Specify 1 for this parameter unless you are testing RES.
lsf.conf LSF_RES_SOL27_PLUGINDIR Syntax LSF_RES_SOL27_PLUGINDIR=path Description The path to libresvcl.so. Used only used with Solaris2.7. If you want to link a 64-bit object with RES, then you should set LSF_RES_SOL27_PLUGINDIR. Default Path to LSF_LIBDIR LSF_RES_TIMEOUT Syntax LSF_RES_TIMEOUT=time_seconds Description Timeout when communicating with RES. Default 15 LSF_ROOT_REX Syntax LSF_ROOT_REX=local Description UNIX only.
Parameters Default Undefined Example To use an ssh command before trying rsh for LSF commands, specify: LSF_RSH="ssh -o 'PasswordAuthentication no' -o 'StrictHostKeyChecking no'" ssh options such as PasswordAuthentication and StrictHostKeyChecking can also be configured in the global SSH_ETC/ssh_config file or $HOME/.ssh/config. See also ssh(1) ssh_config(5) LSF_SECUREDIR Syntax LSF_SECUREDIR=path Description Windows only; mandatory if using lsf.sudoers. Path to the directory that contains the file lsf.
lsf.conf Description Applies to lstcsh only. Specifies users who are allowed to use @ for host redirection. Users not specified with this parameter cannot use host redirection in lstcsh. If this parameter is undefined, all users are allowed to use @ for host redirection in lstcsh.
Parameters If you enable this parameter, you must enable it in the entire cluster, as it affects all communications within LSF. If it is used in a MultiCluster environment, it must be enabled in all clusters, or none. Ensure that all binaries and libraries are upgraded to LSF Version 6.2, including LSF_BINDIR, LSF_SERVERDIR and LSF_LIBDIR directories, if you enable this parameter.
lsf.conf See also LSB_TIME_MBD, LSB_TIME_SBD, LSB_TIME_CMD, LSF_TIME_LIM, LSF_TIME_RES LSF_TIME_LIM Syntax LSF_TIME_LIM=timing_level Description The timing level for checking how long LIM routines run. Time usage is logged in milliseconds; specify a positive integer. Default Undefined See also LSB_TIME_CMD, LSB_TIME_MBD, LSB_TIME_SBD, LSF_TIME_RES LSF_TIME_RES Syntax LSF_TIME_RES=timing_level Description The timing level for checking how long RES routines run.
Parameters LSF_TMPDIR=\\HostA\temp\lsf_tmpor LSF_TMPDIR=D:\temp\lsf_tmp Default By default, LSF_TMPDIR is not enabled. If LSF_TMPDIR is not specified either in the environment or in lsf.conf, this parameter is defined as follows: ◆ ◆ On UNIX: $TMPDIR or /tmp On Windows: %TMP%, %TEMP, or %SystemRoot% LSF_TOPD_PORT Syntax LSF_TOPD_PORT=port_number Description UDP port used for communication between the LSF cpuset topology daemon (topd) and the cpuset ELIM. Used with SGI IRIX cpuset support.
lsf.conf Next, LSF resource usage limits are enforced for the IRIX job under which the LSF job is running. LSF limits override the corresponding IRIX job limits. The ULDB limits are used for any LSF limits that are not defined. If the job reaches the IRIX job limits, the action defined in the IRIX system is used. IRIX job limits in the ULDB apply only to batch jobs. See the IRIX 6.5.8 resource administration documentation for information about configuring ULDB domains in the jlimit.in file.
Parameters jlimit_vmem_max=256M jlimit_data_cur=unlimited jlimit_data_max=unlimited jlimit_cpu_cur=80 jlimit_cpu_max=160 # JLIMIT_VMEM # JLIMIT_DATA # JLIMIT_CPU } 3 Configure the user limit directive for user1 in the jlimit.
lsf.conf ◆ In a mixed cluster, this parameter defines a 2-way, 1:1 user map between UNIX user accounts and Windows user accounts belonging to the specified domain, as long as the accounts have the same user name. This means jobs submitted by the Windows user account can run on a UNIX host, and jobs submitted by the UNIX account can run on any Windows host that is available to the Windows user account. If this parameter is undefined, the default user mapping is not enabled.
Parameters XLSF_UIDDIR Syntax XLSF_UIDDIR=dir Description (UNIX only) Directory in which Motif User Interface Definition files are stored. These files are platform-specific.
lsf.licensescheduler The lsf.licensescheduler file contains Platform LSF License Scheduler configuration information. All sections except ProjectGroup are required. The command blinfo displays configuration information from this file. Changing lsf.licensescheduler configuration After making any changes to lsf.
Parameters Section Parameters Section Description Required. Defines License Scheduler configuration parameters. Parameters section structure The first and last lines are: Begin Parameters ADMIN=lsadmin HOSTS=hostA hostB hostC LMSTAT_PATH=/etc/flexlm/bin LMSTAT_INTERVAL=30 PORT=9581 End Parameters Each subsequent line describes one configuration parameter. All parameters are mandatory.
lsf.licensescheduler ◆ reporting_command Specify the keyword CMD with the directory path and command that License Scheduler runs when reporting a violation. Description Optional. Defines how License Scheduler handles distribution policy violations. Distribution policy violations are caused by non-LSF workloads because LSF License Scheduler explicitly follows its distribution policies.
Parameters Section It must be the same as the port number specified in EXTERNAL_FILTER_SERVER in the vendor daemon option file. Use a number close to the defined value for the PORT parameter. For example, if PORT=9581, define EXT_FILTER_PORT=9582. FLX_LICENSE_FILE Syntax FLX_LICENSE_FILE=path Description Specifies a path to the file that contains the license keys FLEXnet.Ext.Filter and FLEXnet.Usage.Snapshot to enable the FLEXnet APIs.
lsf.licensescheduler LS_MAX_TASKMAN_SESSIONS Syntax LS_MAX_TASKMAN_SESSIONS=integer Description Defines the maximum number of taskman jobs that run simultaneously. This prevents system-wide performance issues that occur if there are a large number of taskman jobs running in License Scheduler. PORT Syntax PORT=integer Description Defines the TCP listening port used by License Scheduler hosts, including candidate License Scheduler hosts. Specify any non-privileged port number.
Clusters Section Clusters Section Description Required. Lists the clusters that can use License Scheduler. When configuring clusters for a WAN, the Clusters Section of the master cluster must define its slave clusters. Clusters section structure The Clusters section begins and ends with the lines Begin Clusters and End Clusters. The second line is the column heading, CLUSTERS. Subsequent lines list participating clusters, one name per line: Begin Clusters CLUSTERS cluster1 cluster2 . .
lsf.licensescheduler ServiceDomain Section Description Required. Defines License Scheduler service domains as groups of physical license server hosts that serve a specific network. ServiceDomain section structure Define a section for each License Scheduler service domain.
ServiceDomain Section Description Optional. Defines a name for the license collector daemon (blcollect) to use in each service domain. blcollect collects license usage information from FLEXnet and passes it to the License Scheduler daemon (bld). It improves performance by allowing you to distribute license information queries on multiple hosts. You can only specify one collector per service domain, but you can specify one collector to serve multiple service domains.
lsf.licensescheduler Feature Section Description Required. Defines license distribution policies. Feature section structure Define a section for each feature managed by License Scheduler.
Feature Section FLEX_NAME allows the NAME parameter to be an alias of the FLEXnet feature name. For feature names that start with a number or contain a dash (-), you must set both NAME and FLEX_NAME, where FLEX_NAME is the actual FLEXnet Licensing feature name, and NAME is an arbitrary license token name you choose.
lsf.licensescheduler GROUP_DISTRIBUTION and DISTRIBUTION are mutually exclusive. If they are both defined in the same feature, the License Scheduler daemon returns an error and ignores this feature. Examples DISTRIBUTION=wanserver (Lp1 1 Lp2 1 Lp3 1 Lp4 1) In this example, the service domain named wanserver shares licenses equally among four License Scheduler projects. If all projects are competing for a total of eight licenses, each project is entitled to two licenses at all times.
Feature Section Default Undefined. If ENABLE_INTERACTIVE is not set, each cluster receives one share, and interactive tasks receive no shares. Each example contains two clusters and 12 licenses of a specific feature. Example 1 ALLOCATION is not configured. The ENABLE_INTERACTIVE parameter is not set. Begin Parameters ... ENABLE_INTERACTIVE=n ... End Parameters Begin Feature NAME=ApplicationX DISTRIBUTION=LicenseServer1 (Lp1 1) End Feature Six licenses are allocated to each cluster.
lsf.licensescheduler Begin Parameters ... ENABLE_INTERACTIVE=n ... End Parameters Begin Feature NAME=ApplicationZ DISTRIBUTION=LicenseServer1 (Lp1 1) ALLOCATION=Lp1(cluster1 0 cluster2 1 interactive 2) End Feature The ENABLE_INTERACTIVE setting is ignored. Four licenses of ApplicationZ are allocated to cluster2. Eight licenses are allocated to interactive tasks. GROUP Syntax GROUP=[group_name (project_name... )] ... ◆ group_name Specify a name for a group of projects.
Feature Section Begin Feature NAME = myjob2 GROUP_DISTRIBUTION = groups SERVICE_DOMAINS = LanServer End Feature wanServer NON_SHARED_DISTRIBUTION Syntax NON_SHARED_DISTRIBUTION=service_domain_name ([project_name number_non_shared_licenses] ... ) ... ◆ service_domain_name Specify a License Scheduler service domain (described in the “ServiceDomain Section” on page 583) that distributes the licenses.
lsf.licensescheduler Description Optional. With the Macrovision FLEXnet plugin integration installed, enables on- demand preemption of LSF jobs for important non-managed workload. This guarantees that important non-managed jobs will not fail because of lack of licenses. Default LSF workload is not preemtable PREEMPT_RESERVE Syntax PREEMPT_RESERVE=Y Description Optional. Enables License Scheduler to preempt either licenses that are reserved or already in use by other projects.
Feature Section Example 1 Begin Feature NAME=ApplicationX DISTRIBUTION=LicenseServer1(Lp1 1 Lp2 2) WORKLOAD_DISTRIBUTION=LicenseServer1(LSF 8 NON_LSF 2) End Feature On the LicenseServer1 domain, the available licenses are dedicated in a ratio of 8:2 for LSF and non-LSF workloads. This means that 80% of the available licenses are dedicated to the LSF workload, and 20% of the available licenses are dedicated to the non-LSF workload.
lsf.licensescheduler ProjectGroup Section Description Optional. Defines the hierarchical relationships of projects. The hierarchical groups that can have multiple levels of grouping. You can configure a tree-like scheduling policy, with the leaves being the license projects that jobs can belong to. Each project group in the tree has a set of values, including shares, limits, ownership and non-shared, or exclusive, licenses. Use blstat -G to view the hierarchical dynamic license information.
ProjectGroup Section SHARES Required. Defines the shares assigned to the hierarchical group member projects. Specify the share for each member, separated by spaces, in the same order as listed in the GROUP column. OWNERSHIP Defines the level of ownership of the hierarchical group member projects. Specify the ownership for each member, separated by spaces, in the same order as listed in the GROUP column. You can only define OWNERSHIP for hierarchical group member projects, not hierarchical groups.
lsf.licensescheduler GROUP (final (G2 G1)) (G1 (AP2 AP1)) SHARES (1 1) (1 1) OWNERSHIP () () LIMITS () () NON_SHARED (2 0) (1 1) Valid values Any positive integer up to the LIMITS value defined for the specified hierarchical group. If defined as greater than LIMITS, NON_SHARED is set to LIMITS.
Projects Section Projects Section Description Required. Lists the License Scheduler projects. Projects section structure The Projects section begins and ends with the lines Begin Projects and End Projects. The second line consists of the required column heading PROJECTS and the optional column heading PRIORITY. Subsequent lines list participating projects, one name per line. Examples The following example lists the projects without defining the priority: Begin Projects PROJECTS Lp1 Lp2 Lp3 Lp4 . .
lsf.licensescheduler When 2 projects have the same priority number configured, the first listed project has higher priority, like LSF queues. Priority of default If not explicitly configured, the default project has the priority of 0. You can override project this value by explicitly configuring the default project in Projects section with the chosen priority value.
SEE ALSO SEE ALSO blcollect(1), bladmin(8), lsf.
lsf.shared The lsf.shared file contains common definitions that are shared by all load sharing clusters defined by lsf.cluster.cluster_name files. This includes lists of cluster names, host types, host models, the special resources available, and external load indices. This file is installed by default in the directory defined by LSF_CONFDIR. Changing After making any changes to lsf.shared, run the following commands: lsf.
Cluster Section Cluster Section (Required) Lists the cluster names recognized by the LSF system Cluster section structure The first line must contain the mandatory keyword ClusterName. The other keyword is optional. The first line must contain the mandatory keyword ClusterName and the keyword Servers in a MultiCluster environment. Each subsequent line defines one cluster.
lsf.shared HostType Section (Required) Lists the valid host types in the cluster. All hosts that can run the same binary executable are in the same host type. HostType section structure The first line consists of the mandatory keyword TYPENAME. Subsequent lines name valid host types. Example HostType section Begin HostType TYPENAME SUN41 SOLSPARC ALPHA HPPA NTX86 End HostType TYPENAME Host type names are usually based on a combination of the hardware name and operating system.
HostModel Section HostModel Section (Required) Lists models of machines and gives the relative CPU scaling factor for each model. All hosts of the same relative speed are assigned the same host model. LSF uses the relative CPU scaling factor to normalize the CPU load indices so that jobs are more likely to be sent to faster hosts. The CPU factor affects the calculation of job execution time limits and accounting.
lsf.shared Automatic detection of host model and type is useful because you no longer need to make changes in the configuration files when you upgrade the operating system or hardware of a host and reconfigure the cluster. LSF will automatically detect the change. Mapping to CPU Automatically detected models are mapped to the short model names in lsf.shared factors in the ARCHITECTURE column. Model strings in the ARCHITECTURE column are only used for mapping to the short model names. Example lsf.
Resource Section Resource Section Optional. Defines resources (must be done by the LSF administrator). Resource section structure The first line consists of the keywords. RESOURCENAME and DESCRIPTION are mandatory. The other keywords are optional. Subsequent lines define resources.
lsf.shared The information defined here will be returned by the ls_info() API call or printed out by the lsinfo command as an explanation of the meaning of the resource. INCREASING Applies to numeric resources only. Description If a larger value means greater load, INCREASING should be defined as Y. If a smaller value means greater load, INCREASING should be defined as N. INTERVAL Optional. Applies to dynamic resources only.
Resource Section 606 Platform LSF Reference
lsf.sudoers Contents ◆ ◆ ◆ ◆ ◆ ◆ “About lsf.sudoers” on page 608 “lsf.sudoers on UNIX” on page 609 “lsf.sudoers on Windows” on page 610 “File Format” on page 611 “Creating and Modifying lsf.
About lsf.sudoers About lsf.sudoers The lsf.sudoers file is an optional file to configure security mechanisms. It is not installed by default. You use lsf.sudoers to set the parameter LSF_EAUTH_KEY to configure a key for eauth to encrypt and decrypt user authentication data. On UNIX, you also use lsf.sudoers to grant permission to users other than root to perform certain operations as root in LSF, or as a specified user.
lsf.sudoers lsf.sudoers on UNIX In LSF, certain operations such as daemon startup can only be performed by root. The lsf.sudoers file grants root privileges to specific users or user groups to perform these operations. Location lsf.sudoers must be located in /etc on each host. Permissions lsf.sudoers must have permission 600 and be readable and writable only by root.
lsf.sudoers on Windows lsf.sudoers on Windows Location The lsf.sudoers file is shared over an NTFS network, not duplicated on every Windows host. By default, LSF installs lsf.sudoers in the %SYSTEMROOT% directory. The location of lsf.sudoers on Windows must be specified by LSF_SECUREDIR in lsf.conf. You must configure the LSF_SECUREDIR parameter in lsf.conf if using lsf.sudoers on Windows. Permissions The permissions on lsf.
lsf.sudoers File Format The format of lsf.sudoers is very similar to that of lsf.conf. Each entry can have one of the following forms: ◆ NAME=VALUE NAME= ◆ NAME= "STRING1 STRING2 ..." The equal sign = must follow each NAME even if no value follows and there should be ◆ no space beside the equal sign. NAME describes an authorized operation. VALUE is a single string or multiple strings separated by spaces and enclosed in quotation marks. Lines starting with a pound sign (#) are comments and are ignored.
Creating and Modifying lsf.sudoers Creating and Modifying lsf.sudoers You can create and modify lsf.sudoers with a text editor such as vi. On Windows, you can use the graphical tool xlsadmin to create or modify lsf.sudoers, by selecting Configure | Security Parameters. You must invoke xlsadmin as a domain administrator for a Windows domain. For a Windows workgroup, you must invoke xlsadmin as a local user with the necessary administrative privileges. After you modify lsf.
lsf.sudoers Parameters ◆ ◆ ◆ ◆ ◆ ◆ ◆ “LSB_PRE_POST_EXEC_USER” “LSF_EAUTH_KEY” “LSF_EAUTH_USER” “LSF_EEXEC_USER” “LSF_LOAD_PLUGINS” “LSF_STARTUP_USERS” “LSF_STARTUP_PATH” LSB_PRE_POST_EXEC_USER Syntax LSB_PRE_POST_EXEC_USER=user_name Description UNIX only. Specifies the authorized user for running queue level pre-execution and post-execution commands. When this parameter is defined, the queue level pre-execution and postexecution commands will be run as the specified user.
Parameters Description UNIX only. Specifies the user account under which to run the external authentication executable eauth. Default Undefined. eauth is run as the primary LSF administrator. LSF_EEXEC_USER Syntax LSF_EEXEC_USER=user_name Description UNIX only. Defines the user name to run the external execution command eexec. Default Undefined. eexec is run as the user who submitted the job. LSF_LOAD_PLUGINS Syntax LSF_LOAD_PLUGINS=y | Y Description If defined, LSF loads plugins from LSB_LSBDIR.
lsf.sudoers LSF_STARTUP_PATH Syntax LSF_STARTUP_PATH=path Description UNIX only. Absolute path name of the directory in which the server binaries (LIM, RES, sbatchd, mbatchd, etc.) are installed. This is normally LSF_SERVERDIR as defined in cshrc.lsf, profile.lsf or lsf.conf. LSF will allow the specified administrators (see “LSF_STARTUP_USERS” on page 614) to start the daemons installed in the LSF_STARTUP_PATH directory. Both LSF_STARTUP_USERS and LSF_STARTUP_PATH must be defined for this feature to work.
SEE ALSO SEE ALSO lsadmin(8), badmin(8), lsf.conf(5), lsfstartup(3), lsf.cluster(5), eexec(8), eauth(8) .
lsf.task Users should not have to specify a resource requirement each time they submit a job. LSF supports the concept of a task list. This chapter describes the files used to configure task lists: ◆ ◆ ◆ lsf.task lsf.task.cluster_name .
About Task Lists About Task Lists A task list is a list in LSF that keeps track of the default resource requirements for different applications and task eligibility for remote execution. The term task refers to an application name. With a task list defined, LSF automatically supplies the resource requirement of the job whenever users submit a job unless one is explicitly specified at job submission.
lsf.task Task Files There are 3 task list files that can affect a job: ◆ ◆ ◆ lsf.task —system-wide defaults apply to all LSF users, even across multiple clusters if MultiCluster is installed lsf.task.cluster_name —cluster-wide defaults apply to all users in the cluster $HOME/.lsftask —user-level defaults apply to a single user This file lists applications to be added to or removed from the default system lists for your jobs. Resource requirements specified in this file override those in the system lists.
Format of Task Files Format of Task Files Each file consists of two sections, LocalTasks and RemoteTasks. For example: Begin LocalTasks ps hostname uname crontab End LocalTasks Begin RemoteTasks + "newjob/mem>25" + "verilog/select[type==any && swp>100]" make/cpu nroff/End RemoteTasks Tasks are listed one per line. Each line in a section consists of a task name, and, for the RemoteTasks section, an optional resource requirement string separated by a slash (/).
lsf.task SEE ALSO lsfintro(1), lsrtasks(1), lsltasks(1), ls_task(3), lsf.
SEE ALSO 622 Platform LSF Reference
setup.config Contents ◆ ◆ “About setup.
About setup.config About setup.config The setup.config file contains options for Platform LSF License Scheduler installation and configuration for systems without Platform LSF. You only need to edit this file if you are installing License Scheduler as a standalone product without LSF. Template location A template setup.config is included in the License Scheduler installation script tar file and is located in the directory created when you uncompress and extract the installation script tar file.
setup.config Parameters ◆ ◆ ◆ ◆ ◆ “LS_ADMIN” “LS_HOSTS” “LS_LICENSE_FILE” “LS_LMSTAT_PATH” “LS_TOP” LS_ADMIN Syntax LS_ADMIN="user_name [user_name ... ]" Description Lists the License Scheduler administrators. The first user account name in the list is the primary License Scheduler administrator. The primary License Scheduler administrator account is typically named lsadmin. CAUTION You should not configure the root account as the primary License Scheduler administrator.
Parameters Description Defines the full path to the lmstat program. License Scheduler uses lmstat to gather the FLEXnet license information for scheduling. This path does not include the name of the lmstat program itself. Example LS_LMSTAT_PATH="/usr/bin" Default The installation script attempts to find a working copy of lmstat on the current system. If it is unsuccessful, the path is set as blank ("").
slave.config Contents ◆ ◆ “About slave.
About slave.config About slave.config Dynamically added LSF hosts that will not be master candidates are slave hosts. Each dynamic slave host has its own LSF binaries and local lsf.conf and shell environment scripts (cshrc.lsf and profile.lsf). You must install LSF on each slave host. The slave.config file contains options for installing and configuring a slave host that can be dynamically added or removed. Use lsfinstall -s -f slave.config to install LSF using the options specified in slave.config.
slave.config Parameters ◆ ◆ ◆ ◆ ◆ ◆ “LSF_ADMINS” “LSF_LIM_PORT” “LSF_SERVER_HOSTS” “LSF_TARDIR” “LSF_LOCAL_RESOURCES” “LSF_TOP” LSF_ADMINS Syntax LSF_ADMINS="user_name [ user_name ... ]" Description Required. Lists the LSF administrators. The first user account name in the list is the primary LSF administrator in lsf.cluster.cluster_name. The LSF administrator accounts must exist on all hosts in the LSF cluster before installing LSF The primary LSF administrator account is typically named lsfadmin.
Parameters ◆ Name of a file containing a list of host names, one host per line. Valid Values Any valid LSF host name Examples ◆ List of host names: LSF_SERVER_HOSTS="hosta hostb hostc hostd" ◆ Host list file: LSF_SERVER_HOSTS=:lsf_server_hosts The file lsf_server_hosts contains a list of hosts: hosta hostb hostc hostd Default The local host where lsfinstall is running LSF_TARDIR Syntax LSF_TARDIR="/path " Description Optional. Full path to the directory containing the LSF distribution tar files.
slave.config Default None—optional variable LSF_TOP Syntax LSF_TOP="/path " Description Required. Full path to the top-level LSF installation directory. Valid value Must be an absolute path to a local directory on the slave host. Cannot be the root directory (/). Recommended The file system containing LSF_TOP must have enough disk space for all host types value (approximately 300 MB per host type).
SEE ALSO SEE ALSO lsfinstall(8), install.config(5), lsf.cluster(5), lsf.
win_install.config Contents ◆ ◆ “About win_install.
About win_install.config About win_install.config The win_install.config file contains options for Platform LSF for Windows installation and configuration using the silent install option. Use setup /I:win_install.config to install LSF using the options specified in win_install.config. Template location A template win_install.config is included in the LSF for Windows installation file lsf6.2_win.exe. To edit 1 win_install.config 2 Extract win_install.config from the lsf6.2_win.exe installation file.
win_install.config Parameters Required The following parameters are required and must be defined: parameters ◆ ◆ ◆ ◆ ◆ “INSTALL_OPTION” “LSF_TOP” “LSF_CLUSTER_NAME” “LOCAL_DIR” “SERVICE_ACCT” Optional The following parameters are optional: parameters “LIM_PORT” “LSF_CLIENTS” ◆ “LSF_DYNAMIC_SERVERS” ◆ “LSF_SERVERS” ◆ “SERVER_HOST” If LSF_SERVERS, LSF_DYNAMIC_SERVERS, and LSF_CLIENTS are not defined, the local host is installed as LSF_SERVER.
Parameters LOCAL_DIR Syntax LOCAL_DIR=path Description Required—sets the local directory for the root of the path to the machine-dependent LSF files. Must be an absolute path to a local (non-shared) directory. Cannot be the root directory (\\server_name). Example LOCAL_DIR="C:\lsf_6.2_cluster" Default None—required parameter LSF_CLIENTS Syntax LSF_CLIENTS="host_name| :host_list_file [host_name| :host_list_file ...]" Description Optional—lists the hosts in the cluster to be set up as LSF client hosts.
win_install.config LSF_SERVERS Syntax LSF_SERVERS="host_name| :host_list_file [host_name| :host_list_file ...]" Description Optional—lists the hosts in the cluster to be set up as LSF server hosts. The first host in the list becomes the LSF master host in lsf.cluster.cluster_name. Valid Values Any valid LSF host name, or any file containing a list of valid LSF host name. The file containing the list cannot have any white spaces and must list one host per line.
Parameters SERVER_HOST Syntax SERVER_HOST="server_domain " Description Optional—defines the non-shared hosts to add to the cluster. You must also set LIM_PORT. Without LIM_PORT set, SERVER_HOST is ignored. Valid Values Must be a valid domain server name. Example SERVER_HOST="hosta.example.com" Default None See also LIM_PORT SERVICE_ACCT Syntax SERVICE_ACCT="[domain \]account_name " Description Required—defines the user account that the LSF daemons run from.
P A R T IV Troubleshooting
Troubleshooting and Error Messages Contents ◆ ◆ ◆ “Shared File Access” on page 642 “Common LSF Problems” on page 643 “Error Messages” on page 646 Platform LSF Reference 641
Shared File Access Shared File Access A frequent problem is non-accessible files due to a non-uniform file space. If a task is run on a remote host where a file it requires cannot be accessed using the same name, an error results. Almost all interactive LSF commands fail if the user’s current working directory cannot be found on the remote host. Shared files on UNIX If you are running NFS, rearranging the NFS mount table may solve the problem.
Troubleshooting and Error Messages Common LSF Problems This section lists some common problems with LSF jobs. Most problems are due to incorrect installation or configuration. Check the mbatchd and sbatchd error log files; often the log message points directly to the problem. The section also includes some common problems with the LIM, the RES and interactive applications. LIM dies quietly Run the following command to check for errors in the LIM configuration files.
Common LSF Problems RES does not start Check the RES error log. UNIX If the RES is unable to read the lsf.conf file and does not know where to write error messages, it logs errors into syslog(3). Windows If the RES is unable to read the lsf.conf file and does not know where to write error messages, it logs errors into C:\temp. User permission denied If remote execution fails with the following error message, the remote host could not securely determine the user ID of the user requesting remote execution.
Troubleshooting and Error Messages LSF can resolve most, but not all, problems using automount. The automount maps must be managed through NIS. Follow the instructions in your Release Notes for obtaining technical support if you are running automount and LSF is not able to locate directories on remote hosts. Batch daemons die quietly First, check the sbatchd and mbatchd error logs. Try running the following command to check the configuration. % badmin ckconfig This reports most errors.
Error Messages Error Messages The following error messages are logged by the LSF daemons, or displayed by the following commands. lsadmin ckconfig badmin ckconfig General errors The messages listed in this section may be generated by any LSF daemon. can’t open file: error The daemon could not open the named file for the reason given by error. This error is usually caused by incorrect file permissions or missing files.
Troubleshooting and Error Messages userok: Forged username suspected from /: / The service request claimed to come from user claimed_user but ident authentication returned that the user was actually actual_user. The request was not serviced. userok: ruserok(,) failed LSF_USE_HOSTEQUIV=Y is defined in the lsf.conf file, but host has not been set up as an equivalent host (see /etc/host.equiv), and user uid has not set up a .rhosts file.
Error Messages The HostModel, Resource, or HostType section in the lsf.shared file is either missing or contains an unrecoverable error. file(line): Name name reserved or previously defined. Ignoring index The name assigned to an external load index must not be the same as any built-in or previously defined resource or load index. file(line): Duplicate clustername name in section cluster. Ignoring current line A cluster name is defined twice in the same lsf.shared file. The second definition is ignored.
Troubleshooting and Error Messages function: Received request from non-LSF host / The daemon does not recognize host as a Platform LSF host. The request is not serviced. These messages can occur if host was added to the configuration files, but not all the daemons have been reconfigured to read the new information. If the problem still occurs after reconfiguring all the daemons, check whether the host is a multi-addressed host.
Error Messages authRequest: root job submission rejected Root tried to execute or submit a job but LSF_ROOT_REX is not defined in the lsf.conf file. resControl: operation permission denied, uid = The user with user ID uid is not allowed to make RES control requests. Only the LSF administrator, or root if LSF_ROOT_REX is defined in lsf.conf, can make RES control requests.
Troubleshooting and Error Messages If logdir is on AFS, check that the instructions in Administering Platform LSF have been followed. Use the fs ls command to verify that the LSF administrator owns logdir and that the directory has the correct ACL.
Error Messages 652 Platform LSF Reference
Index Symbols .lsftask file 617 .rhosts file 644 /etc/hosts file 644 /etc/hosts.equiv file 644 A ABS_RUNLIMIT, lsb.params file 375 absolute path, lsfinstall options 202 account mapping in MultiCluster 470 ACCT_ARCHIVE_AGE, lsb.params file 375 ACCT_ARCHIVE_SIZE, lsb.params file 376 ACCT_ARCHIVE_TIME, lsb.params file 376 Active status, bqueues 111 ACTIVE WINDOW, bsla 143 Active:Missed status, bsla 144 Active:Ontime status, bsla 144 ADJUST_DURATION, lsf.
Index BSUB_QUIET variable 269 BSUB_QUIET2 variable 270 BSUB_STDERR variable 270 bswitch 176 btop 178 bugroup 180 bulk jobs, killing 71 busers 181 C CACHE_INTERVAL, lsf.cluster file 497 ch 183 CHECKPOINT, bqueues -l 121 CHKPNT lsb.hosts file 354 lsb.queues file 400 CHKPNTDIR, bqueues -l 121 CHKPNTPERIOD, bqueues -l 121 chunk jobs bmig 99 bsub restrictions 151 bswitch 176 CHKPNT parameter in lsb.queues 400 MIG parameter in lsb.queues 416 rerunnable 421 CHUNK_JOB_DURATION, lsb.
Index DISABLE_UACCT_MAP, lsb.params file 379 disk space for installation 203 DISP_RES_USAGE_LIMITS, LSF HPC extensions parameter 545 DISPAT_TIME, bacct -l 17 DISPATCH_ORDER, lsb.queues file 403 DISPATCH_WINDOW lsb.hosts file 355 lsb.queues file 404 DISPATCH_WINDOWS bhosts -l 53 bqueues -l 118 DISPLAYS, blusers output 96 DISTRIBUTION blinfo output 84 lsb.resources file HostExport section 447 lsb.resources file SharedResourceExport section 450 lsf.
Index LSF_MASTER 289 LSF_NIOS_DEBUG 289 LSF_NIOS_DIE_CMD 289 LSF_NIOS_IGNORE_SIGWINDOW 289 LSF_NIOS_PEND_TIMEOUT 289 LSF_RESOURCES 290 LSF_USE_HOSTEQUIV 290 LSF_USER_DOMAIN 290 environmental variables LSB_EXEC_RUSAGE 275 LSB_NTRIES 282 EQUIV, lsf.cluster file 497 ERR_FILE, bacct -l 18 ESTIMATED FINISH TIME, bsla 144 /etc/hosts file 644 EVENT_ADRSV_FINISH record, lsb.acct 321 EVENT_UPDATE_INTERVAL, lsb.
Index lsclusters 199 lsf.licensescheduler file Parameters section, description 580 hosts exclusive resource 492 lost_and_found 51, 64 lsfinstall command 203 hosts file 303 hostsetup command, example 205 hostsetup script, lsfinstall command 205 HPART_NAME, lsb.hosts file 362 I idle job exception bacct -l -x 18 bjobs -l 67 bqueues -l 117 IDLE_FACTOR, bjobs -l 66 IGNORE_DEADLINE bqueues -l 117 lsb.queues file 408 IMPT_JOBBKLG, lsb.
Index JOBID bacct -l 17 bjobs 64 bjobs -A 67 blusers output 96 L LIB_RECVTIMEOUT, lsf.licensescheduler file Parameters section 580 LIC_COLLECT, lsf.licensescheduler file 76 LIC_COLLECTOR, lsf.licensescheduler file ServiceDomain section, description 583 LIC_FLEX_API_ENABLE, lsf.licensescheduler file ServiceDomain section, description 584 LIC_SERVERS blinfo output 84 lsf.licensescheduler file ServiceDomain section, description 583 LICENSE, lsb.
Index variable 273 LSB_DEBUG_NQS lsf.conf file 514 variable 273 LSB_DEBUG_SBD lsf.conf file 514 variable 273 LSB_DEBUG_SCH lsf.conf file 515 variable 273 LSB_DEFAULTPROJECT variable 273 LSB_DEFAULTQUEUE variable 274 LSB_DISABLE_RERUN_POST_EXEC, lsf.conf file 517 LSB_ECHKPNT_KEEP_OUTPUT, lsf.conf file 517 LSB_ECHKPNT_METHOD, lsf.conf file 517 LSB_ECHKPNT_METHOD_DIR, lsf.conf file 518 LSB_ERESTART_USRCMD variable 274 LSB_ESUB_METHOD, lsf.
Index lsf.conf file 504 lsf.licensescheduler file, reference 577 lsf.shared file 599 lsf.sudoers file 608 lsf.task file 617 lsf.task.cluster file 617 LSF_ADD_CLIENTS, install.config file 309 LSF_ADD_SERVERS, install.config file 309 LSF_ADMINS install.config file 309 slave.config file 629 LSF_AFS_CELLNAME, lsf.conf file 536 LSF_AM_OPTIONS, lsf.conf file 536 LSF_API_CONNTIMEOUT, lsf.conf file 536 LSF_API_RECVTIMEOUT, lsf.conf file 536 LSF_AUTH, lsf.conf file 537 LSF_AUTH_DAEMONS, lsf.
Index LSF_MASTER_LIST install.config file 311 lsf.conf file 560 LSF_MC_NON_PRIVILEGED_PORTS, lsf.conf file 560 LSF_MISC, lsf.conf file 561 LSF_MULTICLUSTER product name, lsf.cluster_name.license.acct file 500 LSF_NIOS_DEBUG lsf.conf file 561 variable 289 LSF_NIOS_DIE_CMD variable 289 LSF_NIOS_IGNORE_SIGWINDOW variable 289 LSF_NIOS_JOBSTATUS_INTERVAL, lsf.conf file 562 LSF_NIOS_PEND_TIMEOUT variable 289 LSF_NIOS_RES_HEARTBEAT, lsf.conf file 562 LSF_NON_PRIVILEGED_PORTS, lsf.
Index MAX_CONCURRENT_JOB_QUERY, lsb.params file 384 MAX_GROUPS, lsbatch.h file 466 MAX_INFO_DIRS, lsb.params file 384 MAX_JOB_ARRAY_SIZE, lsb.params file 385 MAX_JOB_ATTA_SIZE, lsb.params file 385 MAX_JOB_MSG_NUM, lsb.params file 386 MAX_JOB_NUM, lsb.params file 386 MAX_JOBID, lsb.params file 385 MAX_JOBINFO_QUERY_PERIOD, lsb.params file 386 MAX_JOBS, lsb.users file 468 MAX_PEND_JOBS lsb.params file 387 lsb.users file 469 MAX_PREEXEC_RETRY, lsb.params file 387 MAX_RSCHED_TIME, lsb.
Index NQS_QUEUES, lsb.queues file 416 NQS_QUEUES_FLAGS, lsb.params file 391 NQS_REQUESTS_FLAGS, lsb.params file 391 NSTATIC_CPUSETS, brlainfo 132 NTASKS, blusers output 96 NTHREAD, bjobs -l 66 O obsolete commands bqc. See badmin qclose breboot. See badmin reconfig breconfig. See badmin reconfig lslockhost. See lsadmin limlock lsreconfig. See lsadmin reconfig lsunlockhost. See lsadmin limunlock OK license usage status bld.license.acct file 296 lsf.cluster_name.license.
Index bhist -l 46 bjobs -l 65 PROJECT/GROUP, blstat output 91 PROJECT_NAME, bacct -l 17 ProjectGroup section, lsf.licensescheduler file, description 593 PROJECTS blimits 81 lsb.resources file Limit section 440 lsf.licensescheduler file Projects section 596 Projects section, lsf.licensescheduler file, description 596 PSUSP bhist 46 bjobs -A 68 bjobs -l 65 Q QJOB_LIMIT, lsb.queues file 420 QUEUE bacct -b 17 bjobs 64 QUEUE_CTRL record, lsb.events 333 QUEUE_NAME bqueues 111 lsb.
Index busers 182 RSV_HOSTS, bacct -U 19 RSVID, bacct -U 18 RUN bhist 46 bhosts 52 bjgroup 59 bjobs -A 68 bjobs -l 65 bqueues 113 bsla 144 busers 182 RUN_JOB_FACTOR, lsb.params file 394 RUN_TIME, bhpart 57 RUN_TIME_FACTOR, lsb.params file 394 RUN_WINDOW, lsb.queues file 423 RUN_WINDOWS bqueues -l 118 lshosts -l 218 RUNLIMIT bqueues -l 115 lsb.queues file 424 RUNWINDOW, lsf.cluster file 493 RUSAGE, blusers output 96 S SBD_SLEEP_TIME, lsb.params file 394 SBD_UNREPORTED_STATUS record, lsb.
Index bsla 144 lsclusters 198 status lsload 224 lsmon 238 STOP_COND bqueues -l 120 lsb.queues file 426 SUB_TRY_INTERVAL, lsb.params file 394 SUB_TRY_INTERVAL parameter in lsb.params 387 SUBMIT_TIME bacct -b 17 bjobs 65 SUSP, bqueues 113 SUSPENDING REASONS, bjobs -l 65 SWAP bacct -l 18 bjobs -l 66 lsb.resources file HostExport section 448 SWAPLIMIT bqueues -l 114 lsb.queues file 427 per parallel task 546 Swaps, lsacct 187 SWP blimits 82 lsb.resources file Limit section 444 swp bqueues -l 116 lsb.
Index User section, lsb.users file 468 USER/GROUP bhpart 56 busers 181 USER_NAME, lsb.users file 468 USER_SHARES bqueues -l 118 lsb.hosts file 363 lsb.users file 467 UserGroup section, lsb.users file 466 UserMap section, lsb.users file 470 USERS blimits 81 bqueues -l 119 lsb.queues file 428 lsb.resources file Limit section 445 lsb.resources file ResourceReservation section 453 lsb.
Index 668 Platform LSF Reference