Platform LSF Configuration Reference Platform LSF Version 7 Update 3 Release date: May 2008 Last modified: May 6 2008
Copyright © 1994-2008 Platform Computing Inc. Although the information in this document has been carefully reviewed, Platform Computing Corporation (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.
Contents Part I: Features . . . 5 Feature: Between-host user account mapping ............................................................... 7 Feature: Cross-cluster user account mapping .............................................................. 12 Feature: External authentication ................................................................................... 17 Feature: LSF daemon startup control ...........................................................................
lsf.shared .................................................................................................................... lsf.sudoers ................................................................................................................... lsf.task ......................................................................................................................... setup.config ................................................................................................................
P A R T I Features Platform LSF Configuration Reference 5
Features 6 Platform LSF Configuration Reference
Feature: Between-host user account mapping Feature: Between-host user account mapping The between-host user account mapping feature enables job submission and execution within a cluster that has different user accounts assigned to different hosts. Using this feature, you can map a local user account to a different user account on a remote host.
Feature: Between-host user account mapping Figure 2: With local user account mapping enabled Figure 3: With Windows workgroup account mapping enabled Scope Applicability Details Operating system • • • Not required for • • UNIX hosts Windows hosts A mix of UNIX and Windows hosts within a single clusters A cluster with a uniform user name space A mixed UNIX/Windows cluster in which user accounts have the same user name on both operating systems 8 Platform LSF Configuration Reference
Feature: Between-host user account mapping Applicability Details Dependencies • • Limitations • • UNIX and Windows user accounts must be valid on all hosts in the cluster and must have the correct permissions to successfully run jobs. For clusters that include both UNIX and Windows hosts, you must also enable the UNIX/Windows user account mapping feature.
Feature: Between-host user account mapping Configuration file Parameter and syntax Default behavior lsb.params SYSTEM_MAPPING_ACCOUNT= account • • Enables Windows workgroup account mapping Windows local user accounts run LSF jobs using the system account name and permissions Between-host user account mapping behavior Local user account mapping example The following example describes how local user account mapping works when configured in the file .lsfhosts in the user’s home directory.
Feature: Between-host user account mapping Between-host user account mapping commands Commands for submission Command Description bsub • • Submits the job with the user name and password of the user who entered the command. The job runs on the execution host with the submission user name and password, unless you have configured between-host user account mapping.
Feature: Cross-cluster user account mapping Feature: Cross-cluster user account mapping The cross-cluster user account mapping feature enables cross-cluster job submission and execution for a MultiCluster environment that has different user accounts assigned to different hosts. Using this feature, you can map user accounts in a local cluster to user accounts in one or more remote clusters.
Feature: Cross-cluster user account mapping Scope Applicability Details Operating system • UNIX hosts Windows hosts A mix of UNIX and Windows hosts within one or more clusters Not required for • Multiple clusters with a uniform user name space Dependencies • UNIX and Windows user accounts must be valid on all hosts in the cluster and must have the correct permissions to successfully run jobs.
Feature: Cross-cluster user account mapping Configuration file Level Syntax Behavior lsb.users System Required fields: • LOCAL REMOTE • DIRECTION .
Feature: Cross-cluster user account mapping Figure 6: System-level mappings for both clusters Only mappings configured in lsb.users on both clusters work. In this example, the common user account mappings are: • • user1@cluster1 to user2@cluster2 user3@cluster1 to user6@cluster2 User-level configuration examples The following examples describe how user account mapping works when configured at the user level in the file .lsfhosts in the user’s home directory. Only mappings configured in .
Feature: Cross-cluster user account mapping Cross-cluster user account mapping commands Commands for submission Command Description bsub • • Submits the job with the user name and password of the user who entered the command. The job runs on the execution host with the submission user name and password, unless you have configured cross-cluster user account mapping.
Feature: External authentication Feature: External authentication The external authentication feature provides a framework that enables you to integrate LSF with any third-party authentication product—such as Kerberos or DCE Security Services—to authenticate users, hosts, and daemons. This feature provides a secure transfer of data within the authentication data stream between LSF clients and servers. Using external authentication, you can customize LSF to meet the security requirements of your site.
Feature: External authentication Figure 7: Default behavior (eauth executable provided with LSF) The eauth executable uses corresponding processes eauth -c host_name (client) and eauth -s (server) to provide a secure data exchange between LSF daemons on client and server hosts. The variable host_name refers to the host on which eauth -s runs; that is, the host called by the command. For bsub, for example, the host_name is NULL, which means the authentication data works for any host in the cluster.
Feature: External authentication Figure 8: How eauth works One eauth -s process can handle multiple authentication requests. If eauth -s terminates, the LSF daemon invokes another instance of eauth -s to handle new authentication requests.
Feature: External authentication The variable … Represents the … aux_data_file Location of the temporary file that stores encrypted authentication data aux_data_status File in which eauth -s stores authentication status. When used with Kerberos authentication, eauth -s writes the source of authentication to this file if authentication fails. For example, if mbatchd to mbatchd authentication fails, eauth -s writes "mbatchd" to the file defined by aux_data_status.
Feature: External authentication Scope Applicability Details Operating system • • Allows for UNIX Windows (except for Kerberos authentication) • Authentication of LSF users, hosts, and daemons Authentication of any number of LSF users Not required for • Authorization of users based on account permissions Dependencies • UNIX and Windows user accounts must be valid on all hosts in the cluster, or the correct type of account mapping must be enabled: • • For a mixed UNIX/Windows cluster, UNIX/Wi
Feature: External authentication Configuration file Parameter and syntax Default behavior lsf.conf LSF_AUTH=eauth • Enables external authentication LSF_AUTH_DAEMONS=y | Y • Enables daemon authentication when external authentication is enabled Note: By default, daemon authentication is not enabled. If you enable daemon authentication and want to turn it off later, you must comment out or delete the parameter LSF_AUTH_DAEMONS.
Feature: External authentication Figure 9: Example of external authentication Authentication failure When external authentication is enabled, the message User permission denied indicates that the eauth executable failed to authenticate the user’s credentials. Security External authentication—and any other LSF authentication method—depends on the security of the root account on all hosts within the cluster. Limit access to the root account to prevent unauthorized use of your cluster.
Feature: External authentication • • Increasing security through the use of an external encryption key (recommended) Specifying a trusted user account under which the eauth executable runs (UNIX and Linux only) You can also choose Kerberos authentication to provide a secure data exchange during LSF user and daemon authentication and to forward credentials to a remote host for use during job execution. Configuration to modify security File Parameter and syntax Descriptions lsf.
Feature: External authentication • Alpha 4.x • IRIX 6.5 • Linux 2.x • Solaris 2.x Configuration file Parameter and syntax Behavior lsf.conf LSF_AUTH=eauth • Enables external authentication LSF_AUTH_DAEMONS=y | Y • Enables daemon authentication when external authentication is enabled • Required for Kerberos authentication • mbatchd, sbatchd, and RES run the executable LSF_SERVERDIR/daemons.
Feature: External authentication Commands to control Not applicable: There are no commands to control the behavior of this feature. Commands to display configuration Command Description badmin showconf • Displays all configured parameters and their values set in lsf.conf or ego.conf that affect mbatchd and sbatchd. Use a text editor to view other parameters in the lsf.conf or ego.conf configuration files.
Feature: LSF daemon startup control Feature: LSF daemon startup control The LSF daemon startup control feature allows you to specify a list of user accounts other than root that can start LSF daemons on UNIX hosts. This feature also enables UNIX and Windows users to bypass the additional login required to start res and sbatchd when the EGO Service Controller (EGOSC) is configured to control LSF daemons; bypassing the EGO administrator login enables the use of scripts to automate system startup.
Feature: LSF daemon startup control Figure 11: With LSF daemon startup control enabled EGO administrator login bypass If the EGO Service Controller (EGOSC) is configured to control LSF daemons, EGO will automatically restart the res and sbatchd daemons unless a user has manually shut them down. When manually starting a res or sbatchd daemon that EGO has not yet started, the user who invokes lsadmin or badmin is prompted to enter EGO administrator credentials.
Feature: LSF daemon startup control Figure 12: EGO administrator login bypass not enabled Platform LSF Configuration Reference 29
Feature: LSF daemon startup control Figure 13: With EGO administrator login bypass enabled Scope Applicability Details Operating system • Dependencies • UNIX hosts only within a UNIX-only or mixed UNIX/Windows cluster: Startup of LSF daemons by users other than root. UNIX and Windows: EGO administrator login bypass. • For startup of LSF daemons by users other than root: • You must define both a list of users and the absolute path of the directory that contains the LSF daemon binary files.
Feature: LSF daemon startup control Configuration to enable LSF daemon startup control Startup by users other than root (UNIX-only) The LSF daemon startup control feature is enabled for UNIX hosts by defining the LSF_STARTUP_USERS and LSF_STARTUP_PATH parameters in the lsf.sudoers file. Permissions for lsf.sudoers must be set to 600. For Windows hosts, this feature is already enabled at installation when the Platform services admin group is defined.
Feature: LSF daemon startup control Configuration file Parameter and syntax Default behavior LSF_STARTUP_PATH=path • • • Enables LSF daemon startup by users other than root when LSF_STARTUP_USERS is also defined. Specifies the directory that contains the LSF daemon binary files. LSF daemons are usually installed in the path specified by the LSF_SERVERDIR parameter defined in the cshrc.lsf, profile.lsf, or lsf.conf files.
Feature: LSF daemon startup control Configuration file Parameter and syntax Default behavior lsf.sudoers LSF_EGO_ADMIN_USER=Admin • • • LSF_EGO_ADMIN_PASSWD=password • • • Enables a user or script to bypass the EGO administrator login prompt when LSF_EGO_ADMIN_PASSWD is also defined. Applies only to startup of res or sbatchd. Specify the Admin EGO cluster administrator account. Enables a user or script to bypass the EGO administrator login prompt when LSF_EGO_ADMIN_USER is also defined.
Feature: LSF daemon startup control Figure 14: Example of LSF daemon startup control Configuration to modify LSF daemon startup control Not applicable: There are no parameters that modify the behavior of this feature. LSF daemon startup control commands Commands for submission Command Description N/A • This feature does not directly relate to job submission. Commands to monitor Command Description bhosts • Displays the host status of all hosts, specific hosts, or specific host groups.
Feature: LSF daemon startup control Commands to control Command Description badmin hstartup • Starts the sbatchd daemon on specific hosts or all hosts. Only root and users listed in the LSF_STARTUP_USERS parameter can successfully run this command. lsadmin limstartup • Starts the lim daemon on specific hosts or all hosts in the cluster. Only root and users listed in the LSF_STARTUP_USERS parameter can successfully run this command.
Feature: Pre-execution and post-execution processing Feature: Pre-execution and post-execution processing The pre- and post-execution processing feature provides a way to run commands on the execution host prior to and after completion of LSF jobs. Use pre-execution commands to set up an execution host with the required directories, files, software licenses, environment, and user permissions. Use post-execution commands to define post-job processing such as cleaning up job files or transferring job output.
Feature: Pre-execution and post-execution processing With pre- and post-execution processing enabled at the queue or application level The following example illustrates how pre- and post-execution processing works for setting the environment prior to job execution and for transferring resulting files after the job runs. Any executable command line can serve as a pre-execution or post-execution command.
Feature: Pre-execution and post-execution processing or application level so that users do not have to enter a pre-execution command every time they submit a job. Configuration file Parameter and syntax Behavior lsb.queues PRE_EXEC=command • • • • POST_EXEC= command • • • • • • 38 Platform LSF Configuration Reference Enables pre-execution processing at the queue level. The pre-execution command runs on the execution host before the job starts.
Feature: Pre-execution and post-execution processing Configuration file Parameter and syntax Behavior lsb.applications PRE_EXEC=command • • • • POST_EXEC= command • • • • • • Enables pre-execution processing at the application level. The pre-execution command runs on the execution host before the job starts. If the PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue. The PRE_EXEC command uses the same environment variable values as the job.
Feature: Pre-execution and post-execution processing If the pre-execution or post-execution command is not in your usual execution path, you must specify the full path name of the command. Order of command execution Pre-execution commands run in the following order: 1. The queue-level command 2. The application-level or job-level command. If you specify a command at both the application and job levels, the job-level command overrides the application-level command; the application-level command is ignored.
Feature: Pre-execution and post-execution processing Post-execution command behavior A post-execution command runs after the job finishes, regardless of the exit state of the job. Once a post-execution command is associated with a job, that command runs even if the job fails. You cannot configure the post-execution command to run only under certain conditions.
Feature: Pre-execution and post-execution processing At the application level: File Parameter and syntax Description lsb.
Feature: Pre-execution and post-execution processing File Parameter and syntax Description lsf.
Feature: Pre-execution and post-execution processing The number of times that pre-execution is retried includes queue-level, application-level, and job-level pre-execution command specifications.
Feature: Pre-execution and post-execution processing Command Description bacct • • Displays accounting statistics for finished jobs. The CPU and run times shown do not include resource usage for post-execution processing, unless the parameter JOB_INCLUDE_POSTPROC is defined in lsb.applications. Commands to control Command Description bmod -E command • Changes the pre-execution command at the job level.
Feature: Preemptive scheduling Feature: Preemptive scheduling The preemptive scheduling feature allows a pending high-priority job to preempt a running job of lower priority. The lower-priority job is suspended and is resumed as soon as possible. Use preemptive scheduling if you have long-running, low-priority jobs causing high-priority jobs to wait an unacceptably long time.
Feature: Preemptive scheduling Default behavior (preemptive scheduling not enabled) With preemptive scheduling enabled (preemptive queue) With preemptive scheduling enabled (preemptable queue) Platform LSF Configuration Reference 47
Feature: Preemptive scheduling Configuration to enable preemptive scheduling The preemptive scheduling feature is enabled by defining at least one queue as preemptive or preemptable, using the PREEMPTION parameter in the lsb.queues file. Preemption does not actually occur until at least one queue is assigned a higher relative priority than another queue, using the PRIORITY parameter, which is also set in the lsb.queues file.
Feature: Preemptive scheduling When … The behavior is … A queue is defined as preemptable, and • one or more queues are specified that can preempt it Jobs from this queue can be preempted only by jobs from the specified queues A queue is defined as preemptive, but no specific queues are listed that it can preempt Jobs from this queue preempt jobs from all queues with a lower value for priority Jobs are preempted from the least-loaded host • • A queue is defined as preemptive, and one • or more speci
Feature: Preemptive scheduling • A cannot preempt C, even though A has a higher priority than C, because A is not preemptive, nor is C preemptable Calculation of job slots in use The number of job slots in use determines whether preemptive jobs can start. The method in which the number of job slots in use is calculated can be configured to ensure that a preemptive job can start. When a job is preempted, it is suspended.
Feature: Preemptive scheduling Preemption of backfill jobs With preemption of backfill jobs enabled (PREEMPT_JOBTYPE=BACKFILL in lsb.params), LSF maintains the priority of jobs with resource or slot reservations by preventing lower-priority jobs that preempt backfill jobs from "stealing" resources from jobs with reservations. Only jobs from queues with a higher priority than queues that define resource or slot reservations can preempt backfill jobs.
Feature: Preemptive scheduling Configuration to modify selection of queue to preempt File Parameter Syntax and description lsb.
Feature: Preemptive scheduling Configuration to modify selection of job to preempt Files Parameter Syntax and description lsb.params PREEMPT_FOR PREEMPT_FOR=LEAST_RUN_TIME lsb.
Feature: Preemptive scheduling Configuration to modify preemption of backfill and exclusive jobs File Parameter Syntax and description lsb.param s PREEMPT_JOBTYPE PREEMPT_JOBTYPE=BACKFILL • • • Enables preemption of backfill jobs. Requires the line PREEMPTION=PREEMPTABLE in the queue definition. Only jobs from queues with a higher priority than queues that define resource or slot reservations can preempt jobs from backfill queues.
Feature: Preemptive scheduling Configuration to modify how job slot usage is calculated File Parameter Syntax and description lsb.params PREEMPT_FOR PREEMPT_FOR=GROUP_JLP • Counts only running jobs when evaluating if a user group is approaching its per-processor job slot limit (SLOTS_PER_PROCESSOR, USERS, and PER_HOST=all in the lsb.
Feature: Preemptive scheduling Configuration to control how many times a job can be preempted By default, if preemption is enabled, there is actually no guarantee that a job will ever actually complete. A lower priority job could be preempted again and again, and ultimately end up being killed due to a run limit. Limiting the number of times a job can be preempted is configured cluster-wide (lsb.params), at the queue level (lsb.queues), and at the application level (lsb.applications).
Feature: Preemptive scheduling Commands to control Command Description brun • Forces a pending job to run immediately on specified hosts. For an exclusive job, when LSB_DISABLE_LIMLOCK_EXCL=y , LSF allows other jobs already running on the host to finish but does not dispatch any additional jobs to that host until the exclusive job finishes.
Feature: UNIX/Windows user account mapping Feature: UNIX/Windows user account mapping The UNIX/Windows user account mapping feature enables cross-platform job submission and execution in a mixed UNIX/Windows environment. Using this feature, you can map Windows user accounts, which include a domain name, to UNIX user accounts, which do not include a domain name, for user accounts with the same user name on both operating systems.
Feature: UNIX/Windows user account mapping Figure 15: Default behavior (feature not enabled) Platform LSF Configuration Reference 59
Feature: UNIX/Windows user account mapping Figure 16: With UNIX/Windows user account mapping enabled For mixed UNIX/Windows clusters, UNIX/Windows user account mapping allows you to do the following: • • • • Submit a job from a Windows host and run the job on a UNIX host Submit a job from a UNIX host and run the job on a Windows host Specify the domain\user combination used to run a job on a Windows host Schedule and track jobs submitted with either a Windows or UNIX account as though the jobs belong to
Feature: UNIX/Windows user account mapping Applicability Details Operating system • UNIX and Windows hosts within a single cluster Not required for • Windows-only clusters UNIX-only clusters • Dependencies • UNIX and Windows user accounts must be valid on all hosts in the cluster and must have the correct permissions to successfully run jobs. Limitations • This feature works with a uniform user name space.
Feature: UNIX/Windows user account mapping Configuratio Parameter and syntax n file Behavior lsf.conf • LSF_USER_DOMAIN=domain_name • • LSF_USER_DOMAIN=domain_name:domain_name … • • • LSF_USER_DOMAIN= .
Feature: UNIX/Windows user account mapping When … In the file … And the job is submitted by … LSF_USER_DOMAIN=BUSINES lsf.conf S • The job … BUSINESS\user1 on a • Windows host • LSF_USER_DOMAIN=BUSINES lsf.conf S • user1 on a UNIX host • • LSF_USER_DOMAIN= SUPPORT:ENGINEERING lsf.conf • SUPPORT\user1 on a Windows host • • LSF_USER_DOMAIN= SUPPORT:ENGINEERING lsf.conf • BUSINESS\user1 on a • Windows host • LSF_USER_DOMAIN= SUPPORT:ENGINEERING lsf.
Feature: UNIX/Windows user account mapping When … In the file … And the job is submitted by … The job … LSF_USER_DOMAIN= SUPPORT:ENGINEERING lsf.conf • and . profile . cshrc LSF_EXECUTE_DOMAIN= ENGINEERING • user1 on a UNIX host • Runs on a Windows host as ENGINEERING\user1; if the job cannot run with those credentials, runs as SUPPORT \user1 Runs on a UNIX host as user1 These additional examples are based on the following conditions: • • In lsf.
Feature: UNIX/Windows user account mapping Commands to control Command Description lspasswd • Registers a password for a Windows user account. Windows users must register a password for each domain\user account using this command. Commands to display configuration Command Description bugroup -w • • • busers • Displays information about specific users and user groups. • If UNIX/Windows user account mapping is enabled, the command busers displays user names without domains.
Feature: External job submission and execution controls Feature: External job submission and execution controls The job submission and execution controls feature enables you to use external, site-specific executables to validate, modify, and reject jobs, transfer data, and modify the job execution environment.
Feature: External job submission and execution controls Use of esub not enabled With esub enabled An esub executable is typically used to enforce site-specific job submission policies and command-line syntax by validating or pre-parsing the command line. The file indicated by the environment variable LSB_SUB_PARM_FILE stores the values submitted by the user.
Feature: External job submission and execution controls option values or rejects the job. Because an esub runs before job submission, using an esub to reject incorrect job submissions improves overall system performance by reducing the load on the master batch daemon (mbatchd).
Feature: External job submission and execution controls External execution (eexec) An eexec is an executable that you write to control the job environment on the execution host.
Feature: External job submission and execution controls With eexec enabled The following are some of the things that you can use an eexec to do: • • • • • • Set up the user environment variables on the execution host Monitor job state or resource usage Receive data from stdout of esub Run a shell script to create and populate environment variables needed by jobs Monitor the number of tasks running on a host and raise a flag when this number exceeds a pre-determined limit Pass DCE credentials and AFS toke
Feature: External job submission and execution controls Scope Applicability Details Operating system • • UNIX and Linux Windows Security • Job types • Batch jobs submitted with bsub or modified by bmod. • Batch jobs restarted with brestart. • Interactive tasks submitted with lsrun and lsgrun (eexec only). • UNIX and Windows user accounts must be valid on all hosts in the cluster, or the correct type of account mapping must be enabled.
Feature: External job submission and execution controls Executable file UNIX naming convention Windows naming convention esub LSF_SERVERDIR/ esub.application LSF_SERVERDIR \esub.application.exe LSF_SERVERDIR \esub.application.bat eexec LSF_SERVERDIR/eexec LSF_SERVERDIR\eexec.exe LSF_SERVERDIR\eexec.bat The name of your esub should indicate the application with which it runs. For example: esub.fluent. Restriction: The name esub.user is reserved. Do not use the name esub.
Feature: External job submission and execution controls changes the values, or rejects the job. Job submission options are stored as name-value pairs on separate lines with the format option_name=value.
Feature: External job submission and execution controls Option bsub or bmod option Description LSB_SUB_HOSTS -m List of requested execution host names LSB_SUB_IN_FILE -i, -io Standard input file name LSB_SUB_INTERACTIVE -I Interactive job, specified by "Y" LSB_SUB_LOGIN_SHELL -L Login shell LSB_SUB_JOB_NAME -J Job name LSB_SUB_JOB_WARNING_AC TION -wa Job warning action LSB_SUB_JOB_ACTION_WAR NING_TIME -wt Job warning time period LSB_SUB_MAIL_USER -u Email address to which LSF send
Feature: External job submission and execution controls Option bsub or bmod option Description LSB_SUB_PRE_EXEC -E Pre-execution command LSB_SUB_PROJECT_NAME -P Project name LSB_SUB_PTY -Ip An interactive job with PTY support, specified by "Y" LSB_SUB_PTY_SHELL -Is An interactive job with PTY shell support, specified by "Y" LSB_SUB_QUEUE -q Submission queue name LSB_SUB_RERUNNABLE -r "Y" specifies a rerunnable job "N" specifies a nonrerunnable job (specified with bsub -rn).
Feature: External job submission and execution controls Option bsub or bmod option Description LSB_SUB_RLIMIT_RUN -W Wall-clock run limit LSB_SUB_RLIMIT_STACK -S Stack size limit LSB_SUB_RLIMIT_THREAD -T Thread limit LSB_SUB_TERM_TIME -t Termination time, in seconds, since 00:00:00 GMT, Jan.
Feature: External job submission and execution controls Option bsub or bmod option Description LSB_SUB3_POST_EXEC -Ep Run the specified post-execution command on the execution host after the job finishes LSB_SUB3_RUNTIME_ESTIMA TION -We Runtime estimate LSB_SUB3_USER_SHELL_LIMI TS -ul Pass user shell limits to execution host LSB_SUB_MODIFY_FILE Points to the file that esub uses to modify the bsub job option values stored in the LSB_SUB_PARM_FILE.
Feature: External job submission and execution controls If multiple esubs are specified and one of the esubs exits with a value of LSB_SUB_ABORT_VALUE, LSF rejects the job without running the remaining esubs and returns a value of LSB_SUB_ABORT_VALUE. LSB_INVOKE_CMD Specifies the name of the LSF command that most recently invoked an external executable.
Feature: External job submission and execution controls Changing job submission parameters using esub The following example shows an esub that modifies job submission options and environment variables based on the user name that submits a job. This esub writes the changes to LSB_SUB_MODIFY_FILE for userA and to LSB_SUB_MODIFY_ENVFILE for userB. LSF rejects all jobs submitted by userC without writing to either file: #!/bin/sh .
Feature: External job submission and execution controls # corresponding update interval else )& wait done)& fi rm $taskFile >/dev/null 2>&1 exit 0 Passing data between esub and eexec A combination of esub and eexec executables can be used to pass AFS/DCE tokens from the submission host to the execution host. LSF passes data from the standard output of esub to the standard input of eexec. A daemon wrapper script can be used to renew the tokens.
Feature: External job submission and execution controls Job submission and execution controls commands Commands for submission Command Description bsub -a esub_application [esub_application] … • brestart lsrun lsgrun Specifies one or more esub executables to run at job submission • For example, to specify the esub named esub.
Feature: External job submission and execution controls Command Description bmod -an • Dissociates from a job all esub executables that were previously associated with the job • LSF runs any esub executables defined by LSB_ESUB_METHOD, followed by the executable named "esub" if it exists in LSF_SERVERDIR • LSF runs eexec if an executable file with that name exists in LSF_SERVERDIR Commands to display configuration Command Description badmin showconf • Displays all configured parameters and the
Feature: Job migration Feature: Job migration The job migration feature enables you to move checkpointable and rerunnable jobs from one host to another. Job migration makes use of job checkpoint and restart so that a migrated checkpointable job restarts on the new host from the point at which the job stopped on the original host.
Feature: Job migration With automatic job migration enabled Scope Applicability Details Operating system • • UNIX Linux Windows • Non-interactive batch jobs submitted with bsub or bmod, including chunk jobs • Job types 84 Platform LSF Configuration Reference
Feature: Job migration Applicability Details Dependencies • UNIX and Windows user accounts must be valid on all hosts in the cluster, or the correct type of account mapping must be enabled: For a mixed UNIX/Windows cluster, UNIX/Windows user account mapping must be enabled • For a cluster with a non-uniform user name space, between-host account mapping must be enabled • For a MultiCluster environment with a non-uniform user name space, crosscluster user account mapping must be enabled Both the original
Feature: Job migration Configuration file Parameter and syntax Behavior lsb.queues CHKPNT=chkpnt_dir [chkpnt_period] • All jobs submitted to the queue are checkpointable. The specified checkpoint directory must already exist. LSF will not create the checkpoint directory. • The user account that submits the job must have read and write permissions for the checkpoint directory.
Feature: Job migration Configuration file Parameter and syntax Behavior CHKPNT_PERIOD=chkpnt_period CHKPNT_METHOD= chkpnt_method Configuration to enable automatic job migration Automatic job migration assumes that if a job is system-suspended (SSUSP) for an extended period of time, the execution host is probably heavily loaded. Configuring a queue-level or host-level migration threshold lets the job to resume on another less loaded host, and reduces the load on the original host.
Feature: Job migration Configuration to modify job migration You can configure LSF to requeue a migrating job rather than restart or rerun the job. Configuration file Parameter and syntax Behavior lsf.
Feature: Job migration Commands to control Command Description bmig • Migrates one or more running jobs from one host to another.
Feature: Job checkpoint and restart Feature: Job checkpoint and restart The job checkpoint and restart feature enables you to stop jobs and then restart them from the point at which they stopped, which optimizes resource usage. LSF can periodically capture the state of a running job and the data required to restart it. This feature provides fault tolerance and allows LSF administrators and users to migrate jobs from one host to another to achieve load balancing.
Feature: Job checkpoint and restart Default behavior (job checkpoint and restart not enabled) With job checkpoint and restart enabled Kernel-level checkpoint and restart The operating system provides checkpoint and restart functionality that is transparent to your applications and enabled by default. To implement job checkpoint and restart at the kernel level, the LSF echkpnt and erestart executables invoke operating system-specific calls.
Feature: Job checkpoint and restart LSF uses the default executables echkpnt.default and erestart.default for kernel-level checkpoint and restart. User-level checkpoint and restart For systems that do not support kernel-level checkpoint and restart, LSF provides a job checkpoint and restart implementation that is transparent to your applications and does not require you to rewrite code.
Feature: Job checkpoint and restart Applicability Details Dependencies • UNIX and Windows user accounts must be valid on all hosts in the cluster, or the correct type of account mapping must be enabled. For a mixed UNIX/Windows cluster, UNIX/Windows user account mapping must be enabled. • For a cluster with a non-uniform user name space, between-host account mapping must be enabled. • For a MultiCluster environment with a non-uniform user name space, crosscluster user account mapping must be enabled.
Feature: Job checkpoint and restart Configuration file Parameter and syntax Behavior lsb.queues CHKPNT=chkpnt_dir [chkpnt_period] • All jobs submitted to the queue are checkpointable. LSF writes the checkpoint files, which contain job state information, to the checkpoint directory. The checkpoint directory can contain checkpoint files for multiple jobs. The specified checkpoint directory must already exist. LSF will not create the checkpoint directory.
Feature: Job checkpoint and restart Important: The erestart.application executable must: • • Have access to the command line used to submit or modify the job Exit with a return value without running an application; the erestart interface runs the application to restart the job Executable file UNIX naming convention Windows naming convention echkpnt LSF_SERVERDIR/echkpnt.application LSF_SERVERDIR\echkpnt.application.exe LSF_SERVERDIR\echkpnt.application.bat erestart LSF_SERVERDIR/erestart.
Feature: Job checkpoint and restart A non-zero value indicates that erestart.application failed to write to the .restart_cmd file. • A return value of 0 indicates that erestart.application successfully wrote to the .restart_cmd file, or that the executable intentionally did not write to the file. Your executables must recognize the syntax used by the echkpnt and erestart interfaces, which communicate with your executables by means of a common syntax. • • • echkpnt.
Feature: Job checkpoint and restart • Migrates the job to a new host using bmig All checkpoint and restart executables run under the user account of the user who submits the job. Note: By default, LSF redirects standard error and standard output to / dev/null and discards the data. Checkpoint directory and files LSF identifies checkpoint files by the checkpoint directory and job ID.
Feature: Job checkpoint and restart If the command line is… And… Then… bsub -k "my_dir" In lsb.applications, • LSF saves the checkpoint file to my_dir/ job_ID every 360 minutes • LSF saves the checkpoint file to app_dir/ job_ID every 240 minutes CHKPNT_PERIOD=360 In lsb.applications, bsub -k "240" CHKPNT_DIR=app_dir CHKPNT_PERIOD=360 In lsb.
Feature: Job checkpoint and restart Configuration to specify the directory for application-level executables By default, LSF looks for application-level checkpoint and restart executables in LSF_SERVERDIR. You can modify this behavior by specifying a different directory as an environment variable or in lsf.conf. Configuration file Parameter and syntax Behavior lsf.conf LSB_ECHKPNT_METHOD_DIR= path • • Specifies the absolute path to the directory that contains the echkpnt.application and erestart.
Feature: Job checkpoint and restart Configuration file Parameter and syntax Behavior lsb.
Feature: Job checkpoint and restart Commands to monitor Command Description bacct -l • • Displays accounting statistics for finished jobs, including termination reasons. TERM_CHKPNT indicates that a job was checkpointed and killed. If JOB_CONTROL is defined for a queue, LSF does not display the result of the action. bhist -l • Displays the actions that LSF took on a completed job, including job checkpoint, restart, and migration to another host.
Feature: Job checkpoint and restart Commands to display configuration Command Description bqueues -l • Displays information about queues configured in lsb.queues, including the values defined for checkpoint directory and checkpoint period. Note: The bqueues command displays the checkpoint period in seconds; the lsb.queues CHKPNT parameter defines the checkpoint period in minutes. badmin showconf • Displays all configured parameters and their values set in lsf.conf or ego.
Feature: External load indices Feature: External load indices External load indices report the values of dynamic external resources. A dynamic external resource is a customer-defined resource with a numeric value that changes over time, such as the space available in a directory. Use the external load indices feature to make the values of dynamic external resources available to LSF, or to override the values reported for an LSF built-in load index.
Feature: External load indices Default behavior (feature not enabled) With external load indices enabled 104 Platform LSF Configuration Reference
Feature: External load indices Scope Applicability Details Operating system • • • Dependencies UNIX Windows A mix of UNIX and Windows hosts • UNIX and Windows user accounts must be valid on all hosts in the cluster and must have the correct permissions to successfully run jobs. • All elim executables run under the same user account as the load information manager (LIM)—by default, the LSF administrator (lsfadmin) account. External dynamic resources (host-based or shared) must be defined in lsf.
Feature: External load indices Configuration file Parameter and syntax Description lsf.shared RESOURCENAME • Specifies the name of the external resource. • Specifies the type of external resource: Numeric resources have numeric values. Specify Numeric for all dynamic resources. resource_name TYPE Numeric • INTERVAL • seconds • Specifies the interval for data collection by an elim.
Feature: External load indices Map an external resource Once external resources are defined in lsf.shared, they must be mapped to hosts in the ResourceMap section of lsf.cluster.cluster_name. Configuration file Parameter and syntax Default behavior lsf.cluster. cluster_name RESOURCENAMEresource_name • Specifies the name of the external resource as defined in the Resource section of lsf.shared.
Feature: External load indices Operating system Naming convention Windows LSF_SERVERDIR\elim.application.exe or LSF_SERVERDIR\elim.application.bat Restriction: The name elim.user is reserved for backward compatibility. Do not use the name elim.user for your application-specific elim. Note: LSF invokes any elim that follows this naming convention,— move backup copies out of LSF_SERVERDIR or choose a name that does not follow the convention. For example, use elim_backup instead of elim.backup.
Feature: External load indices is programmed to report values for the resources expected on the host. For detailed information about using a checking header, see the section How environment variables determine elim hosts. Overriding built-in load indices An elim executable can be used to override the value of a built-in load index. For example, if your site stores temporary files in the /usr/tmp directory, you might want to monitor the amount of space available in that directory.
Feature: External load indices External load indices behavior How LSF manages multiple elim executables The LSF administrator can write one elim executable to collect multiple external load indices, or the LSF administrator can divide external load index collection among multiple elim executables. On each host, the load information manager (LIM) starts a master elim (MELIM), which manages all elim executables on the host and reports the external load index values to the LIM.
Feature: External load indices If the specified LOCATION is … Then the elim executables start on … • ([all]) | ([all ~host_name …]) • The master host, because all hosts in the cluster (except those identified by the not operator [~]) share a single instance of the external resource. • [default] • Every host in the cluster, because the default setting identifies the external resource as host-based.
Feature: External load indices LSF_RESOURCES. The MELIM does not restart an elim that exits with ELIM_ABORT_VALUE. The following sample code shows how to use a header to verify that an elim is programmed to collect load indices for the resources expected on the host. If the elim is not programmed to report on the requested resources, the elim does not need to run on the host.
Feature: External load indices External load indices commands Commands to submit workload Command Description bsub -R "res_req" [-R "res_req"] … • • • Runs the job on a host that meets the specified resource requirements. If you specify a value for a dynamic external resource in the resource requirements string, LSF uses the most recent values provided by your elim executables for host selection.
Feature: External load indices Command Description bhosts -s shared_resource_name … • Displays configuration information for the specified resources.
Feature: External host and user groups Feature: External host and user groups Use the external host and user groups feature to maintain group definitions for your site in a location external to LSF, and to import the group definitions on demand. About external host and user groups LSF provides you with the option to configure host groups, user groups, or both.
Feature: External host and user groups With external host and user groups enabled Scope Applicability Details Operating system • • • Dependencies UNIX Windows A mix of UNIX and Windows hosts • UNIX and Windows user accounts must be valid on all hosts in the cluster and must have the correct permissions to successfully run jobs. • You must reconfigure the cluster using badmin reconfig each time you want to run the egroup executable to retrieve host or user group members.
Feature: External host and user groups Configuration to enable external host and user groups To enable the use of external host and user groups, you must • • • Define the host group in lsb.hosts, or the user group in lsb.users, and put an exclamation mark (!) in the GROUP_MEMBER column. Create an egroup executable in the directory specified by the parameter LSF_SERVERDIR in lsf.conf. LSF does not include a default egroup; you should write your own executable to meet the requirements of your site.
Feature: External host and user groups Operating system Naming convention UNIX LSF_SERVERDIR\egroup Windows LSF_SERVERDIR\egroup.exe or LSF_SERVERDIR\egroup.bat • • • Run when invoked by the commands egroup –m hostgroup_name and egroup –u usergroup_name. When mbatchd finds an exclamation mark (!) in the GROUP_MEMBER column of lsb.hosts or lsb.users, mbatchd runs the egroup command to invoke your egroup executable. Output a space-delimited list of group members (hosts, users, or both) to stdout.
Feature: External host and user groups External host and user groups commands Commands to submit workload Command Description bsub -m host_group • Submits a job to run on any host that belongs to the specified host group. bsub -G user_group • For fairshare scheduling only. Associates the job with the specified group. Specify any group that you belong to that does not contain subgroups.
Feature: External host and user groups 120 Platform LSF Configuration Reference
P A R T II Configuration Files Platform LSF Configuration Reference 121
Configuration Files 122 Platform LSF Configuration Reference
bld.license.acct bld.license.acct The bld.license.acct file is the license and accounting file for LSF License Scheduler. bld.license.acct structure The license accounting log file is an ASCII file with one record per line. The fields of a record are separated by blanks. LSF License Scheduler adds a new record to the file every hour. File properties Location The default location of this file is LSF_SHAREDIR/db. Use LSF_LICENSE_ACCT_PATH in lsf.conf to specify another location.
bld.license.acct Example record format 1107961731 1107961792 1126639979 1126640028 LICENSE_SCHEDULER LICENSE_SCHEDULER LICENSE_SCHEDULER LICENSE_SCHEDULER 7.0 7.0 7.0 7.0 0 2 0 6 OK 335a33c2bd9c9428140a61e57bd06da02b623a42 OK 58e45b891f371811edfcceb6f5270059a74ee31a 5 OK b3efd43ee28346f2d125b445fd16aa96875da35 5 OVERUSE 2865775920372225fa7f8ed4b9a8eb2b15 See also • • • LSF_LOGDIR in lsf.conf LSF_LICENSE_ACCT_PATH in lsf.conf lsf.cluster_name.license.
cshrc.lsf and profile.lsf cshrc.lsf and profile.lsf About cshrc.lsf and profile.lsf The user environment shell files cshrc.lsf and profile.lsf set the LSF operating environment on an LSF host. They define machine-dependent paths to LSF commands and libraries as environment variables: • • cshrc.lsf sets the C shell (csh or tcsh) user environment for LSF commands and libraries profile.
cshrc.lsf and profile.lsf • • • • • • • EGO_BINDIR EGO_CONFDIR EGO_ESRVDIR EGO_LIBDIR EGO_LOCAL_CONFDIR EGO_SERVERDIR EGO_TOP See the Platform EGO Reference for more information about these variables. Setting the LSF environment with cshrc.lsf and profile.lsf Before using LSF, you must set the LSF execution environment. After logging on to an LSF host, use one of the following shell environment files to set your LSF environment: • For example, in csh or tcsh: • source /usr/share/lsf/lsf_7/conf/cshrc.
cshrc.lsf and profile.lsf These variable settings are an example only. Your system may set additional variables. For sh, ksh, or bash Add profile.lsf to the end of the .profile file for all users: • • Copy the profile.lsf file into .profile, or Add a line similar to following to the end of .profile: . /usr/share/lsf/lsf_7/conf/profile.lsf After running profile.lsf, use the set command to see the environment variable settings. For example: set ... LD_LIBRARY_PATH=/usr/share/lsf/lsf_7/7.0/linux2.6-glibc2.
cshrc.lsf and profile.lsf Values • In cshrc.lsf for csh and tcsh: setenv LSF_BINDIR $LSF_TOP/$LSF_VERSION/$BINARY_TYPE/bin • Set and exported in profile.lsf for sh, ksh, or bash: LSF_BINDIR=$LSF_TOP/$LSF_VERSION/$BINARY_TYPE/bin LSF_ENVDIR Syntax LSF_ENVDIR=dir Description Directory containing the lsf.conf file. By default, lsf.conf is installed by creating a shared copy in LSF_CONFDIR and adding a symbolic link from /etc/lsf.conf to the shared copy.
cshrc.lsf and profile.lsf Values • In cshrc.lsf for csh and tcsh: setenv LSF_LIBDIR $LSF_TOP/$LSF_VERSION/$BINARY_TYPE/lib • Set and exported in profile.lsf for sh, ksh, or bash: LSF_LIBDIR=$LSF_TOP/$LSF_VERSION/$BINARY_TYPE/lib LSF_SERVERDIR Syntax LSF_SERVERDIR=dir Description Directory where LSF server binaries and shell scripts are installed. These include lim, res, nios, sbatchd, mbatchd, and mbschd. If you use elim, eauth, eexec, esub, etc, they are also installed in this directory.
cshrc.lsf and profile.lsf Values • In cshrc.lsf for csh and tcsh: setenv XLSF_UIDDIR $LSF_TOP/$LSF_VERSION/$BINARY_TYPE/lib/uid • Set and exported in profile.lsf for sh, ksh, or bash: XLSF_UIDDIR=$LSF_TOP/$LSF_VERSION/$BINARY_TYPE/lib/uid Platform EGO environment variables set by cshrc.lsf and profile.lsf See the Platform EGO Reference for more information about these variables. EGO_BINDIR Syntax EGO_BINDIR=dir Description Directory where Platform EGO user commands are installed.
cshrc.lsf and profile.lsf Values • In cshrc.lsf for csh and tcsh: setenv EGO_CONFDIR /usr/share/lsf/lsf_7/conf/ego/lsf1.2.3/kernel • Set and exported in profile.lsf for sh, ksh, or bash: EGO_CONFDIR=/usr/share/lsf/lsf_7/conf/ego/lsf1.2.3/kernel EGO_ESRVDIR Syntax EGO_ESRVDIR=dir Description Directory where the EGO the service controller configuration files are stored. Examples • Set in csh and tcsh by cshrc.
cshrc.lsf and profile.lsf • Set and exported in profile.lsf for sh, ksh, or bash: EGO_LIBDIR=$LSF_LIBDIR EGO_LOCAL_CONFDIR Syntax EGO_LOCAL_CONFDIR=dir Description The local EGO configuration directory containing the ego.conf file. Examples • Set in csh and tcsh by cshrc.lsf: setenv EGO_LOCAL_CONFDIR /usr/share/lsf/lsf_7/conf/ego/lsf1.2.3/kernel • Set and exported in sh, ksh, or bash by profile.lsf: EGO_LOCAL_CONFDIR=/usr/share/lsf/lsf_7/conf/ego/lsf1.2.3/kernel Values • In cshrc.
cshrc.lsf and profile.lsf EGO_TOP Syntax EGO_TOP=dir Description The the top-level installation directory. The path to EGO_TOP must be shared and accessible to all hosts in the cluster. Equivalent to LSF_TOP. Examples • Set in csh and tcsh by cshrc.lsf: setenv EGO_TOP /usr/share/lsf/lsf_7 • Set and exported in sh, ksh, or bash by profile.lsf: EGO_TOP=/usr/share/lsf/lsf_7 Values • In cshrc.lsf for csh and tcsh: setenv EGO_TOP /usr/share/lsf/lsf_7 • Set and exported in profile.
hosts hosts For hosts with multiple IP addresses and different official host names configured at the system level, this file associates the host names and IP addresses in LSF.
hosts Specify -TAC as the last part of the host name if the host is a TAC and is a DoD host. Specify the host name in the format defined in Internet RFC 952, which states: A “name” (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.). Periods are only allowed when they serve to delimit components of “domain style names”. (See RFC 921, “Domain Name System Implementation Schedule”, for background).
hosts You can use either an IPv4 or an IPv6 format for the IP address (if you define the parameter LSF_ENABLE_SUPPORT_IPV6 in lsf.conf). Example hosts file IPv4 example 192.168.1.1 hostA hostB 192.168.2.2 hostA hostC host-C In this example, hostA has 2 IP addresses and 3 aliases. The alias hostB specifies the first address, and the aliases hostC and host-C specify the second address. LSF uses the official host name, hostA, to identify that both IP addresses belong to the same host.
install.config install.config About install.config The install.config file contains options for LSF installation and configuration. Use lsfinstall -f install.config to install LSF using the options specified in install.config. Template location A template install.config is included in the installation script tar file lsf7Update3_lsfinstall.tar.Z and is located in the lsf7Update3_lsfinstall directory created when you uncompress and extract installation script tar file.
install.config • • • • • • • LSF_QUIET_INST LSF_TARDIR LSF_TOP PATCH_BACKUP_DIR PATCH_HISTORY_DIR PERF_HOST PMC_HOST DERBY_DB_HOST Syntax DERBY_DB_HOST="host_name" Description Reporting database host. This parameter is used when you install the Platform Management Console package for the first time, and is ignored for all other cases. Specify the name of a reliable host where the Derby database for Reporting data collection will be installed. You must specify a host from LSF_MASTER_LIST.
install.config EGO_PERF_CONTROL Syntax EGO_PERF_CONTROL="Y" | "N" Description Enables EGO Service Controller to control PERF daemons. Set the value to "N" if you want to control PERF daemons manually. If you do this, you must define PERF_HOST in this file. Note: If you specify EGO_ENABLE="N", this parameter is ignored. Note: This parameter only takes effect when you install the Platform Management Console package for the first time.
install.config ENABLE_DYNAMIC_HOSTS Syntax ENABLE_DYNAMIC_HOSTS="Y" | "N" Description Enables dynamically adding and removing hosts. Set the value to "Y" if you want to allow dynamically added hosts. If you enable dynamic hosts, any host can connect to cluster. To enable security, configure LSF_HOST_ADDR_RANGE in lsf.cluster.cluster_name after installation and restrict the hosts that can connect to your cluster.
install.config Description Set the value to "Y" to add LSF HPC configuration parameters to the cluster. Default N (Platform LSF HPC is disabled.) EP_BACKUP Syntax EP_BACKUP="Y" | "N" Description Enables backup and rollback for enhancement packs. Set the value to "N" to disable backups when installing enhancement packs (you will not be able to roll back to the previous patch level after installing an EP, but you will still be able to roll back any fixes installed on the new EP).
install.config The file lsf_server_hosts contains a list of hosts: hosta hostb hostc hostd Default Only hosts in LSF_MASTER_LIST are LSF servers. LSF_ADD_CLIENTS Syntax LSF_ADD_CLIENTS="host_name [ host_name...]" Description List of LSF client-only hosts. Tip: After installation, you must manually edit lsf.cluster.cluster_name to include the host model and type of each client listed in LSF_ADD_CLIENTS. Valid Values Any valid LSF host name.
install.config The first user account name in the list is the primary LSF administrator. It cannot be the root user account. Typically this account is named lsfadmin. It owns the LSF configuration files and log files for job events. It also has permission to reconfigure LSF and to control batch jobs submitted by other users. It typically does not have authority to start LSF daemons. Usually, only root has permission to start LSF daemons.
install.config LSF_DYNAMIC_HOST_WAIT_TIME Syntax LSF_DYNAMIC_HOST_WAIT_TIME=seconds Description Time in seconds slave LIM waits after startup before calling master LIM to add the slave host dynamically. This parameter only takes effect if you set ENABLE_DYNAMIC_HOSTS="Y" in this file. If the slave LIM receives the master announcement while it is waiting, it does not call the master LIM to add itself. Recommended value Up to 60 seconds for every 1000 hosts in the cluster, for a maximum of 15 minutes.
install.config Default The parent directory of the current working directory. For example, if lsfinstall is running under usr/share/lsf_distrib/lsf_lsfinstall the LSF_LICENSE default value is usr/share/lsf_distrib/license.dat. LSF_MASTER_LIST Syntax LSF_MASTER_LIST="host_name [ host_name ...]" Description Required for a first-time installation. List of LSF server hosts to be master or master candidates in the cluster. You must specify at least one valid server host to start the cluster.
install.config LSF_TARDIR Syntax LSF_TARDIR="/path" Description Full path to the directory containing the LSF distribution tar files. Example LSF_TARDIR="/usr/share/lsf_distrib" Default The parent directory of the current working directory. For example, if lsfinstall is running under usr/share/lsf_distrib/lsf_lsfinstall the LSF_TARDIR default value is usr/share/lsf_distrib. LSF_TOP Syntax LSF_TOP="/path" Description Required. Full path to the top-level LSF installation directory.
install.config The file system containing the patch backup directory must have sufficient disk space to back up your files (approximately 400 MB per binary type if you want to be able to install and roll back one enhancement pack and a few additional fixes). It cannot be the root directory (/). If the directory already exists, it must be writable by the cluster administrator (lsfadmin). If you need to change the directory after installation, edit PATCH_BACKUP_DIR in LSF_TOP/patch.
install.config Note: This parameter only takes effect when you install the Platform Management Console package for the first time. Example PERF_HOST="hostp" Default Undefined. PMC_HOST Syntax PMC_HOST="host_name" Description Dedicated host for Platform Management Console. Required if EGO_PMC_CONTROL="N". To allow failover, we recommend that you leave this parameter undefined when EGO control is enabled for the Platform Management Console.
lim.acct lim.acct The lim.acct file is the log file for Load Information Manager (LIM). Produced by lsmon, lim.acct contains host load information collected and distributed by LIM. lim.acct structure The first line of lim.acct contains a list of load index names separated by spaces. This list of load index names can be specified in the lsmon command line. The default list is "r15s r1m r15m ut pg ls it swp mem tmp".
lsb.acct lsb.acct The lsb.acct file is the batch job log file of LSF. The master batch daemon (see mbatchd(8)) generates a record for each job completion or failure. The record is appended to the job log file lsb.acct. The file is located in LSB_SHAREDIR/cluster_name/logdir, where LSB_SHAREDIR must be defined in lsf.conf (5) and cluster_name is the name of the LSF cluster, as returned by lsid(1). See mbatchd(8) for the description of LSB_SHAREDIR. The bacct command uses the current lsb.
lsb.acct JOB_FINISH A job has finished. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.acct file format.
lsb.
lsb.acct hostFactor (%f) CPU factor of the first execution host jobName (%s) Job name (up to 4094 characters for UNIX or 255 characters for Windows) command (%s) Complete batch job command specified by the user (up to 4094 characters for UNIX or 255 characters for Windows) lsfRusage (%f) The following fields contain resource usage information for the job (see getrusage (2)). If the value of some field is unavailable (due to job exit or the difference among the operating systems), -1 will be logged.
lsb.
lsb.
lsb.acct runtimeEstimation (%d) Estimated run time for the job jobGroupName (%s) Job group name EVENT_ADRSV_FINISH An advance reservation has expired.
lsb.
lsb.applications lsb.applications The lsb.applications file defines application profiles. Use application profiles to define common parameters for the same type of jobs, including the execution requirements of the applications, the resources they require, and how they should be run and managed. This file is optional. Use the DEFAULT_APPLICATION parameter in lsb.params to specify a default application profile for all jobs. LSF does not automatically assign a default application profile.
lsb.
lsb.applications Default Not defined. Run limit and runtime estimate are normalized. BIND_JOB Syntax BIND_JOB=Y | N Description When BIND_JOB=Y, enables LSF job process processor binding for all sequential jobs submitted to the application profile. On Linux execution hosts that support this feature, job processes are hard bind to selected processors. If processor binding feature is not configured with the BIND_JOB parameter in an application profile in lsb.applications, the lsf.
lsb.applications If checkpoint-related configuration is specified in the queue, application profile, and at job level: • • Application-level and job-level parameters are merged. If the same parameter is defined at both job-level and in the application profile, the job-level value overrides the application profile value. The merged result of job-level and application profile settings override queue-level configuration.
lsb.applications Description Specifies the checkpoint period for the application in minutes. CHKPNT_DIR must be set in the application profile for this parameter to take effect. The running job is checkpointed automatically every checkpoint period. Specify a positive integer. Job-level command line values override the application profile and queue level configurations. Application profile level configuration overrides the queue level configuration.
lsb.applications • • Reduces communication between sbatchd and mbatchd and reduces scheduling overhead in mbschd. Increases job throughput in mbatchd and CPU utilization on the execution hosts. However, throughput can deteriorate if the chunk job size is too big. Performance may decrease on profiles with CHUNK_JOB_SIZE greater than 30. You should evaluate the chunk job size on your own systems for best performance.
lsb.applications Description Normalized CPU time allowed for all processes of a job running in the application profile. The name of a host or host model specifies the CPU time normalization host to use. Limits the total CPU time the job can use. This parameter is useful for preventing runaway jobs or jobs that use up too many resources. When the total CPU time for the whole job has reached the limit, a SIGXCPU signal is sent to all processes belonging to the job.
lsb.applications Default Unlimited DESCRIPTION Syntax DESCRIPTION=text Description Description of the application profile. The description is displayed by bapp -l. The description should clearly describe the service features of the application profile to help users select the proper profile for each job. The text can include any characters, including white space. The text can be extended to multiple lines by ending the preceding line with a backslash (\).
lsb.applications DJOB_ENV_SCRIPT Syntax DJOB_ENV_SCRIPT=script_name Description Defines the name of a user-defined script for setting and cleaning up the parallel or distributed job environment. The specified script must support a setup argument and a cleanup argument. The script is executed by LSF with the setup argument before launching a parallel or distributed job, and with argument cleanup after the job is finished. The script runs as the user, and is part of the job.
lsb.applications DJOB_RU_INTERVAL Syntax DJOB_RU_INTERVAL=seconds Description Value in seconds used to calculate the resource usage update interval for the tasks of a parallel or distributed job. This parameter only applies to the blaunch distributed application framework. When DJOB_RU_INTERVAL is specified, the interval is scaled according to the number of tasks in the job: max(DJOB_RU_INTERVAL, 10) + host_factor where host_factor = 0.01 * number of hosts allocated for the job Default Not defined.
lsb.applications JOB_POSTPROC_TIMEOUT Syntax JOB_POSTPROC_TIMEOUT=minutes Description Specifies a timeout in minutes for job post-execution processing. The specified timeout must be greater than zero If post-execution processing takes longer than the timeout, sbatchd reports that postexecution has failed (POST_ERR status), and kills the process group of the job’s post-execution processes. Only the parent process of the post-execution command is killed when the timeout expires.
lsb.applications By default, the user commands run after the job starter. A special string, %USRCMD, can be used to represent the position of the user’s job in the job starter command line. The % USRCMD string and any additional commands must be enclosed in quotation marks (" "). Example JOB_STARTER=csh -c "%USRCMD;sleep 10" In this case, if a user submits a job bsub myjob arguments the command that actually runs is: csh -c "myjob arguments;sleep 10" Default Not defined.
lsb.applications MAX_PREEXEC_RETRY Syntax MAX_PREEXEC_RETRY=integer Description The maximum number of times to attempt the pre-execution command of a job. Valid values 0 < MAX_PREEXEC_RETRY < INFINIT_INT INFINIT_INT is defined in lsf.h. Default 5 MEMLIMIT Syntax MEMLIMIT=integer Description The per-process (soft) process resident set size limit for all of the processes belonging to a job running in the application profile.
lsb.applications • • Sun Solaris 2.x Windows LSF memory limit enforcement To enable LSF memory limit enforcement, set LSB_MEMLIMIT_ENFORCE in lsf.conf to y. LSF memory limit enforcement explicitly sends a signal to kill a running process once it has allocated memory past MEMLIMIT. You can also enable LSF memory limit enforcement by setting LSB_JOB_MEMLIMIT in lsf.conf to y.
lsb.applications • • • • • JOB: Applies a memory limit identified in a job and enforced by LSF. When the sum of the memory allocated to all processes of the job exceeds the memory limit, LSF kills the job. PROCESS TASK: Enables both process-level memory limit enforced by OS and task-level memory limit enforced by LSF. PROCESS JOB: Enables both process-level memory limit enforced by OS and job-level memory limit enforced by LSF.
lsb.applications Description Required. Unique name for the application profile. Specify any ASCII string up to 60 characters long. You can use letters, digits, underscores (_), dashes (-), periods (.) or spaces in the name. The application profile name must be unique within the cluster. Note: If you want to specify the ApplicationVersion in a JSDL file, include the version when you define the application profile name.
lsb.applications For example, if the job run limit is 60 minutes and NO_PREEMPT_RUN_TIME=50%, the job cannot be preempted after it running 30 minutes or longer. Requires a run time (bsub -We or RUNTIME in lsb.applications), or run limit to be specified for the job (bsub -W, or RUNLIMIT in lsb.queues, or RUNLIMIT in lsb.applications) POST_EXEC Syntax POST_EXEC=command Description Enables post-execution processing at the application level.
lsb.applications $USER_POSTEXEC when LSB_PRE_POST_EXEC_USER=root. For Windows: • • • The pre- and post-execution commands run under cmd.exe /c The standard input, standard output, and standard error are set to NULL The PATH is determined by the setup of the LSF Service Note: For post-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe. Default Not defined.
lsb.applications • • • The pre- and post-execution commands run under cmd.exe /c The standard input, standard output, and standard error are set to NULL The PATH is determined by the setup of the LSF Service Note: For pre-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe. Default Not defined. No pre-execution commands are associated with the application profile.
lsb.applications • • Two limits—The first is the minimum processor limit, and the second one is the maximum. The default is set equal to the minimum. The minimum must be less than or equal to the maximum. Three limits—The first is the minimum processor limit, the second is the default processor limit, and the third is the maximum. The minimum must be less than the default and the maximum.
lsb.applications Default Not defined, Jobs in the application profile are not requeued. RERUNNABLE Syntax RERUNNABLE=yes | no Description If yes, enables automatic job rerun (restart) for any job associated with the application profile. Rerun is disabled when RERUNNABLE is set to no. The yes and no arguments are not casesensitive. Members of a chunk job can be rerunnable.
lsb.applications When LSF_STRICT_RESREQ=Y in lsf.conf, LSF rejects resource requirement strings where an rusage section contains a non-consumable resource select section The select section defined at the application, queue, and job level must all be satisfied. rusage section The rusage section can specify additional requests. To do this, use the OR (||) operator to separate additional rusage strings. The job-level rusage section takes precedence.
lsb.applications Define span[hosts=-1] in the application profile or in bsub R resource requirement string to disable the span section setting in the queue. Default select[type==local] order[r15s:pg] If this parameter is defined and a host model or Boolean resource is specified, the default type is any. RESUME_CONTROL Syntax RESUME_CONTROL=signal | command Remember: Unlike the JOB_CONTROLS parameter in lsb.queues, the RESUME_CONTROL parameter does not require square brackets ([ ]) around the action.
lsb.applications RTASK_GONE_ACTION Syntax RTASK_GONE_ACTION="[KILLJOB_TASKDONE | KILLJOB_TASKEXIT] [IGNORE_TASKCRASH]" Description Defines the actions LSF should take if it detects that a remote task of a parallel or distributed job is gone. This parameter only applies to the blaunch distributed application framework. IGNORE_TASKCRASH A remote task crashes. LSF does nothing. The job continues to launch the next task. KILLJOB_TASKDONE A remote task exits with zero value.
lsb.applications maximum run limit are rejected. Application-level limits override any default limit specified in the queue. Note: If you want to provide an estimated run time for scheduling purposes without killing jobs that exceed the estimate, define the RUNTIME parameter in the application profile, or submit the job with -We instead of a run limit. The run limit is in the form of [hour:]minute. The minutes can be specified as a number greater than 59.
lsb.applications The job-level runtime estimate specified by bsub -We overrides the RUNTIME setting in an application profile. The following LSF features use the RUNTIME value to schedule jobs: • • • • • Job chunking Advanced reservation SLA Slot reservation Backfill Default Not defined STACKLIMIT Syntax STACKLIMIT=integer Description The per-process (soft) stack segment size limit for all of the processes belonging to a job from this queue (see getrlimit(2)).
lsb.applications SUSPEND_CONTROL Syntax SUSPEND_CONTROL=signal | command | CHKPNT Remember: Unlike the JOB_CONTROLS parameter in lsb.queues, the SUSPEND_CONTROL parameter does not require square brackets ([ ]) around the action. • • signal is a UNIX signal name (for example, SIGTSTP). The specified signal is sent to the job. The same set of signals is not supported on all UNIX systems.
lsb.applications after being suspended by LSF. For example, SUSPEND_CONTROL=bkill $LSB_JOBPIDS; command Default • • On UNIX, by default, SUSPEND sends SIGTSTP for parallel or interactive jobs and SIGSTOP for other jobs. On Windows, actions equivalent to the UNIX signals have been implemented to do the default job control actions. Job control messages replace the SIGINT and SIGTERM signals, but only customized applications are able to process them.
lsb.applications Do not specify a signal followed by an action that triggers the same signal. For example, do not specify TERMINATE_CONTROL=bkill. This causes a deadlock between the signal and the action. CHKPNT is a special action, which causes the system to checkpoint the job. The job is checkpointed and killed automatically. • • Description Changes the behavior of the TERMINATE action in LSF.
lsb.applications USE_PAM_CREDS Syntax USE_PAM_CREDS=y | n Description If USE_PAM_CREDS=y, applies PAM limits to an application when its job is dispatched to a Linux host using PAM. PAM limits are system resource limits defined in limits.conf. When USE_PAM_CREDS is enabled, PAM limits override others. If the execution host does not have PAM configured and this parameter is enabled, the job fails. For parallel jobs, only takes effect on the first execution host. Overrides MEMLIMIT_TYPE=Process.
lsb.events lsb.events The LSF batch event log file lsb.events is used to display LSF batch event history and for mbatchd failure recovery. Whenever a host, job, or queue changes status, a record is appended to the event log file. The file is located in LSB_SHAREDIR/cluster_name/logdir, where LSB_SHAREDIR must be defined in lsf.conf(5) and cluster_name is the name of the LSF cluster, as returned by lsid. See mbatchd(8) for the description of LSB_SHAREDIR. The bhist command searches the most current lsb.
lsb.events • • • • • • • • • • • • JOB_REQUEUE JOB_CLEAN JOB_EXCEPTION JOB_EXT_MSG JOB_ATTA_DATA JOB_CHUNK SBD_UNREPORTED_STATUS PRE_EXEC_START JOB_FORCE GRP_ADD GRP_MOD LOG_SWITCH JOB_NEW A new job has been submitted.
lsb.
lsb.
lsb.
lsb.
lsb.events The time of the event jobId (%d) Job ID numReserHosts (%d) Number of reserved hosts in the remote cluster If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .hosts listed in the reserHosts field. cluster (%s) Remote cluster name reserHosts (%s) List of names of the reserved hosts in the remote cluster If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is logged in a shortened format.
lsb.events The fields in order of occurrence are: Version number (%s) The version number Event time (%d) The time of the event jobId (%d) Job ID jStatus (%d) Job status, (4, indicating the RUN status of the job) jobPid (%d) Job process ID jobPGid (%d) Job process group ID hostFactor (%f) CPU factor of the first execution host numExHosts (%d) Number of processors used for execution If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .
lsb.events Placement information of HPC jobs JOB_START_ACCEPT A job has started on the execution host(s). The fields in order of occurrence are: Version number (%s) The version number Event time (%d) The time of the event jobId (%d) Job ID jobPid (%d) Job process ID jobPGid (%d) Job process group ID idx (%d) Job array index JOB_STATUS The status of a job changed after dispatch.
lsb.events Job completion time ru (%d) Resource usage flag lsfRusage (%s) Resource usage statistics, see exitStatus (%d) Exit status of the job, see idx (%d) Job array index exitInfo (%d) Job termination reason, see duration4PreemptBackfill How long a backfilled job can run; used for preemption backfill jobs JOB_SWITCH A job switched from one queue to another (bswitch).
lsb.events The version number Event time (%d) The time of the event userId (%d) UNIX user ID of the user invoking the command jobId (%d) Job ID position (%d) Position number base (%d) Operation code, (TO_TOP or TO_BOTTOM), see idx (%d) Job array index userName (%s) Name of the job submitter QUEUE_CTRL A job queue has been altered.
lsb.events HOST_CTRL A batch server host changed status. The fields in order of occurrence are: Version number (%s) The version number Event time (%d) The time of the event opCode (%d) Operation code, see host (%s) Host name userId (%d) UNIX user ID of the user invoking the command userName (%s) Name of the user ctrlComments (%s) Administrator comment text from the -C option of badmin host control commands hclose and hopen MBD_START The mbatchd has started.
lsb.events Version number (%s) The version number Event time (%d) The time of the event master (%s) Master host name numRemoveJobs (%d) Number of finished jobs that have been removed from the system and logged in the current event file exitCode (%d) Exit code from mbatchd ctrlComments (%s) Administrator comment text from the -C option of badmin mbdrestart UNFULFILL Actions that were not taken because the mbatchd was unable to contact the sbatchd on the job execution host.
lsb.events If set to true, then parameters for the job cannot be modified. idx (%d) Job array index LOAD_INDEX mbatchd restarted with these load index names (see lsf.cluster(5)). The fields in order of occurrence are: Version number (%s) The version number Event time (%d) The time of the event nIdx (%d) Number of index names name (%s) List of index names JOB_SIGACT An action on a job has been taken.
lsb.events Action status: 1: Action started 2: One action preempted other actions 3: Action succeeded 4: Action Failed signalSymbol (%s) Action name, accompanied by actFlags idx (%d) Job array index MIG A job has been migrated (bmig).
lsb.events The time of the event jobIdStr (%s) Job ID options (%d) Bit flags for job modification options processing options2 (%d) Bit flags for job modification options processing delOptions (%d) Delete options for the options field delOptions2 (%d) Delete options for the options2 field userId (%d) UNIX user ID of the submitter userName (%s) User name submitTime (%d) Job submission time umask (%d) File creation mask for this job numProcessors (%d) Number of processors requested for execution.
lsb.events Name of job queue to which the job was submitted numAskedHosts (%d) Number of candidate host names askedHosts (%s) List of names of candidate hosts for job dispatching; blank if the last field value is 0.
lsb.
lsb.events Job pre-execution command mailUser (%s) Mail user name projectName (%s) Project name niosPort (%d) Callback port if batch interactive job maxNumProcessors (%d) Maximum number of processors. The value 2147483646 means the maximum number of processors is undefined.
lsb.
lsb.events Name of the job submitter JOB_EXECUTE This is created when a job is actually running on an execution host.
lsb.events duration4PreemptBackfill How long a backfilled job can run; used for preemption backfill jobs JOB_REQUEUE This is created when a job ended and requeued by mbatchd. The fields in order of occurrence are: Version number (%s) The version number Event time (%d) The time of the event jobId (%d) Job ID idx (%d) Job array index JOB_CLEAN This is created when a job is removed from the mbatchd memory.
lsb.events exceptMask (%d) Exception Id 0x01: missched 0x02: overrun 0x04: underrun 0x08: abend 0x10: cantrun 0x20: hostfail 0x40: startfail actMask (%d) Action Id 0x01: kill 0x02: alarm 0x04: rerun 0x08: setexcept timeEvent (%d) Time Event, for missched exception specifies when time event ended. exceptInfo (%d) Except Info, pending reason for missched or cantrun exception, the exit code of the job for the abend exception, otherwise 0.
lsb.events Index in the list userId (%d) Unique user ID of the user invoking the command dataSize (%ld) Size of the data if it has any, otherwise 0 postTime (%ld) Message sending time dataStatus (%d) Status of the attached data desc (%s) Text description of the message userName (%s) Name of the author of the message JOB_ATTA_DATA An update on the data status of a message for a job has been sent.
lsb.events JOB_CHUNK This is created when a job is inserted into a chunk. If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, older daemons and commands (pre-LSF Version 6.0) cannot recognize the lsb.events file format.
lsb.events jobPGid (%d) Job process group ID newStatus (%d) New status of the job reason (%d) Pending or suspending reason code, see suspreason (%d) Pending or suspending subreason code, see lsfRusage The following fields contain resource usage information for the job (see getrusage (2)). If the value of some field is unavailable (due to job exit or the difference among the operating systems), -1 will be logged.
lsb.
lsb.events 4: Action Failed sigValue (%d) Signal value seq (%d) Sequence status of the job idx (%d) Job array index jRusage The following fields contain resource usage information for the job. If the value of some field is unavailable (due to job exit or the difference among the operating systems), -1 will be logged. Times are measured in seconds, and sizes are measured in KB.
lsb.events exitInfo (%d) Job termination reason, see PRE_EXEC_START A pre-execution command has been started.
lsb.events additionalInfo (%s) Placement information of HPC jobs JOB_FORCE A job has been forced to run with brun. Version number (%s) The version number Event time (%d) The time of the event jobId (%d) Job ID userId (%d) UNIX user ID of the user invoking the command idx (%d) Job array index options (%d) Bit flags for job processing numExecHosts (%ld) Number of execution hosts If LSF_HPC_EXTENSIONS="SHORT_EVENTFILE" is specified in lsf.conf, the value of this field is the number of .
lsb.
lsb.
lsb.hosts lsb.hosts The lsb.hosts file contains host-related configuration information for the server hosts in the cluster. It is also used to define host groups and host partitions. This file is optional. All sections are optional. By default, this file is installed in LSB_CONFDIR/cluster_name/configdir. Changing lsb.hosts configuration After making any changes to lsb.hosts, run badmin reconfig to reconfigure mbatchd. Host section Description Optional.
lsb.hosts host type A host type defined in lsf.shared. default The reserved host name default indicates all hosts in the cluster not otherwise referenced in the section (by name or by listing its model or type). CHKPNT Description If C, checkpoint copy is enabled. With checkpoint copy, all opened files are automatically copied to the checkpoint directory by the operating system when a process is checkpointed.
lsb.hosts Default Not defined JL/U Description Per-user job slot limit for the host. Maximum number of job slots that each user can use on this host. Example HOST_NAME JL/U hostA 2 Default Unlimited MIG Syntax MIG=minutes Description Enables automatic job migration and specifies the migration threshold for checkpointable or rerunnable jobs, in minutes. LSF automatically migrates jobs that have been in the SSUSP state for more than the specified number of minutes.
lsb.hosts By default, the number of running and suspended jobs on a host cannot exceed the number of job slots. If preemptive scheduling is used, the suspended jobs are not counted as using a job slot. On multiprocessor hosts, to fully use the CPU resource, make the number of job slots equal to or greater than the number of processors.
lsb.hosts 1.6 or the pg index goes above 20. HostA only accepts batch jobs from 19:00 on Friday evening until 8:30 Monday morning and overnight from 20:00 to 8:30 on all other days. For hosts of type SUNSOL, the pg index does not have host-specific thresholds and such hosts are only available overnight from 23:00 to 8:00. The entry with host name default applies to each of the other hosts in the cluster. Each host can run up to two jobs at the same time, with at most one job from each user.
lsb.hosts Default N (the specified host group is not condensed) GROUP_MEMBER Description A space-delimited list of host names or previously defined host group names, enclosed in one pair of parentheses. You cannot use more than one pair of parentheses to define the list. The names of hosts and host groups can appear on multiple lines because hosts can belong to multiple groups. The reserved name all specifies all hosts in the cluster.
lsb.hosts • • • groupA includes hostA and hostD. groupB includes hostF and hostK, along with all hosts in groupA. The group membership of groupC is defined externally and retrieved by the egroup executable.
lsb.hosts HostPartition section Description Optional. Used with host partition user-based fairshare scheduling. Defines a host partition, which defines a user-based fairshare policy at the host level. Configure multiple sections to define multiple partitions. The members of a host partition form a host group with the same name as the host partition. Limitations on queue configuration • • If you configure a host partition, you cannot configure fairshare at the queue level.
lsb.hosts Description Specifies the hosts in the partition, in a space-separated list. A host cannot belong to multiple partitions. A host group cannot be empty. Hosts that are not included in any host partition are controlled by the FCFS scheduling policy instead of the fairshare scheduling policy. Optionally, use the reserved host name all to configure a single partition that applies to all hosts in a cluster.
lsb.hosts • number_shares • • Specify a positive integer representing the number of shares of the cluster resources assigned to the user. The number of shares assigned to each user is only meaningful when you compare it to the shares assigned to other users or to the total number of shares. The total number of shares is just the sum of all the shares assigned in each share assignment.
lsb.modules lsb.modules The lsb.modules file contains configuration information for LSF scheduler and resource broker modules. The file contains only one section, named PluginModule. This file is optional. If no scheduler or resource broker modules are configured, LSF uses the default scheduler plugin modules named schmod_default and schmod_fcfs. The lsb.modules file is stored in the directory LSB_CONFDIR/cluster_name/configdir, where LSB_CONFDIR is defined in lsf.conf. Changing lsb.
lsb.modules SCH_PLUGIN Description Required. The SCH_PLUGIN column specifies the shared module name for the LSF scheduler plugin. Each plugin requires a corresponding license. Scheduler plugins are called in the order they are listed in the PluginModule section. By default, all shared modules for scheduler plugins are located in LSF_LIBDIR. On UNIX, you can also specify a full path to the name of the scheduler plugin.
lsb.modules schmod_preemption Enables the LSF preemption scheduler features. schmod_advrsv Handles jobs that use advance reservations (brsvadd, brsvs, brsvdel, bsub -U) schmod_cpuset Handles jobs that use IRIX cpusets (bsub -ext;sched] "CPUSET[cpuset_options]") The schmod_cpuset plugin name must be configured after the standard LSF plugin names in the PluginModule list.
lsb.modules configuring your modules under SCH_PLUGIN in the PluginModules section of lsb.modules. The directory LSF_TOP/7.0/misc/examples/external_plugin/ contains sample plugin code. See Platform LSF Programmer’s Guide for more detailed information about writing, building, and configuring your own custom scheduler plugins. RB_PLUGIN Description RB_PLUGIN specifies the shared module name for resource broker plugins.
lsb.modules 1. In the order phase, the scheduler applies policies such as FCFS, Fairshare and Hostpartition and consider job priorities within user groups and share groups. By default, job priority within a pool of jobs from the same user is based on how long the job has been pending. 2. For resource intensive jobs (jobs requiring a lot of CPUs or a large amount of memory), resource reservation is performed so that these jobs are not starved. 3.
lsb.params lsb.params The lsb.params file defines general parameters used by the LSF system. This file contains only one section, named Parameters. mbatchd uses lsb.params for initialization. The file is optional. If not present, the LSF-defined defaults are assumed. Some of the parameters that can be defined in lsb.params control timing within the system. The default settings provide good throughput for long-running batch jobs while adding a minimum of processing overhead in the batch daemons.
lsb.params • • • RUNLIMIT queue-level parameter in lsb.queues RUNLIMIT application-level parameter in lsb.applications RUNTIME parameter in lsb.applications The run time estimates and limits are not normalized by the host CPU factor. Default Not defined (run limit and run time estimate are normalized) ACCT_ARCHIVE_AGE Syntax ACCT_ARCHIVE_AGE=days Description Enables automatic archiving of LSF accounting log files, and specifies the archive interval.
lsb.params ACCT_ARCHIVE_TIME Syntax ACCT_ARCHIVE_TIME=hh:mm Description Enables automatic archiving of LSF accounting log file lsb.acct, and specifies the time of day to archive the current log file. See also • • • ACCT_ARCHIVE_SIZE also enables automatic archiving ACCT_ARCHIVE_TIME also enables automatic archiving MAX_ACCT_ARCHIVE_FILE enables automatic deletion of the archives Default Not defined (no time set for archiving lsb.
lsb.params • • Jobs with a CPU limit, run limit, or run time estimate less than or equal to 90 are chunked Jobs with a CPU limit, run limit, or run time estimate greater than 90 are not chunked Default Not defined CLEAN_PERIOD Syntax CLEAN_PERIOD=seconds Description For non-repetitive jobs, the amount of time that job records for jobs that have finished or have been killed are kept in mbatchd core memory after they have finished.
lsb.params Description If enabled, condenses all host-based pending reasons into one generic pending reason. If enabled, you can request a full pending reason list by running the following command: badmin diagnose jobId Tip: You must be LSF administrator or a queue administrator to run this command.
lsb.params Default Not defined. When a user submits a job without explicitly specifying an application profile, and no default application profile is defined by this parameter, LSF does not associate the job with any application profile. DEFAULT_HOST_SPEC Syntax DEFAULT_HOST_SPEC=host_name | host_model Description The default CPU time normalization host for the cluster.
lsb.params • • • • • Job group names cannot contain more than one slash character (/) in a row. For example, job group names like DEFAULT_JOBGROUP=/A//B or DEFAULT_JOBGROUP=A////B are not correct. Job group names cannot contain spaces. For example, DEFAULT_JOBGROUP=/A/B C/D is not correct. Project names and user names used for macro substitution with %p and %u cannot start or end with slash character (/).
lsb.params Default Not defined. When a user submits a job to LSF without explicitly specifying a queue, and there are no candidate default queues defined (by this parameter or by the user’s environment variable LSB_DEFAULTQUEUE), LSF automatically creates a new queue named default, using the default configuration, and submits the job to that queue. This parameter is set at installation to DEFAULT_QUEUE=normal.
lsb.params EADMIN_TRIGGER_DURATION Syntax EADMIN_TRIGGER_DURATION=minutes Description Defines how often LSF_SERVERDIR/eadmin is invoked once a job exception is detected. Used in conjunction with job exception handling parameters JOB_IDLE, JOB_OVERRUN, and JOB_UNDERRUN in lsb.queues. Tip: Tune EADMIN_TRIGGER_DURATION carefully. Shorter values may raise false alarms, longer values may not trigger exceptions frequently enough.
lsb.params ENABLE_EVENT_STREAM Syntax ENABLE_EVENT_STREAM=Y | N Description Used only with event streaming for system performance analysis tools, such as the Platform LSF reporting feature. Defaults For new and upgrade installations, the event streaming feature is enabled (ENABLE_EVENT_STREAM=Y). If ENABLE_EVENT_STREAM is not defined, event streaming is not enabled (ENABLE_EVENT_STREAM=N).
lsb.params Description Defines job resume permissions. When this parameter is defined: • If the value is Y, users can resume their own jobs that have been suspended by the administrator. • If the value is N, jobs that are suspended by the administrator can only be resumed by the administrator or root; users do not have permission to resume a job suspended by another user or the administrator. Administrators can resume jobs suspended by users or administrators.
lsb.params Note: Avoid setting the value to exactly 30 seconds, because this will trigger the default behavior and cause mbatchd to synchronize the data every time an event is logged. Default Not defined See also LSB_LOCALDIR in lsf.conf EXIT_RATE_TYPE Syntax EXIT_RATE_TYPE=[JOBEXIT | JOBEXIT_NONLSF] [JOBINIT] [HPCINIT] Description When host exception handling is configured (EXIT_RATE in lsb.hosts or GLOBAL_EXIT_RATE in lsb.params), specifies the type of job exit to be handled.
lsb.params Description Specifies a cluster-wide threshold for exited jobs. If EXIT_RATE is not specified for the host in lsb.hosts, GLOBAL_EXIT_RATE defines a default exit rate for all hosts in the cluster. Host-level EXIT_RATE overrides the GLOBAL_EXIT_RATE value. If the global job exit rate is exceeded for 5 minutes or the period specified by JOB_EXIT_RATE_DURATION, LSF invokes LSF_SERVERDIR/eadmin to trigger a host exception. Example GLOBAL_EXIT_RATE=10 defines a job exit rate of 10 jobs for all hosts.
lsb.params be dispatched to a host all at once. This can overload your system to the point that it will be unable to create any more processes. It is not recommended to set this parameter to 0. JOB_ACCEPT_INTERVAL set at the queue level (lsb.queues) overrides JOB_ACCEPT_INTERVAL set at the cluster level (lsb.params). Default 1 JOB_ATTA_DIR Syntax JOB_ATTA_DIR=directory Description The shared directory in which mbatchd saves the attached data of messages posted with the bpost command.
lsb.params JOB_DEP_LAST_SUB Description Used only with job dependency scheduling. If set to 1, whenever dependency conditions use a job name that belongs to multiple jobs, LSF evaluates only the most recently submitted job. Otherwise, all the jobs with the specified name must satisfy the dependency condition. Default Not defined JOB_EXIT_RATE_DURATION Description Defines how long LSF waits before checking the job exit rate for a host. Used in conjunction with EXIT_RATE in lsb.
lsb.params JOB_INCLUDE_POSTPROC Syntax JOB_INCLUDE_POSTPROC=Y | N Description Specifies whether LSF includes the post-execution processing of the job as part of the job.
lsb.params JOB_POSTPROC_TIMEOUT Syntax JOB_POSTPROC_TIMEOUT=minutes Description Specifies a timeout in minutes for job post-execution processing. The specified timeout must be greater than zero. If post-execution processing takes longer than the timeout, sbatchd reports that postexecution has failed (POST_ERR status), and kills the entire process group of the job’s postexecution processes on UNIX and Linux.
lsb.params Default Not defined Example JOB_PRIORITY_OVER_TIME=3/20 Specifies that every 20 minute interval increment to job priority of pending jobs by 3. See also MAX_USER_PRIORITY JOB_RUNLIMIT_RATIO Syntax JOB_RUNLIMIT_RATIO=integer | 0 Description Specifies a ratio between a job run limit and the runtime estimate specified by bsub -We or bmod -We. The ratio does not apply to the RUNTIME parameter in lsb.applications.
lsb.
lsb.params If JOB_SPOOL_DIR is not accessible or does not exist, files are spooled to the default job output directory $HOME/.lsbatch. For bsub -is and bsub -Zs, JOB_SPOOL_DIR must be readable and writable by the job submission user, and it must be shared by the master host and the submission host.
lsb.params Description UNIX only. Specifies the time interval in seconds between sending SIGINT, SIGTERM, and SIGKILL when terminating a job. When a job is terminated, the job is sent SIGINT, SIGTERM, and SIGKILL in sequence with a sleep time of JOB_TERMINATE_INTERVAL between sending the signals. This allows the job to clean up if necessary.
lsb.params When you define this parameter, mbatchd periodically obtains the host status from the master LIM, and then verifies the status by polling each sbatchd at an interval defined by the parameters MBD_SLEEP_TIME and LSB_MAX_PROBE_SBD. Default Not defined. mbatchd obtains and reports host status, without contacting the master LIM, by polling each sbatchd at an interval defined by the parameters MBD_SLEEP_TIME and LSB_MAX_PROBE_SBD. See also MBD_SLEEP_TIME LSB_MAX_PROBE_SBD in lsf.
lsb.params • • If mbatchd is using multithreading, a dedicated query port is defined by the parameter LSB_QUERY_PORT in lsf.conf. When mbatchd has a dedicated query port, the value of MAX_CONCURRENT_JOB_QUERY sets the maximum number of job queries for each child mbatchd that is forked by mbatchd. This means that the total number of job queries can be more than the number specified by MAX_CONCURRENT_JOB_QUERY.
lsb.params Description The number of subdirectories under the LSB_SHAREDIR/cluster_name/logdir/info directory. When MAX_INFO_DIRS is enabled, mbatchd creates the specified number of subdirectories in the info directory. These subdirectories are given an integer as its name, starting with 0 for the first subdirectory.
lsb.params MAX_JOB_ATTA_SIZE Syntax MAX_JOB_ATTA_SIZE=integer | 0 Specify any number less than 20000. Description Maximum attached data size, in KB, that can be transferred to a job. Maximum size for data attached to a job with the bpost command. Useful if you use bpost and bread to transfer large data files between jobs and you want to limit the usage in the current working directory. 0 indicates that jobs cannot accept attached data files. Default Not defined.
lsb.params MAX_JOB_PREEMPT Syntax MAX_JOB_PREEMPT=integer Description The maximum number of times a job can be preempted. Applies to queue-level jobs only. Valid values 0 < MAX_JOB_PREEMPT < INFINIT_INT INFINIT_INT is defined in lsf.h. Default Not defined. The number of preemption times is unlimited. MAX_JOB_REQUEUE Syntax MAX_JOB_REQUEUE=integer Description The maximum number of times to requeue a job automatically. Valid values 0 < MAX_JOB_REQUEUE < INFINIT_INT INFINIT_INT is defined in lsf.h.
lsb.params You cannot lower the job ID limit, but you can raise it to 10 digits. This allows longer term job accounting and analysis, and means you can have more jobs in the system, and the job ID numbers will roll over less often. LSF assigns job IDs in sequence. When the job ID limit is reached, the count rolls over, so the next job submitted gets job ID "1". If the original job 1 remains in the system, LSF skips that number and assigns job ID "2", or the next available job ID.
lsb.params This is the hard system-wide pending job threshold. No user or user group can exceed this limit unless the job is forwarded from a remote cluster. If the user or user group submitting the job has reached the pending job threshold as specified by MAX_PEND_JOBS, LSF will reject any further job submission requests sent by that user or user group. The system will continue to send the job submission requests with the interval specified by SUB_TRY_INTERVAL in lsb.
lsb.params all file descriptors to sbatchd connection. This could cause mbatchd to run out of descriptors, which results in an mbatchd fatal error, such as failure to open lsb.events. Use together with LSB_MAX_JOB_DISPATCH_PER_SESSION in lsf.conf. Example A reasonable setting is: MAX_SBD_CONNS=768 For a large cluster, specify a value equal to the number of hosts in your cluster plus a buffer.
lsb.params Description Enables user-assigned job priority and specifies the maximum job priority a user can assign to a job. LSF administrators can assign a job priority higher than the specified value. Compatibility User-assigned job priority changes the behavior of btop and bbot. Example MAX_USER_PRIORITY=100 Specifies that 100 is the maximum job priority that can be specified by a user.
lsb.params MBD_EGO_TIME2LIVE Syntax MBD_EGO_TIME2LIVE=minutes Description For EGO-enabled SLA scheduling, specifies how long EGO should keep information about host allocations in case mbatchd restarts, Default 1440 minutes (24 hours) MBD_QUERY_CPUS Syntax MBD_QUERY_CPUS=cpu_list cpu_list defines the list of master host CPUS on which the mbatchd child query processes can run. Format the list as a white-space delimited list of CPU numbers.
lsb.params mbschd daemon processes to specific CPUs so that higher priority daemon processes can run more efficiently. To get best performance, CPU allocation for all four daemons should be assigned their own CPUs. For example, on a 4 CPU SMP host, the following configuration will give the best performance: EGO_DAEMONS_CPUS=0 LSF_DAEMONS_CPUS=1:2 MBD_QUERY_CPUS=3 Default Not defined See also LSF_DAEMONS_CPUS in lsf.
lsb.params • • • • Sun Solaris, 2500 threads per process AIX, 512 threads per process Digital, 256 threads per process HP-UX, 64 threads per process Default 5 seconds if min_refresh_time is not defined or if MBD_REFRESH_TIME is defined value is less than 5; 300 seconds if the defined value is more than 300 min_refresh_time default is 10 seconds See also LSB_QUERY_PORT in lsf.
lsb.params If you set MBD_USE_EGO_MXJ=Y, you can configure only one service class, including the default SLA. Default N (mbatcthd uses the LSF MXJ) MC_PENDING_REASON_PKG_SIZE Syntax MC_PENDING_REASON_PKG_SIZE=kilobytes | 0 Description MultiCluster job forwarding model only. Pending reason update package size, in KB. Defines the maximum amount of pending reason data this cluster will send to submission clusters in one cycle.
lsb.params Default 10 MC_RUSAGE_UPDATE_INTERVAL Syntax MC_RUSAGE_UPDATE_INTERVAL=seconds Description MultiCluster only. Enables resource use updating for MultiCluster jobs running on hosts in the cluster and specifies how often to send updated information to the submission or consumer cluster. Default 300 MIN_SWITCH_PERIOD Syntax MIN_SWITCH_PERIOD=seconds Description The minimum period in seconds between event log switches.
lsb.params Description Enables a child mbatchd to get up to date information about new jobs from the parent mbatchd. When set to Y, job queries with bjobs display new jobs submitted after the child mbatchd was created. If you have enabled multithreaded mbatchd support, the bjobs command may not display up-to-date information if two consecutive query commands are issued before a child mbatchd expires because child mbatchd job information is not updated.
lsb.params For example, if the job run limit is 60 minutes and NO_PREEMPT_FINISH_TIME=10%, the job cannot be preempted after it running 54 minutes or longer. Requires a run time (bsub -We or RUNTIME in lsb.applications), or run limit to be specified for the job (bsub -W, or RUNLIMIT in lsb.queues, or RUNLIMIT in lsb.
lsb.params Description For Cray NQS compatibility only. If the NQS version on a Cray is NQS 80.42 or NQS 71.3, this parameter does not need to be defined. If the version is NQS 1.1 on a Cray, set this parameter to 251918848. This is the is the qstat flag that LSF uses to retrieve requests on Cray in long format. For other versions of NQS on a Cray, run the NQS qstat command. The value of Npk_int [1] in the output is the value you need for this parameter.
lsb.params PEND_REASON_UPDATE_INTERVAL Syntax PEND_REASON_UPDATE_INTERVAL=seconds Description Time interval that defines how often pending reasons are calculated by the scheduling daemon mbschd. Default 30 seconds PG_SUSP_IT Syntax PG_SUSP_IT=seconds Description The time interval that a host should be interactively idle (it > 0) before jobs suspended because of a threshold on the pg load index can be resumed.
lsb.params Counts only running jobs when evaluating if a user group is approaching its total job slot limit (SLOTS, PER_USER=all, and HOSTS in the lsb.resources file). Suspended jobs are ignored when this keyword is used. When preemptive scheduling is enabled, suspended jobs never count against the total job slot limit for individual users. HOST_JLU Counts only running jobs when evaluating if a user or user group is approaching its per-host job slot limit (SLOTS and USERS in the lsb.resources file).
lsb.params Enables preemption of and preemption by exclusive jobs. LSB_DISABLE_LIMLOCK_EXCL=Y in lsf.conf must also be defined. BACKFILL Enables preemption of backfill jobs. Jobs from higher priority queues can preempt jobs from backfill queues that are either backfilling reserved job slots or running as normal jobs. Default Not defined. Exclusive and backfill jobs are only preempted if the exclusive low priority job is running on a different host than the one used by the preemptive high priority job.
lsb.params RESOURCE_RESERVE_PER_SLOT Syntax RESOURCE_RESERVE_PER_SLOT=y | Y Description If Y, mbatchd reserves resources based on job slots instead of per-host. By default, mbatchd only reserves static resources for parallel jobs on a per-host basis. For example, by default, the command: bsub -n 4 -R "rusage[mem=500]" -q reservation my_job requires the job to reserve 500 MB on each host where the job runs. Some parallel jobs need to reserve resources based on job slots, rather than by host.
lsb.params In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the total run time of a user’s running jobs. Default 0.7 SBD_SLEEP_TIME Syntax SBD_SLEEP_TIME=seconds Description The interval at which LSF checks the load conditions of each host, to decide whether jobs on the host must be suspended or resumed. The job-level resource usage information is updated at a maximum frequency of every SBD_SLEEP_TIME seconds.
lsb.params Description Set a default performance metric sampling period in seconds. Cannot be less than 60 seconds. Use badmin perfmon setperiod to dynamically change performance metric sampling period. Default 60 seconds SLA_TIMER Syntax SLA_TIMER=seconds Description For EGO-enabled SLA scheduling. Controls how often each service class is evaluated and a network message is sent to EGO communicating host demand.
lsb.params Description Enables Windows workgroup account mapping, which allows LSF administrators to map all Windows workgroup users to a single Windows system account, eliminating the need to create multiple users and passwords in LSF. Users can submit and run jobs using their local user names and passwords, and LSF runs the jobs using the mapped system account name and password. With Windows workgroup account mapping, all users have the same permissions because all users map to the same system account.
lsb.queues lsb.queues The lsb.queues file defines batch queues. Numerous controls are available at the queue level to allow cluster administrators to customize site policies. This file is optional; if no queues are configured, LSF creates a queue named default, with all parameters set to default values. This file is installed by default in LSB_CONFDIR/cluster_name/configdir. Changing lsb.queues configuration After making any changes to lsb.queues, run badmin reconfig to reconfigure mbatchd. lsb.
lsb.
lsb.queues ADMINISTRATORS Syntax ADMINISTRATORS=user_name | user_group ... Description List of queue administrators. To specify a Windows user account or user group, include the domain name in uppercase letters (DOMAIN_NAME\user_name or DOMAIN_NAME \user_group). Queue administrators can perform operations on any user’s job in the queue, as well as on the queue itself. Default Not defined. You must be a cluster administrator to operate on this queue.
lsb.queues Default Not defined BACKFILL Syntax BACKFILL=Y | N Description If Y, enables backfill scheduling for the queue. A possible conflict exists if BACKFILL and PREEMPTION are specified together. A backfill queue cannot be preemptable. Therefore, if BACKFILL is enabled, do not also specify PREEMPTION=PREEMPTABLE. BACKFILL is required for interruptible backfill queues (INTERRUPTIBLE_BACKFILL=seconds). Default Not defined. No backfilling.
lsb.queues lsb.applications) of both submission cluster and execution cluster. LSF uses the directory specified in the execution cluster. To make a MultiCluster job checkpointable, both submission and execution queues must enable checkpointing, and the application profile or queue setting on the execution cluster determines the checkpoint directory. Checkpointing is not supported if a job runs on a leased host.
lsb.queues CHUNK_JOB_SIZE = 4 End Queue Default Not defined CORELIMIT Syntax CORELIMIT=integer Description The per-process (hard) core file size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)).
lsb.queues The number of minutes may be greater than 59. Therefore, three and a half hours can be specified either as 3:30 or 210. If no host or host model is given with the CPU time, LSF uses the default CPU time normalization host defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it has been configured, otherwise uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC in lsb.
lsb.queues Default Not defined DEFAULT_HOST_SPEC Syntax DEFAULT_HOST_SPEC=host_name | host_model Description The default CPU time normalization host for the queue. The CPU factor of the specified host or host model is used to normalize the CPU time limit of all jobs in the queue, unless the CPU time normalization host is specified at the job level. Default Not defined. The queue uses the DEFAULT_HOST_SPEC defined in lsb.params.
lsb.queues This avoids having users with higher fairshare priority getting jobs dispatched from lowpriority queues. Jobs in queues having the same priority are dispatched according to user priority. Queues that are not part of the cross-queue fairshare can have any priority; the are not limited to fall outside of the priority range of cross-queue fairshare queues. Default Not defined DISPATCH_WINDOW Syntax DISPATCH_WINDOW=time_window ...
lsb.queues • • • A single user (specify user_name). To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name). Users in a group, individually (specify group_name@) or collectively (specify group_name). To specify a Windows user group, include the domain name in uppercase letters (DOMAIN_NAME\group_name).
lsb.queues Notes • • • • By default, the PRIORITY range defined for queues in cross-queue fairshare cannot be used with any other queues. For example, you have 4 queues: queue1, queue2, queue3, queue4. You configure cross-queue fairshare for queue1, queue2, queue3 and assign priorities of 30, 40, 50 respectively. By default, the priority of queue4 (which is not part of the cross-queue fairshare) cannot fall between the priority range of the cross-queue fairshare queues (30-50).
lsb.queues HJOB_LIMIT Syntax HJOB_LIMIT=integer Description Per-host job slot limit. Maximum number of job slots that this queue can use on any host. This limit is configured per host, regardless of the number of processors it may have. This may be useful if the queue dispatches jobs that require a node-locked license. If there is only one node-locked license per host then the system should not dispatch more than one job to the host even if it is a multiprocessor host.
lsb.queues If host groups and host partitions are included in the list, the job can run on any host in the group or partition. All the members of the host list should either belong to a single host partition or not belong to any host partition. Otherwise, job scheduling may be affected. Some items can be followed by a plus sign (+) and a positive number to indicate the preference for dispatching a job to that host. A higher number indicates a higher preference.
lsb.queues Hosts that participate in queue-based fairshare cannot be in a host partition. Compatibility Host preferences specified by bsub -m override the queue specification. Example 1 HOSTS=hostA+1 hostB hostC+1 hostD+3 This example defines three levels of preferences: run jobs on hostD as much as possible, otherwise run on either hostA or hostC if possible, otherwise run on hostB. Jobs should not run on hostB unless all other hosts are too busy to accept more jobs.
lsb.queues Description If Y, disables deadline constraint scheduling (starts all jobs regardless of deadline constraints). IMPT_JOBBKLG Syntax IMPT_JOBBKLG=integer |infinit Description MultiCluster job forwarding model only. Specifies the MultiCluster pending job limit for a receive-jobs queue. This represents the maximum number of MultiCluster jobs that can be pending in the queue; once the limit has been reached, the queue stops accepting jobs from remote clusters.
lsb.queues Specify the minimum number of seconds for the job to be considered for backfilling.This minimal time slice depends on the specific job properties; it must be longer than at least one useful iteration of the job. Multiple queues may be created if a site has jobs of distinctively different classes.
lsb.queues JOB_ACCEPT_INTERVAL set at the queue level (lsb.queues) overrides JOB_ACCEPT_INTERVAL set at the cluster level (lsb.params). Default Not defined. The queue uses JOB_ACCEPT_INTERVAL defined in lsb.params, which has a default value of 1. JOB_ACTION_WARNING_TIME Syntax JOB_ACTION_WARNING_TIME=[hour:]minute Description Specifies the amount of time before a job control action occurs that a job warning action is to be taken.
lsb.queues JOB_CONTROLS=TERMINATE[brequeue]. This causes a deadlock between the signal and the action. • CHKPNT is a special action, which causes the system to checkpoint the job. Only valid for SUSPEND and TERMINATE actions: • • If the SUSPEND action is CHKPNT, the job is checkpointed and then stopped by sending the SIGSTOP signal to the job automatically. If the TERMINATE action is CHKPNT, then the job is checkpointed and killed automatically.
lsb.queues JOB_IDLE Syntax JOB_IDLE=number Description Specifies a threshold for idle job exception handling. The value should be a number between 0.0 and 1.0 representing CPU time/runtime. If the job idle factor is less than the specified threshold, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job idle exception. The minimum job run time before mbatchd reports that the job is idle is defined as DETECT_IDLE_JOB_AFTER in lsb.params. Valid Values Any positive number between 0.0 and 1.
lsb.queues Description Creates a specific environment for submitted jobs prior to execution. starter is any executable that can be used to start the job (i.e., can accept the job as an input argument). Optionally, additional strings can be specified. By default, the user commands run after the job starter. A special string, %USRCMD, can be used to represent the position of the user’s job in the job starter command line.
lsb.queues A job warning action must be specified with a job action warning time in order for job warning to take effect. If JOB_WARNING_ACTION is specified, LSF sends the warning action to the job before the actual control action is taken. This allows the job time to save its result before being terminated by the job control action. The warning action specified by the bsub -wa option overrides JOB_WARNING_ACTION in the queue.
lsb.queues These two lines translate into a loadSched condition of mem>=100 && swap>=200 and a loadStop condition of mem < 10 || swap < 30 Default Not defined MANDATORY_EXTSCHED Syntax MANDATORY_EXTSCHED=external_scheduler_options Description Specifies mandatory external scheduling options for the queue. -extsched options on the bsub command are merged with MANDATORY_EXTSCHED options, and MANDATORY_EXTSCHED options override any conflicting job-level options set by -extsched.
lsb.queues Valid values 0 < MAX_JOB_REQUEUE < INFINIT_INT INFINIT_INT is defined in lsf.h. Default Not defined. The number of requeue times is unlimited MAX_PREEXEC_RETRY Syntax MAX_PREEXEC_RETRY=integer Description The maximum number of times to attempt the pre-execution command of a job. Valid values 0 < MAX_PREEXEC_RETRY < INFINIT_INT INFINIT_INT is defined in lsf.h. Default 5 MAX_RSCHED_TIME Syntax MAX_RSCHED_TIME=integer | infinit Description MultiCluster job forwarding model only.
lsb.queues MEMLIMIT Syntax MEMLIMIT=[default_limit] maximum_limit Description The per-process (hard) process resident set size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)). Sets the maximum amount of physical memory (resident set size, RSS) that may be allocated to a process. By default, if a default memory limit is specified, jobs submitted to the queue without a joblevel memory limit are killed when the default memory limit is reached.
lsb.queues Example The following configuration defines a queue with a memory limit of 5000 KB: Begin Queue QUEUE_NAME = default DESCRIPTION = Queue with memory limit of 5000 kbytes MEMLIMIT = 5000 End Queue Default Unlimited MIG Syntax MIG=minutes Description Enables automatic job migration and specifies the migration threshold for checkpointable or rerunnable jobs, in minutes. LSF automatically migrates jobs that have been in the SSUSP state for more than the specified number of minutes.
lsb.queues NICE Syntax NICE=integer Description Adjusts the UNIX scheduling priority at which jobs from this queue execute. The default value of 0 (zero) maintains the default scheduling priority for UNIX interactive jobs. This value adjusts the run-time priorities for batch jobs on a queue-by-queue basis, to control their effect on other batch or interactive jobs. See the nice(1) manual page for more details.
lsb.queues PJOB_LIMIT Syntax PJOB_LIMIT=float Description Per-processor job slot limit for the queue. Maximum number of job slots that this queue can use on any processor. This limit is configured per processor, so that multiprocessor hosts automatically run more jobs. Default Unlimited POST_EXEC Syntax POST_EXEC=command Description Enables post-execution processing at the queue level. The POST_EXEC command runs on the execution host after the job finishes.
lsb.queues Note: The path name for the post-execution command must be an absolute path. Do not define POST_EXEC= $USER_POSTEXEC when LSB_PRE_POST_EXEC_USER=root. For Windows: • • • The pre- and post-execution commands run under cmd.
lsb.queues • The stdin, stdout, and stderr are set to /dev/null For Windows: • • • The pre- and post-execution commands run under cmd.exe /c The standard input, standard output, and standard error are set to NULL The PATH is determined by the setup of the LSF Service Note: For pre-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe. Default Not defined. No pre-execution commands are associated with the queue.
lsb.queues with higher relative preference levels are preempted before queues with lower relative preference levels set. hi_queue_name Specifies the names of higher-priority queues that can preempt jobs in this queue. To specify multiple queues, separate the queue names with a space and enclose the list in a single set of square brackets.
lsb.queues PRIORITY Syntax PRIORITY=integer Description Specifies the relative queue priority for dispatching jobs. A higher value indicates a higher jobdispatching priority, relative to other queues. LSF schedules jobs from one queue at a time, starting with the highest-priority queue. If multiple queues have the same priority, LSF schedules all the jobs from these queues in firstcome, first-served order. LSF queue priority is independent of the UNIX scheduler priority system for time-sharing processes.
lsb.queues Job-level processor limits (bsub -n) override queue-level PROCLIMIT. Job-level limits must fall within the maximum and minimum limits of the application profile and the queue. Application-level PROCLIMIT in lsb.applications overrides queue-level specificiation. Optionally specifies the minimum and default number of job slots.
lsb.queues QUEUE_NAME Syntax QUEUE_NAME=string Description Required. Name of the queue. Specify any ASCII string up to 59 characters long. You can use letters, digits, underscores (_) or dashes (-). You cannot use blank spaces. You cannot specify the reserved name default. Default You must specify this parameter to define a queue. The default queue automatically created by LSF is named default. RCVJOBS_FROM Syntax RCVJOBS_FROM=cluster_name ... | allclusters Description MultiCluster only.
lsb.queues Jobs are requeued to the head of the queue. The output from the failed run is not saved, and the user is not notified by LSF. Define an exit code as EXCLUDE(exit_code) to enable exclusive job requeue. Exclusive job requeue does not work for parallel jobs. For MultiCluster jobs forwarded to a remote execution cluster, the exit values specified in the submission cluster with the EXCLUSIVE keyword are treated as if they were non-exclusive.
lsb.queues Default no RESOURCE_RESERVE Syntax RESOURCE_RESERVE=MAX_RESERVE_TIME[integer] Description Enables processor reservation and memory reservation for pending jobs for the queue. Specifies the number of dispatch turns (MAX_RESERVE_TIME) over which a job can reserve job slots and memory. Overrides the SLOT_RESERVE parameter. If both RESOURCE_RESERVE and SLOT_RESERVE are defined in the same queue, an error is displayed when the cluster is reconfigured, and SLOT_RESERVE is ignored.
lsb.queues Description Resource requirements used to determine eligible hosts. Specify a resource requirement string as usual. The resource requirement string lets you specify conditions in a more flexible manner than using the load thresholds. The select section defined at the queue level must be satisfied at in addition to any job-level requirements or load thresholds. The rusage section can specify additional requests. To do this, use the OR (||) operator to separate additional rusage strings.
lsb.queues Resource requirements determined by the queue no longer apply to a running job after running badmin reconfig, For example, if you change the RES_REQ parameter in a queue and reconfigure the cluster, the previous queue-level resource requirements for running jobs are lost. Default select[type==local] order[r15s:pg]. If this parameter is defined and a host model or Boolean resource is specified, the default type is any.
lsb.queues Description The maximum run limit and optionally the default run limit. The name of a host or host model specifies the runtime normalization host to use. By default, jobs that are in the RUN state for longer than the specified maximum run limit are killed by LSF. You can optionally provide your own termination job action to override this default. Jobs submitted with a job-level run limit (bsub -W) that is less than the maximum run limit are killed when their job-level run limit is reached.
lsb.queues RUNLIMIT is required for queues configured with INTERRUPTIBLE_BACKFILL. Default Unlimited SLOT_POOL Syntax SLOT_POOL=pool_name Description Name of the pool of job slots the queue belongs to for queue-based fairshare. A queue can only belong to one pool. All queues in the pool must share the same set of hosts. Valid value Specify any ASCII string up to 60 characters long. You can use letters, digits, underscores (_) or dashes (-). You cannot use blank spaces. Default Not defined.
lsb.queues lsb.queues), or if an estimated run time is specified at the application level (RUNTIME in lsb.applications), backfill parallel jobs can use job slots reserved by the other jobs, as long as the backfill job can finish before the predicted start time of the jobs with the reservation. Unlike memory reservation, which applies both to sequential and parallel jobs, slot reservation applies only to parallel jobs.
lsb.queues STACKLIMIT Syntax STACKLIMIT=integer Description The per-process (hard) stack segment size limit (in KB) for all of the processes belonging to a job from this queue (see getrlimit(2)). Default Unlimited STOP_COND Syntax STOP_COND=res_req Use the select section of the resource requirement string to specify load thresholds. All other sections are ignored. Description LSF automatically suspends a running job in this queue if the load on the host satisfies the specified conditions.
lsb.queues The action taken when a job exceeds its SWAPLIMIT or PROCESSLIMIT is to send SIGQUIT, SIGINT, SIGTERM, and SIGKILL in sequence. For CPULIMIT, SIGXCPU is sent before SIGINT, SIGTERM, and SIGKILL. Default Unlimited TERMINATE_WHEN Syntax TERMINATE_WHEN=[LOAD] [PREEMPT] [WINDOW] Description Configures the queue to invoke the TERMINATE action instead of the SUSPEND action in the specified circumstance. • • • LOAD — kills jobs when the load exceeds the suspending thresholds.
lsb.queues Both the default and the maximum limits must be positive integers. The default limit must be less than the maximum limit. The default limit is ignored if it is greater than the maximum limit. Examples THREADLIMIT=6 No default thread limit is specified. The value 6 is the default and maximum thread limit. THREADLIMIT=6 8 The first value (6) is the default thread limit. The second value (8) is the maximum thread limit.
lsb.queues Default n USERS Syntax USERS=all [~user_name ...] [~user_group ...] | [user_name ...] [user_group [~user_group ...] ...] Description A space-separated list of user names or user groups that can submit jobs to the queue. Use the reserved word all to specify all LSF users. LSF cluster administrators are automatically included in the list of users, so LSF cluster administrators can submit jobs to this queue, or switch any user’s jobs into this queue, even if they are not listed.
lsb.queues constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command. The expressions are evaluated by LSF every 10 minutes based on mbatchd start time. When an expression evaluates true, LSF dynamically changes the configuration based on the associated configuration statements. Reconfiguration is done in real time without restarting mbatchd, providing continuous system availability. Example Begin Queue ...
lsb.resources lsb.resources The lsb.resources file contains configuration information for resource allocation limits, exports, and resource usage limits. This file is optional. The lsb.resources file is stored in the directory LSB_CONFDIR/cluster_name/configdir, where LSB_CONFDIR is defined in lsf.conf. Changing lsb.resources configuration After making any changes to lsb.resources, run badmin reconfig to reconfigure mbatchd.
lsb.resources • • • • USERS or PER_USER QUEUES or PER_QUEUE HOSTS or PER_HOST PROJECTS or PER_PROJECT Each subsequent row describes the configuration information for resource consumers and the limits that apply to them. Each line must contain an entry for each keyword. Use empty parentheses () or a dash (-) to specify the default value for an entry. Fields cannot be left blank. For resources, the default is no limit; for consumers, the default is all consumers.
lsb.resources Jobs that do not match these limits; that is, all users except user1 and user3 running jobs on hostA and all users except user2 submitting jobs to queue normal, have no limits.
lsb.resources • USERS HOSTS Syntax HOSTS=all [~]host_name ... | all [~]host_group ... HOSTS ( [-] | all [~]host_name ... | all [~]host_group ... ) Description A space-separated list of hosts, host groups defined in lsb.hosts on which limits are enforced. Limits are enforced on all hosts or host groups listed. If a group contains a subgroup, the limit also applies to each member in the subgroup recursively. To specify a per-host limit, use the PER_HOST keyword.
lsb.resources JOBS Syntax JOBS=integer JOBS - | integer Description Maximum number of running or suspended (RUN, SSUSP, USUSP) jobs available to resource consumers. Specify a positive integer greater than or equal 0. Job limits can be defined in both vertical and horizontal limit formats. With MultiCluster resource lease model, this limit applies only to local hosts being used by the local cluster.
lsb.resources LICENSE Syntax LICENSE=[license_name,integer] [[license_name,integer] ...] LICENSE ( [license_name,integer] [[license_name,integer] ...] ) Description Maximum number of specified software licenses available to resource consumers. The value must be a positive integer greater than or equal to zero. Software licenses must be defined as decreasing numeric shared resources in lsf.shared. The RESOURCE keyword is a synonym for the LICENSE keyword.
lsb.resources In vertical tabular format, use empty parentheses () or a dash (-) to indicate the default value (no limit). Fields cannot be left blank. If only QUEUES are configured in the Limit section, MEM must be an integer value. MEM is the maximum amount of memory available to the listed queues for any hosts, users, or projects. If only USERS are configured in the Limit section, MEM must be an integer value.
lsb.resources PER_HOST Syntax PER_HOST=all [~]host_name ... | all [~]host_group ... PER_HOST ( [-] | all [~]host_name ... | all [~]host_group ... ) Description A space-separated list of host or host groups defined in lsb.hosts on which limits are enforced. Limits are enforced on each host or individually to each host of the host group listed. If a group contains a subgroup, the limit also applies to each member in the subgroup recursively.
lsb.resources Do not configure PER_PROJECT and PROJECTS limits in the same Limit section. In horizontal format, use only one PER_PROJECT line per Limit section. Use the keyword all to configure limits that apply to each project in a cluster. Use the not operator (~) to exclude projects from the all specification in the limit. In vertical tabular format, multiple project names must be enclosed in parentheses. In vertical tabular format, use empty parentheses () or a dash (-) to indicate each project.
lsb.resources PER_USER Syntax PER_USER=all [~]user_name ... | all [~]user_group ... PER_USER ( [-] | all [~]user_name ... | all [~]user_group ... ) Description A space-separated list of user names or user groups on which limits are enforced. Limits are enforced on each user or individually to each user in the user group listed. If a user group contains a subgroup, the limit also applies to each member in the subgroup recursively. User names must be valid login names.
lsb.resources To specify a per-project limit, use the PER_PROJECT keyword. Do not configure PROJECTS and PER_PROJECT limits in the same Limit section. In horizontal format, use only one PROJECTS line per Limit section. Use the keyword all to configure limits that apply to all projects in a cluster. Use the not operator (~) to exclude projects from the all specification in the limit. This is useful if you have a large number of projects but only want to exclude a few projects from the limit definition.
lsb.resources Example QUEUES=normal night RESOURCE Syntax RESOURCE=[shared_resource,integer] [[shared_resource,integer] ...] RESOURCE ( [[shared_resource,integer] [[shared_resource,integer] ...] ) Description Maximum amount of any user-defined shared resource available to consumers. The RESOURCE keyword is a synonym for the LICENSE keyword. You can use RESOURCE to configure software licenses. You cannot specify RESOURCE and LICENSE in the same Limit section.
lsb.resources If JOBS are configured in the Limit section, the most restrictive limit is applied. If HOSTS are configured in the Limit section, SLOTS is the number of running and suspended jobs on a host. If preemptive scheduling is used, the suspended jobs are not counted as using a job slot. To fully use the CPU resource on multiprocessor hosts, make the number of job slots equal to or greater than the number of processors.
lsb.resources Description Per processor job slot limit, based on the number of processors on each host affected by the limit. Maximum number of job slots that each resource consumer can use per processor. This job slot limit is configured per processor so that multiprocessor hosts will automatically run more jobs. You must also specify PER_HOST and list the hosts that the limit is to be enforced on.
lsb.resources Example SLOTS_PER_PROCESSOR=2 SWP Syntax SWP=integer[%] SWP - | integer[%] Description Maximum amount of swap space available to resource consumers. Specify a value in MB or a percentage (%) as a positive integer greater than or equal 0. If you specify a percentage, you must also specify PER_HOST and list the hosts that the limit is to be enforced on.
lsb.resources TMP Syntax TMP=integer[%] TMP - | integer[%] Description Maximum amount of tmp space available to resource consumers. Specify a value in MB or a percentage (%) as a positive integer greater than or equal 0. If you specify a percentage, you must also specify PER_HOST and list the hosts that the limit is to be enforced on. The Limit section is ignored if TMP is specified as a percentage: • • Without PER_HOST, or With HOSTS In horizontal format, use only one TMP line per Limit section.
lsb.resources ( [-] | all [~]user_name ... | all [~]user_group ... ) Description A space-separated list of user names or user groups on which limits are enforced. Limits are enforced on all users or groups listed. Limits apply to a group as a whole. If a group contains a subgroup, the limit also applies to each member in the subgroup recursively. User names must be valid login names. User group names can be LSF user groups or UNIX and Windows user groups.
lsb.resources • • • • • DISTRIBUTION MEM SLOTS SWAP TYPE PER_HOST Syntax PER_HOST=host_name... Description Required when exporting special hosts. Determines which hosts to export. Specify one or more LSF hosts by name. Separate names by space. RES_SELECT Syntax RES_SELECT=res_req Description Required when exporting workstations. Determines which hosts to export.
lsb.resources Description Required. Specifies how the exported resources are distributed among consumer clusters. The syntax for the distribution list is a series of share assignments. The syntax of each share assignment is the cluster name, a comma, and the number of shares, all enclosed in square brackets, as shown. Use a space to separate multiple share assignments. Enclose the full distribution list in a set of round brackets.
lsb.resources Description Used when exporting special hosts. Specify the amount of swap space to export on each host, in MB. Default - (provider and consumer clusters compete for available swap space) TYPE Syntax TYPE=shared Description Changes the lease type from exclusive to shared. If you export special hosts with a shared lease (using PER_HOST), you cannot specify multiple consumer clusters in the distribution policy.
lsb.resources Description Shared resource to export. This resource must be available on the hosts that are exported to the specified clusters; you cannot export resources without hosts. NINSTANCES Syntax NINSTANCES=integer Description Maximum quantity of shared resource to export. If the total number available is less than the requested amount, LSF exports all that are available. DISTRIBUTION Syntax DISTRIBUTION=([cluster_name, number_shares]...
lsb.resources Example ResourceReservation section Only user1 and user2 can make advance reservations on hostA and hostB. The reservation time window is between 8:00 a.m. and 6:00 p.m. every day: Begin ResourceReservation NAME = dayPolicy USERS = user1 user2 HOSTS = hostA hostB TIME_WINDOW = 8:00-18:00 End ResourceReservation # optional # optional # weekly recurring reservation user1 can add the following reservation for user user2 to use on hostA every Friday between 9:00 a.m. and 11:00 a.m.
lsb.resources Examples HOSTS=hgroup1 ~hostA hostB hostC Advance reservations can be created on hostB, hostC, and all hosts in hgroup1 except for hostA. HOSTS=all ~group2 ~hostA Advance reservations can be created on all hosts in the cluster, except for hostA and the hosts in group2. Default all allremote (users can create reservations on all server hosts in the local cluster, and all leased hosts in a remote cluster). NAME Syntax NAME=text Description Required.
lsb.resources where all fields are numbers with the following ranges: • • • day of the week: 0-6 (0 is Sunday) hour: 0-23 minute: 0-59 Specify a time window one of the following ways: • • • hour-hour hour:minute-hour:minute day:hour:minute-day:hour:minute The default value for minute is 0 (on the hour); the default value for day is every day of the week. You must specify at least the hour. Day of the week and minute are optional. Both the start time and end time values must use the same syntax.
lsb.resources Use the not operator (~) to exclude users or user groups from the list of users who can create reservations. Caution: The not operator does not exclude LSF administrators from the policy. Example USERS=user1 user2 Default all (all users in the cluster can create reservations) ReservationUsage section To enable greater flexibility for reserving numeric resources are reserved by jobs, configure the ReservationUsage section in lsb.
lsb.resources applications are running provided those applications are running on the same host under the same user. Assumptions and limitations • • Per-resource configuration defines resource usage for individual resources, but it does not change any existing resource limit behavior (PER_JOB, PER_SLOT). In a MultiCluster environment, you should configure resource usage in the scheduling cluster (submission cluster in lease model or receiving cluster in job forward model).
lsb.serviceclasses lsb.serviceclasses The lsb.serviceclasses file defines the service-level agreements (SLAs) in an LSF cluster as service classes, which define the properties of the SLA. This file is optional. You can configure as many service class sections as you need. Use bsla to display the properties of service classes configured in lsb.serviceclasses and dynamic information about the state of each configured service class. By default, lsb.
lsb.serviceclasses • • • • • • • DESCRIPTION EGO_RES_REQ GOALS MAX_HOST_IDLE_TIME NAME PRIORITY USER_GROUP CONSUMER Syntax CONSUMER=ego_consumer_name Description For EGO-enabled SLA service classes, the name of the EGO consumer from which hosts are allocated to the SLA. This parameter is not mandatory, but must be configured for the SLA to receive hosts from EGO. Important: CONSUMER must specify the name of a valid consumer in EGO. If a default SLA is configured with ENABLE_DEFAULT_EGO_SLA in lsb.
lsb.serviceclasses DESCRIPTION Syntax DESCRIPTION=text Description Optional. Description of the service class. Use bsla to display the description text. This description should clearly describe the features of the service class to help users select the proper service class for their jobs. The text can include any characters, including white space. The text can be extended to multiple lines by ending the preceding line with a backslash (\).
lsb.serviceclasses The time windows of multiple service-level goals can overlap. In this case, the largest number of jobs is run. An active SLA can have a status of On time if it is meeting the goal, and a status Delayed, if it is missing its goals. A service-level goal defines: throughput — expressed as finished jobs per hour and an optional time window when the goal is active.
lsb.serviceclasses the first minute of the hour (:00). If you do not specify a day, LSF assumes every day of the week. If you do specify the day, you must also specify the minute. You can specify multiple time windows, but they cannot overlap. For example: timeWindow(8:00-14:00 18:00-22:00) is correct, but timeWindow(8:00-14:00 11:00-15:00) is not valid. Tip: To configure a time window that is always open, use the timeWindow keyword with empty parentheses.
lsb.serviceclasses Example NAME=Tofino Default None. You must provide a unique name for the service class. PRIORITY Syntax PRIORITY=integer Description Required. The service class priority. A higher value indicates a higher priority, relative to other service classes. Similar to queue priority, service classes access the cluster resources in priority order. LSF schedules jobs from one service class at a time, starting with the highest-priority service class.
lsb.serviceclasses Example USER_GROUP=user1 user2 ugroup1 Default all (all users in the cluster can submit jobs to the service class) Examples • The service class Uclulet defines one deadline goal that is active during working hours between 8:30 AM and 4:00 PM. All jobs in the service class should complete by the end of the specified time window.
lsb.
lsb.users lsb.users The lsb.users file is used to configure user groups, hierarchical fairshare for users and user groups, and job slot limits for users and user groups. It is also used to configure account mappings in a MultiCluster environment. This file is optional. The lsb.users file is stored in the directory LSB_CONFDIR/cluster_name/configdir, where LSB_CONFDIR is defined in lsf.conf. Changing lsb.users configuration After making any changes to lsb.users, run badmin reconfig to reconfigure mbatchd.
lsb.users GROUP_MEMBER A list of user names or user group names that belong to the group, enclosed in parentheses and separated by spaces. Group names must not conflict with user names. User and user group names can appear on multiple lines, because users can belong to multiple groups. User groups may be defined recursively but must not create a loop. Syntax (user_name | user_group ...
lsb.users • Users not included in any other share assignment, individually (specify the keyword default@) or collectively (specify the keyword default). Note: By default, when resources are assigned collectively to a group, the group members compete for the resources on a firstcome, first-served (FCFS) basis. You can use hierarchical fairshare to further divide the shares among the group members. When resources are assigned to members of a group individually, the share assignment is recursive.
lsb.users User group names can be the LSF user groups defined previously, and/or UNIX and Windows user groups. To specify a Windows user account or user group, include the domain name in uppercase letters (DOMAIN_NAME\user_name or DOMAIN_NAME\user_group). Job slot limits apply to a group as a whole. Append the at sign (@) to a group name to make the job slot limits apply individually to each user in the group.
lsb.users \user_name or DOMAIN_NAME\user_group). Separate multiple user names by a space and enclose the list in parentheses ( ): (user4 user6) REMOTE A list of remote users or user groups in the form user_name@cluster_name or user_group@cluster_name. To specify a Windows user account or user group, include the domain name in uppercase letters (DOMAIN_NAME\user_name@cluster_name or DOMAIN_NAME\user_group@cluster_name).
lsb.users Example From 12 - 1 p.m. daily, user smith has 10 job slots, but during other hours, user has only 5 job slots.
lsf.acct lsf.acct The lsf.acct file is the LSF task log file. The LSF Remote Execution Server, RES (see res(8)), generates a record for each task completion or failure. If the RES task logging is turned on (see lsadmin(8)), it appends the record to the task log file lsf.acct.. lsf.acct structure The task log file is an ASCII file with one task record per line. The fields of each record are separated by blanks.
lsf.
lsf.
lsf.cluster lsf.cluster Contents • • • • • • About lsf.cluster Parameters section ClusterAdmins section Host section ResourceMap section RemoteClusters section About lsf.cluster This is the cluster configuration file. There is one for each cluster, called lsf.cluster.cluster_name. The cluster_name suffix is the name of the cluster defined in the Cluster section of lsf.shared. All LSF hosts are listed in this file, along with the list of LSF administrators and the installed LSF features. The lsf.cluster.
lsf.
lsf.cluster Description Specifies command-line arguments required by an elim executable on startup. Used only when the external load indices feature is enabled. Default Undefined EXINTERVAL Syntax EXINTERVAL=time_in_seconds Description Time interval, in seconds, at which the LIM daemons exchange load information On extremely busy hosts or networks, or in clusters with a large number of hosts, load may interfere with the periodic communication between LIM daemons.
lsf.cluster LSF Floating Client Although an LSF Floating Client requires a license, LSF_Float_Client does not need to be added to the PRODUCTS line. LSF_Float_Client also cannot be added as a resource for specific hosts already defined in lsf.cluster.cluster_name. Should these lines be present, they are ignored by LSF. Default Undefined FLOAT_CLIENTS_ADDR_RANGE Syntax FLOAT_CLIENTS_ADDR_RANGE=IP_address ... Description Optional.
lsf.cluster If a range is specified with fewer fields than an IP address such as 10.161, it is considered as 10.161.*.*. Address ranges are validated at configuration time so they must conform to the required format. If any address range is not in the correct format, no hosts will be accepted as LSF floating clients, and an error message will be logged in the LIM log. This parameter is limited to 2048 characters. For IPv6 addresses, the double colon symbol (::) indicates multiple groups of 16-bits of zeros.
lsf.cluster Although one correct address range is specified, because *43 is not correct format, the entire line is considered not valid. An error will be inserted in the LIM log and no hosts will be accepted to become LSF floating clients. No IPv6 hosts are allowed. FLOAT_CLIENTS_ADDR_RANGE = 3ffe All client IPv6 hosts with a domain address starting with 3ffe will be allowed access. No IPv4 hosts are allowed. FLOAT_CLIENTS_ADDR_RANGE = 3ffe:fffe::88bb:* Expands to 3ffe:fffe:0:0:0:0:88bb:*.
lsf.cluster LSF_ELIM_BLOCKTIME Syntax LSF_ELIM_BLOCKTIME=seconds Description UNIX only; used when the external load indices feature is enabled. Maximum amount of time the master external load information manager (MELIM) waits for a complete load update string from an elim executable. After the time period specified by LSF_ELIM_BLOCKTIME, the MELIM writes the last string sent by an elim in the LIM log file (lim.log.host_name) and restarts the elim.
lsf.cluster Default Undefined; external load information sent by an to the MELIM is not logged. See also LSF_ELIM_BLOCKTIME to configure how long LIM waits before restarting the ELIM. LSF_ELIM_RESTARTS to limit how many times the ELIM can be restarted. LSF_ELIM_RESTARTS Syntax LSF_ELIM_RESTARTS=integer Description UNIX only; used when the external load indices feature is enabled. Maximum number of times the master external load information manager (MELIM) can restart elim executables on a host.
lsf.cluster To enable dynamically added hosts after installation, you must define LSF_HOST_ADDR_RANGE in lsf.cluster.cluster_name, and LSF_DYNAMIC_HOST_WAIT_TIME in lsf.conf. If you enable dynamic hosts during installation, you must define an IP address range after installation to enable security. If a value is defined, security for dynamically adding and removing hosts is enabled, and only hosts with IP addresses within the specified range can be added to or removed from a cluster dynamically.
lsf.cluster Notes After you configure LSF_HOST_ADDR_RANGE, check the lim.log.host_name file to make sure this parameter is correctly set. If this parameter is not set or is wrong, this will be indicated in the log file. Examples LSF_HOST_ADDR_RANGE=100 All IPv4 and IPv6 hosts with a domain address starting with 100 will be allowed access. • • To specify only IPv4 hosts, set the value to 100.* To specify only IPv6 hosts, set the value to 100:* LSF_HOST_ADDR_RANGE=100-110.34.1-10.
lsf.cluster All IPv6 client hosts belonging to domains starting with 3ffe up to ffff and ending with 0 up to ff are allowed. No IPv4 hosts are allowed. Default Undefined (dynamic host feature disabled). If you enable dynamic hosts during installation, no security is enabled and all hosts can join the cluster. See also LSF_ENABLE_SUPPORT_IPV6 MASTER_INACTIVITY_LIMIT Syntax MASTER_INACTIVITY_LIMIT=integer Description An integer reflecting a multiple of EXINTERVAL.
lsf.cluster Description Specifies the LSF products and features that the cluster will run (you must also have a license for every product you want to run). The list of items is separated by space. The PRODUCTS parameter is set automatically during installation to include core features.
lsf.cluster The first administrator of the expanded list is considered the primary LSF administrator. The primary administrator is the owner of the LSF configuration files, as well as the working files under LSB_SHAREDIR/cluster_name. If the primary administrator is changed, make sure the owner of the configuration files and the files under LSB_SHAREDIR/cluster_name are changed as well.
lsf.cluster The LIM on the first host listed becomes the master LIM if this host is up; otherwise, that on the second becomes the master if its host is up, and so on. Also, to avoid the delays involved in switching masters if the first machine goes down, the master should be on a reliable machine. It is desirable to arrange the list such that the first few hosts in the list are always in the same subnet.
lsf.cluster nd Description Number of local disks This corresponds to the ndisks static resource. On most host types, LSF automatically determines the number of disks, and the nd parameter is ignored. nd should only count local disks with file systems on them. Do not count either disks used only for swapping or disks mounted with NFS.
lsf.cluster REXPRI Description UNIX only Default execution priority for interactive remote jobs run under the RES The range is from -20 to 20. REXPRI corresponds to the BSD-style nice value used for remote jobs. For hosts with System V-style nice values with the range 0 - 39, a REXPRI of -20 corresponds to a nice value of 0, and +20 corresponds to 39.
lsf.cluster type Description Host type as defined in the HostType section of lsf.shared The strings used for host types are determined by the system administrator: for example, SUNSOL, DEC, or HPPA. The host type is used to identify binary-compatible hosts. The host type is used as the default resource requirement. That is, if no resource requirement is specified in a placement request, the task is run on a host of the same type as the sending host. Often one host type can be used for many machine models.
lsf.cluster Example ResourceMap section Begin ResourceMap RESOURCENAME LOCATION verilog (5@[all]) local ([host1 host2] [others]) End ResourceMap The resource verilog must already be defined in the RESOURCE section of the lsf.shared file. It is a static numeric resource shared by all hosts. The value for verilog is 5. The resource local is a numeric shared resource that contains two instances in the cluster. The first instance is shared by two machines, host1 and host2.
lsf.cluster RESOURCENAME Description Name of the resource This resource name must be defined in the Resource section of lsf.shared. You must specify at least a name and description for the resource, using the keywords RESOURCENAME and DESCRIPTION. • • A resource name cannot begin with a number. A resource name cannot contain any of the following characters: : • .
lsf.cluster EQUIV Description Specify ‘Y’ to make the remote cluster equivalent to the local cluster. Otherwise, specify ‘N’. The master LIM considers all equivalent clusters when servicing requests from clients for load, host, or placement information.
lsf.cluster_name.license.acct lsf.cluster_name.license.acct This is the license accounting file. There is one for each cluster, called lsf.cluster_name.license.acct. The cluster_name variable is the name of the cluster defined in the Cluster section of lsf.shared. The lsf.cluster_name.license.acct file contains three types of configuration information: • • LSF license information MultiCluster license information lsf.cluster_name.license.
lsf.cluster_name.license.acct e_peak, s_peak, and b_peak are the peak usage values (in number of CPUs) of the E, S, and B class licenses, respectively. e_max_avail, s_max_avail, and b_max_avail are the maximum availability and usage values (in number of CPUs) of the E, S, and B class licenses, respectively. This is determined by the license that you purchased.
lsf.conf lsf.conf The lsf.conf file controls the operation of LSF. About lsf.conf lsf.conf is created during installation and records all the settings chosen when LSF was installed. The lsf.conf file dictates the location of the specific configuration files and operation of individual servers and applications. The lsf.conf file is used by LSF and applications built on top of it. For example, information in lsf.
lsf.conf Format Each entry in lsf.conf has one of the following forms: NAME=VALUE NAME= NAME="STRING1 STRING2 ..." The equal sign = must follow each NAME even if no value follows and there should be no space beside the equal sign. A value that contains multiple strings separated by spaces must be enclosed in quotation marks. Lines starting with a pound sign (#) are comments and are ignored. Do not use #if as this is reserved syntax for timebased configuration.
lsf.
lsf.
lsf.
lsf.
lsf.conf LSB_API_CONNTIMEOUT Syntax LSB_API_CONNTIMEOUT=time_seconds Description The timeout in seconds when connecting to LSF. Valid values Any positive integer or zero Default 10 See also LSB_API_RECVTIMEOUT LSB_API_RECVTIMEOUT Syntax LSB_API_RECVTIMEOUT=time_seconds Description Timeout in seconds when waiting for a reply from LSF.
lsf.conf When LSB_API_VERBOSE=N, LSF batch commands will not display a retry error meesage when LIM is not available. Default Y. Retry message is displayed to stderr. LSB_BJOBS_CONSISTENT_EXIT_CODE Syntax LSB_BJOBS_CONSISTENT_EXIT_CODE=Y | N Description When LSB_BJOBS_CONSISTENT_EXIT_CODE=Y, the bjobs command exits with 0 only when unfinished jobs are found, and 255 when no jobs are found, or a non-existent job ID is entered.
lsf.
lsf.conf Description Specifies to bpeek how to get output of a remote running job. Valid values Specify "rsh" or "lsrun" or both, in the order you want to invoke the bpeek method. Default "rsh lsrun" LSB_BPEEK_WAIT_TIME Syntax LSB_BPEEK_WAIT_TIME=seconds Description Defines how long the bpeek process waits to get the output of a remote running job. Valid values Any positive integer Default 80 seconds LSB_CHUNK_RUSAGE Syntax LSB_CHUNK_RUSAGE=y Description Applies only to chunk jobs.
lsf.conf Description Specifies the logging level of error messages from LSF batch commands. To specify the logging level of error messages for LSF commands, use LSF_CMD_LOG_MASK. To specify the logging level of error messages for LSF daemons, use LSF_LOG_MASK. LSB_CMD_LOG_MASK sets the log level and is used in combination with LSB_DEBUG_CMD, which sets the log class for LSF batch commands.
lsf.conf Description Specifies the path to the LSF command log files. Default /tmp See also LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD LSB_CPUSET_BESTCPUS Syntax LSB_CPUSET_BESTCPUS=y | Y Description If set, enables the best-fit algorithm for SGI cpusets Default Y (best-fit) LSB_CONFDIR Syntax LSB_CONFDIR=path Description Specifies the path to the directory containing the LSF configuration files.
lsf.conf See also LSF_CONFDIR LSB_CRDIR Syntax LSB_CRDIR=path Description Specifies the path and directory to the checkpointing executables on systems that support kernel-level checkpointing. LSB_CRDIR specifies the directory containing the chkpnt and restart utility programs that sbatchd uses to checkpoint or restart a job.
lsf.conf Default Not defined See also LSB_DEBUG, LSB_DEBUG_CMD, LSB_DEBUG_MBD, LSB_DEBUG_NQS, LSB_DEBUG_SBD, LSB_DEBUG_SCH, LSF_DEBUG_LIM, LSF_DEBUG_RES, LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT, LSF_LOGDIR, LSF_LIM_DEBUG, LSF_RES_DEBUG LSB_DEBUG_CMD Syntax LSB_DEBUG_CMD=log_class Description Sets the debugging log class for commands and APIs. Specifies the log class filtering to be applied to LSF batch commands or the API. Only messages belonging to the specified log class are recorded.
lsf.
lsf.conf Valid Values Valid log classes are the same as for LSB_DEBUG_CMD except for the log class LC_ELIM, which cannot be used with LSB_DEBUG_MBD. See LSB_DEBUG_CMD.
lsf.conf LSB_DEBUG_SBD Syntax LSB_DEBUG_SBD=log_class Description Sets the debugging log class for sbatchd. Specifies the log class filtering to be applied to sbatchd. Only messages belonging to the specified log class are recorded. LSB_DEBUG_SBD sets the log class and is used in combination with LSF_LOG_MASK, which sets the log level. For example: LSF_LOG_MASK=LOG_DEBUG LSB_DEBUG_SBD="LC_TRACE LC_EXEC" To specify multiple log classes, use a space-separated list enclosed in quotation marks.
lsf.conf To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example: LSB_DEBUG_SCH="LC_SCHED LC_TRACE LC_EXEC" You need to restart the daemons after setting LSB_DEBUG_SCH for your changes to take effect. Valid Values Valid log classes are the same as for LSB_DEBUG_CMD except for the log class LC_ELIM, which cannot be used with LSB_DEBUG_SCH, and LC_HPC and LC_SCHED, which are only valid for LSB_DEBUG_SCH. See LSB_DEBUG_CMD.
lsf.conf Description If set, and the job is rerunnable, the POST_EXEC configured in the queue is not executed if the job is rerun. Running of post-execution commands upon restart of a rerunnable job may not always be desirable. For example, if the post-exec removes certain files, or does other cleanup that should only happen if the job finishes successfully, use LSB_DISABLE_RERUN_POST_EXEC to prevent the post-exec from running and allow the successful continuation of the job when it reruns.
lsf.conf echkpnt.method_name and erestart.method_name. must be in LSF_SERVERDIR or in the directory specified by LSB_ECHKPNT_METHOD_DIR. Do not define LSB_ECHKPNT_METHOD=default as default is a reserved keyword to indicate to use the default echkpnt and erestart methods of LSF. You can however, specify bsub -k "my_dir method=default" my_job to indicate that you want to use the default checkpoint and restart methods. When this parameter is not defined in lsf.
lsf.conf LSB_ESUB_METHOD Syntax LSB_ESUB_METHOD="esub_application [esub_application] ..." Description Specifies a mandatory esub that applies to all job submissions. LSB_ESUB_METHOD lists the names of the application-specific esub executables used in addition to any executables specified by the bsub -a option. For example, LSB_ESUB_METHOD="dce fluent" runs LSF_SERVERDIR/esub.dce and LSF_SERVERDIR/esub.fluent for all jobs submitted to the cluster.
lsf.conf See also LSB_INTERACT_MSG_INTVAL LSB_INTERACT_MSG_INTVAL Syntax LSB_INTERACT_MSG_INTVAL=time_seconds Description Specifies the update interval in seconds for interactive batch job messages. LSB_INTERACT_MSG_INTVAL is ignored if LSB_INTERACT_MSG_ENH is not set. Job information that LSF uses to get the pending or suspension reason is updated according to the value of PEND_REASON_UPDATE_INTERVAL in lsb.params. Default Not defined.
lsf.conf Description Determines whether the CPU limit is a per-process limit enforced by the OS or whether it is a per-job limit enforced by LSF: • • The per-process limit is enforced by the OS when the CPU time of one process of the job exceeds the CPU limit. The per-job limit is enforced by LSF when the total CPU time of all processes of the job exceed the CPU limit. This parameter applies to CPU limits set when a job is submitted with bsub -c, and to CPU limits set for queues by CPULIMIT in lsb.
lsf.conf • sent to the job either when an individual process exceeds the CPU limit or the sum of the CPU time of all processes of the job exceed the limit. A job that is running may be killed by the OS or by LSF. If the parameter is changed from per-job limit enforced by LSF to per-process limit enforced by the OS (LSB_JOB_CPULIMIT=y changed to LSB_JOB_CPULIMIT=n), the job is allowed to run without limits because the per-process limit was previously disabled. See also lsb.
lsf.conf The following operating systems do not support the memory limit at the OS level and the job is allowed to run without a memory limit: • • Windows Sun Solaris 2.x Default Not defined. Per-process memory limit enforced by the OS; per-job memory limit enforced by LSF disabled Notes To make LSB_JOB_MEMLIMIT take effect, use the command badmin hrestart all to restart all sbatchds in the cluster. If LSB_JOB_MEMLIMIT is set, it overrides the setting of the parameter LSB_MEMLIMIT_ENFORCE.
lsf.conf Description If resource limits are configured for a user in the SGI IRIX User Limits Database (ULDB) domain specified in LSF_ULDB_DOMAIN, and there is no domain default, the system default is honored. If LSB_KEEP_SYSDEF_RLIMIT=n, and no resource limits are configured in the domain for the user and there is no domain default, LSF overrides the system default and sets system limits to unlimited. Default Not defined.
lsf.conf Example LSB_LOCALDIR=/usr/share/lsbatch/loginfo Default Not defined See also LSB_SHAREDIR, EVENT_UPDATE_INTERVAL in lsb.params LSB_MAILPROG Syntax LSB_MAILPROG=file_name Description Path and file name of the mail program used by LSF to send email. This is the electronic mail program that LSF uses to send system messages to the user. When LSF needs to send email to users it invokes the program defined by LSB_MAILPROG in lsf.conf.
lsf.conf Default /usr/lib/sendmail (UNIX) blank (Windows) See also LSB_MAILSERVER, LSB_MAILTO LSB_MAILSERVER Syntax LSB_MAILSERVER=mail_protocol:mail_server Description Part of mail configuration on Windows. This parameter only applies when lsmail is used as the mail program (LSB_MAILPROG=lsmail.exe).Otherwise, it is ignored. Both mail_protocol and mail_server must be indicated.
lsf.conf output. To prevent large job output files from interfering with your mail system, use LSB_MAILSIZE_LIMIT to set the maximum size in KB of the email containing the job information. Specify a positive integer. If the size of the job output email exceeds LSB_MAILSIZE_LIMIT, the output is saved to a file under JOB_SPOOL_DIR or to the default job output directory if JOB_SPOOL_DIR is not defined. The email informs users of where the job output is located.
lsf.conf See also LSB_MAILPROG, LSB_MAILSIZE_LIMIT LSB_MAX_JOB_DISPATCH_PER_SESSION Syntax LSB_MAX_JOB_DISPATCH_PER_SESSION=integer Description Defines the maximum number of jobs that mbatchd can dispatch during one job scheduling session. Both mbatchd and sbatchd must be restarted when you change the value of this parameter. If set to a value greater than 300, the file descriptor limit is increased on operating systems that support a file descriptor limit greater than 1024.
lsf.conf After modifying LSB_MAX_PROBE_SBD, use badmin mbdrestart to restart mbatchd and let the modified value take effect. If LSB_MAX_PROBE_SBD is defined, the value of MAX_SBD_FAIL in lsb.params can be less than 3. Valid Values Any positive integer between 0 and 64 Default 20 See also MAX_SBD_FAIL in lsb.params LSB_MAX_NQS_QUEUES Syntax LSB_MAX_NQS_QUEUES=nqs_queues Description The maximum number of NQS queues allowed in the LSF cluster. Required for LSF to work with NQS.
lsf.conf Valid Values String, either non-empty or empty. Default Not defined. By default, LSF displays the message "LSF is processing your request. Please wait..." Batch commands retry the connection to mbatchd at the intervals specified by the parameters LSB_API_CONNTIMEOUT and LSB_API_RECVTIMEOUT. LSB_MBD_CONNECT_FAIL_MSG Syntax LSB_MBD_CONNECT_FAIL_MSG="message_string" Description Specifies the message displayed when internal system connections to mbatchd fail.
lsf.conf Batch commands retry the connection to mbatchd at the intervals specified by the parameters LSB_API_CONNTIMEOUT and LSB_API_RECVTIMEOUT. LSB_MBD_MAX_SIG_COUNT Syntax LSB_MBD_MAX_SIG_COUNT=integer Description When a host enters an unknown state, the mbatchd attempts to retry any pending jobs. This parameter specifies the maximum number of pending signals that the mbatchd deals with concurrently in order not to overload it.
lsf.conf Description MultiCluster job forwarding model only. Specify y to make LSF email the job owner when a job is suspended after reaching the retry threshold. Default n LSB_MC_INITFAIL_RETRY Syntax LSB_MC_INITFAIL_RETRY=integer Description MultiCluster job forwarding model only. Defines the retry threshold and causes LSF to suspend a job that repeatedly fails to start. For example, specify 2 retry attempts to make LSF attempt to start a job 3 times before suspending it.
lsf.conf LSB_MIG2PEND Syntax LSB_MIG2PEND=0 | 1 Description Applies only to migrating checkpointable or rerunnable jobs. When defined with a value of 1, requeues migrating jobs instead of restarting or rerunning them on the first available host. Requeues the jobs in the PEND state in order of the original submission time and with the original job priority.
lsf.conf Default Not defined See also LSB_JOB_CPULIMIT, LSB_JOB_MEMLIMIT LSB_NCPU_ENFORCE Description When set to 1, enables parallel fairshare and considers the number of CPUs when calculating dynamic priority for queue-level user-based fairshare. LSB_NCPU_ENFORCE does not apply to host-partition user-based fairshare. For host-partition user-based fairshare, the number of CPUs is automatically considered.
lsf.conf If your cluster runs a large amount of blocking mode (bsub -K) and interactive jobs (bsub -I), response to batch queries can become very slow. If you run large number of bsub -I or bsub -K jobs, you can define the threads to the number of processors on the master host. Default Not defined LSB_PSET_BIND_DEFAULT Syntax LSB_PSET_BIND_DEFAULT=y | Y Description If set, Platform LSF HPC binds a job that is not explicitly associated with an HP-UX pset to the default pset 0.
lsf.conf • mbatchd responds to requests by forking one child mbatchd. As soon as mbatchd has forked a child mbatchd, the child mbatchd takes over and listens on the port to process more query requests. For each request, the child mbatchd creates a thread to process it. The interval used by mbatchd for forking new child mbatchds is specified by the parameter MBD_REFRESH_TIME in lsb.params.
lsf.conf LSB_RLA_HOST_LIST Syntax LSB_RLA_HOST_LIST="host_name ..." Description By default, the LSF scheduler can contact the LSF HPC topology adapter (RLA) running on any host for Linux/QsNet RMS allocation requests. LSB_RLA_HOST_LIST defines a list of hosts to restrict which RLAs the LSF scheduler contacts. If LSB_RLA_HOST_LIST is configured, you must list at least one host per RMS partition for the RMS partition to be considered for job scheduling. Listed hosts must be defined in lsf.cluster.
lsf.conf LSB_RLA_WORKDIR Syntax LSB_RLA_WORKDIR=directory Description Directory to store the LSF HPC topology adapter (RLA) status file. Allows RLA to recover its original state when it restarts. When RLA first starts, it creates the directory defined by LSB_RLA_WORKDIR if it does not exist, then creates subdirectories for each host. You should avoid using /tmp or any other directory that is automatically cleaned up by the system.
lsf.conf LSB_RMS_MAXNUMRAILS Syntax LSB_RMS_MAXNUMRAILS=integer Description Maximum number of rails in a system. Specifies a maximum value for the rails argument to the topology scheduler options specified in: • • -extsched option of bsub DEFAULT_EXTSCHED and MANDATORY_EXTSCHED in lsb.queues Default 32 LSB_RMS_MAXPTILE Syntax LSB_RMS_MAXPTILE=integer Description Maximum number of CPUs per node in a system.
lsf.conf Default Not defined LSB_SBD_PORT See LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT. LSB_SET_TMPDIR Syntax LSB_SET_TMPDIR=y | n If y, LSF sets the TMPDIR environment variable, overwriting the current value with /tmp/ job_ID.tmpdir. Default n LSB_SHAREDIR Syntax LSB_SHAREDIR=directory Description Directory in which the job history and accounting logs are kept for each cluster. These files are necessary for correct operation of the system.
lsf.conf Description Displays an abbreviated list of hosts in bjobs and bhist for a parallel job where multiple processes of a job are running on a host. Multiple processes are displayed in the following format: processes*hostA For example, if a parallel job is running 5 processes on hostA, the information is displayed in the following manner: 5*hostA Setting this parameter may improve mbatchd restart performance and accelerate event replay.
lsf.conf Description When set, and used with the -o or -e options of bsub, redirects standard output or standard error from the job directly to a file as the job runs. If LSB_STDOUT_DIRECT is not set and you use the bsub -o option, the standard output of a job is written to a temporary file and copied to the file you specify after the job finishes. LSB_STDOUT_DIRECT is not supported on Windows.
lsf.conf Causes esub to echo the message: netscape is not allowed to run in batch mode Default Not defined See also LSB_SUB_COMMAND_LINE and LSB_SUB_PARM_FILE environment variables LSB_TIME_CMD Syntax LSB_TIME_CMD=timimg_level Description The timing level for checking how long batch commands run. Time usage is logged in milliseconds; specify a positive integer.
lsf.conf LSB_TIME_RESERVE_NUMJOBS Syntax LSB_TIME_RESERVE_NUMJOBS=maximum_reservation_jobs Description Enables time-based slot reservation. The value must be positive integer. LSB_TIME_RESERVE_NUMJOBS controls maximum number of jobs using time-based slot reservation. For example, if LSB_TIME_RESERVE_NUMJOBS=4, only the top 4 jobs get their future allocation information. Use LSB_TIME_RESERVE_NUMJOBS=1 to allow only the highest priority job to get accurate start time prediction.
lsf.conf Time usage is logged in milliseconds; specify a positive integer. Example: LSB_TIME_SCH=1 Default Not defined LSB_UTMP Syntax LSB_UTMP=y | Y Description If set, enables registration of user and account information for interactive batch jobs submitted with bsub -Ip or bsub -Is. To disable utmp file registration, set LSB_UTMP to any value other than y or Y; for example, LSB_UTMP=N.
lsf.conf LSF_AM_OPTIONS Syntax LSF_AM_OPTIONS=AMFIRST | AMNEVER Description Determines the order of file path resolution when setting the user’s home directory. This variable is rarely used but sometimes LSF does not properly change the directory to the user’s home directory when the user’s home directory is automounted. Setting LSF_AM_OPTIONS forces LSF to change directory to $HOME before attempting to automount the user’s home.
lsf.conf Description Timeout when receiving a reply from LIM. EGO parameter EGO_LIM_RECVTIMEOUT Default 20 See also LSF_API_CONNTIMEOUT LSF_AUTH Syntax LSF_AUTH=eauth | ident Description Enables either external authentication or authentication by means of identification daemons. This parameter is required for any cluster that contains Windows hosts, and is optional for UNIX-only clusters.
lsf.conf example of how the eauth protocol works. You should write your own eauth executable to meet the security requirements of your cluster. LSF_AUTH_DAEMONS Syntax LSF_AUTH_DAEMONS=y | Y Description Enables LSF daemon authentication when external authentication is enabled (LSF_AUTH=eauth in the file lsf.conf). Daemons invoke eauth to authenticate each other as specified by the eauth executable. Default Not defined.
lsf.conf Default N. Processor binding is disabled. LSF_CMD_LOGDIR Syntax LSF_CMD_LOGDIR=path Description The path to the log files used for debugging LSF commands. This parameter can also be set from the command line. Default /tmp See also LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD LSF_CMD_LOG_MASK Syntax LSF_CMD_LOG_MASK=log_level Description Specifies the logging level of error messages from LSF commands.
lsf.conf • • • • • • • • • • • LOG_EMERG LOG_ALERT LOG_CRIT LOG_ERR LOG_WARNING LOG_NOTICE LOG_INFO LOG_DEBUG LOG_DEBUG1 LOG_DEBUG2 LOG_DEBUG3 Default LOG_WARNING See also LSB_CMD_LOG_MASK, LSB_CMD_LOGDIR, LSB_DEBUG, LSB_DEBUG_CMD, LSB_TIME_CMD, LSB_CMD_LOGDIR, LSF_LOG_MASK, LSF_LOGDIR, LSF_TIME_CMD LSF_CONF_RETRY_INT Syntax LSF_CONF_RETRY_INT=time_seconds Description The number of seconds to wait between unsuccessful attempts at opening a configuration file (only valid for LIM).
lsf.conf EGO parameter EGO_CONF_RETRY_MAX Default 0 See also LSF_CONF_RETRY_INT LSF_CONFDIR Syntax LSF_CONFDIR=directory Description Directory in which all LSF configuration files are installed. These files are shared throughout the system and should be readable from any host. This directory can contain configuration files for more than one cluster. The files in the LSF_CONFDIR directory must be owned by the primary LSF administrator, and readable by all LSF server hosts.
lsf.conf The operating system can assign other processes to run on the same CPU; however, if utilization of the bound CPU is lower than utilization of the unbound CPUs. Related parameters To improve scheduling and dispatch performance of all LSF daemons, you should use LSF_DAEMONS_CPUS together with EGO_DAEMONS_CPUS (in ego.conf or lsf.
lsf.conf When this parameter is set to y or Y, mbatchd, sbatchd, and RES run the executable daemons.wrap located in LSF_SERVERDIR. Default Not defined. LSF does not run the daemons.wrap executable. LSF_DEBUG_CMD Syntax LSB_DEBUG_CMD=log_class Description Sets the debugging log class for LSF commands and APIs. Specifies the log class filtering to be applied to LSF commands or the API. Only messages belonging to the specified log class are recorded.
lsf.
lsf.conf To specify multiple log classes, use a space-separated list enclosed in quotation marks. For example: LSF_DEBUG_LIM="LC_TRACE LC_EXEC" This parameter can also be defined from the command line.
lsf.conf Specifies the log class filtering to be applied to RES. Only messages belonging to the specified log class are recorded. LSF_DEBUG_RES sets the log class and is used in combination with LSF_LOG_MASK, which sets the log level. For example: LSF_LOG_MASK=LOG_DEBUG LSF_DEBUG_RES=LC_TRACE To specify multiple log classes, use a space-separated list enclosed in quotation marks.
lsf.conf See also LSF_DYNAMIC_HOST_WAIT_TIME LSF_DISABLE_LSRUN Syntax LSF_DISABLE_LSRUN=y | Y Description When defined, RES refuses remote connections from lsrun and lsgrun unless the user is either an LSF administrator or root. For remote execution by root, LSF_ROOT_REX must be defined. Other remote execution commands, such as ch and lsmake are not affected.
lsf.conf LSF_DUALSTACK_PREFER_IPV6 Syntax LSF_DUALSTACK_PREFER_IPV6=Y | y Description Define this parameter when you want to ensure that clients and servers on dual-stack hosts use IPv6 addresses only. Setting this parameter configures LSF to sort the dynamically created address lookup list in order of AF_INET6 (IPv6) elements first, followed by AF_INET (IPv4) elements, and then others. Restriction: IPv4-only and IPv6-only hosts cannot belong to the same cluster.
lsf.conf Description Enables automatic removal of dynamic hosts from the cluster and specifies the timeout value (minimum 10 minutes). To improve performance in very large clusters, you shuold disable this feature and remove unwanted hosts from the hostcache file manually. Specifies the length of time a dynamic host is unavailable before the master host removes it from the cluster. Each time LSF removes a dynamic host, mbatchd automatically reconfigures itself.
lsf.conf Recommended value An integer greater than zero, up to 60 seconds for every 1000 hosts in the cluster, for a maximum of 15 minutes. Selecting a smaller value results in a quicker response time for new hosts at the expense of an increased load on the master LIM. Example LSF_DYNAMIC_HOST_WAIT_TIME=60 A host waits 60 seconds from startup to send a request for the master LIM to add it to the cluster.
lsf.conf Default N (res and sbatchd are started manually or through operating system rc facility) LSF_EGO_ENVDIR Syntax LSF_EGO_ENVDIR=directory Description Directory where all Platform EGO configuration files are installed. These files are shared throughout the system and should be readable from any host. If LSF_ENABLE_EGO="N", this parameter is ignored and ego.conf is not loaded. Default LSF_CONFDIR/ego/cluster_name/kernel. If not defined, or commented out, /etc is assumed.
lsf.conf Note: See the IRIX resource administration documentation for information about the csaswitch command.
lsf.conf LSF_ENABLE_EGO Syntax LSF_ENABLE_EGO="Y" | "N" Description Enables Platform EGO functionality in the LSF cluster. If you set LSF_ENABLE_EGO="Y", you must set or uncomment LSF_EGO_ENVDIR in lsf.conf. If you set LSF_ENABLE_EGO="N" you must remove or comment out LSF_EGO_ENVDIR in lsf.conf.
lsf.conf Default Not defined See also LSF_DUALSTACK_PREFER_IPV6 LSF_ENVDIR Syntax LSF_ENVDIR=directory Description Directory containing the lsf.conf file. By default, lsf.conf is installed by creating a shared copy in LSF_CONFDIR and adding a symbolic link from /etc/lsf.conf to the shared copy. If LSF_ENVDIR is set, the symbolic link is installed in LSF_ENVDIR/lsf.conf. The lsf.conf file is a global environment configuration file for all LSF services and applications.
lsf.conf Description Specifies the LSF event receiver and enables event generation. Any string may be used as the LSF event receiver; this information is not used by LSF to enable the feature but is only passed as an argument to the event program. If LSF_EVENT_PROGRAM specifies a program that does not exist, event generation does not work. Default Not defined.
lsf.conf See also LSF_HOST_CACHE_PTTL LSF_HOST_CACHE_PTTL Syntax LSF_HOST_CACHE_PTTL=time_seconds Description Positive-time-to-live value in seconds. Specifies the length of time the system caches a successful DNS lookup result. If you set this value to zero (0), LSF does not cache the result. Note: Setting this parameter does not affect the negative-time-to-live value set by the parameter LSF_HOST_CACHE_NTTL. Valid values Positive integer.
lsf.conf The status of the closed host is closed_Adm. No new jobs are dispatched to this host, but currently running jobs are not suspended. RESERVE_BY_STARTTIME : LSF selects the reservation that gives the job the earliest predicted start time. By default, if multiple host groups are available for reservation, LSF chooses the largest possible reservation based on number of slots. SHORT_EVENTFILE : Compresses long host name lists when event records are written to lsb.events and lsb.
lsf.conf TASK_SWAPLIMIT: Enables enforcement of a virtual memory (swap) limit (bsub -v, bmod -v, or SWAPLIMIT in lsb.queues) for individual tasks in a parallel job. If any parallel task exceeds the swap limit, LSF terminates the entire job. Example JOB_START events in lsb.events: For a job submitted with bsub -n 64 -R "span[ptile=32]" sleep 100 Without SHORT_EVENTFILE, a JOB_START event like the following would be logged in lsb.events: "JOB_START" "7.0" 1058989891 710 4 0 0 10.
lsf.conf PGID: 257325; PIDs: 257325 257500 257482 257501 257523 257525 257531 SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched loadStop cpuspeed bandwidth loadSched loadStop << Job <109> is done successfully. >> Example bjobs -l ouput with SHORT_PIDLIST: bjobs -l displays a count of the PGIDS and PIDs: bjobs -l Job <109>, User , Project , Status , Queue , Inte ractive mode, Command <./myjob.
lsf.conf Default 0 LSF_HPC_NCPU_INCR_CYCLES Syntax LSF_HPC_NCPU_INCR_CYCLES=increment_cycles Description Minimum number of consecutive cycles where the number of CPUs changed does not exceed LSF_HPC_NCPU_INCREMENT. LSF checks total usable CPUs every 2 minutes. Default 1 LSF_HPC_NCPU_THRESHOLD Syntax LSF_HPC_NCPU_THRESHOLD=threshold Description The percentage of total usable CPUs in the LSF partition of a SLURM cluster.
lsf.conf LSF_ID_PORT Syntax LSF_ID_PORT=port_number Description The network port number used to communicate with the authentication daemon when LSF_AUTH is set to ident. Default Not defined LSF_INCLUDEDIR Syntax LSF_INCLUDEDIR=directory Description Directory under which the LSF API header files lsf.h and lsbatch.h are installed.
lsf.conf • XLSF_APPDIR=$LSF_INDEP/misc Default /usr/share/lsf/mnt See also LSF_MACHDEP, LSB_SHAREDIR, LSF_CONFDIR, LSF_INCLUDEDIR, LSF_MANDIR, XLSF_APPDIR LSF_INTERACTIVE_STDERR Syntax LSF_INTERACTIVE_STDERR=y | n Description Separates stderr from stdout for interactive tasks and interactive batch jobs. This is useful to redirect output to a file with regular operators instead of the bsub -e err_file and -o out_file options. This parameter can also be enabled or disabled as an environment variable.
lsf.conf Notes When this parameter is set, the change affects interactive tasks and interactive batch jobs run with the following commands: • • • • • • • bsub -I bsub -Ip bsub -Is lsrun lsgrun lsmake (Platform LSF Make) bsub pam (Platform LSF HPC) Limitations • • • Pseudo-terminal: Do not use this parameter if your application depends on stderr as a terminal. This is because LSF must use a non-pseudo-terminal connection to separate stderr from stdout.
lsf.conf LSF_LIBDIR Syntax LSF_LIBDIR=directory Description Specifies the directory in which the LSF libraries are installed. Library files are shared by all hosts of the same type. Default LSF_MACHDEP/lib LSF_LIC_SCHED_HOSTS Syntax LSF_LIC_SCHED_HOSTS="candidate_host_list" candidate_host_list is a space-separated list of hosts that are candidate LSF License Scheduler hosts.
lsf.conf LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE Syntax LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE=y | n Description Set this parameter to release the slot of a job that is suspended when its license is preempted by LSF License Scheduler. If you set LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE, do not set LSF_LIC_SCHED_PREEMPT_REQUEUE. If both these parameters are set, LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE is ignored.
lsf.conf Description Specifies the location for the license accounting files. These include the license accounting files for LSF Family products. Use this parameter to define the location of all the license accounting files. By defining this parameter, you can store the license accounting files for the LSF Family of products in the same directory for convenience. Default Not defined. The license accounting files are stored in the default log directory for the particular product.
lsf.conf listed, so it checks the second server when there are no more licenses available from the first server. If this parameter is not defined, LSF assumes the default location. Default If you installed LSF with a default installation, the license file is installed in the LSF configuration directory (LSF_CONFDIR/license.dat). If you installed LSF with a custom installation, you specify the license installation directory.
lsf.conf LSF_LIM_API_NTRIES Syntax LSF_LIM_API_NTRIES=integer Description Defines the number of times LSF commands will retry to communicate with the LIM API when LIM is not available. LSF_LIM_API_NTRIES is ignored by LSF and EGO daemons and EGO commands. The LSF_LIM_API_NTRIES environment variable. overrides the value of LSF_LIM_API_NTRIES in lsf.conf. Valid values 1 to 65535 Default Not defined. LIM API exits without retrying.
lsf.conf See also LSF_RES_DEBUG, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR LSF_LIM_IGNORE_CHECKSUM Syntax LSF_LIM_IGNORE_CHECKSUM=y | Y Description Configure LSF_LIM_IGNORE_CHECKSUM=Y to ignore warning messages logged to lim log files on non-master hosts. When LSF_MASTER_LIST is set, lsadmin reconfig only restarts master candidate hosts (for example, after adding or removing hosts from the cluster).
lsf.conf • • • • LSF_LIM_PORT=7869 LSF_RES_PORT=6878 LSB_MBD_PORT=6881 LSB_SBD_PORT=6882 LSF_LOAD_USER_PROFILE Syntax LSF_LOAD_USER_PROFILE=local | roaming Description When running jobs on Windows hosts, you can specify whether a user profile should be loaded. Use this parameter if you have jobs that need to access user-specific resources associated with a user profile. Local and roaming user profiles are Windows features. For more information about them, check Microsoft documentation.
lsf.conf LSF_LOCAL_RESOURCES is usually set in the slave.config file during installation. If LSF_LOCAL_RESOURCES are already defined in a local lsf.conf on the slave host, lsfinstall does not add resources you define in LSF_LOCAL_RESOURCES in slave.config. You should not have duplicate LSF_LOCAL_RESOURCES entries in lsf.conf. If local resources are defined more than once, only the last definition is valid. Important: Resources must already be mapped to hosts in the ResourceMap section of lsf.cluster.
lsf.conf • • • • • • • • • • • LOG_EMERG LOG_ALERT LOG_CRIT LOG_ERR LOG_WARNING LOG_NOTICE LOG_INFO LOG_DEBUG LOG_DEBUG1 LOG_DEBUG2 LOG_DEBUG3 The most important LSF log messages are at the LOG_ERR or LOG_WARNING level. Messages at the LOG_INFO and LOG_DEBUG level are only useful for debugging. Although message log level implements similar functionality to UNIX syslog, there is no dependency on UNIX syslog. It works even if messages are being logged to files instead of syslog.
lsf.conf Description Allows you to reduce the information logged to the LSF Windows event log files. Messages of lower severity than the specified level are discarded. For all LSF files, the types of messages saved depends on LSF_LOG_MASK, so the threshold for the Windows event logs is either LSF_LOG_MASK or LSF_LOG_MASK_WIN, whichever is higher. LSF_LOG_MASK_WIN is ignored if LSF_LOG_MASK is set to a higher level. The LSF event log files for Windows are: • • • • • lim.log.host_name res.log.
lsf.conf • On Windows XP x64 and Windows 2003 x64: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Platform Computing Corporation\LSF \cluster_name\LSF_LOGDIR If a server is unable to write in the LSF system log file directory, LSF attempts to write to the following directories in the following order: • • • • LSF_TMPDIR if defined %TMP% if defined %TEMP% if defined System directory, for example, c:\winnt UNIX If a server is unable to write in this directory, the error logs are created in /tmp on UNIX.
lsf.conf Use this parameter to enable LSF to save log files in a different location from the default local directory specified in the Windows registry.
lsf.conf Description Directory under which all man pages are installed. The man pages are placed in the man1, man3, man5, and man8 subdirectories of the LSF_MANDIR directory. This is created by the LSF installation process, and you should not need to modify this parameter. Man pages are installed in a format suitable for BSD-style man commands. For most versions of UNIX and Linux, you should add the directory LSF_MANDIR to your MANPATH environment variable.
lsf.conf If you have a large number of non-master hosts, you should configure LSF_LIM_IGNORE_CHECKSUM=Y to ignore warning messages like the following logged to lim log files on non-master hosts. Aug 26 13:47:35 2006 9746 4 7.0 xdr_loadvector: Sender <10.225.36.46:9999> has a different configuration Interaction with LSF_SERVER_HOSTS You can use the same list of hosts, or a subset of the master host list defined in LSF_MASTER_LIST, in LSF_SERVER_HOSTS.
lsf.conf LSF_MAX_TRY_ADD_HOST Syntax LSF_MAX_TRY_ADD_HOST=integer Description When a slave LIM on a dynamically added host sends an add host request to the master LIM, but master LIM cannot add the host for some reason. the slave LIM tries again. LSF_MAX_TRY_ADD_HOST specifies how many times the slave LIM retries the add host request before giving up. Default 20 LSF_MC_NON_PRIVILEGED_PORTS Syntax LSF_MC_NON_PRIVILEGED_PORTS=y | Y Description MultiCluster only.
lsf.conf LSF_NON_PRIVILEGED_PORTS Syntax LSF_NON_PRIVILEGED_PORTS=y | Y Description Disables privileged ports usage. By default, LSF daemons and clients running under root account use privileged ports to communicate with each other. Without LSF_NON_PRIVILEGED_PORTS defined, and if LSF_AUTH is not defined in lsf.conf, LSF daemons check privileged port of request message to do authentication.
lsf.conf Description Applies only to interactive batch jobs. Time interval at which NIOS polls mbatchd to check if a job is still running. Used to retrieve a job’s exit status in the case of an abnormal exit of NIOS, due to a network failure for example. Use this parameter if you run interactive jobs and you have scripts that depend on an exit code being returned.
lsf.conf Description Applies only to interactive non-parallel batch jobs. Defines how long NIOS waits before sending a message to RES to determine if the connection is still open. Use this parameter to ensure NIOS exits when a network failure occurs instead of waiting indefinitely for notification that a job has been completed. When a network connection is lost, RES cannot communicate with NIOS and as a result, NIOS does not exit.
lsf.conf Description Used to start applications that use both OpenMP and MPI. Valid values unique Default Not defined Notes At job submission, LSF reserves the correct number of processors and PAM starts only 1 process per host. For example, to reserve 32 processors and run on 4 processes per host, resulting in the use of 8 hosts: bsub -n 32 -R "span[ptile=4]" pam yourOpenMPJob Where defined This parameter can alternatively be set as an environment variable.
lsf.conf LSF_PIM_INFODIR Syntax LSF_PIM_INFODIR=path Description The path to where PIM writes the pim.info.host_name file. Specifies the path to where the process information is stored. The process information resides in the file pim.info.host_name. The PIM also reads this file when it starts so that it can accumulate the resource usage of dead processes for existing process groups. EGO parameter EGO_PIM_INFODIR Default Not defined. The system uses /tmp.
lsf.conf Use this parameter to improve job throughput and reduce a job’s start time if there are many jobs running simultaneously on a host. This parameter reduces communication traffic between sbatchd and PIM on the same host. When this parameter is not defined or set to n, sbatchd queries PIM as needed for job process information. When this parameter is defined, sbatchd does not query PIM immediately as it needs information; sbatchd only queries PIM every LSF_PIM_SLEEPTIME seconds.
lsf.conf Default 160 seconds LSF_RES_ACCT Syntax LSF_RES_ACCT=time_milliseconds | 0 Description If this parameter is defined, RES logs information for completed and failed tasks by default (see lsf.acct). The value for LSF_RES_ACCT is specified in terms of consumed CPU time (milliseconds). Only tasks that have consumed more than the specified CPU time are logged. If this parameter is defined as LSF_RES_ACCT=0, then all tasks are logged.
lsf.conf LSF_RES_ACTIVE_TIME Syntax LSF_RES_ACTIVE_TIME=time_seconds Description Time in seconds before LIM reports that RES is down. Minimum value 10 seconds Default 90 seconds LSF_RES_CONNECT_RETRY Syntax LSF_RES_CONNECT_RETRY=integer | 0 Description The number of attempts by RES to reconnect to NIOS. If LSF_RES_CONNECT_RETRY is not defined, the default value is used. Default 0 See also LSF_NIOS_RES_HEARTBEAT LSF_RES_DEBUG Syntax LSF_RES_DEBUG=1 | 2 Description Sets RES to debug mode.
lsf.conf Valid values LSF_RES_DEBUG=1 RES runs in the background with no associated control terminal. LSF_RES_DEBUG=2 RES runs in the foreground and prints error messages to tty. Default Not defined See also LSF_LIM_DEBUG, LSF_CMD_LOGDIR, LSF_CMD_LOG_MASK, LSF_LOG_MASK, LSF_LOGDIR LSF_RES_PORT See LSF_LIM_PORT, LSF_RES_PORT, LSB_MBD_PORT, LSB_SBD_PORT.
lsf.conf Default 15 LSF_ROOT_REX Syntax LSF_ROOT_REX=local Description UNIX only. Allows root remote execution privileges (subject to identification checking) on remote hosts, for both interactive and batch jobs. Causes RES to accept requests from the superuser (root) on remote hosts, subject to identification checking. If LSF_ROOT_REX is not defined, remote execution requests from user root are refused.
lsf.conf EGO parameter EGO_RSH Default Not defined Example To use an ssh command before trying rsh for LSF commands, specify: LSF_RSH="ssh -o 'PasswordAuthentication no' -o 'StrictHostKeyChecking no'" ssh options such as PasswordAuthentication and StrictHostKeyChecking can also be configured in the global SSH_ETC/ssh_config file or $HOME/.ssh/config. See also ssh, ssh_config LSF_SECUREDIR Syntax LSF_SECUREDIR=path Description Windows only; mandatory if using lsf.sudoers.
lsf.conf The host names in LSF_SERVER_HOSTS must be enclosed in quotes and separated by white space. For example: LSF_SERVER_HOSTS="hostA hostD hostB" The parameter string can include up to 4094 characters for UNIX or 255 characters for Windows. Interaction with LSF_MASTER_LIST Starting in LSF 7, LSF_MASTER_LIST must be defined in lsf.conf. You can use the same list of hosts, or a subset of the master host list, in LSF_SERVER_HOSTS.
lsf.conf Description Applies to lstcsh only. Specifies users who are allowed to use @ for host redirection. Users not specified with this parameter cannot use host redirection in lstcsh. To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name). If this parameter is not defined, all users are allowed to use @ for host redirection in lstcsh.
lsf.conf LSF_SLURM_TMPDIR Syntax LSF_SLURM_TMPDIR=path Description Specifies the LSF HPC tmp directory for SLURM clusters. The default LSF_TMPDIR /tmp cannot be shared across nodes, so LSF_SLURM_TMPDIR must specify a path that is accessible on all SLURM nodes.
lsf.conf LSF_STRIP_DOMAIN Syntax LSF_STRIP_DOMAIN=domain_suffix[:domain_suffix ...] Description (Optional) If all of the hosts in your cluster can be reached using short host names, you can configure LSF to use the short host names by specifying the portion of the domain name to remove. If your hosts are in more than one domain or have more than one domain name, you can specify more than one domain suffix to remove, separated by a colon (:).
lsf.conf LSF_TIME_LIM Syntax LSF_TIME_LIM=timing_level Description The timing level for checking how long LIM routines run. Time usage is logged in milliseconds. Specify a positive integer. EGO parameter EGO_TIME_LIM Default Not defined See also LSB_TIME_CMD, LSB_TIME_MBD, LSB_TIME_SBD, LSF_TIME_RES LSF_TIME_RES Syntax LSF_TIME_RES=timing_level Description The timing level for checking how long RES routines run. Time usage is logged in milliseconds. Specify a positive integer.
lsf.conf When LSF_TMPDIR is defined in lsf.conf, LSF creates a temporary directory under the directory specified by LSF_TMPDIR on the execution host when a job is started and sets the temporary directory environment variable (TMPDIR) for the job. The name of the temporary directory has the following format: $LSF_TMPDIR/job_ID.tmpdir On UNIX, the directory has the permission 0700 and is owned by the execution user. After adding LSF_TMPDIR to lsf.conf, use badmin hrestart all to reconfigure your cluster.
lsf.conf • • If the TMPDIR directory was created by the job RES, LSF will delete the temporary directory and its contents when the job is done If the TMPDIR directory is on a shared file system, it is assumed to be shared by all the hosts allocated to the blaunch job, so LSF does not remove TMPDIR directories created by the job RES or task RES Default By default, LSF_TMPDIR is not enabled. If LSF_TMPDIR is not specified in lsf.conf, this parameter is defined as follows: • • On UNIX: $TMPDIR/job_ID.
lsf.
lsf.conf Valid values unit indicates the unit for the resource usage limit, one of: • KB (kilobytes) • MB (megabytes) • GB (gigabytes) • TB (terabytes) • PB (petabytes) • EB (exabytes) Default KB LSF_USE_HOSTEQUIV Syntax LSF_USE_HOSTEQUIV=y | Y Description (UNIX only; optional) If LSF_USE_HOSTEQUIV is defined, RES and mbatchd call the ruserok() function to decide if a user is allowed to run remote jobs. The ruserok() function checks in the /etc/hosts.equiv file and the user’s $HOME/.
lsf.conf Important: Configure LSF_USER_DOMAIN immediately after you install LSF; changing this parameter in an existing cluster requires that you verify and possibly reconfigure service accounts, user group memberships, and user passwords. Specify one or more Windows domains, separated by a colon (:). You can enter an unlimited number of Windows domains. A period (.) specifies a local account, not a domain.
lsf.conf Description MultiCluster job forwarding model only. By default, the submission cluster does not consider remote resources. Define MC_PLUGIN_REMOTE_RESOURCE=y in the submission cluster to allow consideration of remote resources. Note: When MC_PLUGIN_REMOTE_RESOURCE is defined, only the following resource requirements are supported: -R "type==type_name", -R "same[type]" and -R "defined (resource_name)" Default Not defined. The submission cluster does not consider remote resources.
lsf.licensescheduler lsf.licensescheduler The lsf.licensescheduler file contains Platform LSF License Scheduler configuration information. All sections except ProjectGroup are required. The command blparams displays configuration information from this file. Changing lsf.licensescheduler configuration After making any changes to lsf.
lsf.licensescheduler • • • • • • • • • • LIB_RECVTIMEOUT LM_REMOVE_INTERVAL LM_STAT_INTERVAL LMSTAT_PATH LS_DEBUG_BLD LS_LOG_MASK LS_MAX_TASKMAN_SESSIONS LS_PREEMPT_PEER PORT BLC_HEARTBEAT_FACTOR ADMIN Syntax ADMIN=user_name ... Description Defines the License Scheduler administrator using a valid UNIX user account. You can specify multiple accounts. AUTH Syntax AUTH=Y Description Enables License Scheduler user authentication for projects for taskman jobs.
lsf.licensescheduler License Scheduler reports a distribution policy violation when the total number of licenses given to the LSF workload, both free and in use, is less than the LSF workload distribution specified in WORKLOAD_DISTRIBUTION. If License Scheduler finds a distribution policy violation, it creates or overwrites the LSF_LOGDIR/ bld.violation.service_domain_name.log file and runs the user command specified by the CMD keyword.
lsf.licensescheduler It must be the same as the port number specified in EXTERNAL_FILTER_SERVER in the vendor daemon option file. Use a number close to the defined value for the PORT parameter. For example, if PORT=9581, define EXT_FILTER_PORT=9582. FLX_LICENSE_FILE Syntax FLX_LICENSE_FILE=path Description Specifies a path to the file that contains the license keys FLEXnet.Ext.Filter and FLEXnet.Usage.Snapshot to enable the flex grid interface APIs.
lsf.licensescheduler Description Specifies the minimum time a job must have a license checked out before lmremove can remove the license. lmremove causes lmgrd and vendor daemons to close the TCP connection with the application. They then retry the license checkout. Default 180 seconds LM_STAT_INTERVAL Syntax LM_STAT_INTERVAL=seconds Description Defines a time interval between calls that License Scheduler makes to collect license usage information from FLEXlm license management.
lsf.licensescheduler You need to restart the bld daemon after setting LS_DEBUG_BLD for your changes to take effect. If you use the command bladmin blddebug to temporarily change this parameter without changing lsf.licensescheduler, you do not need to restart the daemons.
lsf.licensescheduler License Scheduler logs error messages in different levels so that you can choose to log all messages, or only log messages that are deemed critical. The level specified by LS_LOG_MASK determines which messages are recorded and which are discarded. All messages logged at the specified level or higher are recorded, while lower level messages are discarded. For debugging purposes, the level LOG_DEBUG contains the fewest number of debugging messages and is used for basic debugging.
lsf.licensescheduler BLC_HEARTBEAT_FACTOR Syntax BLC_HEARTBEAT_FACTOR=integer Description Enables bld to detect blcollect failure. Defines the number of times that bld receives no response from a license collector daemon (blcollect) before bld resets the values for that collector to zero. Each license usage reported to bld by the collector is treated as a heartbeat. Default 3 Clusters section Description Required. Lists the clusters that can use License Scheduler.
lsf.licensescheduler Parameters • • • • • NAME LIC_SERVERS LIC_COLLECTOR LIC_FLEX_API_ENABLE LM_STAT_INTERVAL NAME Defines the name of the service domain. LIC_SERVERS Syntax LIC_SERVERS=([(host_name | port_number@host_name |(port_number@host_name port_number@host_name port_number@host_name))] ...) Description Defines the FLEXlm license server hosts that make up the License Scheduler service domain.
lsf.licensescheduler Default Undefined. The License Scheduler daemon uses one license collector daemon for the entire cluster. LIC_FLEX_API_ENABLE Syntax LIC_FLEX_API_ENABLE=y | n Description Enables the flex grid interface APIs to replace the default behavior of scheduling based on lmstat data. You must also configure License Scheduler and your vendor daemons to work with the flex grid interface package.
lsf.licensescheduler Parameters • • • • • • • • • • • • • • • • NAME FLEX_NAME DISTRIBUTION ALLOCATION GROUP GROUP_DISTRIBUTION LOCAL_TO LS_FEATURE_PERCENTAGE NON_SHARED_DISTRIBUTION PREEMPT_RESERVE SERVICE_DOMAINS WORKLOAD_DISTRIBUTION ENABLE_DYNAMIC_RUSAGE DYNAMIC LM_REMOVE_INTERVAL ENABLE_MINJOB_PREEMPTION NAME Required. Defines the token name—the name used by License Scheduler and LSF to identify the license feature.
lsf.licensescheduler Specify a License Scheduler service domain (described in the ServiceDomain section) that distributes the licenses. project_name Specify a License Scheduler project (described in the Projects section) that is allowed to use the licenses. number_shares Specify a positive integer representing the number of shares assigned to the project.
lsf.licensescheduler six licenses in total, Lp2 is entitled to all of them, and Lp1 can only use licenses when Lp2 does not need them. ALLOCATION Syntax ALLOCATION=project_name (cluster_name [number_shares] ... )] ... cluster_name Specify LSF cluster names that licenses are to be allocated to. project_name Specify a License Scheduler project (described in the PROJECTS section) that is allowed to use the licenses.
lsf.licensescheduler ... End Parameters Begin Feature NAME=ApplicationX DISTRIBUTION=LicenseServer1 (Lp1 1) End Feature Six licenses are allocated to each cluster. No licenses are allocated to interactive tasks. Example 2 ALLOCATION is not configured. The ENABLE_INTERACTIVE parameter is set. Begin Parameters ... ENABLE_INTERACTIVE=y ... End Parameters Begin Feature NAME=ApplicationX DISTRIBUTION=LicenseServer1 (Lp1 1) End Feature Four licenses are allocated to each cluster.
lsf.licensescheduler GROUP Syntax GROUP=[group_name(project_name... )] ... group_name Specify a name for a group of projects. project_name Specify a License Scheduler project (described in the PROJECTS section) that is allowed to use the licenses. The project must appear in the DISTRIBUTION. A project should only belong to one group. Description Optional. Defines groups of projects and specifies the name of each group.
lsf.licensescheduler LOCAL_TO Syntax LOCAL_TO=cluster_name | location_name(cluster_name [cluster_name ...]) Description Configures token locality for the license feature. You must configure different feature sections for same feature based on their locality. By default, If LOCAL_TO is not defined, the feature is available to all clients and is not restricted by geographical location.
lsf.licensescheduler SERVICE_DOMAINS = SD1 LOCAL_TO = siteUS(clusterA clusterB) End Feature Begin Feature NAME = hspice GROUP_DISTRIBUTION = group1 SERVICE_DOMAINS = SD2 LOCAL_TO = clusterA End Feature Begin Feature NAME = hspice GROUP_DISTRIBUTION = group1 SERVICE_DOMAINS = SD3 SD4 End Feature Default Not defined. The feature is available to all clusters and interactive jobs, and is not restricted by cluster.
lsf.licensescheduler NON_SHARED_DISTRIBUTION Syntax NON_SHARED_DISTRIBUTION=service_domain_name ([project_name number_non_shared_licenses] ... ) ... service_domain_name Specify a License Scheduler service domain (described in the ServiceDomain section) that distributes the licenses. project_name Specify a License Scheduler project (described in the Projects section) that is allowed to use the licenses.
lsf.licensescheduler Description Optional. With the flex grid interface integration installed, enables on-demand preemption of LSF jobs for important non-managed workload. This guarantees that important nonmanaged jobs do not fail because of lack of licenses. Default LSF workload is not preemtable PREEMPT_RESERVE Syntax PREEMPT_RESERVE=Y Description Optional. Enables License Scheduler to preempt either licenses that are reserved or already in use by other projects.
lsf.licensescheduler Optional. Specify a slash (/) and a positive integer representing the enforced number of licenses. non_lsf_distribution Specify the share of licenses dedicated to non-LSF workloads. The share of licenses dedicated to non-LSF workloads is a ratio of non_lsf_distribution:lsf_distribution. Description Optional. Defines the distribution given to each LSF and non-LSF workload within the specified service domain. Use blinfo -a to display WORKLOAD_DISTRIBUTION configuration.
lsf.licensescheduler DYNAMIC Syntax DYNAMIC=Y Description If you specify DYNAMIC=Y, you must specify a duration in an rusage resource requirement for the feature. This enables License Scheduler to treat the license as a dynamic resource and prevents License Scheduler from scheduling tokens for the feature when they are not available, or reserving license tokens when they should actually be free.
lsf.licensescheduler FeatureGroup section structure The FeatureGroup section begins and ends with the lines Begin FeatureGroup and End FeatureGroup. Feature group definition consists of a unique name and a list of features contained in the feature group. Example Begin FeatureGroup NAME = Synposys FEATURE_LIST = ASTRO VCS_Runtime_Net Hsim Hspice End FeatureGroup Begin FeatureGroup NAME = Cadence FEATURE_LIST = Encounter NCSim NCVerilog End FeatureGroup Parameters • • NAME FEATURE_LIST NAME Required.
lsf.licensescheduler (C (P6 P7 P8)) (1 1 1) () (D (P2 P3)) (1 1) () End ProjectGroup () () () () (8 3 0) (2 1) Parameters • • • • • • • GROUP SHARES OWNERSHIP LIMITS NON_SHARED PRIORITY DESCRIPTION GROUP Defines the project names in the hierarchical grouping and its relationships. Each entry specifies the name of the hierarchical group and its members. For better readability, you should specify the projects in the order from the root to the leaves as in the example.
lsf.licensescheduler A dash (-) is equivalent to INFINIT_INT, which means there is no maximum limit and the project group can use as many licenses as possible. You can leave the parentheses empty () if desired. NON_SHARED Defines the number of licenses that the hierarchical group member projects use exclusively. Specify the number of licenses for each group or project, separated by spaces, in the same order as listed in the GROUP column.
lsf.licensescheduler DESCRIPTION Optional. Description of the project group. The text can include any characters, including white space. The text can be extended to multiple lines by ending the preceding line with a backslash (\). The maximum length for the text is 64 characters. Use blinfo -G to view hierarchical project group description. Projects section Description Required. Lists the License Scheduler projects.
lsf.licensescheduler Priority of default project If not explicitly configured, the default project has the priority of 0. You can override this value by explicitly configuring the default project in Projects section with the chosen priority value. DESCRIPTION Optional. Description of the project. The text can include any characters, including white space. The text can be extended to multiple lines by ending the preceding line with a backslash (\). The maximum length for the text is 64 characters.
lsf.shared lsf.shared The lsf.shared file contains common definitions that are shared by all load sharing clusters defined by lsf.cluster.cluster_name files. This includes lists of cluster names, host types, host models, the special resources available, and external load indices, including indices required to submit jobs using JSDL files. This file is installed by default in the directory defined by LSF_CONFDIR. Changing lsf.shared configuration After making any changes to lsf.
lsf.shared Servers MultiCluster only. List of hosts in this cluster that LIMs in remote clusters can connect to and obtain information from. For other clusters to work with this cluster, one of these hosts must be running mbatchd. HostType section (Required) Lists the valid host types in the cluster. All hosts that can run the same binary executable are in the same host type. Caution: If you remove NTX86, NTX64, or NTIA64 from the HostType section, the functionality of lspasswd.exe is affected.
lsf.shared Example HostModel section Begin HostModel MODELNAME CPUFACTOR ARCHITECTURE PC400 13.0 (i86pc_400 i686_400) PC450 13.2 (i86pc_450 i686_450) Sparc5F 3.0 (SUNWSPARCstation5_170_sparc) Sparc20 4.7 (SUNWSPARCstation20_151_sparc) Ultra5S 10.3 (SUNWUltra5_270_sparcv9 SUNWUltra510_270_sparcv9) End HostModel ARCHITECTURE (Reserved for system use only) Indicates automatically detected host models that correspond to the model names.
lsf.
lsf.shared • • • Boolean—Resources that have a value of 1 on hosts that have the resource and 0 otherwise. Numeric—Resources that take numerical values, such as all the load indices, number of processors on a host, or host CPU factor. String— Resources that take string values, such as host type, host model, host status. Default If TYPE is not given, the default type is Boolean. INTERVAL Optional. Applies to dynamic resources only.
lsf.shared RELEASE Applies to numeric shared resources only, such as floating licenses. Controls whether LSF releases the resource when a job using the resource is suspended. When a job using a shared resource is suspended, the resource is held or released by the job depending on the configuration of this parameter. Specify N to hold the resource, or specify Y to release the resource.
lsf.sudoers lsf.sudoers Contents • • • • • • About lsf.sudoers lsf.sudoers on UNIX lsf.sudoers on Windows File format Creating and modifying lsf.sudoers Parameters About lsf.sudoers The lsf.sudoers file is an optional file to configure security mechanisms. It is not installed by default. You use lsf.sudoers to set the parameter LSF_EAUTH_KEY to configure a key for eauth to encrypt and decrypt user authentication data. On UNIX, you also use lsf.
lsf.sudoers lsf.sudoers on Windows Location The lsf.sudoers file is shared over an NTFS network, not duplicated on every Windows host. By default, LSF installs lsf.sudoers in the %SYSTEMROOT% directory. The location of lsf.sudoers on Windows must be specified by LSF_SECUREDIR in lsf.conf. You must configure the LSF_SECUREDIR parameter in lsf.conf if using lsf.sudoers on Windows. Permissions Restriction: The owner of lsf.sudoers on Windows be Administrators. If not, eauth may not work.
lsf.sudoers Creating and modifying lsf.sudoers You can create and modify lsf.sudoers with a text editor. After you modify lsf.sudoers, you must run badmin hrestart all to restart all sbatchds in the cluster with the updated configuration.
lsf.sudoers Specifies the key that eauth uses to encrypt and decrypt user authentication data. Defining this parameter enables increased security at your site. The key must contain at least six characters and must use only printable characters. For UNIX, you must edit the lsf.sudoers file on all hosts within the cluster and specify the same encryption key. For Windows, you must edit the shared lsf.sudoers file. Default Not defined.
lsf.sudoers Specify the Admin EGO cluster administrator password as clear text. You must also define the LSF_EGO_ADMIN_USER parameter. Default Not defined. With EGOSC daemon control enabled, the lsadmin and badmin startup subcommands invoke the egosh user logon command to prompt for the Admin EGO cluster administrator credentials.
lsf.sudoers Specifies the absolute path name of the directory in which the LSF daemon binary files (lim, res, sbatchd, and mbatchd) are installed. LSF daemons are usually installed in the path specified by LSF_SERVERDIR defined in the cshrc.lsf, profile.lsf or lsf.conf files. Important: For security reasons, you should move the LSF daemon binary files to a directory other than LSF_SERVERDIR or LSF_BINDIR. The user accounts specified by LSF_STARTUP_USERS can start any binary in the LSF_STARTUP_PATH.
lsf.
lsf.task lsf.task Users should not have to specify a resource requirement each time they submit a job. LSF supports the concept of a task list. This chapter describes the files used to configure task lists: lsf.task, lsf.task.cluster_name, and .lsftask.
lsf.task LSF commands to find out resource names available in your system, and tell LSF about the needs of your applications. LSF stores the resource requirements for you from then on. You can specify resource requirements when tasks are added to the user's remote task list. If the task to be added is already in the list, its resource requirements are replaced. lsrtasks + myjob/swap>=100 && cpu This adds myjob to the remote tasks list with its resource requirements.
lsf.task uname crontab End LocalTasks Begin RemoteTasks + "newjob/mem>25" + "verilog/select[type==any && swp>100]" make/cpu nroff/End RemoteTasks Tasks are listed one per line. Each line in a section consists of a task name, and, for the RemoteTasks section, an optional resource requirement string separated by a slash (/). A plus sign (+) or a minus sign (-) can optionally precede each entry. If no + or - is specified, + is assumed.
setup.config setup.config About setup.config The setup.config file contains options for Platform LSF License Scheduler installation and configuration for systems without Platform LSF. You only need to edit this file if you are installing License Scheduler as a standalone product without LSF. Template location A template setup.config is included in the License Scheduler installation script tar file and is located in the directory created when you uncompress and extract the installation script tar file.
setup.config Caution: You should not configure the root account as the primary License Scheduler administrator. Valid Values User accounts for License Scheduler administrators must exist on all hosts using License Scheduler prior to installation. Example LS_ADMINS="lsadmin user1 user2" Default The user running the License Scheduler installation script. LS_HOSTS Syntax LS_HOSTS="host_name [host_name ... ]" Description Defines a list of hosts that are candidates to become License Scheduler master hosts.
setup.config Default $LS_TOP/conf/license.dat LS_LMSTAT_PATH Syntax LS_LMSTAT_PATH="/path" Description Defines the full path to the lmstat program. License Scheduler uses lmstat to gather the FLEXlm license information for scheduling. This path does not include the name of the lmstat program itself. Example LS_LMSTAT_PATH="/usr/bin" Default The installation script attempts to find a working copy of lmstat on the current system. If it is unsuccessful, the path is set as blank ("").
slave.config slave.config About slave.config Dynamically added LSF hosts that will not be master candidates are slave hosts. Each dynamic slave host has its own LSF binaries and local lsf.conf and shell environment scripts (cshrc.lsf and profile.lsf). You must install LSF on each slave host. The slave.config file contains options for installing and configuring a slave host that can be dynamically added or removed. Use lsfinstall -s -f slave.config to install LSF using the options specified in slave.config.
slave.config Description Enables Platform EGO to control LSF res and sbatchd. Set the value to "Y" if you want EGO Service Controller to start res and sbatchd, and restart if they fail. All hosts in the cluster must use the same value for this parameter (this means the value of EGO_DAEMON_CONTROL in this file must be the same as the specification for EGO_DAEMON_CONTROL in install.config). To avoid conflicts, leave this parameter undefined if you use a script to start up LSF daemons.
slave.config Description Enables backup and rollback for enhancement packs. Set the value to "N" to disable backups when installing enhancement packs (you will not be able to roll back to the previous patch level after installing an EP, but you will still be able to roll back any fixes installed on the new EP). You may disable backups to speed up install time, to save disk space, or because you have your own methods to back up the cluster.
slave.config Default 7869 LSF_SERVER_HOSTS Syntax LSF_SERVER_HOSTS="host_name [ host_name ...]" Description Required for non-shared slave host installation. This parameter defines a list of hosts that can provide host and load information to client hosts. If you do not define this parameter, clients will contact the master LIM for host and load information. List of LSF server hosts in the cluster to be contacted. Recommended for large clusters to decrease the load on the master LIM.
slave.config Description Full path to the directory containing the LSF distribution tar files. Example LSF_TARDIR="/usr/local/lsf_distrib" Default The parent directory of the current working directory. For example, if lsfinstall is running under usr/share/lsf_distrib/lsf_lsfinstall the LSF_TARDIR default value is usr/share/lsf_distrib. LSF_LOCAL_RESOURCES Syntax LSF_LOCAL_RESOURCES="resource ..." Description Defines instances of local resources residing on the slave host.
slave.config Default None LSF_TOP Syntax LSF_TOP="/path" Description Required. Full path to the top-level LSF installation directory. Important: You must use the same path for every slave host you install. Valid value The path to LSF_TOP cannot be the root directory (/).
P A R T III Environment Variables Platform LSF Configuration Reference 553
Environment Variables 554 Platform LSF Configuration Reference
Environment variables Environment variables Contents • • Environment Variables Set for Job Execution on page 585 Environment Variable Reference on page 586 Environment variables set for job execution LSF transfers most environment variables between submission and execution hosts. Environment variables related to file names and job spooling directories support paths that contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows.
Environment variables LS_EXEC_T LS_JOBPID LS_LICENSE_SERVER_ feature LS_SUBCWD LSB_CHKPNT_DIR LSB_DEBUG LSB_DEBUG_CMD LSB_DEBUG_MBD LSB_DEBUG_NQS LSB_DEBUG_SBD LSB_DEBUG_SCH LSB_DEFAULT_JOBGROUP LSB_DEFAULTPROJECT LSB_DEFAULTQUEUE LSB_DJOB_COMMFAIL_ ACTION LSB_DJOB_ENV_SCRIPT LSB_ECHKPNT_METHOD LSB_ECHKPNT_METHOD_DI R LSB_ECHKPNT_KEEP_OUTPUT LSB_ERESTART_USRCMD LSB_EXEC_RUSAGE LSB_EXECHOSTS LSB_EXIT_IF_CWD_NOTEXIST LSB_EXIT_PRE_ABORT LSB_EXIT_REQUEUE LSB_FRAMES LSB_HOSTS LSB_INTE
Environment variables LSF_NIOS_DIE_CMD LSF_NIOS_IGNORE_SIGWINDO W LSF_NIOS_PEND_TIMEOUT LSF_NIOS_PORT_RANGE LSF_RESOURCES LSF_TS_LOGON_TIME LSF_USE_HOSTEQUIV LSF_USER_DOMAIN BSUB_BLOCK Description If set, tells NIOS that it is running in batch mode. Default Not defined Notes If you submit a job with the -K option of bsub, which is synchronous execution, then BSUB_BLOCK is set. Synchronous execution means you have to wait for the job to finish before you can continue.
Environment variables BSUB_QUIET2 Syntax BSUB_QUIET2=any_value Description Suppresses the printing of information about job completion when a job is submitted with the bsub -K option. If set, bsub will not print information about job completion to stdout. For example, when this variable is set, the message <> will not be written to stdout. If BSUB_QUIET and BSUB_QUIET2 are both set, no job messages will be printed to stdout.
Environment variables CLEARCASE_DRIVE Syntax CLEARCASE_DRIVE=drive_letter: Description Optional, Windows only. Defines the virtual drive letter for a Rational ClearCase view to the drive. This is useful if you wish to map a Rational ClearCase view to a virtual drive as an alias. If this letter is unavailable, Windows attempts to map to another drive. Therefore, CLEARCASE_DRIVE only defines the default drive letter to which the Rational ClearCase view is mapped, not the final selected drive letter.
Environment variables Where defined From the command line Example CLEARCASE_MOUNTDIR=/myvobs See also CLEARCASE_DRIVE, CLEARCASE_ROOT CLEARCASE_ROOT Syntax CLEARCASE_ROOT=path Description The path to the Rational ClearCase view. In Windows, this path must define an absolute path starting with the default ClearCase drive and ending with the view name without an ending backslash (\). Notes CLEARCASE_ROOT must be defined if you want to submit a batch job from a ClearCase view.
Environment variables When the MELIM finds an elim that exited with ELIM_ABORT_VALUE, the MELIM marks the elim and does not restart it on that host. Where defined Set by the master elim (MELIM) on the host when the MELIM invokes the elim executable LM_LICENSE_FILE Syntax LM_LICENSE_FILE=file_name Description The path to where the license file is found. The file name is the name of the license file. Default /usr/share/flexlm/licenses/license.dat Notes A FLEXlm variable read by the lmgrd daemon.
Environment variables Where defined During job execution, sbatchd sets LS_JOBPID to be the same as the process ID assigned by the operating system. LS_LICENSE_SERVER_feature Syntax LS_LICENSE_SERVER_feature="domain:server:num_available ..." server is of the format port@host Description The license server information provided to the job. The purpose of this environment variable is to provide license server information to the job.
Environment variables Valid values The value of checkpoint_dir is the directory you specified through the -k option of bsub when submitting the checkpointable job. The value of job_ID is the job ID of the checkpointable job. Where defined Set by LSF, based on the directory you specified when submitting a checkpointable job with the -k option of bsub. LSB_DEBUG This parameter can be set from the command line or from lsf.conf. See LSB_DEBUG in lsf.conf.
Environment variables DEFAULT_JOBGROUP in lsb.params. The bsub -g job_group_name option overrides both LSB_DEFAULT_JOBGROUP and DEFAULT_JOBGROUP. If you submit a job without the -g option of bsub, but you defined LSB_DEFAULT_JOBGROUP, then the job belongs to the job group specified in LSB_DEFAULT_JOBGROUP. Job group names must follow this format: • • • • • • • Job group names must start with a slash character (/).
Environment variables Notes If the LSF administrator defines a default project in the lsb.params configuration file, the system uses this as the default project. You can change the default project by setting LSB_DEFAULTPROJECT or by specifying a project name with the -P option of bsub. If you submit a job without the -P option of bsub, but you defined LSB_DEFAULTPROJECT, then the job belongs to the project specified in LSB_DEFAULTPROJECT.
Environment variables LSB_ECHKPNT_METHOD_DIR This parameter can be set as an environment variable and/or in lsf.conf. See LSB_ECHKPNT_METHOD_DIR in lsf.conf. LSB_ECHKPNT_KEEP_OUTPUT This parameter can be set as an environment variable and/or in lsf.conf. See LSB_ECHKPNT_KEEP_OUTPUT in lsf.conf. LSB_ERESTART_USRCMD Syntax LSB_ERESTART_USRCMD=command Description Original command used to start the job.
Environment variables Default Not defined Where defined Set by LSF after reserving a resource for the job. LSB_EXECHOSTS Description A list of hosts on which a batch job will run. Where defined Set by sbatchd Product MultiCluster LSB_EXIT_IF_CWD_NOTEXIST Syntax LSB_EXIT_IF_CWD_NOTEXIST=Y | y | N | n Description Indicates that the job will exit if the current working directory specified by bsub -cwd or bmod -cwd is not accessible on the execution host.
Environment variables LSB_EXIT_REQUEUE Syntax LSB_EXIT_REQUEUE="exit_value1 exit_value2..." Description Contains a list of exit values found in the queue’s REQUEUE_EXIT_VALUES parameter defined in lsb.queues. Valid values Any positive integers Default Not defined Notes If LSB_EXIT_REQUEUE is defined, a job will be requeued if it exits with one of the specified values. LSB_EXIT_REQUEUE is not defined if the parameter REQUEUE_EXIT_VALUES is not defined.
Environment variables Notes When the job is running, LSB_FRAMES will be set to the relative frames with the format LSB_FRAMES=start_number,end_number,step. From the start_number, end_number, and step, the frame job can know how many frames it will process. Where defined Set by sbatchd Example LSB_FRAMES=10,20,1 LSB_HOSTS Syntax LSB_HOSTS="host_name..." Description A list of hosts selected by LSF to run the job.
Environment variables Where defined Set by sbatchd LSB_JOB_INCLUDE_POSTPROC Syntax LSB_JOB_INCLUDE_POSTPROC=Y | y | N | n Description Enables the post-execution processing of the job to be included as part of the job. LSB_JOB_INCLUDE_POSTPROC in the user environment overrides the value of JOB_INCLUDE_POSTPROC in lsb.params and lsb.applications.
Environment variables LSB_JOBEXIT_STAT Syntax LSB_JOBEXIT_STAT=exit_status Description Indicates a job’s exit status. Applies to post-execution commands. Post-execution commands are set with POST_EXEC in lsb.queues. When the post-execution command is run, the environment variable LSB_JOBEXIT_STAT is set to the exit status of the job. Refer to the man page for the wait(2) command for the format of this exit status.
Environment variables Where defined Set during job execution based on bsub options or the default job group defined in DEFAULT_JOBGROUP in lsb.params and the LSB_DEFAULT_JOBGROUP environment variable. Default Not defined LSB_JOBID Syntax LSB_JOBID=job_ID Description The job ID assigned by sbatchd. This is the ID of the job assigned by LSF, as shown by bjobs.
Environment variables Example You can use LSB_JOBINDEX in a shell script to select the job command to be performed based on the job array index. For example: if [$LSB_JOBINDEX -eq 1]; then cmd1 fi if [$LSB_JOBINDEX -eq 2]; then cmd2 fi See also LSB_JOBINDEX_STEP, LSB_REMOTEINDEX LSB_JOBINDEX_STEP Syntax LSB_JOBINDEX_STEP=step Description Step at which single elements of the job array are defined.
Environment variables Description The name of the job defined by the user at submission time. Default The job’s command line Notes The name of a job can be specified explicitly when you submit a job. The name does not have to be unique. If you do not specify a job name, the job name defaults to the actual batch command as specified on the bsub command line. The job name can be up to 4094 characters long for UNIX and Linux or up to 255 characters for Windows.
Environment variables LSB_JOBPIDS Description A list of the current process IDs of the job. Where defined The process IDs are assigned by the operating system, and LSB_JOBPIDS is set by sbatchd. See also LSB_JOBPGIDS LSB_MAILSIZE Syntax LSB_MAILSIZE=value Description Gives an estimate of the size of the batch job output when the output is sent by email. It is not necessary to configure LSB_MAILSIZE_LIMIT.
Environment variables LSB_MCPU_HOSTS Syntax LSB_MCPU_HOSTS="host_nameA num_processors1 host_nameB num_processors2..." Description Contains a list of the hosts and the number of CPUs used to run a job. Valid values num_processors1, num_processors2,... refer to the number of CPUs used on host_nameA, host_nameB,..., respectively Default Not defined Notes The environment variables LSB_HOSTS and LSB_MCPU_HOSTS both contain the same information, but the information is presented in different formats.
Environment variables LSB_NTRIES Syntax LSB_NTRIES=integer Description The number of times that LSF libraries attempt to contact mbatchd or perform a concurrent jobs query. For example, if this parameter is not defined, when you type bjobs, LSF keeps displaying "batch system not responding" if mbatchd cannot be contacted or if the number of pending jobs exceeds MAX_PEND_JOBS specified in lsb.params or lsb.users.
Environment variables Description Indicates that LSF cannot access the output file specified for a job submitted the bsub -o option. Valid values Set to Y if the output file cannot be accessed; otherwise, it is not defined. Where defined Set by sbatchd during job execution LSB_DJOB_COMMFAIL_ACTION Syntax LSB_DJOB_COMMFAIL_ACTION="KILL_TASKS" Description Defines the action LSF should take if it detects a communication failure with one or more remote parallel or distributed tasks.
Environment variables If a full path is specified, LSF will use the path name for the execution. Otherwise, LSF will look for the executable from $LSF_BINDIR. Where defined Set by the system to the value of the parameter DJOB_ENV_SCRIPT in lsb.applications when running bsub -app for the specified application See also DJOB_ENV_SCRIPT in lsb.applications LSB_QUEUE Syntax LSB_QUEUE=queue_name Description The name of the queue from which the job is dispatched.
Environment variables Description The job ID of a remote MultiCluster job. Where defined Set by sbatchd, defined by mbatchd See also LSB_JOBID LSB_RESTART Syntax LSB_RESTART=Y Description Indicates that a job has been restarted or migrated. Valid values Set to Y if the job has been restarted or migrated; otherwise, it is not defined. Notes If a batch job is submitted with the -r option of bsub, and is restarted because of host failure, then LSB_RESTART is set to Y.
Environment variables Where defined Set during restart of a checkpointed job. See also LSB_RESTART_PID, LSB_RESTART LSB_RESTART_PID Syntax LSB_RESTART_PID=pid Description The process ID of the checkpointed job when the job is restarted. Notes When a checkpointed job is restarted, the operating system assigns a new process ID to the job. LSF sets LSB_RESTART_PID to the new process ID.
Environment variables Where defined Set by the system based on the value of the parameter RTASK_GONE_ACTION in lsb.applications when running bsub -app for the specified application See also RTASK_GONE_ACTION in lsb.applications LSB_SUB_APP_NAME Description Application profile name specified by bsub -app. Where defined Set by esub before a job is dispatched.
Environment variables LSB_SUB_JOB_ACTION_WARNING_TIME Description Value of job warning time period specified by bsub -wt. Where defined Set by esub before a job is submitted. LSB_SUB_JOB_WARNING_ACTION Description Value of job warning action specified by bsub -wa. Where defined Set by esub before a job is submitted. LSB_SUB_PARM_FILE Syntax LSB_SUB_PARM_FILE=file_name Description Points to a temporary file that LSF uses to store the bsub options entered in the command line.
Environment variables LSB_SUSP_REASONS Syntax LSB_SUSP_REASONS=integer Description An integer representing suspend reasons. Suspend reasons are defined in lsbatch.h. This parameter is set when a job goes to system-suspended (SSUSP) or user-suspended status (USUSP). It indicates the exact reason why the job was suspended. To determine the exact reason, you can test the value of LSB_SUSP_REASONS against the symbols defined in lsbatch.h.
Environment variables Load Index Value IT 7 TMP 8 SWP 9 MEM 10 Default Not defined Where defined Set during job execution See also LSB_SUSP_REASONS LSB_UNIXGROUP Description Specifies the UNIX user group of the submitting user. Notes This variable is useful if you want pre- or post-execution processing to use the user group of the user who submitted the job, and not sys(1). Where defined Set during job execution LSF_CMD_LOGDIR This parameter can be set from the command line or from lsf.
Environment variables See LSF_DEBUG_RES in lsf.conf. LSF_EAUTH_AUX_DATA Syntax LSF_EAUTH_AUX_DATA=path/file_name Description Used in conjunction with LSF daemon authentication, specifies the full path to the temporary file on the local file system that stores auxiliary authentication information (such as credentials required by a remote host for use during job execution). Provides a way for eauth -c, mbatchd, and sbatchd to communicate the location of auxiliary authentication data.
Environment variables LSF_EAUTH_UID Syntax LSF_EAUTH_UID=user_ID Description Specifies the user account under which eauth -s runs. Set by the LSF daemon that executes eauth. Set by the LSF daemon that executes eauth. LSF_EXECUTE_DOMAIN Syntax LSF_EXECUTE_DOMAIN=domain_namesetenv LSF_EXECUTE_DOMAIN domain_name Description If UNIX/Windows user account mapping is enabled, specifies the preferred Windows execution domain for a job submitted by a UNIX user.
Environment variables Where defined Set internally within by LSF LSF_JOB_STARTER Syntax LSF_JOB_STARTER=binary Description Specifies an executable program that has the actual job as an argument. Default Not defined Notes Interactive Jobs If you want to run an interactive job that requires some preliminary setup, LSF provides a job starter function at the command level.
Environment variables Windows RES runs the job starter, passing it your commands as arguments: LSF_JOB_STARTER command [argument...] If you define LSF_JOB_STARTER as follows: set LSF_JOB_STARTER=C:\cmd.exe /C and run a simple DOS shell job: C:\> lsrun dir /p then the following will be invoked to correctly start the job: C:\cmd.exe /C dir /p See also JOB_STARTER in lsb.queues LSF_LD_LIBRARY_PATH Description When LSF_LD_SECURITY=Y in lsf.
Environment variables LSF_LIM_API_NTRIES Syntax LSF_LIM_API_NTRIES=integer Description Defines the number of times LSF commands will retry to communicate with the LIM API when LIM is not available. LSF_LIM_API_NTRIES is ignored by LSF and EGO daemons and EGO commands. The LSF_LIM_API_NTRIES environment variable. overrides the value of LSF_LIM_API_NTRIES in lsf.conf. Valid values 1 to 65535 Where defined From the command line or from lsf.conf Default Not defined. If not defined in lsf.conf.
Environment variables LSF_NIOS_DEBUG This parameter can be set from the command line or from lsf.conf. See LSF_NIOS_DEBUG in lsf.conf. LSF_NIOS_DIE_CMD Syntax LSF_NIOS_DIE_CMD=command Description If set, the command defined by LSF_NIOS_DIE_CMD is executed before NIOS exits. Default Not defined Where defined From the command line LSF_NIOS_IGNORE_SIGWINDOW Syntax LSF_NIOS_IGNORE_SIGWINDOW=any_value Description If defined, the NIOS will ignore the SIGWINDOW signal.
Environment variables Maximum amount of time that an interactive batch job can remain pending. If this parameter is defined, and an interactive batch job is pending for longer than the specified time, the interactive batch job is terminated. Valid values Any integer greater than zero Default Not defined LSF_NIOS_PORT_RANGE Syntax LSF_NIOS_PORT_RANGE=min_port_number-max_port_number Description Defines a range of listening ports for NIOS to use. Example LSF_NIOS_PORT_RANGE=5000-6000 Default Not defined.
Environment variables LSF_TS_LOGON_TIME Syntax LSF_TS_LOGON_TIME=milliseconds Description Specifies the time to create a Windows Terminal Service session. Configure LSF_TS_LOGON_TIME according to the load on your network enviroment. The default, 30000 milliseconds, is suitable for most environments. If you set LSF_TS_LOGON_TIME too small, the LSF tries multiple times before it succeeds in making a TS session with the TS server, which can cause the job wait a long time before it runs.
Environment variables LSF_USER_DOMAIN Syntax LSF_USER_DOMAIN=domain_name | . Description Set during LSF installation or setup. If you modify this parameter in an existing cluster, you probably have to modify passwords and configuration files also. Windows or mixed UNIX-Windows clusters only. Enables default user mapping, and specifies the LSF user domain. The period (.) specifies local accounts, not domain accounts.
P A R T IV Troubleshooting Platform LSF Configuration Reference 595
Troubleshooting 596 Platform LSF Configuration Reference
Troubleshooting and error messages Troubleshooting and error messages Shared file access A frequent problem is non-accessible files due to a non-uniform file space. If a task is run on a remote host where a file it requires cannot be accessed using the same name, an error results. Almost all interactive LSF commands fail if the user’s current working directory cannot be found on the remote host. Shared files on UNIX If you are running NFS, rearranging the NFS mount table may solve the problem.
Troubleshooting and error messages If the LIM has just been started, this is normal, because the LIM needs time to get initialized by reading configuration files and contacting other LIMs. If the LIM does not become available within one or two minutes, check the LIM error log for the host you are working on. To prevent communication timeouts when starting or restarting the local LIM, define the parameter LSF_SERVER_HOSTS in the lsf.conf file.
Troubleshooting and error messages User permission denied If remote execution fails with the following error message, the remote host could not securely determine the user ID of the user requesting remote execution. User permission denied. Check the RES error log on the remote host; this usually contains a more detailed error message. If you are not using an identification daemon (LSF_AUTH is not defined in the lsf.
Troubleshooting and error messages support if you are running automount and LSF is not able to locate directories on remote hosts. Batch daemons die quietly First, check the sbatchd and mbatchd error logs. Try running the following command to check the configuration. badmin ckconfig This reports most errors. You should also check if there is any email from LSF in the LSF administrator’s mailbox.
Troubleshooting and error messages Error messages The following error messages are logged by the LSF daemons, or displayed by the following commands. lsadmin ckconfig badmin ckconfig General errors The messages listed in this section may be generated by any LSF daemon. can’t open file: error The daemon could not open the named file for the reason given by error. This error is usually caused by incorrect file permissions or missing files.
Troubleshooting and error messages The service request claimed to come from user claimed_user but ident authentication returned that the user was actually actual_user. The request was not serviced. userok: ruserok(,) failed LSF_USE_HOSTEQUIV=Y is defined in the lsf.conf file, but host has not been set up as an equivalent host (see /etc/host.equiv), and user uid has not set up a .rhosts file.
Troubleshooting and error messages The HostModel, Resource, or HostType section in the lsf.shared file is either missing or contains an unrecoverable error. file(line): Name name reserved or previously defined. Ignoring index The name assigned to an external load index must not be the same as any built-in or previously defined resource or load index. file(line): Duplicate clustername name in section cluster. Ignoring current line A cluster name is defined twice in the same lsf.shared file.
Troubleshooting and error messages function: Gethostbyaddr_(/) failed: error main: Request from unknown host /: error function: Received request from non-LSF host / The daemon does not recognize host as a Platform LSF host. The request is not serviced. These messages can occur if host was added to the configuration files, but not all the daemons have been reconfigured to read the new information.
Troubleshooting and error messages RES assumes that a user has the same userID and username on all the LSF hosts. These messages occur if this assumption is violated. If the user is allowed to use LSF for interactive remote execution, make sure the user’s account has the same user ID and user name on all LSF hosts. doacceptconn: root remote execution permission denied authRequest: root job submission rejected Root tried to execute or submit a job but LSF_ROOT_REX is not defined in the lsf.conf file.
Troubleshooting and error messages releaseElogLock: unlink() failed: error touchElogLock: Failed to open lock file : error touchElogLock: close failed: error mbatchd failed to create, remove, read, or write the log directory or a file in the log directory, for the reason given in error. Check that LSF administrator has read, write, and execute permissions on the logdir directory.
Troubleshooting and error messages Batch command client messages LSF displays error messages when a batch command cannot communicate with mbatchd. The following table provides a list of possible error reasons and the associated error message output. Point of failure Possible reason Error message output Establishing a mbatchd is too busy to accept new LSF is processing your request. Please connection with mbatchd connections. The connect() system call times wait… out.
Troubleshooting and error messages the name of the fields being read when parsing failed. Examples Errors like the following are logged to the mbatchd log file when mbatchd fails to read lsb.events: Dec 28 14:25:30 2007 9861 3 7.02 init_log: Reading event file
Understanding Platform LSF job exit information Understanding Platform LSF job exit information Contents • • • • • • Why did my job exit? How LSF translates events into exit codes Application and system exit values LSF job termination reason logging Job termination by LSF exit information LSF RMS integration exit values Why did my job exit? LSF collects job information and reports the final status of a job.
Understanding Platform LSF job exit information that were running when the systems went down are assumed to have exited, and email is sent to the submitting user. Pending jobs remain in their queues, and are scheduled as hosts become available. Exited jobs A job might terminate abnormally for various reasons. Job termination can happen from any state. An abnormally terminated job goes into EXIT state.
Understanding Platform LSF job exit information Termination signals are operating system dependent, so signal 5 may not be SIGTRAP and 11 may not be SIGSEGV on all UNIX and Linux systems. You need to pay attention to the execution host type in order to correct translate the exit value if the job has been signaled. bhist and bjobs output In most cases, bjobs and bhist show the application exit value (128 + signal). In some cases, bjobs and bhist show the actual signal value.
Understanding Platform LSF job exit information 20 0 151 0 0 0 171 Here we see that LSF itself sent the signal to terminate the job, and the job exits 130 (130-128 = 2 = SIGINT). When a job finishes, LSF reports the last job termination action it took against the job and logs it into lsb.acct. If a running job exits because of node failure, LSF sets the correct exit information in lsb.acct, lsb.events, and the job output file. View logged job exit information (bacct -l) 1.
Understanding Platform LSF job exit information Keyword displayed by bacct Termination reason Integer value logged to JOB_FINISH in lsb.
Understanding Platform LSF job exit information • • LSF cannot be guaranteed to catch any external signals sent directly to the job. In MultiCluster, a brequeue request sent from the submission cluster is translated to TERM_OWNER or TERM_ADMIN in the remote execution cluster. The termination reason in the email notification sent from the execution cluster as well as that in the lsb.acct is set to TERM_OWNER or TERM_ADMIN.
Understanding Platform LSF job exit information Example termination cause Termination reason in bacct –l Example bhist output bchkpnt -k On the first run: Wed Apr 16 16:00:48: Checkpoint succeeded (actpid 931249); Completed ; Wed Apr 16 16:01:03: Exited with exit code 137. The CPU time used is 0.0 seconds; TERM_CHKPNT Kill –9 and job Completed ; TERM_EXTERNAL_SIGNAL Thu Mar 13 17:30:43: Exited by signal 15. The CPU time used is 0.
Understanding Platform LSF job exit information Example termination cause LSB_JOBEXIT_ STAT LSB_JOBEXIT_INFO Example bhist output Automatic migration when MIG is defined at queue level 33280 SIGNAL -1 SIG_CHKPNT Fri Feb 14 17:32:17: Job has been requeued; Fri Feb 14 17:32:17: Pending: Migrating job is waiting for rescheduling; bsub –I "hostname;exit 130" 33280 Undefined Fri Feb 14 14:41:51: Exited with exit code 130. The CPU time used is 0.
Understanding Platform LSF job exit information Example termination cause LSB_JOBEXIT_ STAT LSB_JOBEXIT_INFO Example bhist output Job killed when reaches 2 the MEMLIMIT bsub -M 5 "/home/iayaz/script/ memwrite -m 10 -r 2" SIGNAL -25 SIG_TERM_MEMLIMIT Fri Feb 21 10:50:50: Exited by signal 2. The CPU time used is 0.1 seconds; Job killed when 37120 termination time approaches bsub -t 21:11:10 sleep 500;date Undefined Exited with exit code 145. The CPU time used is 0.
Understanding Platform LSF job exit information 618 Platform LSF Configuration Reference
Index .lsftask file 541 .rhosts file 599 /etc/hosts file 599 /etc/hosts.equiv file 599 /tmp_mnt directory 597 A abnormal job termination 610 ABS_RUNLIMIT lsb.params file 159, 235 ACCT_ARCHIVE_AGE lsb.params file 236 ACCT_ARCHIVE_SIZE lsb.params file 236 ACCT_ARCHIVE_TIME lsb.params file 237 ADJUST_DURATION lsf.cluster file 369 ADMIN lsf.licensescheduler file Parameters section 503 ADMINISTRATORS lsb.queues file 282 lsf.cluster file 379 ALLOCATION lsf.
description 90 CHKPNT lsb.hosts file 221 lsb.queues file 283 CHKPNT_DIR lsb.applications file 160 CHKPNT_INITPERIOD lsb.applications file 161 CHKPNT_METHOD lsb.applications file 162 CHKPNT_PERIOD lsb.applications file 161 chunk jobs CHKPNT parameter in lsb.queues 283 MIG parameter in lsb.queues 172, 304 rerunnable 178, 313 CHUNK_JOB_DURATION lsb.params file 237 CHUNK_JOB_SIZE lsb.applications file 162 lsb.queues file 284 CLEAN_PERIOD lsb.
DEFAULT_EXTSCHED lsb.queues file 286 DEFAULT_HOST_SPEC lsb.params file 240 lsb.queues file 287 DEFAULT_JOBGROUP lsb.params file 240 DEFAULT_PROJECT lsb.params file 241 DEFAULT_QUEUE lsb.params file 241 DEFAULT_SLA_VELOCITY lsb.params file 242 DESCRIPTION lsb.applications file 165 lsb.queues file 287 lsb.serviceclasses file 353 lsf.licensescheduler file Project section 527 lsf.licensescheduler file ProjectGroup section 526 lsf.shared file 532 DETECT_IDLE_JOB_AFTER lsb.params file 242 DIRECTION lsb.
EGO_LOCAL_CONFDIR cshrc.lsf and profile.lsf files 132 EGO_LOCAL_RESOURCES parameter in ego.conf 472 EGO_LOG_MASK parameter in ego.conf 445, 473 EGO_MASTER_LIST parameter in ego.conf 478 EGO_PERF_CONTROL install.config file 139 EGO_PIM_INFODIR parameter in ego.conf 484 EGO_PIM_SLEEPTIME parameter in ego.conf 484 EGO_PIM_SLEEPTIME_UPDATE parameter in ego.conf 485 EGO_PMC_CONTROL install.config file 139 EGO_RES_REQ bsla 353 lsb.serviceclasses file 353 EGO_RSH parameter in ego.conf 490 EGO_SERVERDIR cshrc.
LSB_ERESTART_USRCMD 566 LSB_EXEC_RUSAGE 566 LSB_EXECHOSTS 567 LSB_EXIT_IF_CWD_NOTEXIST 567 LSB_EXIT_PRE_ABORT 567 LSB_EXIT_REQUEUE 568 LSB_FRAMES 568 LSB_HOSTS 569 LSB_INTERACTIVE 569 LSB_JOB_INCLUDE_POSTPROC 570 LSB_JOBEXIT_INFO 570 LSB_JOBEXIT_STAT 571 LSB_JOBFILENAME 571 LSB_JOBGROUP 571 LSB_JOBID 572 LSB_JOBINDEX 572 LSB_JOBINDEX_STEP 573 LSB_JOBNAME 573 LSB_JOBPEND 574 LSB_JOBPGIDS 574 LSB_JOBPIDS 575 LSB_MAILSIZE 575 LSB_MCPU_HOSTS 576 LSB_NQS_PORT 576 LSB_NTRIES 577 LSB_OLD_JOBID 577 LSB_OUTPUT_TARGE
event recort format errors 607 EVENT_ADRSV_FINISH record lsb.acct 156 EVENT_STREAM_FILE lsb.params file 245 EVENT_UPDATE_INTERVAL lsb.params file 245 EXCLUSIVE lsb.queues file 288 exclusive resource 382 EXINTERVAL lsf.cluster file 370 EXIT job state abnormal job termination 610 EXIT_RATE lsb.hosts file 221 EXIT_RATE_TYPE lsb.params file 246 EXT_FILTER_PORT lsf.
lsb.queues file 289 Feature section lsf.licensescheduler file description 511 FeatureGroup lsf.licensescheduler 522 FILELIMIT lsb.applications file 168 lsb.queues file 290 files adding default system lists 542 removing default system lists 542 viewing task lists 542 FLEX_NAME lsf.licensescheduler file Feature section 512 FLOAT_CLIENTS lsf.cluster file 370 FLOAT_CLIENTS_ADDR_RANGE lsf.cluster file 371 FLX_LICENSE_FILE lsf.licensescheduler file Parameters section 505 G GLOBAL_EXIT_RATE lsb.
lsf.shared file 532 install.config file description 137 INTERACTIVE lsb.queues file 294 INTERRRUPTIBLE_BACKFILL lsb.queues file 294 INTERVAL lsf.shared file 532 io lsb.hosts file 223 lsb.queues file 300 IPv6 dual-stack hosts 449 enable 454 example 134, 136 in FLOAT_CLIENTS_ADDR_RANGE 372 in LSF_HOST_ADDR_RANGE 376 loopback address 598 IRIX ULDB (User Limits Database) description 497 jlimit.in file 497 it lsb.hosts file 223 lsb.queues file 300 J JL/P lsb.users file 362 JL/U lsb.hosts file 222 jlimit.
configuring 71, 80 description 66 enabling 71, 80 scope 71 JOB_ACCEPT record lsb.events 194 JOB_ACCEPT_INTERVAL lsb.params file 247 lsb.queues file 295 JOB_ACTION_WARNING_TIME lsb.queues file 296 JOB_ATTA_DATA record lsb.events 211 JOB_ATTA_DIR lsb.params file 248 JOB_CHUNK record lsb.events 212 JOB_CLEAN record lsb.events 209 JOB_CONTROLS lsb.queues file 296 JOB_DEP_LAST_SUB lsb.params file 249 JOB_EXECUTE record lsb.events 208 JOB_EXIT_RATE_DURATION lsb.params file 249 JOB_EXT_MSG record lsb.
K Kerberos authentication configuration 24 configuration of 25 description 20 eauth user name configuration of 25 enabling 25 Kerberos daemon authentication enabling 25 non-Solaris 25 Solaris 25 kernel-level job checkpoint and restart description 91 L LIB_RECVTIMEOUT lsf.licensescheduler file Parameters section 505 LIC_COLLECTOR lsf.licensescheduler file ServiceDomain section 510 LIC_FLEX_API_ENABLE lsf.licensescheduler file ServiceDomain section 511 LIC_SERVERS lsf.
LSB_CHKPNT_DIR variable 562 LSB_CHUNK_RUSAGE lsf.conf file 399 LSB_CMD_LOG_MASK lsf.conf file 399 LSB_CMD_LOGDIR lsf.conf file 400 LSB_CONFDIR lsf.conf file 401 LSB_CPUSET_BESTCPUS lsf.conf file 401 LSB_CRDIR lsf.conf file 402 LSB_DEBUG lsf.conf file 402 variable 563 LSB_DEBUG_CMD lsf.conf file 403 variable 563 LSB_DEBUG_MBD lsf.conf file 404 variable 563 LSB_DEBUG_NQS lsf.conf file 405 variable 563 LSB_DEBUG_SBD lsf.conf file 406 variable 563 LSB_DEBUG_SCH lsf.
LSB_MAILSIZE_LIMIT lsf.conf file 417 LSB_MAILTO lsf.conf file 418 LSB_MAX_JOB_DISPATCH_PER_SESSION lsf.conf file 419 LSB_MAX_NQS_QUEUES lsf.conf file 420 LSB_MAX_PROBE_SBD lsf.conf file 419 LSB_MBD_BUSY_MSG lsf.conf file 420 LSB_MBD_CONNECT_FAIL_MSG lsf.conf file 421 LSB_MBD_DOWN_MSG lsf.conf file 421 LSB_MBD_MAX_SIG_COUNT 422 LSB_MBD_PORT lsf.conf file 422, 470 LSB_MC_CHKPNT_RERUN lsf.conf file 422 LSB_MC_INITFAIL_MAIL lsf.conf file 422 LSB_MC_INITFAIL_RETRY lsf.
LSB_SUB_EXTSCHED_PARAM variable 582 LSB_SUB_JOB_ACTION_WARNING_TIME variable 583 LSB_SUB_JOB_WARNING_ACTION variable 583 LSB_SUB_PARM_FILE variable 583 LSB_SUCCESS_EXIT_VALUES variable 583 LSB_SUSP_REASONS variable 584 LSB_SUSP_SUBREASONS variable 584 LSB_SYNC_HOST_STAT_LIM lsb.params 255 LSB_TIME_CMD lsf.conf file 434 LSB_TIME_MBD lsf.conf file 434 LSB_TIME_RESERVE_NUMJOBS lsf.conf file 435 LSB_TIME_SBD lsf.conf file 435 LSB_TIME_SCH lsf.conf file 435 LSB_UNIXGROUP variable 585 LSB_UTMP lsf.
cshrc.lsf and profile.lsf files 127 lsf.conf file 439 LSF_CLUSTER_NAME install.config file 143 LSF_CMD_LOG_MASK lsf.conf file 440 LSF_CMD_LOGDIR lsf.conf file 440 variable 585 LSF_CONF_RETRY_INT lsf.conf file 441 LSF_CONF_RETRY_MAX lsf.conf file 441 LSF_CONFDIR lsf.conf file 442 LSF_DAEMON_WRAP lsf.conf file 443 LSF_DAEMONS_CPUS lsb.params file 442 LSF_DAEMONS_CPUS parameter in ego.conf 443 LSF_DEBUG_CMD lsf.conf file 444 LSF_DEBUG_CMD variable 585 LSF_DEBUG_LIM lsf.
lsf.conf file 461 LSF_HPC_NCPU_INCREMENT lsf.conf file 460 LSF_HPC_NCPU_THRESHOLD lsf.conf file 461 LSF_HPC_PJL_LOADENV_TIMEOUT lsf.conf file 461 LSF_ID_PORT lsf.conf file 462 LSF_INCLUDEDIR lsf.conf file 462 LSF_INDEP lsf.conf file 462 LSF_INTERACTIVE_STDERR lsf.conf file 463 variable 587 LSF_INVOKE_CMD variable 587 LSF_JOB_STARTER variable 588 LSF_LD_LIBRARY_PATH variable 589 LSF_LD_SECURITY lsf.conf 464 LSF_LIBDIR cshrc.lsf and profile.lsf files 128 lsf.conf file 465 LSF_LIC_SCHED_HOSTS lsf.
lsf.conf file 482 LSF_PAM_HOSTLIST_USE lsf.conf file 482 LSF_PAM_PLUGINDIR lsf.conf file 483 LSF_PAM_USE_ASH lsf.conf file 483 LSF_PIM_INFODIR lsf.conf file 484 LSF_PIM_SLEEPTIME lsf.conf file 484 LSF_PIM_SLEEPTIME_UPDATE lsf.conf file 484 LSF_POE_TIMEOUT_BIND lsf.conf file 485 LSF_POE_TIMEOUT_SELECT lsf.conf file 485 LSF_QUIET_INST install.config file 145 LSF_RES_ACCT lsf.conf file 486 LSF_RES_ACCTDIR lsf.conf file 486 LSF_RES_ACTIVE_TIME lsf.conf file 487 LSF_RES_CONNECT_RETRY lsf.
EGO_DEFINE_NCPUS 395 LSB_LIMLOCK_EXCLUSIVE parameter 407 LSB_MBD_MAX_SIG_COUNT 422 LSF_AUTH parameter 438 LSF_USER_DOMAIN parameter 499 lsf.conf file 390 corresponding ego.conf parameters 390 lsf.licensescheduler file time-based configuration 527 lsf.shared file 528 lsf.sudoers file 534 lsf.task file 541 lsf.task.cluster file 541 M mail configuring on UNIX 416 MANDATORY_EXTSCHED lsb.queues file 301 MASTER_INACTIVITY_LIMIT lsf.cluster file 378 MAX_ACCT_ARCHIVE_FILE lsb.
MC_PENDING_REASON_UPDATE_INTERVAL lsb.params file 268 MC_PLUGIN_REMOTE_RESOURCE lsf.conf file 500 MC_RECLAIM_DELAY lsb.params file 268 MC_RUSAGE_UPDATE_INTERVAL lsb.params file 269 mem lsb.hosts file 223 lsb.queues file 300 MEM lsb.resources file HostExport section 343 lsb.resources file Limit section 330 MEMLIMIT lsb.applications file 170, 171 lsb.queues file 303 per parallel task 458 per-job limit 413 mesub definition 68 METHOD lsb.resources file ReservationUsage section 349 MIG lsb.hosts file 222 lsb.
Parameters section lsf.licensescheduler file description 502 PATCH_BACKUP_DIR install.config file 146 PATCH_HISTORY_DIR install.config file 147 PEND_REASON_MAX_JOBS 272 PEND_REASON_UPDATE_INTERVAL 273 PER_HOST lsb.resources file HostExport section 342 lsb.resources file Limit section 332 PER_PROJECT lsb.resources file Limit section 332 PER_QUEUE lsb.resources file Limit section 333 PER_USER lsb.resources file Limit section 334 PERF_HOST install.config file 147 pg lsb.hosts file 223 lsb.
preemptable queues definition 46 PREEMPTABLE_RESOURCES lsb.params file 275 preempted jobs control action 55 limit preemption retry 56 preemption jobs by run time 274 of parallel jobs 274 PREEMPTION lsb.queues file 308 PREEMPTION_WAIT_TIME lsb.params file 275 preemption.
R r15m lsb.hosts file 223 lsb.queues file 300 r15s lsb.hosts file 223 lsb.queues file 300 r1m lsb.hosts file 223 lsb.queues file 300 RB_PLUGIN lsb.modules file 233 RCVJOBS_FROM lsb.queues file 312 RECV_FROM lsf.cluster file 387 RELEASE lsf.shared file 533 REMOTE lsb.users file 363 remote task list 541 remote tasks in task files 543 REQUEUE_EXIT_VALUES lsb.applications file 177 lsb.queues file 312 RERUNNABLE lsb.applications file 178 lsb.queues file 313 RES_REQ lsb.applications file 178 lsb.
schmod_mc scheduler plugin 232 schmod_parallel scheduler plugin 231 schmod_preemption scheduler plugin 232 schmod_ps scheduler plugin 232 schmod_pset scheduler plugin 232 schmod_reserve scheduler plugin 231 security daemons increasing 493 sendmail program 416 server lsf.cluster file 383 Servers lsf.shared file 529 service class examples 357 SERVICE_DOMAINS lsf.licensescheduler file Feature section 520 ServiceDomain section lsf.licensescheduler file description 509 setuid permissions 599 setup.
threads setting cluster to 395 time windows syntax 347 TIME_WINDOW lsb.resources file ResourceReservation section 347 time-based configuration lsb.hosts 229 lsb.params 279 lsb.queues 323 lsb.resources 350 lsb.users 363 lsf.licensescheduler 527 tmp lsb.hosts file 223 lsb.queues file 300 TMP lsb.resources file Limit section 340 troubleshooting cluster performance 422 type lsf.cluster file 384 TYPE lsb.resources file HostExport section 344 lsf.shared file 531 TYPENAME lsf.shared file 529 U UJOB_LIMIT lsb.
lsb.resources file Limit section 340 lsb.resources file ResourceReservation section 348 lsb.serviceclasses file 356 ut lsb.hosts file 223 lsb.queues file 300 V variables. See environment variables W windows 642 Platform LSF Configuration Reference time 347 Windows workgroup account mapping 7 WORKLOAD_DISTRIBUTION lsf.licensescheduler file Feature section 520 X XLSF_APPDIR lsf.conf file 501 XLSF_UIDDIR cshrc.lsf and profile.lsf files 129 lsf.