Platform LSF Administrator's Primer Version 6.2
Chapter 6
Troubleshooting LSF Problems
Platform LSF Administrator’s Primer
61
If none of these applies to your situation, contact support@platform.com.
8 Application runs fine under UNIX or with lsrun, but fails or hangs when submitted through
bsub
On some UNIX systems, certain applications only run with specific limit values.
Different limit values or no limits can cause problems for these applications.
lsrun,
lsgrun and other interactive LSF commands copy the submission host environment,
including any limits, to the execution host and
res sets the submission environment on
the execution host.
LSF Batch works differently. Jobs run in a queue, and are subject to queue limits, not
submission host limits. By default, LSF Batch sets all limits to unlimited and only limits
values explicitly set in the queue. To see the limit settings for your queues, use the
command:
% bqueues -l -q queue_name
To troubleshoot this type of problem:
1
Run the application under UNIX to confirm that it works without LSF.
2
Create a small script like this:
#!/bin/sh
# display limits from command line
# Check the man pages for more limits that can be displayed
ulimit -Hc
ulimit -Hd
ulimit -Sc
ulimit -Sd
3
Run the script to determine the limits from the submission host under UNIX, and
record the limit values.
4
Create a new version of the script, setting the limit values reported by the original
script and running the command for your application. For example:
#!/bin/sh
# display limits from command line
# Check the man pages for more limits that can be displayed
ulimit -Hc 45333
ulimit -Hd 256
ulimit -Sc 24335
ulimit -Sd 256
<your application>
You can also set these limits in a queue in
LSB_CONFDIR/cluster_name/configdir/lsb.queues.
Once the limit values are set correctly, your application should run fine under LSF Batch.
If it still does not work, contact
support@platform.com.
9 LSF Batch job runs in /tmp or cannot find home directory
The problem could be caused by:
◆
Different home directories existing for the same user account on the submission
and execution hosts
◆
NFS automount problems cause LSF to look for incorrect mount points