Platform LSF Administrator's Primer Version 6.2
Common LSF Problems
Platform LSF Administrator’s Primer
54
Common LSF Problems
Contents
◆
“Finding LSF error logs” on page 54
◆
“For most LSF problems” on page 54
◆
“Top 10 LSF problems” on page 55
Finding LSF error logs
When something goes wrong, LSF server daemons log error messages in the LSF log
directory (LSF_LOGDIR).
Make sure that the primary LSF administrator owns LSF_LOGDIR, and that
root can
write to this directory. If an LSF server is unable to write to LSF_LOGDIR, then the
error logs are created in
/tmp.
LSF logs to the following files
◆
lim.log.host_name
◆
res.log.host_name
◆
pim.log.host_name
◆
mbatchd.log.master_host
◆
mbschd.log.master_host
◆
sbatchd.log.host_name
If there are any error messages in the log files that you do not understand, contact your
Platform system engineer or
support@platform.com.
For most LSF problems
The general troubleshooting steps for most LSF problems are:
1
Run lsadmin ckconfig -v and note any errors shown in the command output.
Look for the error in “Top 10 LSF problems” on page 55. If none of these applies
to your situation, contact
support@platform.com.
2
Use the following commands to restart the LSF cluster:
# lsadmin limrestart all
# lsadmin resrestart all
# badmin hrestart all
3
Run ps -ef to see if the LSF daemons are running.
Look for the processes similar to the following:
root 17426 1 0 13:30:40 ? 0:00 /usr/share/lsf/lsf_62/6.2/sparc-sol2/etc/lim
root 17436 1 0 13:31:11 ? 0:00 /usr/share/lsf/lsf_62/6.2/sparc-
sol2/etc/sbatchd
root 17429 1 0 13:30:56 ? 0:00 /usr/share/lsf/lsf_62/6.2/sparc-sol2/etc/res
4
Check the LSF error logs on the first few hosts listed in the Host section of
LSF_CONFDIR/lsf.cluster.cluster_name. If LSF_MASTER_LIST is
defined in
LSF_CONFDIR/lsf.conf, check the error logs on the hosts listed in
this parameter instead.