Platform LSF Reference Version 6.2
Troubleshooting and Error Messages
Platform LSF Reference
643
Common LSF Problems
This section lists some common problems with LSF jobs. Most problems are due to
incorrect installation or configuration. Check the
mbatchd and sbatchd error log
files; often the log message points directly to the problem.
The section also includes some common problems with the LIM, the RES and
interactive applications.
LIM dies quietly
Run the following command to check for errors in the LIM configuration files.
% lsadmin ckconfig -v
This displays most configuration errors. If this does not report any errors, check in the
LIM error log.
LIM unavailable
Sometimes the LIM is up, but executing the lsload command prints the following
error message:
Communication time out.
If the LIM has just been started, this is normal, because the LIM needs time to get
initialized by reading configuration files and contacting other LIMs.
If the LIM does not become available within one or two minutes, check the LIM error
log for the host you are working on.
When the local LIM is running but there is no master LIM in the cluster, LSF
applications display the following message:
Cannot locate master LIM now, try later.
Check the LIM error logs on the first few hosts listed in the Host section of the
lsf.cluster.cluster_name file. If LSF_MASTER_LIST is defined in lsf.conf,
check the LIM error logs on the hosts listed in this parameter instead.
Master LIM is down
Sometimes the master LIM is up, but executing the lsload or lshosts command
prints the following error message:
Master LIM is down; try later
If the /etc/hosts file on the host where the master LIM is running is configured with
the host name assigned to the loopback IP address (127.0.0.1), LSF client LIMs cannot
contact the master LIM. When the master LIM starts up, it sets its official host name
and IP address to the loopback address. Any client requests will get the master LIM
address as 127.0.0.1, and try to connect to it, and in fact will try to access itself.
Check the IP configuration of your master LIM in
/etc/hosts. The following
example incorrectly sets the master LIM IP address to the loopback address:
127.0.0.1 localhost myhostname
The following example correctly sets the master LIM IP address:
127.0.0.1 localhost
192.168.123.123 myhostname