Platform LSF Reference Version 6.2

Troubleshooting and Error Messages
Platform LSF Reference
645
LSF can resolve most, but not all, problems using automount. The automount maps
must be managed through NIS. Follow the instructions in your Release Notes for
obtaining technical support if you are running automount and LSF is not able to locate
directories on remote hosts.
Batch daemons die quietly
First, check the sbatchd and mbatchd error logs. Try running the following command
to check the configuration.
% badmin ckconfig
This reports most errors. You should also check if there is any email from LSF in the
LSF administrator’s mailbox. If the
mbatchd is running but the sbatchd dies on some
hosts, it may be because
mbatchd has not been configured to use those hosts.
See “Host not used by LSF” on page 645.
sbatchd starts but mbatchd does not
Check whether LIM is running. You can test this by running the lsid command. If LIM
is not running properly, follow the suggestions in this chapter to fix the LIM first. You
should make sure that all hosts use the same
lsf.conf file. Note that it is possible that
mbatchd is temporarily unavailable because the master LIM is temporarily unknown,
causing the following error message.
sbatchd: unknown service
Check whether services are registered properly. See Administering Platform LSF for
information about registering LSF services.
Host not used by LSF
If you configure a list of server hosts in the Host section of the lsb.hosts file,
mbatchd allows sbatchd to run only on the hosts listed. If you try to configure an
unknown host as a
HOSTS definition for a queue in the lsb.queues file, mbatchd
logs the following message.
mbatchd on host: LSB_CONFDIR/cluster/configdir/file(line #):
Host hostname is not used by lsbatch;
ignored
If you try to configure an unknown host in the HostGroup or HostPartition
sections of the
lsb.hosts file, you also see the message.
If you start
sbatchd on a host that is not known by mbatchd, mbatchd rejects the
sbatchd. The sbatchd logs the following message and exits.
This host is not used by lsbatch system.
Both of these errors are most often caused by not running the following commands, in
order, after adding a host to the configuration.
lsadmin reconfig
badmin reconfig
You must run both of these before starting the daemons on the new host.