LSF Version 7.3 - Administering Platform LSF
Administering Platform LSF 695
Error and Event Logging
LSF daemons log error messages in different levels so that you can choose to log all
messages, or only log messages that are deemed critical. Message logging for LSF
daemons (except LIM) is controlled by the parameter LSF_LOG_MASK in
lsf.conf. Possible values for this parameter can be any log priority symbol that is
defined in
/usr/include/sys/syslog.h. The default value for LSF_LOG_MASK
is LOG_WARNING.
IMPORTANT: LSF_LOG_MASK in lsf.conf no longer specifies LIM logging level in LSF Version 7. For
LIM, you must use EGO_LOG_MASK in ego.conf to control message logging for LIM. The default
value for EGO_LOG_MASK is LOG_WARNING.
Error logging
If the optional LSF_LOGDIR parameter is defined in lsf.conf, error messages
from LSF servers are logged to files in this directory.
If LSF_LOGDIR is defined, but the daemons cannot write to files there, the error
log files are created in
/tmp.
If LSF_LOGDIR is not defined, errors are logged to the system error logs (
syslog)
using the LOG_DAEMON facility.
syslog messages are highly configurable, and
the default configuration varies widely from system to system. Start by looking for
the file
/etc/syslog.conf, and read the man pages for syslog(3) and
syslogd(1).
If the error log is managed by
syslog, it is probably already being automatically
cleared.
If LSF daemons cannot find
lsf.conf when they start, they will not find the
definition of LSF_LOGDIR. In this case, error messages go to
syslog. If you cannot
find any error messages in the log files, they are likely in the
syslog.
System Event Log
The LSF daemons keep an event log in the lsb.events file. The mbatchd daemon
uses this information to recover from server failures, host reboots, and
mbatchd
restarts. The
lsb.events file is also used by the bhist command to display detailed
information about the execution history of batch jobs, and by the
badmin command
to display the operational history of hosts, queues, and daemons.
By default,
mbatchd automatically backs up and rewrites the lsb.events file after
every 1000 batch job completions. This value is controlled by the MAX_JOB_NUM
parameter in the
lsb.params file. The old lsb.events file is moved to
lsb.events.1, and each old lsb.events.n file is moved to lsb.events.n+1. LSF
never deletes these files. If disk storage is a concern, the LSF administrator should
arrange to archive or remove old
lsb.events.n files periodically.
CAUTION: Do not remove or modify the current lsb.events file. Removing or modifying the
lsb.events file could cause batch jobs to be lost.