LSF Version 7.3 - Administering Platform LSF
Administering Platform LSF 717
Troubleshooting and Error Messages
logJobInfo_: write <logdir/info/jobfile> xdrpos <pos> failed: error
logJobInfo_: write <logdir/info/jobfile> xdr buf len <len> failed:
error
logJobInfo_: close(<logdir/info/jobfile>) failed: error
rmLogJobInfo: Job <jobId>: can’t unlink(<logdir/info/jobfile>):
error
rmLogJobInfo_: Job <jobId>: can’t stat(<logdir/info/jobfile>): error
readLogJobInfo: Job <jobId> can’t open(<logdir/info/jobfile>): error
start_job: Job <jobId>: readLogJobInfo failed: error
readLogJobInfo: Job <jobId>: can’t read(<logdir/info/jobfile>) size
size: error
initLog: mkdir(<logdir/info>) failed: error
<fname>: fopen(<logdir/file> failed: error
getElogLock: Can’t open existing lock file <logdir/file>: error
getElogLock: Error in opening lock file <logdir/file>: error
releaseElogLock: unlink(<logdir/lockfile>) failed: error
touchElogLock: Failed to open lock file <logdir/file>: error
touchElogLock: close <logdir/file> failed: error
mbatchd
failed to create, remove, read, or write the log directory or a file in the log
directory, for the reason given in error. Check that LSF administrator has read,
write, and execute permissions on the
logdir directory.
If
logdir is on AFS, check that the instructions in the document “Installing LSF on
AFS” on the Platform Web site have been followed. Use the
fs ls command to
verify that the LSF administrator owns
logdir and that the directory has the
correct acl.
replay_newjob: File <logfile> at line <line>: Queue <queue> not
found, saving to queue <lost_and_found>
replay_switchjob: File <logfile> at line <line>: Destination queue
<queue> not found, switching to queue <lost_and_found>
When mbatchd was reconfigured, jobs were found in queue but that queue is no
longer in the configuration.
replay_startjob: JobId <jobId>: exec host <host> not found, saving
to host <lost_and_found>
When mbatchd was reconfigured, the event log contained jobs dispatched to host,
but that host is no longer configured to be used by LSF.
do_restartReq: Failed to get hData of host <host_name>/<host_addr>
mbatchd
received a request from sbatchd on host host_name, but that host is not
known to
mbatchd. Either the configuration file has been changed but mbatchd has
not been reconfigured to pick up the new configuration, or host_name is a client
host but the
sbatchd daemon is running on that host. Run the following command
to reconfigure the
mbatchd or kill the sbatchd daemon on host_name.
badmin reconfig