Platform LSF Administration Guide Version 6.2
Handling Job Exceptions
Administering Platform LSF
138
JOB_OVERRUN
Specifies a threshold for job overrun. If a job runs longer than the specified run time,
LSF invokes
eadmin to trigger the action for a job overrun exception.
JOB_UNDERRUN
Specifies a threshold for job underrun. If a job exits before the specified number of
minutes, LSF invokes
eadmin to trigger the action for a job underrun exception.
Example
The following queue defines thresholds for all job exceptions:
Begin Queue
...
JOB_UNDERRUN = 2
JOB_OVERRUN = 5
JOB_IDLE = 0.10
...
End Queue
For this queue:
◆
A job underrun exception is triggered for jobs running less than 2 minutes
◆
A job overrun exception is triggered for jobs running longer than 5 minutes
◆
A job idle exception is triggered for jobs with an idle factor (CPU time/runtime)
less than 0.10
Configuring thresholds for job exception handling
EADMIN_TRIGGER_DURATION (lsb.params)
By default, LSF checks for job exceptions every 5 minutes. Use
EADMIN_TRIGGER_DURATION in
lsb.params to change how frequently LSF
checks for overrun, underrun, and idle jobs.
Tune EADMIN_TRIGGER_DURATION carefully. Shorter values may raise false alarms,
longer values may not trigger exceptions frequently enough.