Platform LSF Administration Guide Version 6.2

ManualsBrandsHP ManualsSoftwareHP XC System 3.x Software

381

382

383

384

385

386

387

388

389

390

Chapter 24

Job Requeue and Job Rerun

Administering Platform LSF

389

Exclusive Job Requeue

About exclusive job requeue

You can configure automatic job requeue so that a failed job is not rerun on the same

host.

Limitations

◆

If mbatchd is restarted, this feature might not work properly, since LSF forgets

which hosts have been excluded. If a job ran on a host and exited with an exclusive

exit code before

mbatchd was restarted, the job could be dispatched to the same

host again after

mbatchd is restarted.

◆

Exclusive job requeue does not work for MultiCluster jobs or parallel jobs

◆

A job terminated by a signal is not requeued

Configuring exclusive job requeue

Set REQUEUE_EXIT_VALUES in the queue definition (lsb.queues) and define

the exit code using parentheses and the keyword

EXCLUDE, as shown:

EXCLUDE(

exit_code...

)

When a job exits with any of the specified exit codes, it will be requeued, but it will not

be dispatched to the same host again.

Example

Begin Queue

...

REQUEUE_EXIT_VALUES=30 EXCLUDE(20)

HOSTS=hostA hostB hostC

...

End Queue

A job in this queue can be dispatched to hostA, hostB or hostC.

If a job running on

hostA exits with value 30 and is requeued, it can be dispatched to

hostA, hostB, or hostC. However, if a job running on hostA exits with value 20 and

is requeued, it can only be dispatched to

hostB or hostC.

If the job runs on

hostB and exits with a value of 20 again, it can only be dispatched

hostC. Finally, if the job runs on hostC and exits with a value of 20, it cannot be

dispatched to any of the hosts, so it will be pending forever.