LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 249
Understanding Resources
Automatic Detection of Hardware Reconfiguration
Some UNIX operating systems support dynamic hardware reconfiguration—that
is, the attaching or detaching of system boards in a live system without having to
reboot the host.
Supported platforms
LSF is able to recognize changes in ncpus, maxmem, maxswp, maxtmp in the following
platforms:
Sun Solaris 2.5+
HP-UX 10.10+
IBM AIX 4.0+
SGI IRIX 6.2+
Dynamic changes in ncpus
LSF is able to automatically detect a change in the number of processors in systems
that support dynamic hardware reconfiguration.
The local LIM checks if there is a change in the number of processors at an internal
interval of 2 minutes. If it detects a change in the number of processors, the local
LIM also checks
maxmem, maxswp, maxtmp. The local LIM then sends this new
information to the master LIM.
Dynamic changes in maxmem, maxswp, maxtmp
If you dynamically change maxmem, maxswp, or maxtmp without changing the
number of processors, you need to restart the local LIM with the command
lsadmin limrestart so that it can recognize the changes.
If you dynamically change the number of processors and any of
maxmem, maxswp, or
maxtmp, the change will be automatically recognized by LSF. When it detects a
change in the number of processors, the local LIM also checks
maxmem, maxswp,
maxtmp.
Viewing dynamic hardware changes
lsxxx Commands There may be a 2 minute delay before the changes are recognized by lsxxx
commands (for example, before
lshosts displays the changes).
bxxx Commands There may be at most a 2 + 10 minute delay before the changes are recognized by
bxxx commands (for example, before bhosts -l displays the changes).
This is because
mbatchd contacts the master LIM at an internal interval of 10
minutes.
Platform
MultiCluster
Configuration changes from a local cluster are communicated from the master LIM
to the remote cluster at an interval of 2 * CACHE_INTERVAL. The parameter
CACHE_INTERVAL is configured in
lsf.cluster.cluster_name and is by default
60 seconds.
This means that for changes to be recognized in a remote cluster there is a
maximum delay of 2 minutes + 2*CACHE_INTERVAL.