Platform LSF Administration Guide Version 6.2
Chapter 9
Understanding Resources
Administering Platform LSF
213
Automatic Detection of Hardware Reconfiguration
Some UNIX operating systems support dynamic hardware reconfiguration—that is, the
attaching or detaching of system boards in a live system without having to reboot the
host.
Supported platforms
LSF is able to recognize changes in ncpus, maxmem, maxswp, maxtmp in the following
platforms:
◆
Sun Solaris 2.5+
◆
HP-UX 10.10+
◆
Compaq Alpha 5.0+
◆
IBM AIX 4.0+
◆
SGI IRIX 6.2+
Dynamic changes in ncpus
LSF is able to automatically detect a change in the number of processors in systems that
support dynamic hardware reconfiguration.
The local LIM checks if there is a change in the number of processors at an internal
interval of 2 minutes. If it detects a change in the number of processors, the local LIM
also checks
maxmem, maxswp, maxtmp. The local LIM then sends this new information
to the master LIM.
Dynamic changes in maxmem, maxswp, maxtmp
If you dynamically change maxmem, maxswp, or maxtmp without changing the number
of processors, you need to restart the local LIM with the command
lsadmin
limrestart so that it can recognize the changes.
If you dynamically change the number of processors and any of
maxmem, maxswp, or
maxtmp, the change will be automatically recognized by LSF. When it detects a change
in the number of processors, the local LIM also checks
maxmem, maxswp, maxtmp.
Viewing dynamic hardware changes
lsxxx Commands
There may be a 2 minute delay before the changes are recognized by lsxxx commands
(for example, before
lshosts displays the changes).
bxxx Commands
There may be at most a 2 + 10 minute delay before the changes are recognized by bxxx
commands (for example, before
bhosts -l displays the changes).
This is because
mbatchd contacts the master LIM at an internal interval of 10 minutes.
Platform
MultiCluster
Configuration changes from a local cluster are communicated from the master LIM to
the remote cluster at an interval of 2 * CACHE_INTERVAL. The parameter
CACHE_INTERVAL is configured in
lsf.cluster.cluster_name and is by default
60 seconds.
This means that for changes to be recognized in a remote cluster there is a maximum
delay of 2 minutes + 2*CACHE_INTERVAL.