LSF Version 7.3 - Administering Platform LSF
Administrative Basics for PMC and CLI
202 Administering Platform LSF
Master host failover
During master host failover, the system is unavailable for a few minutes while hosts
are waiting to be contacted by the new master.
The master candidate list defines which hosts are master candidates. By default, the
list includes just one host, the master host, and there is no failover. If you configure
additional candidates to enable failover, the master host is first in the list. If the
master host becomes unavailable, the next host becomes the master. If that host is
also unavailable, the next host is considered to become the master, and so on down
the list. A short list with two or three hosts is sufficient for practical purposes.
For failover to work properly, the master candidates must share a file system and the
shared directory must always be available.
IMPORTANT: The shared directory should not reside on a master host or any of the master
candidates. If the shared directory resides on the master host and the master host fails, the next
candidate cannot access the necessary files.
Management host
Management hosts belong to the
ManagementHosts resource group. These hosts
are not expected to execute workload units for users. Management hosts are
expected to run services such as the web server and web services gateway. The
master host and all master candidates must be management hosts.
A slot is the basic unit of resource allocation, analogous to a "virtual CPU".
Management hosts share configuration files, so a shared file system is needed
among all management hosts.
A management host is configured when you run
egoconfig mghost on the host.
The tag
mg is assigned to the management host, in order to differentiate it from a
compute host.
Compute host
Compute hosts are distributed to cluster consumers to execute workload units. By
default, compute hosts belong to the
ComputeHosts resource group.
The
ComputeHosts group excludes hosts with the mg tag, which is assigned to
management hosts when you run
egoconfig mghost. If you create your own
resource groups to replace
ComputeHosts, make sure they also exclude hosts with
the
mg tag.
By default, the number of slots on a compute host is equal to the number of CPUs.
Web server host or PMC host
The web server is the host that runs the Platform Management Console, when you
configure this you may call it the PMC host. There is only one host at a time acting
as the web server host. If EGO controls the PMC, it does not need to be a dedicated
host; by default, any management host in the cluster can be the web server (decided
when the cluster starts up, failing over if the original host fails). However, if EGO
does not control PMC, you must configure the PMC host manually. If you specify
the PMC host, there can be no failover of PMC.