LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 201
Managing LSF on Platform EGO
Responding to service message Error
Normally, Platform EGO attempts to start a service multiple times, up to the
maximum threshold set in the service profile XML file (containing the service
definition). If the service cannot start, you will receive a service error message.
1 Try stopping and then restarting the service.
2 Review the appropriate service instance log file to discover the cause of the
error.
Platform EGO service log files include those for the service director
(
ServiceDirector), web service gateway (WebServiceGateway), and the
Platform Management Console (
WEBGUI). If you have defined your own
non-EGO services, you may have other log files you will need to review,
depending on the service which is triggering the error.
Responding to service message
Allocating
Allocating
is a transitional service state before the service starts running. If your
service remains in this state for some time without transitioning to
Started, or
cycles between
Defining and Allocating, you will want to discover the cause of
the delay.
1 If you are the cluster administrator, review the allocation policy.
a Open the service profile XML file (containing the service definition).
b Find the consumer for which the service is expected to run.
c Ensure that a proper resource plan is set for that consumer.
During a servicesallocation” period, Platform EGO attempts to find an
appropriate resource on which to run the service. If it cannot find the required
resource, the service will not start.
Manage hosts
Important host
roles
Hosts in the cluster may be described as the master host, master candidates,
management hosts, compute hosts, or the web server host.
Master host
A cluster requires a master host. This is the first host installed. The master host
controls the rest of the hosts in the grid.
Master candidates
There is only one master host at a time. However, if the master host ever fails,
another host automatically takes over the master host role, allowing work to
continue. This process is called failover. When the master host recovers, the role
switches back again.
Hosts that can act as the master are called master candidates. This includes the
original master host and all hosts that can take over the role in a failover scenario.
All master candidates must be management hosts.