LSF Version 7.3 - Administering Platform LSF

Administering Platform LSF 613
C HAPTER
42
Achieving Performance and Scalability
Contents
Optimizing Performance in Large Sites on page 613
Tuning UNIX for Large Clusters on page 614
Tuning LSF for Large Clusters on page 615
Monitoring Performance Metrics in Real Time on page 623
Optimizing Performance in Large Sites
As your site grows, you must tune your LSF cluster to support a large number of
hosts and an increased workload.
This chapter discusses how to efficiently tune querying, scheduling, and event
logging in a large cluster that scales to 5000 hosts and 100,000 jobs at any one time.
To target performance optimization to a cluster with 5000 hosts and 100,000 jobs,
you must:
Configure your operating system. See Tuning UNIX for Large Clusters on page
614
Fine-tune LSF. See Tuning LSF for Large Clusters on page 615
Whats new in LSF performance?
LSF provides parameters for tuning your cluster, which you will learn about in this
chapter. However, before you calculate the values to use for tuning your cluster,
consider the following enhancements to the general performance of LSF daemons,
job dispatching, and event replaying:
Both scheduling and querying are much faster
Switching and replaying the events log file, lsb.events, is much faster. The
length of the events file no longer impacts performance
Restarting and reconfiguring your cluster is much faster
Job submission time is constant. It does not matter how many jobs are in the
system. The submission time does not vary.
The scalability of load updates from the slaves to the master has increased