ClusterPack V2.4 Tutorial

ManualsBrandsHP ManualsSoftwareHP High Performance ClusterPack V2.3 Software

ClusterPack

Index of Tutorial Sections

Index

Administrators Guide

Users Guide

Tool Overview

Summary of content (173 pages)

PAGE 1
ClusterPack Index of Tutorial Sections Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary Administrators Guide 1.0 ClusterPack Install QuickStart 1.1 ClusterPack General Overview 1.2 Comprehensive Install Instructions 1.3 Installation and Configuration of Optional Components 1.4 Software Upgrades and Reinstalls 1.5 Golden Image Tasks 1.6 System Maintenance Tasks 1.7 System Monitoring Tasks 1.8 Workload Management Tasks 1.9 System Troubleshooting Tasks Users Guide 2.
PAGE 2
Dictionary of Cluster Terms Back to Top Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary Copyright 1994-2004 hewlett-packard company
PAGE 3
ClusterPack Install QuickStart ClusterPack ClusterPack Install QuickStart Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.0.
PAGE 4
Step Q1 Fill Out the ClusterPack Installation Worksheet Print out this form and fill out all information for each node in your cluster. Installation Worksheet (pdf) Note: You will not be able to complete the following steps if you have not collected all of this information. For more information, see the Comprehensive Instructions for this step.
PAGE 5
The Compute Nodes must have Management Processor (MP) cards. ClusterPack depends on certain open source software which is normally installed as a part of the operating environment. The minimum release versions required are: z z MySQL Version 3.23.58 or higher Perl Version 5.8 or higher For more information, see the Comprehensive Instructions for this step. References: z Step 2 Install Prerequisites Back to Top Step Q3 Allocate File System Space Allocate file system space on the Management Server.
PAGE 6
Note: It may take up to 24 hours to receive the license file. Plan accordingly. For more information, see the Comprehensive Instructions for this step. References: z Step 4 Obtain a License File Back to Top Step Q5 Prepare Hardware Access Get a serial console cable long enough to reach all the Compute Nodes from the Management Server. Note: If you are installing ClusterPack on Compute Nodes for the first time, DO NOT power up the systems, ClusterPack will do that for you automatically.
PAGE 7
Step Q7 Configure the ProCurve Switch z z z z z z Select an IP address from the same IP subnet that will be used for the Compute Nodes. Connect a console to the switch Log onto the switch through the console Type 'set-up' Select IP Config and select the "manual" option Select the IP address field and enter the IP address to be used for the switch For more information, see the Comprehensive Instructions for this step.
PAGE 8
References: z Step 9 Install ClusterPack on the Management Server Back to Top Step Q10 Run manager_config on the Management Server Provide the following information to the manager_config program: z z z z z z z z The path to the license file(s), The DNS domain and optional NIS domain for the cluster, The host name of the manager and the name of the cluster, The management LAN interface on the Management Server, The IP address(es) of the Compute Node(s), Whether to mount a home directory, Whether to confi
PAGE 9
z Step 11 Run mp_register on the Management Server Back to Top Step Q12 Power up the Compute Nodes Use the clbootnodes program to power up all Compute Nodes that have a connected Management Processor that you specified in the previous step. The clbootnodes program will provide the following information to the Compute Nodes: z z z z z Language to use, Host name, Time and time zone settings, Network configuration, Root password. For more information, see the Comprehensive Instructions for this step.
PAGE 10
repeat the installation process, performing all steps in the order specified. For more information, see the Comprehensive Instructions for this step.
PAGE 11
ClusterPack General Overview ClusterPack ClusterPack General Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.1.1 ClusterPack Overview 1.1.2 Who should use the material in this tutorial? 1.1.3 What is the best order to review the material in the tutorial? 1.1.4 Operating System and Operating Environment Requirements 1.1.5 System Requirements 1.1.
PAGE 12
of Gigabit Ethernet or Infiniband. The common components of a cluster are: z z z z z z z z z Head Node - provides user access to the cluster. In smaller clusters, the Head Node may also serve as a Management Server. Management Server - server that provides single point of management for all system components in the cluster Management LAN/switch - usually an Ethernet network used to monitor and control all the major system components. May also handle traffic to the file server.
PAGE 13
latency and higher bandwidth. A cluster LAN is also configured to separate the system management traffic from application message passing and file serving traffics. Management Software and Head Node The ability to manage and use a cluster as easily as a single compute system is critical to the success of any cluster solution. To facilitate ease of use for both system administrators and end-users, HP has created a software package called ClusterPack.
PAGE 14
Version 2.0. The ClusterPack has a server component that runs on a Management Server, and client agents that run on the managed Integrity compute servers.
PAGE 15
The Data Dictionary contains definitions for common terms that are used through the tutorial. Back to Top 1.1.3 What is the best order to review the material in the tutorial? System Administrators Initial installation and configuration of the cluster requires a complete understanding of the steps involved and the information required. Before installing a new cluster, the system administrator should read and understand all of the steps involved before beginning the actual installation.
PAGE 16
a link to the printable version at the bottom of the page. References: z Printable Version Back to Top 1.1.4 Operating System and Operating Environment Requirements The key components of the HP Integrity Server Technical Cluster are: z z z Management Server: HP Integrity server with HP-UX 11i Version 2.0 TCOE Compute Nodes: HP Integrity servers with HP-UX 11i Version 2.0 TCOE Cluster Management Software: ClusterPack V2.4 The following prerequisites are assumed: z z HP-UX 11i V2.
PAGE 17
Back to Top Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary Copyright 1994-2004 hewlett-packard company
PAGE 18
Comprehensive Install Instructions ClusterPack Comprehensive Install Instructions Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.2.
PAGE 19
{ z Processor. Verify the Management Server and the initial Compute Node. Configure the remaining Compute Nodes with a Golden Image. { { { { { Create a Golden Image. Add nodes to the configuration that will receive the Golden Image. Distribute the Golden Image to remaining nodes. Install and configure the Compute Nodes that received the Golden Image. Verify the final cluster configuration. These processes are further broken down into a number of discrete steps.
PAGE 20
Note: You will not be able to complete the following steps if you have not collected all of this information. Details At various points during the configuration you will be queried for the following information: z z z DNS Domain name [ex. domain.com] NIS Domain name [ex. hpcluster] Network Connectivity: { { z z z Information on which network cards in each Compute Node connect to the Management Server Information on which network card in the Management Server connects to the Compute Nodes.
PAGE 21
z z HP-UX 11i Ignite-UX HP-UX 11i V2.0 TCOE ClusterPack depends on certain open source software which is normally installed as a part of the operatin environment. The minimum release versions required are: z z MySQL Version 3.23.58 or higher Perl Version 5.8 or higher The Management Server requires a minimum of two LAN connections. One connection must be configur prior to installing ClusterPack. The Compute Nodes must have Management Processor (MP) cards.
PAGE 22
z z /var - 4GB /share - 500MB (Clusterware edition only) Details Allocate space for these file systems when you do a fresh install of HP-UX on the Management Server. To resize /opt 1. Go to single user mode. % # /usr/sbin/shutdown -r now 2. Interrupt auto boot. 3. Select the EFI shell. 4. Select the appropriate file system. (Should be fs0: but may be fs1:) % Shell> fs0: 5. Boot HP-UX. % fs0:\>hpux 6. Interrupt auto boot. 7. Boot to single user mode. % HPUX> boot vmunix -is 8.
PAGE 23
Step 4 Obtain a License File Background For ClusterPack Base Edition, please refer to the Base Edition License certificate for instructions on redeeming your license. For ClusterPack Clusterware Edition, you will need to redeem BOTH the Base Edition license certificate AND the Clusterware Edition license certificate. You will need TWO license files in order to run manager_config.
PAGE 24
Background This document does not cover hardware details. It is necessary, however, to make certain hardware preparations in order to run the software. Overview Get a serial console cable long enough to reach all the Compute Nodes from the Management Server. Details To allow the Management Server to aid in configuring the Management Processors, it is necessary to hav serial console cable to connect the serial port on the Management Server to the console port on the Management Processor to be configured.
PAGE 25
% /opt/clusterpack/bin/manager_config Back to Top Step 7 Configure the ProCurve Switch Background The ProCurve Switch is used for the management network of the cluster. Overview The IP address for the ProCurve Switch should be selected from the same IP subnet that will be used for Compute Nodes. Details z z z z z z Select an IP address from the same IP subnet that will be used for the Compute Nodes.
PAGE 26
% > lcd /tmp % > get cpack.lic % > bye Back to Top Step 9 Install ClusterPack on the Management Server Background The ClusterPack software is delivered on a DVD. Overview z z z Mount and register the ClusterPack DVD as a software depot. Install the ClusterPack Manager software (CPACK-MGR) using swinstall. Leave the DVD in the DVD drive for the next step. Details How to mount a DVD on a remote system to a local directory On the system with the DVD drive (i.e. remote system): 1. Mount the DVD.
PAGE 27
Note: You cannot be in the /mnt/dvdrom directory when you try to mount. You will get a file busy error. When you are finished, on the local machine: 6. Unmount the DVD file system. % /etc/umount /mnt/dvdrom On the remote system: 7. Unexport the DVD file system. % exportfs -u -i /mnt/dvdrom 8. Unmount the DVD % /etc/umount /mnt/dvdrom How to enable a DVD as a software depot During the installation process, two DVDs will be required.
PAGE 28
z z Using the ClusterPack DVD, mount and register the DVD as a software depot. Install the ClusterPack Manager software (CPACK-MGR) on the Management Server using swinstall. On the Management Server: % /usr/sbin/swinstall -s :/mnt/dvdrom CPACKMGR z The ClusterPack DVD will be referenced again in the installation process. Please leave it in the DVD drive until the "Invoke /opt/clusterpack/bin/manager_config on Management Server" step has completed.
PAGE 29
z z Cluster Management Software components after reboots. Configure Cluster Management Software tools. The Management Server components of HP System Management Tools (HP Systems Insight Manager) is also configured if selected. Print a PASS diagnostic message if all of the configuration steps are successful.
PAGE 30
manager_config Invocation manager_config is an interactive tool that configures the Management Server based on some simple quer (most of the queries have default values assigned, and you just need to press RETURN to assign those default values).
PAGE 31
When you telnet to an MP, you will initially access the console of the associated server. Other options su as remote console access, power management, remote re-boot operations, and temperature monitoring are available by typing control-B from the console mode. It is also possible to access the MP as a web consol However, before it is possible to access the MP remotely it is first necessary to assign an IP address to ea MP.
PAGE 32
console port on the MP card of each Compute Node. When you are ready to run mp_register, use this command: % /opt/clusterpack/bin/mp_register Back to Top Step 12 Power up the Compute Nodes Background The clbootnodes utility is intended to ease the task of booting Compute Nodes for the first time. To use clbootnodes, the nodes' MP cards must have been registered and/or configured with mp_register.
PAGE 33
When booting a node, clbootnodes will answer the first boot questions rather than having to answer them manually. The questions are answered using the following information: z z z z z z z Language selection: All language selection options are set to English. Keyboard selection: The keyboard selection is US English Timezone: The time zone information is determined based on the setting of the Management Server Time: The current time is accepted.
PAGE 34
Background This tool is the driver that installs and configures appropriate components on every Compute Node. z z z z z z z Registers Compute Nodes with HP Systems Insight Manager or SCM on the Management Server. Pushes agent components to all Compute Nodes. Sets up each Compute Node as NTP client, NIS client, and NFS client. Starts necessary agents in each of the Compute Nodes. Modifies configuration files on all Compute Nodes to enable auto-startup of agents after reboots.
PAGE 35
Execute the following command. % /opt/clusterpack/bin/compute_config Back to Top Step 14 Set up HyperFabric (optional) Background The utility clnetworks assists in setting up a HyperFabric network within a cluster. For clnetworks to recognize the HyperFabric (clic) interface, it is necessary to first install the drivers and/or kernel patches are needed. Once the clic interface is recognized by lanscan, clnetworks can be used to set (or change) the IP address configure the card.
PAGE 36
ClusterPack can configures IP over InfiniBand (IPoIB) if the appropriate InfiniBand drivers are installed the systems. Overview If the InfiniBand IPoIB drivers are installed prior to running compute_config, the InfiniBand HCA is detected and the administrator is given a chance to configure them. The administrator can also configure the InfiniBand HCA with IP addresses by invoking /opt/clusterpack/bin/clnetworks. See the man pages for clnetworks for usage instructions.
PAGE 37
The finalize_config tool can be run at any time to validate the cluster configuration and to determine if there are any errors in the ClusterPack software suite. Overview This program verifies the Cluster Management Software, and validates the installation of the single Comp Node. If it reports diagnostic error messages, repeat the installation process up to this point, performing a steps in the order specified. Details Finalize and validate the installation and configuration of the ClusterPack software.
PAGE 38
LSF jobs while the archive is being made: % badmin hclose z In addition, you should either wait until all running jobs complete, or suspend them: % bstop -a -u all -m z Execute sysimage_create on the Management Server and pass the name of the file from which you would like the image to be made. For example: % /opt/clusterpack/bin/sysimage_create z Monitor the output for possible error conditions.
PAGE 39
Overview z z Register the image. Distribute the image to selected nodes. Details To distribute a Golden Image to a set of Compute Nodes, you need to first register the image.
PAGE 40
"-a" in front of each node name. % /opt/clusterpack/bin/compute_config -a Back to Top Step 21 Verify the final cluster configuration Background This step completes the installation and configuration process, performs verification checks on the Cluste Management Software, and validates the installation. It prints out diagnostic error messages if the installation is not successful.
PAGE 41
Installation and Configuration of Optional Components ClusterPack Installation and Configuration of Optional Components Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.3.1 HP-UX IPFilter 1.3.2 External /home File Server 1.3.3 Adding Head Nodes to an ClusterPack cluster 1.3.4 Set up TCP-CONTROL 1.3.
PAGE 42
Nodes in a private IP sub-net (10.x.y.z range, 192.168.p.q range), which also alleviates the need for numerous public IP addresses. IP Aliasing or Network Address Translation (NAT) ClusterPack comes with HP-UX IPFilter, a software component with powerful packet filtering and firewalling capabilities. One of the features that it supports is Network Address Translation. For your information on HP-UX IPFilter, please refer to the HP-UX IPFilter manual and release notes at docs.hp.com: http://docs.hp.
PAGE 43
HP-UX IPFilter Validation HP-UX IPFilter is installed with the default HP-UX 11i V2 TCOE bundle. To validate its installation, run the following command: % swverify B9901AA Automatic setup of HP-UX IPFilter rules ClusterPack V2.4 provides a utility called nat.server to automatically set up the NAT rules, based on the cluster configuration. This tool can be invoked as follows: % /opt/clusterpack/lbin/nat.
PAGE 44
% man 8 ipf z List the input output filter rules % ipfstat -hio Setup the NAT rules In this section, we will walk through the steps of setting up HP-UX IPFilter that translate the source IP addresses of all packets from the compute private subnet to the IP address of the gateway node. For addin more sophisticated NAT rules, please refer to the IPFilter documentation. 1. Create a file with NAT rules. Example 1: Map packets from all Compute Nodes in the 192.168.0.x subnet to a single IP address 15.99.84.
PAGE 45
map lan0 192.168.0.4/32 -> 15.99.84.23/32 portmap tcp/udp 40000:60000 map lan0 192.168.0.4/32 -> 15.99.84.23/32 EOF More examples of NAT and other IPFilter rules are available at /opt/ipf/examples. 2. Enable NAT based on this rule set % ipnat -f /tmp/nat.rules Note: If there are existing NAT rules that you want to replace, you must flush and delete that rule set before loading the new rules: % ipnat -FC -f /tmp/nat.rules For more complicated manipulations of the rules, refer to ipnat man pages.
PAGE 46
If there is no packet loss, then NAT is enabled. z DISPLAY Server Interaction Test 1. On the Compute Node, set the DISPLAY variable to a display server that is not part of the cluster, for instance your local desktop. % setenv DISPLAY 15.99.22.42:0.0 (if it is csh) 2. Try to bring up an xterm on the DISPLAY server: % xterm & If the xterm is brought up in the DISPLAY server, then NAT is enabled. References: z 3.6.1 Introduction to NAT (Network Address Translation) Back to Top 1.3.
PAGE 47
The default use model of an ClusterPack cluster is that end users will submit jobs remotely through the ClusterWare GUI or by using the ClusterWare CLI from the Management Node. Cluster administrators generally discourage users from logging into the Compute Nodes directly. Users are encouraged to use th Management Server for accessing files and performing routine tasks.
PAGE 48
ALL:ALL@ By uncommenting these lines, all users from the Management Server will be denied access. There is also a /etc/hosts.allow file that explicitly permits access to some users. It is configured, by default, to allow ac to root and lsfadmin: ALL:root@ALL ALL:lsfadmin@ALL Although the hosts.deny file disallows all access, the entries in hosts.allow override the settings of hosts.deny. The hosts.
PAGE 49
Software Upgrades and Reinstalls ClusterPack Software Upgrades and Reinstalls Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.4.1 Software Upgrades and Reinstalls Overview 1.4.2 Prerequisites for Software Upgrades and Reinstalls 1.4.3 Reinstallation and Configuration Steps 1.4.
PAGE 50
nature can only be accomplished by a complete re-configuration of the cluster (See Initial Installation and Setup). The reinstallation path is only meant to ensure that all of the ClusterPack software is correctly installed and the cluster layout described by earlier invocations of manager_config is configured correctly. References: z 1.2.1 Comprehensive Installation Overview ClusterPack V2.4 supports an upgrade path from ClusterPack V2.3 and ClusterPack V2.2 Back to Top 1.4.
PAGE 51
1.4.4 Upgrading from Base Edition to Clusterware Edition Upgrading from Base Edition to Clusterware Edition is done using the "forced reinstall" path that is documented below. During manager_config you will be given an opportunity to provide a valid Clusterware License key. If you have a key, Clusterware will be installed and integrated into the remaining ClusterPack tools. Please obtain your Clusterware licnese key BEFORE reinstalling the ClusterPack software.
PAGE 52
This tool is the main installation and configuration driver. Invoke this tool with "force install" option -F: % /opt/clusterpack/bin/manager_config -F Note: manager_config will ask for the same software depot that was used the last time the cluster was installed. If you are using the ClusterPack V2.
PAGE 53
1.4.5 Upgrading from V2.2 to V2.4 ClusterPack V2.4 supports an upgrade path from ClusterPack V2.2. Customers that currently deploy ClusterPack V2.2 on HP Integrity servers use HP-UX 11i Version 2.0 TCOE. ClusterPack V2.4 provides a mechanism for the use of the majority of V2.2 configuration settings for the V2.4 configuration. Before starting the upgrade, it is important to have all of your Compute Nodes in good working order. All Compute Nodes and MP cards should be accessible.
PAGE 54
% /opt/clusterpack/bin/compute_config -u z Verify that everything is working as expected. % /opt/clusterpack/bin/finalize_config Back to Top 1.4.6 Upgrading from V2.3 to V2.4 ClusterPack V2.4 supports an upgrade path from ClusterPack V2.3. Customers that currently deploy ClusterPack V2.3 on HP Integrity servers use HP-UX 11i Version 2.0 TCOE. ClusterPack V2.4 provides a mechanism for the use of the majority of V2.3 configuration settings for the V2.4 configuration.
PAGE 55
z Verify that everything is working as expected.
PAGE 56
Golden Image Tasks ClusterPack Golden Image Tasks Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.5.1 Create a Golden Image of a Compute Node from the Management Server 1.5.2 Distribute Golden Image to a set of Compute Nodes 1.5.3 Managing system files on the compute nodes 1.5.4 Adding software bundles to Golden Images 1.5.1 Create a Golden Image of a Compute Node from the Management Server A system image is an archive of a computer's file system.
PAGE 57
z Ensure that the system is not being used. It is advisable that the system stop accepting new LSF jobs while the archive is being made: % badmin hclose z In addition, you should either wait until all running jobs complete, or suspend them: % bstop -a -u all -m z Execute sysimage_create on the Management Server and pass the name of the file from which you would like the image to be made.
PAGE 58
1.5.2 Distribute Golden Image to a set of Compute Nodes To distribute a golden image to a set of Compute Nodes, you need to first register the image. To register the image, use the command: % /opt/clusterpack/bin/sysimage_register If the image was created with sysimage_create, the full path of the image was displayed by sysimage_create.
PAGE 59
clsysfile creates an SD bundle CPACK-FILES. This bundle of files can be used to customize the files on the compute nodes. The revision number of the bundle is automatically incremented each time clsysfile is run. On the management server, clsysfile uses the working directory: /var/opt/clusterpack/sysfiles clsysfile builds the SD control files required to create a SD bundles of files. Three controls files are created by clsysfile: SysFile.psf, SysFile.configure, and Sysfile.unconfigure.
PAGE 60
To install a CPACK-FILES bundle on an individual compute node, or group of compute nodes, the clsh utility can be used: % /opt/clusterpack/bin/clsh -C "/usr/sbin/swinstall -s :/var/opt/clusterpack/depot " References: z z 3.1.5 clsh - Runs commands on one, some, or all nodes in the cluster. 1.5.4 Adding software bundles to Golden Images Back to Top 1.5.
PAGE 61
System Maintenance Tasks ClusterPack System Maintenance Tasks Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.6.1 Add Node(s) to the Cluster 1.6.2 Remove Node(s) from the Cluster 1.6.3 Install Software in Compute Nodes 1.6.4 Remove Software from Compute Nodes 1.6.5 Update Software in Compute Nodes 1.6.6 Add Users to Compute Nodes 1.6.7 Remove Users from Compute Nodes 1.6.8 Change System Parameters in Compute Nodes 1.6.
PAGE 62
The steps in this section have to be followed in the specified order to ensure that everything works correctly. Step 1 Invoke /opt/clusterpack/bin/manager_config on Management Server Invoke /opt/clusterpack/bin/manager_config with a "add node" option -a. % /opt/clusterpack/bin/manager_config -a : This command adds the new node with the specified hostname and IP address to the cluster.
PAGE 63
In the later case, the utility will prompt you (for each node in the cluster) whether to boot it or skip it. To boot a compute node with a system image, use the "-i" option to clbootnodes and specify the image. The image must have been created by sysimage_create and registered with sysimage_register.
PAGE 64
z z z Installation and configuration of the Management Server Installation and configuration of the Compute Nodes Verification of the Management Server and Compute Nodes The steps in this section must be followed in the specified order to ensure that everything works correctly. Step 1 Invoke /opt/clusterpack/bin/manager_config on Management Server Invoke /opt/clusterpack/bin/manager_config with a "remove node" option -r.
PAGE 65
Using CLI Software can also be installed on Compute Nodes using the /opt/clusterpack/bin/clsh tool to run the swinstall command. However, this may not work in a guarded cluster.
PAGE 66
Using the CLI Software can also be removed from Compute Nodes using the /opt/clusterpack/bin/clsh tool to run the swremove command: z To remove product PROD1 on all Compute Nodes % /opt/clusterpack/bin/clsh /usr/sbin/swremove PROD1 z To install product PROD1 on just the Compute Node group "cae" % /opt/clusterpack/bin/clsh -C cae /usr/sbin/remove PROD1 Using the HPSIM GUI To remove software to Compute Nodes using HPSIM GUI, do the following: z z z Select "Deploy", "Software Distributor", and then click
PAGE 67
The process for updating software is the same as for installing software. (See "Install Software in Compute Nodes"). swinstall will verify that the software you are installing is a newer version than what is already present. For patches, and software in non-depot format, it will be necessary to follow the specific directions given with the patch/update. References: z 1.6.3 Install Software in Compute Nodes Back to Top 1.6.
PAGE 68
account parameters to use in creating the account. If NIS is configured in the cluster, all user accounts are administered from the Management Server. Any changes to a user's account will be pushed to all the Compute Nodes using NIS. References: z 3.2.3 How to Run SCM Web-based GUI Back to Top 1.6.7 Remove Users from Compute Nodes Using the CLI User accounts should be removed from the Management Server as normal with userdel (man userdel(1M) for more information).
PAGE 69
account to remove. All user accounts are administered from the Management Server. Any changes to a users account will be pushed to all the Compute Nodes using NIS. References: z 3.2.3 How to Run SCM Web-based GUI Back to Top 1.6.8 Change System Parameters in Compute Nodes Using the HPSIM GUI To change System Parameters in Compute Nodes using HPSIM GUI, do the following: z z z Select "Configure", "HP-UX Configuration", and then double-click on "Kernel Configuration - kcweb".
PAGE 70
Back to Top 1.6.9 Define Compute Node Inventory Data Collection for Consistency checks Scheduling Data Collection tasks are done using the HP System Management Tools: Using the HPSIM GUI To create a Data Collection task using HPSIM GUI, do the following: z z z z Select "Options", then click on "Data Collection". The Data Collection page appears. Select the node(s) and/or node group to install on. Specify how to save data after data collection.
PAGE 71
z 3.2.3 How to Run SCM Web-based GUI Back to Top 1.6.10 Define Consistency Check Timetables on Compute Node Inventories Scheduling Data Collection tasks are done using the HP System Management Tools: Using the HPSIM GUI To create a Data Collection task using HPSIM GUI, do the following: z z z z Select "Options", then click on "Data Collection". The Data Collection page appears. Select the node(s) and/or node group to install on. Specify how to save data after data collection.
PAGE 72
z 3.2.3 How to Run SCM Web-based GUI Back to Top 1.6.11 Compare the Inventories of a Set of Nodes Comparing the results of Data Collection tasks is done using the HP System Management Tools: Using the HPSIM GUI To create a Data Collection task using HPSIM GUI, do the following: z z z z Select "Reports", then click on "Snapshot Comparison". The Snapshot Comparison window appears. Select the target node(s). Select between two and four snapshots for the systems from the Select Snapshots page.
PAGE 73
z 3.2.3 How to Run SCM Web-based GUI Back to Top 1.6.12 Execute remote commands on one or more nodes A remote command can be executed on one or more nodes in the cluster from any node by using the 'clsh' command in /opt/clusterpack/bin.
PAGE 74
z Update /etc/checklist on node1, node3 and node5 with the local /etc/checklist % clcp -C node1+node3+node5 /etc/checklist % h:/etc/checklist z Copy multiple local files to all nodes % clcp a.txt b.txt c.txt %h:/tmp z Copy multiple remote files to multiple local files % clcp %h:/tmp/a.txt /tmp/a.%h.txt For more details on the usage of clcp, invoke: % man clcp Back to Top 1.6.
PAGE 75
using PIDs on a cluster is not feasible given there will be different PIDs on different hosts, clkill can kill processes by name.
PAGE 76
Groups of Compute Nodes can be removed from ClusterPack using /opt/clusterpack/bin/clgroup. The following example removes the node group "cae": % /opt/clusterpack/bin/clgroup -r cae Note that the above-mentioned command just removes the group; the nodes are still part of the cluster, and users can submit jobs to the nodes. For more details on the usage of clgroup, invoke the command: % man clgroup Back to Top 1.6.
PAGE 77
Back to Top 1.6.20 Add File Systems to Compute Nodes The file system for Compute Nodes can be defined using System Administration Manager (SAM). Invoke SAM from the command line or from within HP System Management tools and select "Disks and File Systems". Invoke SAM from the command line or from within SCM and select "Disks and File Systems". Select "Actions->Add Local File System->Using the Logical Volume Manager" and enter the required information. Repeat this operation for each Compute Node.
PAGE 78
% /sbin/init.d/cpack.server stop ClusterPack Clusterware Edition Every installation of ClusterPack Clusterware Edition includes a fully functional Base Edition license manager. All Base Edition license server functions should be used to manage that portion of the license server. Platform Computing's Clusterware Pro V5.1 uses a proprietary licensing scheme. For more information on managing the Clusterware Pro license functionality, Please see the "Platform Computing Clusterware Pro V5.
PAGE 79
System Monitoring Tasks ClusterPack System Monitoring Tasks Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.7.1 Get an Overview of Cluster Health 1.7.2 Get an Overview of the Job Queue Status 1.7.3 Get details on health of specific Compute Nodes 1.7.4 View Usage of Resources in Compute Node(s) 1.7.5 Monitor Compute Nodes based on resource thresholds 1.7.
PAGE 80
z z State refers to the state of the host. Batch State refers to the state of the host, and the state of the daemons running on that host. A detailed list of batch states is shown below. For more information, select the online help: z z z Select Help->Platform Help Select "View" under the "Hosts" section in the left hand pane. Select "Change your hostview" to see a description of the icons. Using the Clusterware Pro V5.
PAGE 81
z z z z z z z z have exceeded their thresholds. closed_Excl - The host is not accepting jobs until the exclusive job running on it completes. closed_Full - The host is not accepting new jobs. The configured maximum number of jobs that can run on it has been reached. closed_Wind - The host is not accepting jobs. The dispatch window that has been defined for it is closed. unlicensed - The host is not accepting jobs. It does not have a valid LSF license for sbatchd and LIM is down.
PAGE 82
or % bqueues -l For more information, see the man page: % man bqueues Common Terms Both the Web interface and the CLI use the same terms for the health and status of the job submission queues. These terms are used to define the State of an individual queue. z z z z Open - The queue is able to accept jobs. Closed - The queue is not able to accept jobs. Active - Jobs in the queue may be started. Inactive - Jobs in the queue cannot be started for the time being. References: z z 3.7.
PAGE 83
Default status from each node is available using: % bhosts STATUS shows the current status of the host and the SBD daemon. Batch jobs can only be dispatched to hosts with an ok status.
PAGE 84
1.7.4 View Usage of Resources in Compute Node(s) Using the Clusterware Pro V5.1 Web Interface: From the Hosts Tab: z z z z z Select the host to be monitored using the checkbox next to each host. More than one host can be selected. From the menu select Host->Monitor A new window will open that displays the current resource usage of one of the selected hosts. Four resources are displayed: total system memory, CPU Utilization, swap space available, and /tmp space available.
PAGE 85
z 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 1.7.5 Monitor Compute Nodes based on resource thresholds Using the Clusterware Pro V5.1 Web Interface: From the Hosts Tab z z z z From the View menu select View->Choose Columns Add the Available Column resource to the Displayed Columns list. Click OK The new resource to be monitored will be displayed on the Host tab screen. Using the Clusterware Pro V5.1 CLI: Using the lshosts command, a resource can be specified.
PAGE 86
Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary Copyright 1994-2004 hewlett-packard company
PAGE 87
Workload Management Tasks ClusterPack Workload Management Tasks Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.8.1 Add new Job Submission Queues 1.8.2 Remove Queues 1.8.3 Restrict user access to specific queues 1.8.4 Add resource constraints to specified queues 1.8.5 Change priority of specified queues 1.8.6 Add pre/post run scripts to specified queues 1.8.7 Kill a job in a queue 1.8.8 Kill all jobs owned by a user 1.8.9 Kill all jobs in a queue 1.8.
PAGE 88
After adding, removing or modifying queues, it is necessary to reconfigure LSF to read the new queue information. This is done from the Management Server using the Clusterware Pro V5.1 CLI: % badmin reconfig Verify the queue has been added by using the Clusterware Pro V5.1 CLI: % bqueues -l References: z 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 1.8.
PAGE 89
Back to Top 1.8.3 Restrict user access to specific queues Using the Clusterware Pro V5.1 CLI: The file /share/platform/clusterware/conf/lsbatch//configdir/lsb.queues controls which users can submit to a specific queue. The name of your cluster can be determined by using the Clusterware Pro V5.1 CLI: % lsid Edit the lsb.queues file and look for a USERS line for the queue you wish to restrict. If a USERS line exists, you can add or remove users from it.
PAGE 90
% lsid Find the queue definition you wish to modify. The following entries for maximum resource usage can be modified or added for each queue definition: z z z z z z z CPULIMIT = minutes on a host FILELIMIT = file size limit MEMLIMIT = bytes per job DATALIMIT = bytes for data segment STACKLIMIT = bytes for stack CORELIMIT = bytes for core files PROCLIMIT = processes per job RES_REQ is a resource requirement string specifying the condition for dispatching a job to a host.
PAGE 91
PRIORITY = to the queue definition. Queues with higher priority values are searched first during scheduling. After adding, removing or modifying queues, it is necessary to reconfigure LSF to read the new queue information. This is done from the Management Server using the Clusterware Pro V5.1 CLI: % badmin reconfig Verify the queue has been modified by using the Clusterware Pro V5.1 CLI: % bqueues -l References: z z 1.8.1 Add new Job Submission Queues 3.7.
PAGE 92
to the queue definition. The command or tool should be accessible and runnable on all nodes that the queue services. After adding, removing or modifying queues, it is necessary to reconfigure LSF to read the new queue information. This is done from the Management Server using the Clusterware Pro V5.1 CLI: % badmin reconfig Verify the queue has been modified by using the Clusterware Pro V5.1 CLI: % bqueues -l References: z 1.8.1 Add new Job Submission Queues Back to Top 1.8.
PAGE 93
Users can kill their own jobs. Queue administrators can kill jobs associated with a particular queue. References: z 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 1.8.9 Kill all jobs in a queue Using the Clusterware Pro V5.1 CLI: All of the jobs in a queue can be killed by using the bkill command with the -q option: % bkill -q -u all 0 Users can kill their own jobs. Queue administrators can kill jobs associated with a particular queue. References: z 3.
PAGE 94
1.8.11 Suspend all jobs owned by a user Using the Clusterware Pro V5.1 CLI: All of a user's jobs can be suspended using the special 0 job id: % bstop -u 0 Users can suspend their own jobs. Queue administrators can suspend jobs associated with a particular queue. References: z 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 1.8.12 Suspend all jobs in a queue Using the Clusterware Pro V5.
PAGE 95
z 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 1.8.14 Resume all suspended jobs owned by a user Using the Clusterware Pro V5.1 CLI: All of a user's jobs can be resumed using the Clusterware Pro V5.1 CLI by using the special 0 job id: % bresume -u 0 Users can resume their own jobs. Queue administrators can resume jobs associated with a particular queue. References: z 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 1.8.
PAGE 96
System Troubleshooting Tasks ClusterPack System Troubleshooting Tasks Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 1.9.1 Locate a Compute Node that is down 1.9.2 Get to the console of a Compute Node that is down 1.9.3 Bring up a Compute Node with a recovery image 1.9.4 View system logs for cause of a crash 1.9.5 Bring up the Management Server from a crash 1.9.6 Troubleshoot SCM problems 1.9.7 Replace a Compute Node that has failed with a new machine 1.9.
PAGE 97
% lshosts -l % bhosts -l References: z z z z 1.7.1 Get an Overview of Cluster Health 1.7.3 Get details on health of specific Compute Nodes 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 1.9.
PAGE 98
This will reboot the machine, hostname, and cause the machine to install from the golden image you specified. References: z 1.5.2 Distribute Golden Image to a set of Compute Nodes Back to Top 1.9.4 View system logs for cause of a crash The system logs are located in /var/admin/syslog/syslog.log The crash logs are stored in /var/adm/crash The installation and configuration logs for ClusterPack are stored in /var/opt/clusterpack/log Back to Top 1.9.
PAGE 99
Problem: When I try to add a node, I get "Properties file for doesn't exist." Solution: z Make sure that the hostname is fully qualified in /etc/hosts on both the Management Server and the managed node, if it exists in /etc/hosts, and that any shortened host names are aliases instead of primary names. For example: { z should be used instead of: { z z 10.1.2.3 cluster Make sure that AgentConfig is installed on the managed node, and that mxrmi and mxagent are running.
PAGE 100
be added to the cluster using the IP address and hostname of the failed node or can be added with a new name and IP address. Replacing with a new hostname and IP address In this case, the replacement node is handled simply by removing the failed node and adding the new node. Remove the failed node from the cluster using the following commands: % manager_config -r % compute_config -r The nodes MP will automatically be removed from the MP register database.
PAGE 102
Job Management Tasks ClusterPack Job Management Tasks Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 2.1.1 Invoke the Workload Management Interface from the Management Server 2.1.2 Invoke the Workload Management Interface from the intranet 2.1.3 Prepare for job submission 2.1.4 Submit a job to a queue 2.1.5 Submit a job to a group 2.1.6 Set a priority for a submitted job 2.1.7 Check the status of a submitted job 2.1.8 Check the status of all submitted jobs 2.1.
PAGE 103
z Go to the following URL in the web browser: % /opt/netscape/netscape http://:8080/Platform/login/Login.jsp z Enter your Unix user name and password. This assumes that the gaadmin services have been started by the LSF Administrator. Note: The user submitting a job must have access to the Management Server and to all the Compute Nodes that will execute the job. To prevent security problems, the super user account (i.e. root) cannot submit any jobs. References: z z 3.7.
PAGE 104
Using the Clusterware Pro V5.1 Web Interface: From the jobs tab: z z z Select Job->Submit. Enter job data. Click Submit. Data files required for the job may be specified using the '-f' option to the bsub command. This optional information can be supplied on the "Advanced" tab within the Job Submission screen. For an explanation of the '-f' options please see "Transfer a file from intranet to specific Compute Nodes in the cluster". Using the Clusterware Pro V5.
PAGE 105
Using the Clusterware Pro V5.1 CLI: % bsub -q Use bqueues to list available Queues. % bqueues References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.1.5 Submit a job to a group Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z z Select Job->Submit. Enter relevant Job information. Select the "Resources" tab.
PAGE 106
Using the Clusterware Pro V5.1 Web Interface: Set a priority at submission by: z z From the Jobs Tab, select Job->Submit. Using the Queue pull down menu, select a queue with a high priority. After submission: z z z From the Jobs Tab, select the job from the current list of pending jobs. Select Job->Switch Queue. Switch the job to a queue with a higher priority The relative priority of the different Queues can be found on the "Queue Tab". Using the Clusterware Pro V5.
PAGE 107
References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.1.8 Check the status of all submitted jobs Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z Review the Jobs table. Use the Previous and Next buttons to view more jobs. Using the Clusterware Pro V5.1 CLI: % bjobs % bjobs - l References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.
PAGE 108
Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z z z Select Job->Submit. Click Advanced. Select "Send email notification when job is done". Enter the email address in the email to field. Using the Clusterware Pro V5.1 CLI: Using the CLI, users are automatically notified when a job completes. References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.1.
PAGE 109
Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z z z z z Select Tools->Find. Select User from the Field list. Type the user name in the Value field. Click Find. Click Select All. Click Kill. Using the Clusterware Pro V5.1 CLI: % bkill -u 0 References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.1.
PAGE 110
z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.1.14 Suspend a submitted job in a queue Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z Select the job from the Jobs table. Select Job->Suspend. Using the Clusterware Pro V5.1 CLI: % bstop References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.
PAGE 111
z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.1.16 Suspend all jobs submitted by the user in a queue Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z z z z z z z z z z Select Tools->Find. Select the Advanced tab. Select User from the Field list in the Define Criteria section. Type the user name in the Value field. Click << Select Queue from the Field list.
PAGE 112
Using the Clusterware Pro V5.1 CLI: % bresume References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.1.18 Resume all suspended jobs submitted by the user Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z z z z z z z z z z Select Tools->Find. Select the Advanced tab. Select User from the Field list in the Define Criteria section. Type the user name in the Value field.
PAGE 113
From the Jobs tab: z z z z z z z z z z z Select Tools->Find. Select the Advanced tab. Select User from the Field list in the Define Criteria section. Type the user name in the Value field. Click << Select Queue from the Field list. Select the queue from the Queue list. Click << Click Find. Click Select All. Click Resume. Using the Clusterware Pro V5.1 CLI: % bresume -u -q 0 References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.
PAGE 114
2.1.21 Suspend a submitted MPI job Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z Select the job from the Jobs table. Select Job->Suspend. Using the Clusterware Pro V5.1 CLI: % bstop References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.1.22 Resume a suspended MPI job Using the Clusterware Pro V5.
PAGE 116
File Transfer Tasks ClusterPack File Transfer Tasks Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 2.2.1 Transfer a file from intranet to the Management Server in the cluster 2.2.2 Transfer a file from intranet to all Compute Nodes in the cluster 2.2.3 Transfer a file from intranet to specific Compute Nodes in the cluster 2.2.4 Transfer a file from a Compute Node to a system outside the cluster 2.2.
PAGE 117
Back to Top 2.2.2 Transfer a file from intranet to all Compute Nodes in the cluster If the cluster is a Guarded Cluster, this operation is done in two steps: z z FTP the file to the Management Server. Copy the file to all nodes in the cluster. % clcp /a/input.data %h:/date/input.data % clcp /a/input.data cluster:/date/input.data For more details on the usage of clcp, invoke the command: % man clcp References: z 2.2.1 Transfer a file from intranet to the Management Server in the cluster Back to Top 2.
PAGE 118
< Copies the remote file to the local file after the job completes. Overwrites the local file if it exists. % bsub -f < << Appends the remote file to the local file after the job completes. The local file must exist. % bsub -f << >< Copies the local file to the remote file before the job starts. Overwrites the remote file if it exists. Then copies the remote file to the local file after the job completes. Overwrites the local file.
PAGE 119
z FTP the file from the Head node to the external target. References: z Guarded Cluster Back to Top 2.2.5 Transfer a file from a Compute Node to another Compute node in the cluster The 'clcp' command in /opt/clusterpack/bin is used to copy files between cluster nodes. This command can be invoked either from the Management Server or any Compute Node. [From the Management Server] % clcp node1:/a/data node2:/b/data Back to Top 2.2.
PAGE 120
For more details on the usage of clcp, invoke the command: % man clcp Back to Top Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary Copyright 1994-2004 hewlett-packard company
PAGE 121
Miscellaneous Tasks ClusterPack Miscellaneous Tasks Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 2.3.1 Run a tool on a set of Compute Nodes 2.3.2 Check resource usage on a Compute Node 2.3.3 Check Queue status 2.3.4 Remove temporary files from Compute Nodes 2.3.5 Prepare application for checkpoint restart 2.3.6 Restart application from a checkpoint if a Compute Node crashes 2.3.7 Determine if the application fails to complete 2.3.
PAGE 122
Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z z z z z Select Jobs->Submit. Enter job information. Click Advanced. On the Advanced dialog, enter script details in the Pre-execution command field. Click OK. Click Submit. Using the CLI: % bsub E 'pre_exec_cmd [args ...]' command References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.3.
PAGE 123
2.3.3 Check Queue status Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z Review the Queues table. Use the Previous and Next buttons to view more Queues. Using the Clusterware Pro V5.1 CLI: % bqueues [] References: z z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.3.
PAGE 124
and should not be used while AppRS jobs are running. % apprs_clean all For jobs submitted to non-AppRS queues, the user's job submission script should include commands to remove files that are no longer needed when the job completes. In the event that the job fails to run to completion it may be necessary to remove these files manually.
PAGE 125
#APPRS TARGETUTIL 1.0 #APPRS TARGETTIME 10 #APPRS REDUNDANCY 4 # Your job goes here: if [ "$APPRS_RESTART" = "Y" ]; then # job as it is run under restart conditions else # job as it is run under normal conditions fi The names of all files that need to be present for the application to run from a restart should be listed with the HIGHLYAVAILABLE tag: #APPRS HIGHLYAVAILABLE Other AppRS options can be set in the job submission script.
PAGE 126
2.3.6 Restart application from a checkpoint if a Compute Node crashes If a Compute Node crashes, jobs submitted to an AppRS queue will automatically be restarted on a new node or set of nodes as those resources become available. No user intervention is necessary. Back to Top 2.3.7 Determine if the application fails to complete The job state of EXIT is assigned to jobs that end abnormally. Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z Review the job states in the Jobs table.
PAGE 127
% bhist z or for more information: % bhist -l z For jobs submitted to an AppRS queue, details of the job, including failover progress can be viewed using the command: % apprs_hist References: z 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.3.9 Get a high-level view of the status of the Compute Nodes Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z z Review the Hosts table.
PAGE 128
Cluster Management Utility Zone Overview ClusterPack Cluster Management Utility Zone Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.1.1 What is Cluster Management Utility Zone? 3.1.2 What are the Easy Install Tools? 3.1.3 What are the system imaging tools? 3.1.4 What are the Cluster Aware Tools? 3.1.5 clsh - Runs commands on one, some, or all nodes in the cluster. 3.1.6 clcp - Copies files to one, some, or all cluster nodes. 3.1.
PAGE 129
3.1.2 What are the Easy Install Tools? The ClusterPack suite includes a set of utilities for setting up a cluster of Itanium 2 nodes. The tools mananger_config, mp_register, clbootnodes, compute_config and finalize_config are key components for establishing and administering an Itanium 2 cluster.
PAGE 130
z z z sysimage_create sysimage_register sysimage_distribute These scripts use ClusterPack's knowledge of the cluster configuration to simplify the creation and distribution of system (golden) images. With the use of scripts, creating and distributing images is as simple as running these three tools and providing the name of a host and/or path of the image. References: z z 1.5.1 Create a Golden Image of a Compute Node from the Management Server 1.5.
PAGE 131
new command will not begin until the previous one is finished, i.e. these do not run in parallel. Sending a SIGINT (usually a ^C) will cause the current host to be skipped, and sending a SIGQUIT (usually a ^\) will immediately abort the whole clsh command. Percent interpolation, as in clcp, is also supported. clsh exits wth a non-zero status if there are problems running the remote shell commands. A summary of hosts on which problems occurred is printed at the end.
PAGE 132
z single local to single local % clcp src dst z single local to multiple local % clcp src dst.%h z single local to multiple remote % clcp src dst:%h or clcp src cluster-group:dst z multiple local to multiple remote % clcp src dst.%h %h:dst z multiple remote to multiple local % clcp %h:src dst.%h Examples 1. Assume that the file /etc/checklist needs to be updated on all HP hosts. Also assume that this file is different on all hosts.
PAGE 133
Make necessary changes. % clcp checklist.%c %h:/etc/checklist which maps to: % rcp host0:/etc/checklist checklist.0 % rcp host1:/etc/checklist checklist.1 % vi checklist.0 checklist.1 % rcp checklist.0 host0:/etc/checklist % rcp checklist.1 host1:/etc/checklist 3. The following is an example if log files are needed: % clcp %h:/usr/spool/mqueue/syslog %h/syslog.%Y%M% D.%T This would save the files in directories (which are the host names) with file names of the form: YYMMDD.TT:TT.
PAGE 134
cluptime is used as follows: % cluptime [ [-C] cluster-group] For more details on the usage of cluptime, invoke the command: % man cluptime Back to Top 3.1.8 clps - Cluster-wide ps command clps and clkill are the same program with clps producing a "ps" output that includes the host name and clkill allowing processes to be killed. clps is used as follows: % clps [-C] cluster][-ad]{tty user command pid regexp} For more details on the usage of clps, invoke the command: % man clps Back to Top 3.1.
PAGE 135
3.1.10 clinfo - Shows nodes and cluster information. The clinfo command lists which hosts make up a cluster. By default, with no arguments, the current cluster is listed. Non-flag arguments are interpreted as cluster names. Three different output modes are supported. z Short format (enabled by the -s option) The short format lists the cluster (followed by a colon) and the hosts it contains; one cluster per line. Long lines do not wrap.
PAGE 136
core tools of ClusterPack, including PCC ClusterWare Pro™ and the HP Systems Insight Manager. Node groups are collections of nodes that are subsets of the entire node membership of the compute cluster. They may have overlapping memberships such that a single node may be a member of more than one group. The node grouping mechanism allows flexible partitioning of a compute cluster into logical collections that match their use model.
PAGE 137
% clgroup -l group1 For more details on the usage of clgroup, invoke the command: % man clgroup Back to Top 3.1.12 clbroadcast - Telnet and MP based broadcast commands on cluster nodes. The clbroadcast command is used to broadcast commands to various nodes in the cluster using the Management Processor (MP) interface or telnet interface.
PAGE 138
The clpower utility performs the specified power operation on a node or list of nodes using the Management Processor (MP) interface.
PAGE 139
Service ControlManager (SCM) Overview ClusterPack Service ControlManager (SCM) Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.2.1 What is ServiceControl Manager? 3.2.2 How to install, configure, manage, and troubleshoot SCM: 3.2.3 How to Run SCM Web-based GUI 3.2.1 What is ServiceControl Manager? ServiceControl Manager (SCM) makes system administration more effective, by distributing the effects of existing tools efficiently across nodes.
PAGE 140
3.2.2 How to install, configure, manage, and troubleshoot SCM: ServiceControl Manager must be installed prior to installation of ClusterPack. References: z 4.1.2 HP-UX ServiceControl Manager Back to Top 3.2.3 How to Run SCM Web-based GUI This release of ClusterPack includes a version of SCM that has a Web-based GUI.
PAGE 141
System Inventory Manager Overview ClusterPack System Inventory Manager Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.3.1 What is System Inventory Manager? 3.3.2 How to invoke Systems Inventory Manager 3.3.1 What is System Inventory Manager? The Systems Inventory Manager application is a tool that allows you to easily collect, store and manage inventory and configuration information for the Compute Nodes in the HP-UX Itanium 2 cluster.
PAGE 142
z z The filtering facility allows you to define and view only the information that you need at any given time. The Command Line Interface (CLI) that is provided enables scripting capabilities. Online help is available by clicking the Help Tab in Systems Inventory Manager GUI. References: z 4.1.4 HP System Inventory Manager Back to Top 3.3.
PAGE 143
Application ReStart (AppRS) Overview ClusterPack Application ReStart (AppRS) Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.4.1 What is AppRS? 3.4.1 What is AppRS? AppRS is a collection of software that works in conjunction with Platform Computing's Clusterware™ to provide a fail-over system that preserves the current working directory (CWD) contents of applications in the event of a fail-over.
PAGE 144
To use AppRS, users must add the following line to their ~/.cshrc file: source /share/platform/clusterware/conf/cshrc.lsf and the following line to their ~/.profile file: . /share/platform/clusterware/conf/profile.lsf References: z z z z z 2.3.4 Remove temporary files from Compute Nodes 2.3.5 Prepare application for checkpoint restart 2.3.
PAGE 145
Cluster Management Utility (CMU) Overview ClusterPack Cluster Management Utility (CMU) Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.5.1 What is CMU? 3.5.2 Command line utilities 3.5.3 Nodes monitoring 3.5.4 Invoking CMU 3.5.5 Stopping CMU 3.5.6 CMU main window 3.5.7 Monitoring By Logical Group 3.5.8 Contextual Menu 3.5.9 Logical Group Administration Menu 3.5.1 What is CMU? CMU is designed to manage a large group of Compute Nodes.
PAGE 146
3.5.3 Nodes monitoring z Cluster monitoring Enhanced monitoring capabilities for up to 1024 nodes in a single window (with vertical scrollbars). z Monitoring tools Provides tools to monitor remote node activities. z Node Administration Allows execution of an action on several nodes with one command. The actions are: 1. Boot and reboot selected nodes. 2.
PAGE 147
window enabled. CMU will display the last monitored logical group. Note: When starting the CMU window for the first time, the monitoring action is performed with the “Default” Logical Group. Note: Some of the menus and functions within CMU will allow the user to act on more than one selected item at a time. When appropriate, the user can select multiple items by using the Ctrl or Shift keys in conjunction with the left mouse button.
PAGE 148
{ { { { { Terminal Server Configuration PDU Configuration Network Topology Adaptation Node Management Event Handling Configuration Back to Top 3.5.7 Monitoring By Logical Group The following section describes the different actions that the user can perform in the "Monitoring By Logical Group" window. z Select/Unselect one node Left click on the name of this node. The node becomes darker when selected, or returns to original color when unselected.
PAGE 149
A contextual menu window appears with a right click on a node displayed in the central frame of the main monitoring CMU window. The following menu options are available: z Telnet Connection Launches a telnet session to this node. The telnet session is embedded in an Xterm window. z Management Card Connection Launches a telnet connection to the management card of this node. The telnet session is embedded in an Xterm window.
PAGE 150
Many management actions such as boot, reboot, halt, or monitoring will be applied to all of the selected nodes. z Halt This sub-menu allows a system administrator to issue the halt command on all of the selected nodes. The halt command can be performed immediately (this is the default), or delayed for a given time (between 1 to 60 minutes). The administrator can also have a message sent to all the users on the selected nodes by typing in the "Message" edit box.
PAGE 151
before booting a node. z Reboot This sub-menu allows a system administrator to issue the reboot command on all of the selected nodes. The reboot command can be performed immediately (this is the default), or delayed for a given time (between 1 to 60 minutes). The administrator can also have a message sent to all the users on the selected nodes by typing in the "Message" edit box. Note: The reboot command is performed on the nodes using "rsh".
PAGE 152
To improve the Xterm windows display appearance, every window can be shifted (in x and y) from the previous one to make sure that they fit nicely on the screen. By default, the shift values are computed so that the windows tile the screen and no window is displayed outside of the screen. If the user does not need to visualize the telnet sessions, or does not want to crowd the display, the user has the option to start the Xterm windows minimized.
PAGE 153
NAT/IPFilter Overview ClusterPack NAT/IPFilter Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.6.1 Introduction to NAT (Network Address Translation) 3.6.1 Introduction to NAT (Network Address Translation) Network Address Translation (NAT) or IP Aliasing provides a mechanism to configure multiple IP addresses in the cluster to present a single image view with a single external IP address.
PAGE 154
IP Aliasing or Network Address Translation (NAT) ClusterPack comes with HP-UX IPFilter, a software component with powerful packet filtering and firewalling capabilities. One of the features that it supports is Network Address Translation. For information on HP-UX HPFilter, please refer to the HP-UX HPFilter manual and release notes at docs.hp.com: http://docs.hp.com/hpux/internet/index.html#IPFilter/9000 For information on NAT features of HP-UX HPFilter refer to the public domain how-to document.
PAGE 155
Platform Computing Clusterware Pro V5.1 Overview ClusterPack Platform Computing Clusterware Pro V5.1 Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.7.1 What is Clusterware Pro? 3.7.2 How do I obtain and install the Clusterware Pro V5.1 license file? 3.7.3 Where is Clusterware Pro V5.1 installed on the system? 3.7.4 How can I tell if Clusterware Pro V5.1 is running? 3.7.5 How do I start and stop the Clusterware Pro V5.1 daemons? 3.7.
PAGE 156
z z z z Organizations experience increased productivity from transparent single system, clusteras-server access to compute resources. The Platform Computing's Clusterware Pro V5.1 solution dramatically reduces time to market through continuous access to the cluster's compute power. The Platform Computing's Clusterware Pro V5.1 solution enables organizations to achieve higher quality results by running simulations and analyses faster than previously possible.
PAGE 157
Setup and Configuration of a DEMO license The use of a DEMO license file (license.dat) for Clusterware Pro, as part of the ClusterPack V2.4 Clusterware Edition, requires some modification of installed configuration files. These modifications will have to be removed in order to use a purchased license key (LSF_license.oem). 1. Place the DEMO license key onto the Management Server /share/platform/clusterware/conf/license.dat 2. Modify the /share/platform/clusterware/conf/lsf.
PAGE 158
The /etc/exports file on the Management Server, and the /etc/fstab file on each Compute Node is updated automatically by ClusterPack. Back to Top 3.7.4 How can I tell if Clusterware Pro V5.1 is running? On the Management Server, several Clusterware Pro V5.1 services must be running in order to provide fu functionality for the tool. All of these services are located in /share/platform/clusterware.
PAGE 159
To START services on the Management Server Issue the following command on the Management Server as the super user (i.e. root): % /share/platform/clusterware/lbin/cwmgr start To STOP services on the Management Server Issue the following command on the Management Server as the super user (i.e. root): % /share/platform/clusterware/lbin/cwmgr stop To START services on ALL Compute Nodes Issue the following command on the Management Server as the super user (i.e.
PAGE 160
% /share/platform/clusterware/lbin/cwagent stop References: z 3.1.5 clsh - Runs commands on one, some, or all nodes in the cluster. Back to Top 3.7.6 How do I start and stop the Clusterware Pro V5.1 Web GUI? The Web GUI is started and stopped as part of the tools that are used to start and stop the other Clusterwa Pro V5.1 services. No additional steps are required. Note: The Clusterware Pro Web GUI is not automatically started during a reboot of the Management Server.
PAGE 161
z The username and password are the same as for any normal user account on the Management Server. References: z 3.7.6 How do I start and stop the Clusterware Pro V5.1 Web GUI? Back to Top 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Before using the Clusterware Pro V5.1 CLI, you must set a number of environment variables. This must b done once in each shell before using any of the Clusterware Pro V5.1 commands.
PAGE 162
% badmin reconfig % badmin mbdrestart -f Restarting the Clusterware Pro V5.1 Services As an alternative, the Clusterware Pro V5.1 services can simply be restarted on all nodes in the cluster. T will cause any information about jobs that are running to be lost, but the jobs will continue to run. Please "How do I start and stop the Clusterware Pro V5.1 daemons?" for more information. References: z 3.7.5 How do I start and stop the Clusterware Pro V5.1 daemons? Back to Top 3.7.
PAGE 163
Management Processor (MP) Card Interface Overview ClusterPack Management Processor (MP) Card Interface Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.8.1 Using the MP Card Interface 3.8.1 Using the MP Card Interface The MP cards allow the Compute Nodes to be remotely powered up. Using this technology, the initial installation and configuration of the Compute Nodes is eased. In order to access the MP Card Interface (using HPUX 11i V2.
PAGE 164
{ Step 11 Run mp_register on the Management Server Back to Top Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary Copyright 1994-2004 hewlett-packard company
PAGE 165
HP Systems Insight Manager (HPSIM) Overview ClusterPack HP Systems Insight Manager (HPSIM) Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.9.1 What is HP Systems Insight Manager 3.9.2 What are the key features of HP Systems Insight Manager 3.9.3 How to install, configure, manage, and troubleshoot HP Systems Insight Manager 3.9.4 How to run HPSIM Web-based GUI 3.9.
PAGE 166
z z z conditions automatically through automated event handling. Facilitates secure, scheduled execution of OS commands, batch files, and custom or off-the-shelf applications across groups of Windows, Linux, or HPUX systems. Enables centralized updates of BIOS, drivers, and agents across multiple ProLiant servers with system software version control. Enables secure management through support for SSL, SSH, OS authentication, and role-based security. Back to Top 3.9.
PAGE 168
Related Documents ClusterPack Related Documents Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 4.1.1 HP-UX 11i Operating Environments 4.1.2 HP-UX ServiceControl Manager 4.1.3 HP Application ReStart 4.1.4 HP System Inventory Manager 4.1.5 HP-UX IPFilter 4.1.6 ClusterPack V2.3 4.1.7 HP Systems Insight Manager 4.1.1 HP-UX 11i Operating Environments HP-UX 11i v2 Operating Environment Document Collection http://www.docs.hp.com/en/oshpux11iv2.
PAGE 169
http://docs.hp.com/en/5990-8540/index.html ServiceControl Manager Troubleshooting Guide http://docs.hp.com/en/5187-4198/index.html Back to Top 4.1.3 HP Application ReStart HP Application ReStart Release Note AppRS Release Notes (pdf) HP Application Restart User's Guide AppRS User's Guide (pdf) Back to Top 4.1.4 HP System Inventory Manager Systems Inventory Manager User's Guide http://docs.hp.com/en/5187-4238/index.html Systems Inventory Manager Troubleshooting Guide http://docs.hp.com/en/5187-4239/index.
PAGE 170
4.1.6 ClusterPack V2.3 ClusterPack V2.3 Release Note http://www.docs.hp.com/hpux/onlinedocs/T1843-90009/T1843-90009.htm Back to Top 4.1.7 HP Systems Insight Manager HP Systems Insight Manager Product Information http://h18013.www1.hp.com/products/servers/management/hpsim/index.
PAGE 171
Cluster LAN/Switch A Cluster LAN/Switch is usually an Ethernet network used to monitor and control all the major system components. May also handle traffic to the file server. Back to Top Cluster Management Software The Cluster Management Software is the ClusterPack for system administrators and endusers. Back to Top Guarded Cluster A cluster where only the Management Server has a network connection to nodes outside of the cluster.
PAGE 172
Interconnect Switch An Interconnect Switch provides high speed connectivity between Compute Nodes. Used for message passing and remote memory access capabilities for parallel applications. Back to Top Management Processor (MP) Management Processor (MP) controls the system console, reset and power management functions. Back to Top Management Server The Management Server provides single point of management for all system components in the cluster.
PAGE 173
Back to Top Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary Copyright 1994-2004 hewlett-packard company