Managing Serviceguard Fourteenth Edition Manufacturing Part Number: B3936-90117 June 2007
Legal Notices © Copyright 1995-2007 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Contents 1. Serviceguard at a Glance What is Serviceguard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About Veritas CFS and CVM from Symantec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Configuration of the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Heartbeat Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Manual Startup of Entire Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Automatic Cluster Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Examples of Mirrored Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Storage on Disk Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Volume Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HP-UX Logical Volume Manager (LVM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Veritas Volume Manager (VxVM) . . . . . . . . . . . . . . . . .
Contents Cluster Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Cluster Configuration Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Package Configuration Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Logical Volume and File System Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Specifying Maximum Number of Configured Packages . . . . . . . . . . . . . . . . . . . . . Modifying Cluster Timing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Access Control Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding Volume Groups. . . . . . . . . . . . . . . . . . . . .
Contents Package Modules and Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generating the Package Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Before You Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cmmakepkg Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Next Step. . . . . . . . . . . . . . . . . . . .
Contents Reconfiguring a Package on a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconfiguring a Package on a Halted Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding a Package to a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deleting a Package from a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resetting the Service Restart Counter . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Reviewing the System Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Object Manager Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Serviceguard Manager Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing the System Multi-node Package Files . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Design for Multiple Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design for Replicated Data Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing Applications to Run on Multiple Systems . . . . . . . . . . . . . . . . . . . . . . . . . Avoid Node-Specific Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avoid Using SPU IDs or MAC Addresses . . . . . . . . . . . . . . . .
Contents Running the Rolling Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keeping Kernels Consistent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Migrating cmclnodelist entries from A.11.15 or earlier . . . . . . . . . . . . . . . . . . . . . . Example of a Rolling Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents H. IPv6 Network Support IPv6 Address Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Textual Representation of IPv6 Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IPv6 Address Prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unicast Addresses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents 14
Tables Table 1. Printing History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 Table 3-1. Package Configuration Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76 Table 3-2. Node Lists in Sample Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79 Table 3-3. Package Failover Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82 Table 3-4.
Tables 16
Printing History Table 1 Printing History Printing Date Part Number Edition January 1995 B3936-90001 First June 1995 B3936-90003 Second December 1995 B3936-90005 Third August 1997 B3936-90019 Fourth January 1998 B3936-90024 Fifth October 1998 B3936-90026 Sixth December 2000 B3936-90045 Seventh September 2001 B3936-90053 Eighth March 2002 B3936-90065 Ninth June 2003 B3936-90070 Tenth June 2004 B3936-90076 Eleventh June 2005 B3936-90076 Eleventh, First reprint.
HP Printing Division: Infrastructure Solutions Division Hewlett-Packard Co. 19111 Pruneridge Ave.
Preface This fourteenth printing of the manual applies to Serviceguard Version A.11.18. Earlier versions are available at http://www.docs.hp.com -> High Availability -> Serviceguard. This guide describes how to configure Serviceguard to run on HP 9000 or HP Integrity servers under the HP-UX operating system. The contents are as follows: • Chapter 1, “Serviceguard at a Glance,” describes a Serviceguard cluster and provides a roadmap for using this guide.
Related Publications • Appendix C, “Designing Highly Available Cluster Applications,” gives guidelines for creating cluster-aware applications that provide optimal performance in a Serviceguard environment. • Appendix D, “Integrating HA Applications with Serviceguard,” presents suggestions for integrating your existing applications with Serviceguard. • Appendix E, “Software Upgrades,” shows how to move from one Serviceguard or HP-UX release to another without bringing down your applications.
— Managing HP Serviceguard for Linux, Sixth Edition, August 2006 • Documentation for your version of Veritas storage products from http://www.docs.hp.com -> High Availability -> HP Serviceguard Storage Management Suite • Before using Veritas Volume Manager (VxVM) storage with Serviceguard, refer to the Veritas documentation posted at http://docs.hp.com. From the heading Operating Environments, choose 11i v3. Then, scroll down to the section Veritas Volume Manager and File System.
— Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • From http://www.docs.hp.com -> High Availability - > HP Serviceguard Extension for Faster Failover: — HP Serviceguard Extension for Faster Failover, Version A.01.00, Release Notes • From http://www.docs.hp.com -> High Availability - > Serviceguard Extension for SAP: — Managing Serviceguard Extension for SAP • From http://www.docs.hp.
— HP Auto Port Aggregation Release Notes and other Auto Port Aggregation documents Problem Reporting If you have any problems with the software or documentation, please contact your local Hewlett-Packard Sales Office or Customer Service Center.
Serviceguard at a Glance 1 Serviceguard at a Glance This chapter introduces Serviceguard on HP-UX, and shows where to find different kinds of information in this book. The following topics are presented: • What is Serviceguard? • Using Serviceguard Manager • A Roadmap for Configuring Clusters and Packages If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4, “Planning and Documenting an HA Cluster,” on page 131.
Serviceguard at a Glance What is Serviceguard? What is Serviceguard? Serviceguard allows you to create high availability clusters of HP 9000 or HP Integrity servers (or a mixture of both; see the release notes for your version for details and restrictions). A high availability computer system allows application services to continue in spite of a hardware or software failure.
Serviceguard at a Glance What is Serviceguard? A multi-node package can be configured to run on one or more cluster nodes. It is considered UP as long as it is running on any of its configured nodes. In Figure 1-1, node 1 (one of two SPU's) is running failover package A, and node 2 is running package B. Each package has a separate group of disks associated with it, containing data needed by the package's applications, and a mirror copy of the data.
Serviceguard at a Glance What is Serviceguard? Figure 1-2 Typical Cluster After Failover After this transfer, the failover package typically remains on the adoptive node as long the adoptive node continues running. If you wish, however, you can configure the package to return to its primary node as soon as the primary node comes back online. Alternatively, you may manually transfer control of the package back to the primary node at the appropriate time.
Serviceguard at a Glance What is Serviceguard? • Mirrordisk/UX or Veritas Volume Manager, which provide disk redundancy to eliminate single points of failure in the disk subsystem; • Event Monitoring Service (EMS), which lets you monitor and detect failures that are not directly handled by Serviceguard; • disk arrays, which use various RAID levels for data protection; • HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, which eliminates failures related to power outage.
Serviceguard at a Glance Using Serviceguard Manager Using Serviceguard Manager Serviceguard Manager is the graphical user interface for Serviceguard. It is available as a “plug-in” to the System Management Homepage (SMH). SMH is a web-based graphical user interface (GUI) that replaces SAM as the system administration GUI as of HP-UX 11i v3 (but you can still run the SAM terminal interface; see “Using SAM” on page 32).
Serviceguard at a Glance Using Serviceguard Manager Configuring Clusters with Serviceguard Manager You can configure clusters and legacy packages in Serviceguard Manager; modular packages must be configured by means of Serviceguard commands (see “How the Package Manager Works” on page 71; Chapter 6, “Configuring Packages and Their Services,” on page 271; and “Configuring a Legacy Package” on page 363). You must have root (UID=0) access to the cluster nodes.
Serviceguard at a Glance Using SAM Using SAM You can using SAM, the System Administration Manager, to do many of the HP-UX system administration tasks described in this manual (that is, tasks, such as configuring disks and filesystems, that are not specifically Serviceguard tasks). To launch SAM, enter /usr/sbin/sam on the command line. As of HP-UX 11i v3, SAM offers a Terminal User Interface (TUI) which also acts as a gateway to the web-based System Management Homepage (SMH).
Serviceguard at a Glance What are the Distributed Systems Administration Utilities? What are the Distributed Systems Administration Utilities? HP Distributed Systems Administration Utilities (DSAU) simplify the task of managing multiple systems, including Serviceguard clusters.
Serviceguard at a Glance A Roadmap for Configuring Clusters and Packages A Roadmap for Configuring Clusters and Packages This manual presents the tasks you need to perform in order to create a functioning HA cluster using Serviceguard. These tasks are shown in Figure 1-3. Figure 1-3 Tasks in Configuring a Serviceguard Cluster The tasks in Figure 1-3 are covered in step-by-step detail in chapters 4 through 7. HP recommends you gather all the data that is needed for configuration before you start.
Understanding Serviceguard Hardware Configurations 2 Understanding Serviceguard Hardware Configurations This chapter gives a broad overview of how the Serviceguard hardware components work. The following topics are presented: • Redundancy of Cluster Components • Redundant Network Components • Redundant Disk Storage • Redundant Power Supplies • Larger Clusters Refer to the next chapter for information about Serviceguard software components.
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Redundancy of Cluster Components In order to provide a high level of availability, a typical cluster uses redundant system components, for example two or more SPUs and two or more independent disks. This redundancy eliminates single points of failure. In general, the more redundancy, the greater your access to applications, data, and supportive services in the event of a failure.
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Note that a package that does not access data from a disk on a shared bus can be configured to fail over to as many nodes as you have configured in the cluster (regardless of disk technology). For instance, if a package only runs local executables, it can be configured to failover to all nodes in the cluster that have local copies of those executables, regardless of the type of disk connectivity.
Understanding Serviceguard Hardware Configurations Redundant Network Components Redundant Network Components To eliminate single points of failure for networking, each subnet accessed by a cluster node is required to have redundant network interfaces. Redundant cables are also needed to protect against cable failures. Each interface card is connected to a different cable, and the cables themselves are connected by a component such as a hub or a bridge.
Understanding Serviceguard Hardware Configurations Redundant Network Components Figure 2-1 Redundant LANs In the figure, a two-node Serviceguard cluster has one bridged net configured with both a primary and a standby LAN card for the data/heartbeat subnet (Subnet A). Another LAN card provides an optional dedicated heartbeat LAN. Note that the primary and standby LAN segments are connected by a hub to provide a redundant data/heartbeat subnet. Each node has its own IP address for this subnet.
Understanding Serviceguard Hardware Configurations Redundant Network Components NOTE You should verify that network traffic is not too high on the heartbeat/ data LAN. If traffic is too high, this LAN might not perform adequately in transmitting heartbeats if the dedicated heartbeat LAN fails. Replacing Failed Network Cards Depending on the system configuration, it is possible to replace failed network cards while the cluster is running.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Redundant Disk Storage Each node in a cluster has its own root disk, but each node is also physically connected to several other disks in such a way that more than one node can obtain access to the data and programs associated with a package it is configured for.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage When planning and assigning SCSI bus priority, remember that one node can dominate a bus shared by multiple nodes, depending on what SCSI addresses are assigned to the controller for each node on the shared bus. All SCSI addresses, including the addresses of all interface cards, must be unique for all devices on a shared bus.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage another node until the failing node is halted. Mirroring the root disk can allow the system to continue normal operation when a root disk failure occurs, and help avoid this downtime. Disk Arrays using RAID Levels and Multiple Data Paths An alternate method of achieving protection for your data is to employ a disk array with hardware RAID levels that provide data redundancy, such as RAID Level 1 or RAID Level 5.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage set up to trigger a package failover or to report disk failure events to a Serviceguard, to another application, or by email. For more information, refer to the manual Using High Availability Monitors (B5736-90046), available at http://docs.hp.com -> High Availability. Replacement of Failed Disk Mechanisms Mirroring provides data protection, but after a disk failure, the failed disk must be replaced.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-2 Mirrored Disks Connected for High Availability Figure 2-3 below shows a similar cluster with a disk array connected to each node on two I/O channels. See “About Multipathing” on page 43.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-3 Cluster with High Availability Disk Array Details on logical volume configuration for Serviceguard are in the chapter “Building an HA Cluster Configuration.” Sample Fibre Channel Disk Configuration In Figure 2-4 below, the root disks are shown with simple mirroring, but the shared storage is now accessed via redundant Fibre Channel switches attached to a disk array.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-4 Cluster with Fibre Channel Switched Disk Array This type of configuration uses native HP-UX or other multipathing software; see “About Multipathing” on page 43.
Understanding Serviceguard Hardware Configurations Redundant Power Supplies Redundant Power Supplies You can extend the availability of your hardware by providing battery backup to your nodes and disks. HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, can provide this protection from momentary power loss. Disks should be attached to power circuits in such a way that mirror copies are attached to different power sources.
Understanding Serviceguard Hardware Configurations Larger Clusters Larger Clusters You can create clusters of up to 16 nodes with Serviceguard. Clusters of up to 16 nodes may be built by connecting individual SPUs via Ethernet. The possibility of configuring a cluster consisting of 16 nodes does not mean that all types of cluster configuration behave in the same way in a 16-node configuration.
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-5 Eight-Node Active/Standby Cluster Point to Point Connections to Storage Devices Some storage devices allow point-to-point connection to a large number of host nodes without using a shared SCSI bus. An example is shown in Figure 2-11, a cluster consisting of eight nodes with a SCSI interconnect. The nodes access shared data on an XP or EMC disk array configured with 16 SCSI I/O ports.
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-6 Eight-Node Cluster with XP or EMC Disk Array Fibre Channel switched configurations also are supported using either an arbitrated loop or fabric login topology. For additional information about supported cluster configurations, refer to the HP Unix Servers Configuration Guide, available through your HP representative.
Understanding Serviceguard Hardware Configurations Larger Clusters 52 Chapter 2
Understanding Serviceguard Software Components 3 Understanding Serviceguard Software Components This chapter gives a broad overview of how the Serviceguard software components work.
Understanding Serviceguard Software Components Serviceguard Architecture Serviceguard Architecture The following figure shows the main software components used by Serviceguard. This chapter discusses these components in some detail. NOTE Veritas CFS may not yet be supported on the version of HP-UX you are running; see “About Veritas CFS and CVM from Symantec” on page 29.
Understanding Serviceguard Software Components Serviceguard Architecture • /usr/lbin/cmclconfd—Serviceguard Configuration Daemon • /usr/lbin/cmcld—Serviceguard Cluster Daemon • /usr/lbin/cmfileassistd—Serviceguard File Management daemon • /usr/lbin/cmlogd—Serviceguard Syslog Log Daemon • /usr/lbin/cmlvmd—Cluster Logical Volume Manager Daemon • /opt/cmom/lbin/cmomd—Cluster Object Manager Daemon • /usr/lbin/cmsnmpd—Cluster SNMP subagent (optionally running) • /usr/lbin/cmsrvassistd—Serviceguard
Understanding Serviceguard Software Components Serviceguard Architecture hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c Then force inetd to re-read inetd.conf: /usr/sbin/inetd -c You can check that this did in fact disable Serviceguard by trying the following command: cmquerycl -n nodename where nodename is the name of the local system. If the command fails, you have successfully disabled Serviceguard.
Understanding Serviceguard Software Components Serviceguard Architecture from the expiration of the safety timer, messages will be written to /var/adm/syslog/syslog.log and the kernel’s message buffer, and a system dump is performed.
Understanding Serviceguard Software Components Serviceguard Architecture Cluster Object Manager Daemon: cmomd This daemon is responsible for providing information about the cluster to clients—external products or tools that depend on knowledge of the state of cluster objects. Clients send queries to the object manager and receive responses from it (this communication is done indirectly, through a Serviceguard API).
Understanding Serviceguard Software Components Serviceguard Architecture For services, cmcld monitors the service process and, depending on the number of service retries, cmcld either restarts the service through cmsrvassistd or it causes the package to halt and moves the package to an available alternate node. Quorum Server Daemon: qs Using a quorum server is one way to break a tie and establish a quorum when the cluster is re-forming; the other way is to use a cluster lock.
Understanding Serviceguard Software Components Serviceguard Architecture Utility Daemon: cmlockd Runs on every node on which cmcld is running (though currently not actually used by Serviceguard on HP-UX systems). CFS Components The HP Serviceguard Storage Management Suite offers additional components for interfacing with the Veritas Cluster File System on some current versions of HP-UX (see “About Veritas CFS and CVM from Symantec” on page 29). Documents for the management suite are posted on http://docs.
Understanding Serviceguard Software Components Serviceguard Architecture • Chapter 3 cmvxping - The Serviceguard-to-Veritas daemon activates certain subsystems of the Veritas Clustered File System product. (Only present when Veritas CFS is installed.
Understanding Serviceguard Software Components How the Cluster Manager Works How the Cluster Manager Works The cluster manager is used to initialize a cluster, to monitor the health of the cluster, to recognize node failure if it should occur, and to regulate the re-formation of the cluster when a node joins or leaves the cluster. The cluster manager operates as a daemon process that runs on each node.
Understanding Serviceguard Software Components How the Cluster Manager Works (described further in this chapter, in “How the Package Manager Works” on page 71). Failover packages that were running on nodes that are no longer in the new cluster are transferred to their adoptive nodes. Note that if there is a transitory loss of heartbeat, the cluster may re-form with the same nodes as before.
Understanding Serviceguard Software Components How the Cluster Manager Works Manual Startup of Entire Cluster A manual startup forms a cluster out of all the nodes in the cluster configuration. Manual startup is normally done the first time you bring up the cluster, after cluster-wide maintenance or upgrade, or after reconfiguration. Before startup, the same binary cluster configuration file must exist on all nodes in the cluster.
Understanding Serviceguard Software Components How the Cluster Manager Works • A node halts because of a package failure. • A node halts because of a service failure. • Heavy network traffic prohibited the heartbeat signal from being received by the cluster. • The heartbeat network failed, and another network is not configured to carry heartbeat. Typically, re-formation results in a cluster with a different composition.
Understanding Serviceguard Software Components How the Cluster Manager Works possibility of two sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-cluster with greater than 50% of the nodes will form the new cluster, and the cluster lock is not used. If you have a two-node cluster, you are required to configure a cluster lock.
Understanding Serviceguard Software Components How the Cluster Manager Works Figure 3-2 Lock Disk or Lock LUN Operation Serviceguard periodically checks the health of the lock disk or LUN and writes messages to the syslog file if the device fails the health check. This file should be monitored for early detection of lock disk problems. If you are using a lock disk, you can choose between two lock disk options—a single or dual lock disk—based on the kind of high availability configuration you are building.
Understanding Serviceguard Software Components How the Cluster Manager Works either node, and a lock disk must be an external disk. For three or four node clusters, the disk should not share a power circuit with 50% or more of the nodes.
Understanding Serviceguard Software Components How the Cluster Manager Works If one of the dual lock disks fails, Serviceguard will detect this when it carries out periodic checking, and it will write a message to the syslog file. After the loss of one of the lock disks, the failure of a cluster node could cause the cluster to go down if the remaining node(s) cannot access the surviving cluster lock disk. Use of the Quorum Server as the Cluster Lock A quorum server can be used in clusters of any size.
Understanding Serviceguard Software Components How the Cluster Manager Works Figure 3-3 Quorum Server Operation The quorum server runs on a separate system, and can provide quorum services for multiple clusters. No Cluster Lock Normally, you should not configure a cluster of three or fewer nodes without a cluster lock. In two-node clusters, a cluster lock is required.
Understanding Serviceguard Software Components How the Package Manager Works How the Package Manager Works Packages are the means by which Serviceguard starts and halts configured applications. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator is known as the package coordinator.
Understanding Serviceguard Software Components How the Package Manager Works Failover Packages A failover package starts up on an appropriate node when the cluster starts. A package failover takes place when the package coordinator initiates the start of a package on a new node. A package failover involves both halting the existing package (in the case of a service, network, or resource failure), and starting the new instance of the package.
Understanding Serviceguard Software Components How the Package Manager Works Deciding When and Where to Run and Halt Failover Packages The package configuration file assigns a name to the package and includes a list of the nodes on which the package can run. Failover packages list the nodes in order of priority (i.e., the first node in the list is the highest priority node). In addition, failover packages’ files contain three parameters that determine failover behavior.
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-5 Before Package Switching Figure 3-6 shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2. Package 1's IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package1’s disk and Package2’s disk.
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-6 After Package Switching Failover Policy The Package Manager selects a node for a failover package to run on based on the priority list included in the package configuration file together with the failover_policy parameter, also in the configuration file.
Understanding Serviceguard Software Components How the Package Manager Works If you use min_package_node as the value for the failover policy, the package will start up on the node that is currently running the fewest other packages. (Note that this does not mean the lightest load; the only thing that is checked is the number of packages currently running on the node.
Understanding Serviceguard Software Components How the Package Manager Works If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8, which shows a failure on node 2: Figure 3-8 Rotating Standby Configuration after Failover NOTE Using the min_package_node policy, when node 2 is repaired and brought back into the cluster, it will then be running the fewest packages, and thus will become the new standby node.
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-9 CONFIGURED_NODE Policy Packages after Failover If you use configured_node as the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-10 Automatic Failback Configuration before Failover Table 3-2 Node Lists in Sample Cluster Package Name NODE_NAME List FAILOVER POLICY FAILBACK POLICY pkgA node1, node4 CONFIGURED_NODE AUTOMATIC pkgB node2, node4 CONFIGURED_NODE AUTOMATIC pkgC node3, node4 CONFIGURED_NODE AUTOMATIC Node1 panics, and after the cluster reforms, pkgA starts running on node4: Figure 3-11 Chapter 3 Automatic Failback Config
Understanding Serviceguard Software Components How the Package Manager Works After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1. Figure 3-12 Automatic Failback Configuration After Restart of Node 1 NOTE Setting the failback_policy to automatic can result in a package failback and application outage during a critical production period.
Understanding Serviceguard Software Components How the Package Manager Works For full details of the current parameters and their default values, see Chapter 6, “Configuring Packages and Their Services,” on page 271, and the package configuration file template itself. Using the Event Monitoring Service Basic package resources include cluster nodes, LAN interfaces, and services, which are the individual processes within an application. All of these are monitored by Serviceguard directly.
Understanding Serviceguard Software Components How the Package Manager Works • File system utilization • LAN health Once a monitor is configured as a package resource dependency, the monitor will notify the package manager if an event occurs showing that a resource is down. The package may then be failed over to an adoptive node. The EMS HA Monitors can also be used to report monitored events to a target application such as OpenView IT/Operations for graphical display or for operator notification.
Understanding Serviceguard Software Components How the Package Manager Works Table 3-3 Package Failover Behavior (Continued) Options in Serviceguard Manager Switching Behavior Parameters in Configuration File Package fails over to the node with the fewest active packages. • Failover Policy set to minimum package node. • failover_policy set to min_package_node. Package fails over to the node that is next on the list of nodes. (Default) • Failover Policy set to configured node.
Understanding Serviceguard Software Components How the Package Manager Works Table 3-3 Package Failover Behavior (Continued) Options in Serviceguard Manager Switching Behavior All packages switch following a system reset on the node when any service fails. An attempt is first made to reboot the system prior to the system reset. 84 • Service Failfast set for all services. • Auto Run set for all packages. Parameters in Configuration File • service_fail_fast_enabled set to yes for all services.
Understanding Serviceguard Software Components How Packages Run How Packages Run Packages are the means by which Serviceguard starts and halts configured applications. Failover packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 150 packages per cluster and a total of 900 services per cluster.
Understanding Serviceguard Software Components How Packages Run package, that node switching is disabled for the package on particular nodes, or that the package has a dependency that is not being met. When a package has failed on one node and is enabled to switch to another node, it will start up automatically in a new location where its dependencies are met. This process is known as package switching, or remote switching.
Understanding Serviceguard Software Components How Packages Run Figure 3-13 Legacy Package Time Line Showing Important Events The following are the most important moments in a package’s life: 1. Before the control script starts. (For modular packages, this is the master control script.) 2. During run script execution. (For modular packages, during control script execution to start the package.) 3. While services are running 4.
Understanding Serviceguard Software Components How Packages Run Before the Control Script Starts First, a node is selected. This node must be in the package’s node list, it must conform to the package’s failover policy, and any resources required by the package must be available on the chosen node. One resource is the subnet that is monitored for the package. If the subnet is not available, the package cannot start on this node.
Understanding Serviceguard Software Components How Packages Run Figure 3-14 Package Time Line (Legacy Package) At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error. NOTE This diagram is specific to legacy packages. Modular packages also run external scripts and “pre-scripts” as explained above.
Understanding Serviceguard Software Components How Packages Run the package is running. If a number of Restarts is specified for a service in the package control script, the service may be restarted if the restart count allows it, without re-running the package run script. Normal and Abnormal Exits from the Run Script Exit codes on leaving the run script determine what happens to the package next.
Understanding Serviceguard Software Components How Packages Run legacy package; for more information about configuring services in modular packages, see the discussion starting on page 289, and the comments in the package configuration template file.
Understanding Serviceguard Software Components How Packages Run When a Service, Subnet, or Monitored Resource Fails, or a Dependency is Not Met What happens when something goes wrong? If a service fails and there are no more restarts, if a subnet fails and there are no standbys, if a configured resource fails, or if a configured dependency on a special-purpose package is not met, then a failover package will halt on its current node and, depending on the setting of the package switching flags, may be resta
Understanding Serviceguard Software Components How Packages Run NOTE If you use cmhaltpkg command with the -n option, the package is halted only if it is running on that node. The cmmodpkg command cannot be used to halt a package, but it can disable switching either on particular nodes or on all nodes. A package can continue running when its switching has been disabled, but it will not be able to start on other nodes if it stops running on its current node.
Understanding Serviceguard Software Components How Packages Run Figure 3-15 Legacy Package Time Line for Halt Script Execution At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). Also, if the halt script execution is not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script. During halt script execution, messages are written to a log file.
Understanding Serviceguard Software Components How Packages Run • 1—abnormal exit, also known as no_restart exit. The package did not halt normally. Services are killed, and the package is disabled globally. It is not disabled on the current node, however. • Timeout—Another type of exit occurs when the halt_script_timeout is exceeded. In this scenario, the package is killed and disabled globally. It is not disabled on the current node, however.
Understanding Serviceguard Software Components How Packages Run Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Node Failfast Enabled Service Failfast Enabled HP-UX Status on Primary after Error Run Script Exit 2 YES Either Setting system reset No N/A (system reset) Yes Run Script Exit 2 NO Either Setting Running No No Yes Run Script Timeout YES
Understanding Serviceguard Software Components How Packages Run Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Node Failfast Enabled Service Failfast Enabled HP-UX Status on Primary after Error Service Failure Either Setting NO Running Yes No Yes Loss of Network YES Either Setting system reset No N/A (system reset) Yes Loss of Network NO Eithe
Understanding Serviceguard Software Components How the Network Manager Works How the Network Manager Works The purpose of the network manager is to detect and recover from network card and cable failures so that network services remain highly available to clients. In practice, this means assigning IP addresses for each package to the primary LAN interface card on the node where the package is running and monitoring the health of all interfaces, switching them when necessary.
Understanding Serviceguard Software Components How the Network Manager Works Both stationary and relocatable IP addresses will switch to a standby LAN interface in the event of a LAN card failure. In addition, relocatable addresses (but not stationary addresses) can be taken over by an adoptive node if control of the package is transferred. This means that applications can access the package via its relocatable address without knowing which node the package currently resides on.
Understanding Serviceguard Software Components How the Network Manager Works Monitoring LAN Interfaces and Detecting Failure At regular intervals, Serviceguard polls all the network interface cards specified in the cluster configuration file. Network failures are detected within each single node in the following manner. One interface on the node is assigned to be the poller.
Understanding Serviceguard Software Components How the Network Manager Works This option is not suitable for all environments. Before choosing it, be sure these conditions are met: — All bridged nets in the cluster should have more than two interfaces each. — Each primary interface should have at least one standby interface, and it should be connected to a standby switch. — The primary switch should be directly connected to its standby.
Understanding Serviceguard Software Components How the Network Manager Works Within the Ethernet family, local switching is supported in the following configurations: • 1000Base-SX and 1000Base-T • 1000Base-T or 1000BaseSX and 100Base-T On HP-UX 11i, however, Jumbo Frames can only be used when the 1000Base-T or 1000Base-SX cards are configured. The 100Base-T and 10Base-T do not support Jumbo Frames.
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-16 Cluster Before Local Network Switching Node 1 and Node 2 are communicating over LAN segment 2. LAN segment 1 is a standby. In Figure 3-17, we see what would happen if the LAN segment 2 network interface card on Node 1 were to fail.
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-17 Cluster After Local Network Switching As the standby interface takes over, IP addresses will be switched to the hardware path associated with the standby interface. The switch is transparent at the TCP/IP level. All applications continue to run on their original nodes. During this time, IP traffic on Node 1 will be delayed as the transfer occurs.
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-18 Local Switching After Cable Failure Local network switching will work with a cluster containing one or more nodes. You may wish to design a single-node cluster in order to take advantage of this local network switching feature in situations where you need only one node and do not wish to set up a more complex cluster.
Understanding Serviceguard Software Components How the Network Manager Works Remote Switching A remote switch (that is, a package switch) involves moving packages and their associated IP addresses to a new system. The new system must already have the same subnetwork configured and working properly, otherwise the packages will not be started. With remote switching, TCP connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically.
Understanding Serviceguard Software Components How the Network Manager Works recovery for environments which require high availability. Port aggregation capability is sometimes referred to as link aggregation or trunking. APA is also supported on dual-stack kernel. Once enabled, each link aggregate can be viewed as a single logical link of multiple physical ports with only one IP and MAC address.
Understanding Serviceguard Software Components How the Network Manager Works Both the Single and Dual ported LANs in the non-aggregated configuration have four LAN cards, each associated with a separate non-aggregated IP address and MAC address, and each with its own LAN name (lan0, lan1, lan2, lan3). When these ports are aggregated all four ports are associated with a single IP address and MAC address.
Understanding Serviceguard Software Components How the Network Manager Works failover of VLAN interfaces when failure is detected. Failure of a VLAN interface is typically the result of the failure of the underlying physical NIC port or aggregated (APA) ports. Configuration Restrictions HP-UX allows up to 1024 VLANs to be created from a physical NIC port.
Understanding Serviceguard Software Components How the Network Manager Works 1. VLAN heartbeat networks must be configured on separate physical NICs or APA aggregates, to avoid single points of failure. 2. Heartbeats are still recommended on all cluster networks, including VLANs. 3. If you are using VLANs, but decide not to use VLANs for heartbeat networks, heartbeats are recommended for all other physical networks or APA aggregates specified in the cluster configuration file.
Understanding Serviceguard Software Components Volume Managers for Data Storage Volume Managers for Data Storage A volume manager is a tool that lets you create units of disk storage known as storage groups. Storage groups contain logical volumes for use on single systems and in high availability clusters. In Serviceguard clusters, storage groups are activated by package control scripts.
Understanding Serviceguard Software Components Volume Managers for Data Storage For instructions on migrating a system to agile addressing, see the white paper Migrating from HP-UX 11i v2 to HP-UX 11i v3 at http://docs.hp.com. NOTE It is possible, though not a best practice, to use legacy DSFs (that is, DSFs using the older naming convention) on some nodes after migrating to agile addressing on others; this allows you to migrate different nodes at different times, if necessary.
Understanding Serviceguard Software Components Volume Managers for Data Storage Each of two nodes also has two (non-shared) internal disks which are used for the root file system, swap etc. Each shared storage unit has three disks, The device file names of the three disks on one of the two storage units are c0t0d0, c0t1d0, and c0t2d0. On the other, they are c1t0d0, c1t1d0, and c1t2d0.
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-21 Mirrored Physical Disks Figure 3-22 shows the mirrors configured into LVM volume groups, shown in the figure as /dev/vgpkgA and /dev/vgpkgB. The volume groups are activated by Serviceguard packages for use by highly available applications.
Understanding Serviceguard Software Components Volume Managers for Data Storage Examples of Storage on Disk Arrays Figure 3-23 shows an illustration of storage configured on a disk array. Physical disks are configured by an array utility program into logical units or LUNs which are then seen by the operating system. Figure 3-23 Physical Disks Combined into LUNs NOTE LUN definition is normally done using utility programs provided by the disk array manufacturer.
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-24 Multiple Paths to LUNs Finally, the multiple paths are configured into volume groups as shown in Figure 3-25.
Understanding Serviceguard Software Components Volume Managers for Data Storage Types of Volume Manager Serviceguard allows a choice of volume managers for data storage: • HP-UX Logical Volume Manager (LVM) and (optionally) Mirrordisk/UX • Veritas Volume Manager for HP-UX (VxVM)—Base and add-on Products • Veritas Cluster Volume Manager for HP-UX (CVM), if available (see “About Veritas CFS and CVM from Symantec” on page 29) Separate sections in Chapters 5 and 6 explain how to configure cluster storage
Understanding Serviceguard Software Components Volume Managers for Data Storage Veritas Volume Manager (VxVM) The Base Veritas Volume Manager for HP-UX (Base-VXVM) is provided at no additional cost with HP-UX 11i. This includes basic volume manager features, including a Java-based GUI, known as VEA. It is possible to configure cluster storage for Serviceguard with only Base-VXVM. However, only a limited set of features is available.
Understanding Serviceguard Software Components Volume Managers for Data Storage Veritas Cluster Volume Manager (CVM) NOTE CVM (and CFS - Cluster File System) are supported on some, but not all, current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard). You may choose to configure cluster storage with the Veritas Cluster Volume Manager (CVM) instead of the Volume Manager (VxVM).
Understanding Serviceguard Software Components Volume Managers for Data Storage CVM 4.1 and later can be used with Veritas Cluster File System (CFS) in Serviceguard. Several of the HP Serviceguard Storage Management Suite bundles include features to enable both CVM and CFS.
Understanding Serviceguard Software Components Volume Managers for Data Storage Redundant Heartbeat Subnet Required HP recommends that you configure all subnets that connect cluster nodes as heartbeat networks; this increases protection against multiple faults at no additional cost. Heartbeat configurations are configured differently depending on whether you are using CVM 3.5 or 4.1 and later.
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Logical Volume Manager (LVM) Mirrordisk/UX Shared Logical Volume Manager (SLVM) 122 Advantages • Software is provided with all versions of HP-UX. • Provides up to 3-way mirroring using optional Mirrordisk/UX software. • Dynamic multipathing (DMP) is active by default as of HP-UX 11i v3.
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Base-VxVM Veritas Volume Manager— Full VxVM product B9116AA (VxVM 3.5) B9116BA (VxVM 4.1) B9116CA (VxVM 5.0) Chapter 3 Advantages Tradeoffs • Software is supplied free with HP-UX 11i releases. • Cannot be used for a cluster lock • Java-based administration through graphical user interface. • root/boot disk supported only on VxVM 3.
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Veritas Cluster Volume Manager— B9117AA (CVM 3.5) B9117BA (CVM 4.1) B9117CA (CVM 5.0) Advantages • Provides volume configuration propagation. • Disk groups must be configured on a master node • Supports cluster shareable disk groups. • • Package startup time is faster than with VxVM. CVM can only be used with up to 8 cluster nodes.
Understanding Serviceguard Software Components Responses to Failures Responses to Failures Serviceguard responds to different kinds of failures in specific ways. For most hardware failures, the response is not user-configurable, but for package and service failures, you can choose the system’s response, within limits.
Understanding Serviceguard Software Components Responses to Failures 2. If the node cannot get a quorum (if it cannot get the cluster lock) then 3. The node halts (system reset). Example Situation. Assume a two-node cluster, with Package1 running on SystemA and Package2 running on SystemB. Volume group vg01 is exclusively activated on SystemA; volume group vg02 is exclusively activated on SystemB. Package IP addresses are assigned to SystemA and SystemB respectively. Failure.
Understanding Serviceguard Software Components Responses to Failures For more information on cluster failover, see the white paper Optimizing Failover Time in a Serviceguard Environment at http://www.docs.hp.com->High Availability->Serviceguard->White Papers.
Understanding Serviceguard Software Components Responses to Failures Serviceguard does not respond directly to power failures, although a loss of power to an individual cluster component may appear to Serviceguard like the failure of that component, and will result in the appropriate switching behavior. Power protection is provided by HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust.
Understanding Serviceguard Software Components Responses to Failures NOTE In a very few cases, Serviceguard will attempt to reboot the system before a system reset when this behavior is specified. If there is enough time to flush the buffers in the buffer cache, the reboot succeeds, and a system reset does not take place. Either way, the system will be guaranteed to come down within a predetermined number of seconds.
Understanding Serviceguard Software Components Responses to Failures 130 Chapter 3
Planning and Documenting an HA Cluster 4 Planning and Documenting an HA Cluster Building a Serviceguard cluster begins with a planning phase in which you gather and record information about all the hardware and software components of the configuration.
Planning and Documenting an HA Cluster General Planning General Planning A clear understanding of your high availability objectives will help you to define your hardware requirements and design your system. Use the following questions as a guide for general planning: 1. What applications must continue to be available in the event of a failure? 2. What system resources (processing power, networking, SPU, memory, disk space) are needed to support these applications? 3.
Planning and Documenting an HA Cluster General Planning additional disk hardware for shared data storage. If you intend to expand your cluster without the need to bring it down, careful planning of the initial configuration is required. Use the following guidelines: • Remember the rules for cluster locks when considering expansion. A one-node cluster does not require a cluster lock. A two-node cluster must have a cluster lock. In clusters larger than 3 nodes, a cluster lock is strongly recommended.
Planning and Documenting an HA Cluster Hardware Planning Hardware Planning Hardware planning requires examining the physical hardware itself. One useful procedure is to sketch the hardware configuration in a diagram that shows adapter cards and buses, cabling, disks and peripherals. A sample diagram for a two-node cluster is shown in Figure 4-1. NOTE Under agile addressing, the storage units in this example would have names such as disk1, disk2, disk3, etc.
Planning and Documenting an HA Cluster Hardware Planning Create a similar sketch for your own cluster, and record the information on the Hardware Worksheet. Indicate which device adapters occupy which slots, and determine the bus address for each adapter. Update the details as you do the cluster configuration (described in Chapter 5). Use one form for each SPU.
Planning and Documenting an HA Cluster Hardware Planning Serviceguard communication relies on the exchange of DLPI (Data Link Provider Interface) traffic at the data link layer and the UDP/TCP (User Datagram Protocol/Transmission Control Protocol) traffic at the Transport layer between cluster nodes. LAN Information While a minimum of one LAN interface per subnet is required, at least two LAN interfaces, one primary and one or more standby, are needed to eliminate single points of network failure.
Planning and Documenting an HA Cluster Hardware Planning When there is a primary and a standby network card, Serviceguard needs to determine when a card has failed, so it knows whether to fail traffic over to the other card. The configuration file specifies one of two ways to decide when the network interface card has failed: • INOUT • INONLY_OR_INOUT The default is INOUT. See “Monitoring LAN Interfaces and Detecting Failure” on page 100 for more information.
Planning and Documenting an HA Cluster Hardware Planning SCSI address must be uniquely set on the interface cards in all four systems, and must be high priority addresses.
Planning and Documenting an HA Cluster Hardware Planning Disk I/O Information This part of the worksheet lets you indicate where disk device adapters are installed. Enter the following items on the worksheet for each disk connected to each disk device adapter on the node: Bus Type Indicate the type of bus. Supported busses are Fibre Channel and SCSI. Slot Number Indicate the slot number in which the interface card is inserted in the backplane of the computer.
Planning and Documenting an HA Cluster Hardware Planning Hardware Configuration Worksheet The following worksheet will help you organize and record your specific cluster hardware configuration. This worksheet is an example; blank worksheets are in Appendix F. Make as many copies as you need. Complete the worksheet and keep it for future reference.
Planning and Documenting an HA Cluster Power Supply Planning Power Supply Planning There are two sources of power for your cluster which you will have to consider in your design: line power and uninterruptible power sources (UPS). Loss of a power circuit should not bring down the cluster. Frequently, servers, mass storage devices, and other hardware have two or three separate power supplies, so they can survive the loss of power to one or more power supplies or power circuits.
Planning and Documenting an HA Cluster Power Supply Planning Other Unit Enter the number of any other unit. Power Supply Enter the power supply unit number of the UPS to which the host or other device is connected. Be sure to follow UPS and cabinet power limits as well as SPU power limits. Power Supply Configuration Worksheet The following worksheet will help you organize and record your specific power supply configuration. This worksheet is an example; blank worksheets are in Appendix F.
Planning and Documenting an HA Cluster Power Supply Planning Unit Name __________________________ Power Supply _____________________ Unit Name __________________________ Power Supply _____________________ Chapter 4 143
Planning and Documenting an HA Cluster Cluster Lock Planning Cluster Lock Planning The purpose of the cluster lock is to ensure that only one new cluster is formed in the event that exactly half of the previously clustered nodes try to form a new cluster. It is critical that only one new cluster is formed and that it alone has access to the disks specified in its packages. You can specify an LVM lock disk, a lock LUN, or a quorum server as the cluster lock.
Planning and Documenting an HA Cluster Cluster Lock Planning Planning for Expansion Bear in mind that a cluster with more than 4 nodes cannot use a lock disk or lock LUN. Thus, if you plan to add enough nodes to bring the total to more than 4, you should use a quorum server. Using a Quorum Server The Quorum Server is described in detail under “Use of the Quorum Server as the Cluster Lock” on page 69. See also “Cluster Lock” on page 65.
Planning and Documenting an HA Cluster Cluster Lock Planning Quorum Server Worksheet Use the Quorum Server Worksheet to identify a quorum server for use with one or more clusters. You should also enter quorum server host and timing parameters on the Cluster Configuration Worksheet. Blank worksheets are in Appendix F. Make as many copies as you need. Fill out the worksheet and keep it for future reference. On the QS worksheet, enter the following: Quorum Server Host Enter the host name for the quorum server.
Planning and Documenting an HA Cluster Cluster Lock Planning Host Names ____________________________________________ Chapter 4 147
Planning and Documenting an HA Cluster LVM Planning LVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM), or using Veritas VxVM software (and CVM if available) as described in the next section. When designing your disk layout using LVM, you should consider the following: • The root disk should belong to its own volume group.
Planning and Documenting an HA Cluster LVM Planning LVM Worksheet The following worksheet will help you organize and record your specific physical disk configuration. This worksheet is an example; blank worksheets are in Appendix F. Make as many copies as you need. Fill out the worksheet and keep it for future reference. This worksheet only includes volume groups and physical volumes.
Planning and Documenting an HA Cluster LVM Planning 150 Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Chapter 4
Planning and Documenting an HA Cluster CVM and VxVM Planning CVM and VxVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM, described in the previous section), or using Veritas VxVM and CVM software. NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard).
Planning and Documenting an HA Cluster CVM and VxVM Planning • A cluster lock disk must be configured into an LVM volume group; you cannot use a VxVM or CVM disk group. (See “Cluster Lock Planning” on page 144 for information about cluster lock options.) • VxVM disk group names should not be entered into the cluster configuration file. These names are not inserted into the cluster configuration file by cmquerycl.
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Configuration Planning A cluster should be designed to provide the quickest possible recovery from failures. The actual time required to recover from a failure depends on several factors: • The length of the cluster heartbeat interval and node timeout. See the parameter descriptions for HEARTBEAT_INTERVAL and NODE_TIMEOUT under “Cluster Configuration Parameters” on page 154 for recommendations.
Planning and Documenting an HA Cluster Cluster Configuration Planning Heartbeat Subnet and Re-formation Time The speed of cluster re-formation is partially dependent on the type of heartbeat network that is used. If two or more heartbeat subnets are used, the one with the fastest failover time is used. Cluster Configuration Parameters You need to define a set of cluster parameters. These are stored in the binary cluster configuration file, which is distributed to each node in the cluster.
Planning and Documenting an HA Cluster Cluster Configuration Planning QS_HOST The name or IP address of a host system outside the current cluster that is providing quorum server functionality. This parameter is only used when you employ a quorum server for tie-breaking services in the cluster. QS_POLLING_INTERVAL The time (in microseconds) between attempts to contact the quorum server to make sure it is running. Default is 300,000,000 microseconds (5 minutes).
Planning and Documenting an HA Cluster Cluster Configuration Planning NODE_NAME The hostname of each system that will be a node in the cluster. Do not use the full domain name. For example, enter ftsys9, not ftsys9.cup.hp.com. A Serviceguard cluster can contain up to 16 nodes (though not in all third-party configurations; see “Veritas Cluster Volume Manager (CVM)” on page 119, and the latest Release Notes for your version of Serviceguard).
Planning and Documenting an HA Cluster Cluster Configuration Planning Heartbeat configuration requirements: A minimum Serviceguard configuration on HP-UX 11i v2 or 11i v3 needs two network interface cards for the heartbeat in all cases, using one of the following configurations: • Two heartbeat subnets; or • One heartbeat subnet with a standby; or • One heartbeat subnet using APA with two physical ports in hot standby mode or LAN monitor mode.
Planning and Documenting an HA Cluster Cluster Configuration Planning The use of a private heartbeat network is not advisable if you plan to use Remote Procedure Call (RPC) protocols and services. RPC assumes that each network adapter device or I/O card is connected to a route-able network. An isolated or private heartbeat LAN is not route-able, and could cause an RPC request-reply, directed to that LAN, to risk time-out without being serviced.
Planning and Documenting an HA Cluster Cluster Configuration Planning You cannot create a dual cluster-lock configuration using LUNs. FIRST_CLUSTER_LOCK_PV, SECOND_CLUSTER_LOCK_PV The name of the physical volume within the Lock Volume Group that will have the cluster lock written on it. Used on only if a lock disk is used for tie-breaking services. This parameter is FIRST_CLUSTER_LOCK_PV for the first physical lock volume and SECOND_CLUSTER_LOCK_PV for the second physical lock volume.
Planning and Documenting an HA Cluster Cluster Configuration Planning NODE_TIMEOUT The time, in microseconds, after which a node may decide that another node has become unavailable and initiate cluster reformation. Maximum value: 60,000,000 microseconds (60 seconds). Minimum value: 2 * HEARTBEAT_INTERVAL Default value: 2,000,000 microseconds (2 seconds).
Planning and Documenting an HA Cluster Cluster Configuration Planning The amount of time a node waits before it stops trying to join a cluster during automatic cluster startup. In the cluster configuration file, this parameter is AUTO_START_TIMEOUT. All nodes wait this amount of time for other nodes to begin startup before the cluster completes the operation. The time should be selected based on the slowest boot time in the cluster.
Planning and Documenting an HA Cluster Cluster Configuration Planning Access Control Policies (also known as Role Based Access) For each policy, specify USER_NAME, USER_HOST, and USER_ROLE. Policies set in the configuration file of a cluster and its packages must not be conflicting or redundant. For more information, see “Access Roles” on page 204. FAILOVER_OPTIMIZATION You will only see this parameter if you have installed Serviceguard Extension for Faster Failover, a separately purchased product.
Planning and Documenting an HA Cluster Cluster Configuration Planning Name and Nodes: =============================================================================== Cluster Name: ___ourcluster_______________ Node Names: ____node1_________________ ____node2_________________ Maximum Configured Packages: ______12________ =============================================================================== Quorum Server Data: =============================================================================== Quorum S
Planning and Documenting an HA Cluster Cluster Configuration Planning | Disk Unit No: ________ | Power Supply No: ________ =========================================================================== Cluster Lock LUN: Pathname on Node 1: ___________________ Pathname on Node 2: ___________________ Pathname on Node 3: ___________________ Pathname on Node 4: ___________________ =========================================================================== Timing Parameters: ====================================
Planning and Documenting an HA Cluster Package Configuration Planning Package Configuration Planning Planning for packages involves assembling information about each group of highly available services. NOTE As of Serviceguard A.11.18, there is a new and simpler way to configure packages.
Planning and Documenting an HA Cluster Package Configuration Planning failed node are deactivated on the failed node and activated on the adoptive node. In order for this to happen, you must configure the volume groups so that they can be transferred from the failed node to the adoptive node.
Planning and Documenting an HA Cluster Package Configuration Planning NOTE Do not use /etc/fstab to mount file systems that are used by Serviceguard packages. Planning Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS) NOTE CVM and CFS are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard).
Planning and Documenting an HA Cluster Package Configuration Planning CVM 4.1 and later and the SG-CFS-pkg require you to configure multiple heartbeat networks, or a single heartbeat with a standby. Using APA, Infiniband, or VLAN interfaces as the heartbeat network is not supported. CVM 4.1 and later with CFS CFS (Veritas Cluster File System) is supported for use with Veritas Cluster Volume Manager Version 4.1 and later. The system multi-node package SG-CFS-pkg manages the cluster’s volumes.
Planning and Documenting an HA Cluster Package Configuration Planning CAUTION Once you create the disk group and mount point packages, it is critical that you administer the cluster with the cfs commands, including cfsdgadm, cfsmntadm, cfsmount, and cfsumount. If you use the general commands such as mount and umount, it could cause serious problems such as writing to the local file system instead of the cluster file system.
Planning and Documenting an HA Cluster Package Configuration Planning When adding packages, be sure not to exceed the value of max_configured_packages as defined in the cluster configuration file. (see “Cluster Configuration Parameters” on page 154). You can modify this parameter while the cluster is running if you need to.
Planning and Documenting an HA Cluster Package Configuration Planning Serviceguard will start up resource monitoring for automatic resources automatically when the Serviceguard cluster daemon starts up on the node. Serviceguard will not attempt to start deferred resource monitoring during node startup, but will start monitoring these resources when the package runs. The following is an example of how to configure deferred and automatic resources.
Planning and Documenting an HA Cluster Package Configuration Planning In Serviceguard A.11.17, package dependencies are supported only for use with certain applications specified by HP, such as the multi-node and system multi-node packages that HP supplies for use with Veritas Cluster File System (CFS) on systems that support it. As of Serviceguard A.11.
Planning and Documenting an HA Cluster Package Configuration Planning — a failover package whose failover_policy is configured_node. • pkg2 cannot be a failover package whose failover_policy is min_package_node. • pkg2’s node list (see node_name, page 282) must contain all of the nodes on pkg1’s. — Preferably the nodes should be listed in the same order if the dependency is between packages whose failover_policy is configured_node; cmcheckconf and cmapplyconf will warn you if they are not.
Planning and Documenting an HA Cluster Package Configuration Planning successor of a package depends on that package; in our example, pkg1 is a successor of pkg2; conversely pkg2 can be referred to as a predecessor of pkg1.) Dragging Rules The priority parameter gives you a way to influence the startup, failover, and failback behavior of a set of failover packages that have a configured_node failover_policy, when one or more of those packages depend on another or others.
Planning and Documenting an HA Cluster Package Configuration Planning HP recommends assigning values in increments of 20 so as to leave gaps in the sequence; otherwise you may have to shuffle all the existing priorities when assigning priority to a new package. no_priority, the default, is treated as a lower priority than any numerical value. 3.
Planning and Documenting an HA Cluster Package Configuration Planning If pkg1 depends on pkg2, and pkg1’s priority is lower than or equal to pkg2’s, pkg2’s node order dominates. Assuming pkg2’s node order is node1, node2, node3, then: • On startup: — pkg2 will start on node1, or node2 if node1 is not available or does not at present meet all of its dependencies, etc.
Planning and Documenting an HA Cluster Package Configuration Planning — if pkg2 has failed back to node1 and node1 does not meet all of pkg1’s dependencies, pkg1 will halt. If pkg1 depends on pkg2, and pkg1’s priority is higher than pkg2’s, pkg1’s node order dominates. Assuming pkg1’s node order is node1, node2, node3, then: • On startup: — pkg1 will select node1 to start on. — pkg2 will start on node1, provided it can run there (no matter where node1 appears on pkg2’s node_name list).
Planning and Documenting an HA Cluster Package Configuration Planning But you also need to weigh the relative importance of the packages. If pkg2 runs a database that is central to your business, you probably want it to run undisturbed, no matter what happens to application packages that depend on it. In this case, the database package should have the highest priority. Note that, if no priorities are set, the dragging rules favor a package that is depended on over a package that depends on it.
Planning and Documenting an HA Cluster Package Configuration Planning • During package execution, after volume-groups and file systems are activated, and IP addresses are assigned, and before the service and resource functions are executed; and again, in the reverse order, on package shutdown. These scripts are invoked by external_script (see page 297). The scripts are also run when the package is validated by cmcheckconf and cmapplyconf, and must have an entry point for validation; see below.
Planning and Documenting an HA Cluster Package Configuration Planning uses a parameter PEV_MONITORING_INTERVAL, defined in the package configuration file, to periodically poll the application it wants to monitor; for example: PEV_MONITORING_INTERVAL 60 At validation time, the sample script makes sure the PEV_MONITORING_INTERVAL and the monitoring service are configured properly; at start and stop time it prints out the interval to the log file. #!/bin/sh # Source utility functions.
Planning and Documenting an HA Cluster Package Configuration Planning sg_log 0 "ERROR: PEV_MONITORING_INTERVAL value ($PEV_MONITORING_INTERVAL) not within legal limits!" ret=1 fi # check monitoring service we are expecting for this package is configured while (( i < ${#SG_SERVICE_NAME[*]} )) do case ${SG_SERVICE_CMD[i]} in *monitor.
Planning and Documenting an HA Cluster Package Configuration Planning return 0 } typeset -i exit_val=0 case ${1} in start) start_command $* exit_val=$? ;; stop) stop_command $* exit_val=$? ;; validate) validate_command $* exit_val=$? ;; *) sg_log 0 "Unknown entry point $1" ;; esac exit $exit_val For more information about integrating an application with Serviceguard, see the white paper Framework for HP Serviceguard Toolkits, which includes a suite of customizable scripts.
Planning and Documenting an HA Cluster Package Configuration Planning To avoid this situation, it is a good idea to specify a run_script_timeout and halt_script_timeout for all packages, especially packages that use Serviceguard commands in their external scripts. If a timeout is not specified and your configuration has a command loop as described above, inconsistent results can occur, including a hung cluster.
Planning and Documenting an HA Cluster Package Configuration Planning the first adoptive node name, then the second adoptive node name, followed, in order of preference, by additional node names. In case of a failover, control of the package will be transferred to the next adoptive node name listed in the package configuration file, or (if that node is not available or cannot run the package at that time) to the next node in the list, and so on.
Planning and Documenting an HA Cluster Package Configuration Planning NOTE If the package halt function fails with “exit 1”, Serviceguard does not halt the node, but sets no_restart for the package, which disables package switching (auto_run), thereby preventing the package from starting on any adoptive node. Possible values are yes and no. The default is no. run_script_timeout and halt_script_timeout The time allowed for package to start and halt, respectively.
Planning and Documenting an HA Cluster Package Configuration Planning script_log_file The script log file documents the package run and halt activities. More details in Chapter 6, under script_log_file (see page 285). log_level Determines the amount of information printed to stdout when the package is validated, and to the script_log_file (see page 285) when the package is started and halted. Valid values are 0 through 5; more details in Chapter 6 under log_level (see page 285).
Planning and Documenting an HA Cluster Package Configuration Planning selected as the Package Failover Policy, the primary node is now running fewer packages than the current node. See also “About Package Dependencies” on page 171. priority Assigns a priority to the package. Used to decide whether a package can “drag” another package it depends on to another node. See “About Package Dependencies” on page 171. Valid values are 1 through 3000, or no_priority. The default is no_priority.
Planning and Documenting an HA Cluster Package Configuration Planning cluster_interconnect_subnet For use in a Serviceguard Extension for Real Application Cluster (SGeRAC) installation only. See the latest version of Using Serviceguard Extension for RAC at http://www.docs.hp.com -> High Availability -> Serviceguard Extension for Real Application Cluster (ServiceGuard OPS Edition) for more information. ip_subnet and ip_address Specifies an IP subnet and relocatable IP addresses used by the package.
Planning and Documenting an HA Cluster Package Configuration Planning If the parameter is set to yes, and the service fails, Serviceguard will halt the node on which the service is running (HP-UX system reset). (An attempt is made to reboot the node first.) The default is no. service_halt_timeout In the event of a service halt, Serviceguard will first send out a SIGTERM signal to terminate the service.
Planning and Documenting an HA Cluster Package Configuration Planning Default is 60 seconds. The minimum value is 1. (There is no practical maximum.). resource_start Determines whether Serviceguard should start monitoring this resource when the package starts, or when the node joins the cluster. See “Parameters for Configuring EMS Resources” on page 170. resource_up_value The criteria for judging whether a package resource has failed or not. You can configure a total of 15 resource_up_values per package.
Planning and Documenting an HA Cluster Package Configuration Planning Details in Chapter 6 under cvm_activation_cmd (see page 293), and in the package configuration file. NOTE vxvol_cmd Controls the method of mirror recovery for mirrored VxVM volumes. vg An LVM volume group that will be activated by the package. cvm_dg A CVM disk group used by the package (on systems that support CVM; see “About Veritas CFS and CVM from Symantec” on page 29).
Planning and Documenting an HA Cluster Package Configuration Planning Specifies the number of concurrent mounts and umounts to allow during package startup or shutdown. The default is 1; considering increasing it if the package will mount a large number of file systems. fs_mount_retry_count The number of mount retries for each file system. The default is zero. Details in Chapter 6 under fs_mount_retry_count (see page 295).
Planning and Documenting an HA Cluster Package Configuration Planning pev_ Specifies a package environment variable that can be passed to external_pre_script, external_script, or both, by means of the cmgetpkgenv (1m) command. More details in Chapter 6, under pev_ (see page 296).
Planning and Documenting an HA Cluster Package Configuration Planning Package Configuration File Data: ========================================================================== Package Name: ______pkg11____________Package Type:___Failover____________ Primary Node: ______ftsys9_______________ First Failover Node:____ftsys10_______________ Additional Failover Nodes:__________________________________ Run Script Timeout: _no_timeout_____ Halt Script Timeout: _no_timeout___ Package AutoRun Enabled? Node Failfas
Planning and Documenting an HA Cluster Package Configuration Planning cvm_activation_cmd: ______________________________________________ VxVM Disk Groups: vxvm_dg___/dev/vx/dg01____vxvm_dg____________vxvm_dg_____________ vxvol_cmd ______________________________________________________ ________________________________________________________________________________ Logical Volumes and File Systems: fs_name___/dev/vg01/1v011___fs_directory____/mnt1______fs_mount_opt_"-o rw"___ fs_umount_opt_"-s"__________fs_
Planning and Documenting an HA Cluster Package Configuration Planning Service Name: _______ Command: _________ Restart: __ Fail Fast enabled:_____ Service Name: _______ Command: _________ Restart: __ Fail Fast enabled:_____ ================================================================================ Package environment variable:__PEV_pkg11_var________________________________ Package environment variable:_______________________________________________ External pre-script:________________________________
Planning and Documenting an HA Cluster Package Configuration Planning Additional Parameters Used Only by Legacy Packages IMPORTANT The following parameters are used only by legacy packages. Do not try to use them in modular packages. See “Creating the Legacy Package Configuration” on page 363 for more information. PATH Specifies the path to be used by the script. SUBNET Specifies the IP subnets that are to be monitored for the package.
Planning and Documenting an HA Cluster Package Configuration Planning In most cases, though, HP recommends that you use the same script for both run and halt instructions. (When the package starts, the script is passed the parameter start; when it halts, it is passed the parameter stop.) DEFERRED_RESOURCE_NAME Add DEFERRED_RESOURCE_NAME to a legacy package control script for any resource that has a RESOURCE_START setting of DEFERRED.
Building an HA Cluster Configuration 5 Building an HA Cluster Configuration This chapter and the next take you through the configuration tasks required to set up a Serviceguard cluster. These procedures are carried out on one node, called the configuration node, and the resulting binary file is distributed by Serviceguard to all the nodes in the cluster. In the examples in this chapter, the configuration node is named ftsys9, and the sample target node is called ftsys10.
Building an HA Cluster Configuration Preparing Your Systems Preparing Your Systems Before configuring your cluster, ensure that all cluster nodes possess the appropriate security files, kernel configuration, and NTP (network time protocol) configuration. Installing and Updating Serviceguard For information about installing Serviceguard, see the Release Notes for your version at http://docs.hp.com -> High Availability -> Serviceguard -> Release Notes.
Building an HA Cluster Configuration Preparing Your Systems SGAUTOSTART=/etc/rc.config.d/cmcluster SGFFLOC=/opt/cmcluster/cmff CMSNMPD_LOG_FILE=/var/adm/SGsnmpsuba.log NOTE If these variables are not defined on your system, then source the file /etc/cmcluster.conf in your login profile for user root. For example, you can add this line to root’s .profile file: . /etc/cmcluster.conf Throughout this book, system filenames are usually given with one of these location prefixes.
Building an HA Cluster Configuration Preparing Your Systems Configuring IP Address Resolution Serviceguard uses the name resolution services built in to HP-UX. HP recommends that you define name resolutions in each node’s /etc/hosts file first, rather than rely solely on DNS or NIS services. For example, consider a two node cluster (gryf and sly) with two private subnets and a public subnet. These nodes will be granting permission to a non-cluster node (bit) which does not share the private subnets.
Building an HA Cluster Configuration Preparing Your Systems NOTE Configure the name service switch to consult the /etc/hosts file before other services such as DNS, NIS, or LDAP. See “Defining Name Resolution Services” on page 209 for instructions. Username Validation Serviceguard relies on the identd daemon (usually started by inetd from /etc/inetd.conf) to verify the username of the incoming network connection.
Building an HA Cluster Configuration Preparing Your Systems Access Roles Serviceguard access control policies define what a user on a remote node can do on the local node. These are known as Access Roles or Role Based Access (RBA). This manual uses Access Roles. Serviceguard recognizes two levels of access, root and non-root: • Root Access: Users authorized for root access have total control over the configuration of the cluster and packages.
Building an HA Cluster Configuration Preparing Your Systems NOTE When you upgrade a cluster from Version A.11.15 or earlier, entries in $SGCONF/cmclnodelist are automatically updated into Access Control Policies in the cluster configuration file. All non-root user-hostname pairs are assigned the role of Monitor (view only).
Building an HA Cluster Configuration Preparing Your Systems ########################################################### # Do not edit this file! # Serviceguard uses this file only to authorize access to an # unconfigured node. Once a cluster is created, Serviceguard # will not consult this file.
Building an HA Cluster Configuration Preparing Your Systems Setting Access Controls for Configured Cluster Nodes Once nodes are configured in a cluster, access-control policies govern cluster-wide security; changes to cmclnodelist are ignored. The root user on each cluster node is automatically granted root access to all other nodes. Other users can be authorized for non-root roles. NOTE Users on systems outside the cluster cannot gain root access to cluster nodes.
Building an HA Cluster Configuration Preparing Your Systems MONITOR and FULL_ADMIN can only be set in the cluster configuration file and they apply to the entire cluster. PACKAGE_ADMIN can be set in the cluster or a package configuration file. If it is set in the cluster configuration file, PACKAGE_ADMIN applies to all configured packages; if it is set in a package configuration file, it applies to that package only.
Building an HA Cluster Configuration Preparing Your Systems USER_NAME ANY_USER USER_HOST ANY_SERVICEGUARD_NODE USER_ROLE MONITOR In the above example, the configuration would fail because user john is assigned two roles. (In any case, Policy 2 is unnecessary, because PACKAGE_ADMIN includes the role of MONITOR.) Policy 3 does not conflict with any other policies, even though the wildcard ANY_USER includes the individual user john. NOTE Be careful when granting access to ANY_SERVICEGUARD_NODE.
Building an HA Cluster Configuration Preparing Your Systems NOTE HP also recommends that you make DNS highly available, either by using multiple DNS servers or by configuring DNS into a Serviceguard package. Safeguarding against Loss of Name Resolution Services This section explains how to create a robust name-resolution configuration that will allow cluster nodes to continue communicating with one another if DNS or NIS services fail.
Building an HA Cluster Configuration Preparing Your Systems nameserver 15.243.160.51 3. Edit or create the /etc/nsswitch.
Building an HA Cluster Configuration Preparing Your Systems NOTE Under agile addressing, the physical devices in these examples would have names such as /dev/[r]disk/disk1, and /dev/[r]disk/disk2. See “About Device File Names (Device Special Files)” on page 111. 1. Create a bootable LVM disk to be used for the mirror. pvcreate -B /dev/rdsk/c4t6d0 2. Add this disk to the current root volume group. vgextend /dev/vg00 /dev/dsk/c4t6d0 3. Make the new disk a boot disk. mkboot -l /dev/rdsk/c4t6d0 4.
Building an HA Cluster Configuration Preparing Your Systems 6. Verify that the mirrors were properly created. lvlnboot -v The output of this command is shown in a display like the following: Boot Definitions for Volume Group /dev/vg00: Physical Volumes belonging in Root Volume Group: /dev/dsk/c4t5d0 (10/0.5.0) -- Boot Disk /dev/dsk/c4t6d0 (10/0.6.
Building an HA Cluster Configuration Preparing Your Systems Backing Up Cluster Lock Disk Information After you configure the cluster and create the cluster lock volume group and physical volume, you should create a backup of the volume group configuration data on each lock volume group. Use the vgcfgbackup command for each lock volume group you have configured, and save the backup file in case the lock configuration must be restored to a new disk with the vgcfgrestore command following a disk failure.
Building an HA Cluster Configuration Preparing Your Systems IMPORTANT • If you are using a disk array, create the smallest LUN the array will allow, or, on an HP Integrity server, you can partition a LUN; see “Creating a Disk Partition on an HP Integrity System”. • If you are using individual disks, use either a small disk, or a portion of a disk. On an HP Integrity server, you can partition a disk; see “Creating a Disk Partition on an HP Integrity System”.
Building an HA Cluster Configuration Preparing Your Systems Step 1. Use a text editor to create a file that contains the partition information. You need to create at least three partitions, for example: 3 EFI 100MB HPUX 1MB HPUX 100% This defines: • A 100 MB EFI (Extensible Firmware Interface) partition (this is required) • A 1 MB partition that can be used for the lock LUN • A third partition that consumes the remainder of the disk is and can be used for whatever purpose you like. Step 2.
Building an HA Cluster Configuration Preparing Your Systems Use the command insf -e on each node. This will create device files corresponding to the three partitions, though the names themselves may differ from node to node depending on each node’s I/O configuration. Step 5. Define the lock LUN; see “Defining the Lock LUN”. Defining the Lock LUN Use cmquerycl -L to create a cluster configuration file that defines the lock LUN.
Building an HA Cluster Configuration Preparing Your Systems The quorum server executable file, qs, is installed in the /usr/lbin directory. When the installation is complete, you need to create an authorization file on the server where the QS will be running to allow specific host systems to obtain quorum services. The required pathname for this file is /etc/cmcluster/qs_authfile. Add to the file the names of all cluster nodes that will access cluster services from this quorum server.
Building an HA Cluster Configuration Preparing Your Systems For a complete discussion of how the quorum server operates, see to “Cluster Quorum to Prevent Split-Brain Syndrome” on page 65. See the section “Specifying a Quorum Server” on page 237 for a description of how to use the cmquerycl command to specify a quorum server in the cluster configuration file. For more information, see the Release Notes for your version of Quorum Server at http://docs.hp.com -> High Availability -> Quorum Server.
Building an HA Cluster Configuration Preparing Your Systems If you experience problems, return the parameters to their default values. When contacting HP support for any issues regarding Serviceguard and networking, please be sure to share all information about any parameters that were changed from the defaults. Third-party applications that are running in a Serviceguard environment may require tuning of network and kernel parameters: • ndd is the network tuning utility.
Building an HA Cluster Configuration Preparing Your Systems Preparing for Changes in Cluster Size If you intend to add additional nodes to the cluster online, while it is running, ensure that they are connected to the same heartbeat subnets and to the same lock disks as the other cluster nodes. In selecting a cluster lock configuration, be careful to anticipate any potential need for additional cluster nodes.
Building an HA Cluster Configuration Preparing Your Systems NOTE If you are configuring volume groups that use mass storage on HP's HA disk arrays, you should use redundant I/O channels from each node, connecting them to separate ports on the array. As of HP-UX 11i v3, the I/O subsystem performs load balancing and multipathing automatically. Creating a Storage Infrastructure with LVM This section describes storage configuration with LVM.
Building an HA Cluster Configuration Preparing Your Systems When you have created the logical volumes and created or extended the volume groups, specify the filesystem that is to be mounted on the volume group, then skip ahead to the section “Deactivating the Volume Group”. To configure the volume groups from the command line, proceed as follows. If your volume groups have not been set up, use the procedures that follow.
Building an HA Cluster Configuration Preparing Your Systems 2. Next, create a control file named group in the directory /dev/vgdatabase, as follows: mknod /dev/vgdatabase/group c 64 0xhh0000 The major number is always 64, and the hexadecimal minor number has the form 0xhh0000 where hh must be unique to the volume group you are creating. Use a unique minor number that is available across all the nodes for the mknod command above.
Building an HA Cluster Configuration Preparing Your Systems Creating File Systems If your installation uses filesystems, create them next. Use the following commands to create a filesystem for mounting on the logical volume just created: 1. Create the filesystem on the newly created logical volume: newfs -F vxfs /dev/vgdatabase/rlvol1 Note the use of the raw device file for the logical volume. 2. Create a directory to mount the disk: mkdir /mnt1 3.
Building an HA Cluster Configuration Preparing Your Systems same physical volume that was available on ftsys9. You must carry out the same procedure separately for each node on which the volume group's package can run. To set up the volume group on ftsys10, use the following steps: 1. On ftsys9, copy the mapping of the volume group to a specified file. vgexport -p -s -m /tmp/vgdatabase.map /dev/vgdatabase 2. Still on ftsys9, copy the map file to ftsys10: rcp /tmp/vgdatabase.map ftsys10:/tmp/vgdatabase.
Building an HA Cluster Configuration Preparing Your Systems NOTE When you use PVG-strict mirroring, the physical volume group configuration is recorded in the /etc/lvmpvg file on the configuration node. This file defines the physical volume groups which are the basis of mirroring and indicate which physical volumes belong to each physical volume group.
Building an HA Cluster Configuration Preparing Your Systems disks. To make merging the files easier, be sure to keep a careful record of the physical volume group names on the volume group planning worksheet (described in Chapter 4). Use the following procedure to merge files between the configuration node (ftsys9) and a new node (ftsys10) to which you are importing volume groups: 1. Copy /etc/lvmpvg from ftsys9 to /etc/lvmpvg.new on ftsys10. 2. If there are volume groups in /etc/lvmpvg.
Building an HA Cluster Configuration Preparing Your Systems This section shows how to configure new storage using the command set of the Veritas Volume Manager (VxVM). Once you have created the root disk group (described next), you can use VxVM commands or the Storage Administrator GUI, VEA, to carry out configuration tasks. For more information, see the Veritas Volume Manager documentation posted at http://docs.hp.com -> 11i v3 -> VxVM (or -> 11i v2 -> VxVM, depending on your HP-UX version).
Building an HA Cluster Configuration Preparing Your Systems Converting Disks from LVM to VxVM You can use the vxvmconvert(1m) utility to convert LVM volume groups into VxVM disk groups. Before you can do this, the volume group must be deactivated, which means that any package that uses the volume group must be halted. Follow the conversion procedures outlined in the Veritas Volume Manager Migration Guide for your version of VxVM.
Building an HA Cluster Configuration Preparing Your Systems /usr/lib/vxvm/bin/vxdisksetup -i c0t3d2 Creating Disk Groups Use vxdiskadm, or use the vxdg command, to create disk groups, as in the following example: vxdg init logdata c0t3d2 Verify the configuration with the following command: vxdg list NAME STATE rootdg logdata enabled enabled ID 971995699.1025.node1 972078742.1084.node1 Creating Volumes Use the vxassist command to create logical volumes.
Building an HA Cluster Configuration Preparing Your Systems Creating File Systems If your installation uses filesystems, create them next. Use the following commands to create a filesystem for mounting on the logical volume just created: 1. Create the filesystem on the newly created volume: newfs -F vxfs /dev/vx/rdsk/logdata/log_files 2. Create a directory to mount the volume: mkdir /logs 3. Mount the volume: mount /dev/vx/dsk/logdata/log_files /logs 4.
Building an HA Cluster Configuration Preparing Your Systems vxvol -g dg_01 startall mount /dev/vx/dsk/dg_01/myvol /mountpoint NOTE Unlike LVM volume groups, VxVM disk groups are not entered in the cluster configuration file, nor in the package configuration file.
Building an HA Cluster Configuration Configuring the Cluster Configuring the Cluster This section describes how to define the basic cluster configuration. This must be done on a system that is not part of a Serviceguard cluster (that is, on which Serviceguard is installed but not configured). NOTE You can use Serviceguard Manager to configure a cluster: open the System Management Homepage (SMH) and choose Tools->Serviceguard Manager. See “Using Serviceguard Manager” on page 30 for more information.
Building an HA Cluster Configuration Configuring the Cluster -w full lets you specify full network probing, in which actual connectivity is verified among all LAN interfaces on all nodes in the cluster. This is the default. -w none skips network querying. If you have recently checked the networks. this option will save time. For more details, see the cmquerycl(1m) man page. The example above creates an template file, by default /etc/cmcluster/clust1.config.
Building an HA Cluster Configuration Configuring the Cluster To display the failover times of disks, use the cmquerycl command, specifying all the nodes in the cluster. The output of the command lists the disks connected to each node together with the re-formation time associated with each. Do not include the node’s entire domain name; for example, specify ftsys9, not ftsys9.cup.hp.
Building an HA Cluster Configuration Configuring the Cluster Specifying a Lock LUN A cluster lock disk, lock LUN, or quorum server, is required for two-node clusters. The lock must be accessible to all nodes and must be powered separately from the nodes. See “Cluster Lock” on page 65 and “Setting Up a Lock LUN” on page 214 for more information.
Building an HA Cluster Configuration Configuring the Cluster # HEARTBEAT_INTERVAL. If you are experiencing quorum server # timeouts, you can adjust these parameters, or you can include # the QS_TIMEOUT_EXTENSION parameter.
Building an HA Cluster Configuration Configuring the Cluster Specifying Maximum Number of Configured Packages This specifies the most packages that can be configured in the cluster. The parameter value must be equal to or greater than the number of packages currently configured in the cluster. The count includes all types of packages: failover, multi-node, and system multi-node. As of Serviceguard A.11.17, the default is 150, which is the maximum allowable number of packages in a cluster.
Building an HA Cluster Configuration Configuring the Cluster SGeFF has requirements for cluster configuration, as outlined in the cluster configuration template file. For more information, see the Serviceguard Extension for Faster Failover Release Notes posted on http://www.docs.hp.com -> High Availability. See also Optimizing Failover Time in a Serviceguard Environment at http://www.docs.hp.com -> High Availability -> Serviceguard -> White Papers.
Building an HA Cluster Configuration Configuring the Cluster NOTE If you are using CVM disk groups, they should be configured after cluster configuration is done, using the procedures described in “Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM)” on page 256. Veritas disk groups are added to the package configuration file, as described in Chapter 6.CVM is not supported on all systems; see “About Veritas CFS and CVM from Symantec” on page 29.
Building an HA Cluster Configuration Configuring the Cluster • Heartbeat network minimum requirement is met. See the entry for HEARTBEAT_IP under “Cluster Configuration Parameters” starting on page 154. • At least one NODE_NAME is specified. • Each node is connected to each heartbeat network. • All heartbeat networks are of the same type of LAN. • The network interface device files specified are valid LAN device files. • VOLUME_GROUP entries are not currently marked as cluster-aware.
Building an HA Cluster Configuration Configuring the Cluster vgchange -a y /dev/vglock • Generate the binary configuration file and distribute it: cmapplyconf -k -v -C /etc/cmcluster/clust1.config or cmapplyconf -k -v -C /etc/cmcluster/clust1.ascii Using the -k option means that cmapplyconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmapplyconf tests the connectivity of all LVM disks on all nodes.
Building an HA Cluster Configuration Configuring the Cluster Be sure to use vgcfgbackup for all volume groups, especially the cluster lock volume group. NOTE You must use the vgcfgbackup command to store a copy of the cluster lock disk's configuration data whether you created the volume group using the System Management Homepage (SMH), SAM, or HP-UX commands.
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Creating a Storage Infrastructure with Veritas Cluster File System (CFS) NOTE CFS (and CVM - Cluster Volume Manager) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard).
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Preparing the Cluster and the System Multi-node Package 1. First, be sure the cluster is running: cmviewcl 2. If it is not, start it: cmruncl 3. If you have not initialized your disk groups, or if you have an old install that needs to be re-initialized, use the vxinstall command to initialize VxVM/CVM disk groups. See “Initializing the Veritas Volume Manager” on page 257. 4.
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) cfscluster config -t 900 -s 5. Verify the system multi-node package is running and CVM is up, using the cmviewcl or cfscluster command. Following is an example of using the cfscluster command.
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) NAME logdata NOTE STATE enabled, shared, cds ID 11192287592.39.ftsys9 If you want to create a cluster with CVM only - without CFS, stop here. Then, in your application package’s configuration file, add the dependency triplet, with DEPENDENCY_CONDITION set to SG-DG-pkg-id#=UP and DEPENDENCY_LOCATION set to SAME_NODE.
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) NODE NAME ACTIVATION MODE ftsys9 sw (sw) MOUNT POINT SHARED VOLUME TYPE ftsys10 sw (sw) MOUNT POINT SHARED VOLUME TYPE 5. To view the package name that is monitoring a disk group, use the cfsdgadm show_package command: cfsdgadm show_package logdata sg_cfs_dg-1 Creating Volumes 1. Make log_files volume on the logdata disk group: vxassist -g logdata make log_files 1024m 2.
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) CAUTION Once you create the disk group and mount point packages, it is critical that you administer the cluster with the cfs commands, including cfsdgadm, cfsmntadm, cfsmount, and cfsumount. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) CLUSTER STATUS cfs_cluster up NODE STATUS STATE ftsys9 up running ftsys10 up running MULTI_NODE_PACKAGES PACKAGE STATUS SG-CFS-pkg up SG-CFS-DG-1 up SG-CFS-MP-1 up STATE running running running AUTO_RUN enabled enabled enabled SYSTEM yes no no ftsys9/etc/cmcluster/cfs> bdf Filesystem kbytes used avail %used /dev/vx/dsk/logdata/log_files 10485 17338 966793 Mounted on 2% tmp/logdata/log_files ft
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Creating Checkpoint and Snapshot Packages for CFS The storage checkpoints and snapshots are two additional mount point package types. They can be associated with the cluster via the cfsmntadm(1m) command.
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Package name "SG-CFS-CK-2" was generated to control the resource Mount point "/tmp/check_logfiles" was associated to the cluster cfsmount /tmp/check_logfiles 3. Verify.
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Mount Point Packages for Snapshot Images A snapshot is a frozen image of an active file system that does not change when the contents of target file system changes. On cluster file systems, snapshots can be created on any node in the cluster, and backup operations can be performed from that node.
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) SG-CFS-SN-1 up running disabled no The snapshot file system /local/snap1 is now mounted and provides a point in time view of /tmp/logdata/log_files.
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) Separate procedures are given below for: • Initializing the Volume Manager • Preparing the Cluster for Use with CVM • Creating Disk Groups for Shared Storage • Creating File Systems with CVM For more information, including details about configuration of plexes (mirrors), multipathing, and RAID, refer to the HP-UX documentation for the Veritas Volume Manager.
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) Preparing the Cluster for Use with CVM In order to use the Veritas Cluster Volume Manager (CVM), you need a cluster that is running with a Serviceguard-supplied CVM system multi-node package. This means that the cluster must already be configured and running before you create disk groups.
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) • Veritas CVM 3.5: cmapplyconf -P /etc/cmcluster/cvm/VxVM-CVM-pkg.conf • Veritas CVM 4.1 and later: If you are not using Veritas Cluster File System, use the cmapplyconf command. (If you are using CFS, you will set up CVM as part of the CFS components.): cmapplyconf -P /etc/cmcluster/cfs/SG-CFS-pkg.conf Begin package verification ...
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) vxdctl -c mode One node will identify itself as the master. Create disk groups from this node. Initializing Disks for CVM You need to initialize the physical disks that will be employed in CVM disk groups.
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) This command creates a 1024 MB volume named log_files in a disk group named logdata. The volume can be referenced with the block device file /dev/vx/dsk/logdata/log_files or the raw (character) device file /dev/vx/rdsk/logdata/log_files.
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) You also need to identify the CVM disk groups, filesystems, logical volumes, and mount options in the package control script. The package configuration process is described in detail in Chapter 6.
Building an HA Cluster Configuration Using DSAU during Configuration Using DSAU during Configuration As explained under “What are the Distributed Systems Administration Utilities?” on page 33, you can use DSAU to centralize and simplify configuration and monitoring tasks. See the Distributed Systems Administration Utilities User’s Guide posted at http://docs.hp.com.
Building an HA Cluster Configuration Managing the Running Cluster Managing the Running Cluster This section describes some approaches to routine management of the cluster. Additional tools and suggestions are found in Chapter 7, “Cluster and Package Maintenance.” Checking Cluster Operation with Serviceguard Manager You can check configuration and status information using Serviceguard Manager: from the System Management Homepage (SMH), choose Tools-> Serviceguard Manager.
Building an HA Cluster Configuration Managing the Running Cluster You can use these commands to test cluster operation, as in the following: 1. If the cluster is not already running, start it. From the Serviceguard Manager menu, choose Run Cluster. From the command line, use cmruncl -v. By default, cmruncl will check the networks. Serviceguard will probe the actual network configuration with the network information in the cluster configuration.
Building an HA Cluster Configuration Managing the Running Cluster Preventing Automatic Activation of LVM Volume Groups It is important to prevent LVM volume groups that are to be used in packages from being activated at system boot time by the /etc/lvmrc file. One way to ensure that this does not happen is to edit the /etc/lvmrc file on all nodes, setting AUTO_VG_ACTIVATE to 0, then including all the volume groups that are not cluster-bound in the custom_vg_activation function.
Building an HA Cluster Configuration Managing the Running Cluster To enable automatic cluster start, set the flag AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file on each node in the cluster; the nodes will then join the cluster at boot time. Here is an example of the /etc/rc.config.d/cmcluster file: #************************ CMCLUSTER ************************ # Highly Available Cluster configuration # # @(#) $Revision: 72.
Building an HA Cluster Configuration Managing the Running Cluster Managing a Single-Node Cluster The number of nodes you will need for your Serviceguard cluster depends on the processing requirements of the applications you want to protect. You may want to configure a single-node cluster to take advantage of Serviceguard’s network failure protection. In a single-node cluster, a cluster lock is not required, since there is no other node in the cluster.
Building an HA Cluster Configuration Managing the Running Cluster Deleting the Cluster Configuration With root login, you can delete a cluster configuration from all cluster nodes by using Serviceguard Manager, or on the command line. The cmdeleteconf command prompts for a verification before deleting the files unless you use the -f option. You can only delete the configuration when the cluster is down.
Building an HA Cluster Configuration Managing the Running Cluster 270 Chapter 5
Configuring Packages and Their Services 6 Configuring Packages and Their Services Serviceguard packages group together applications and the services and resources they depend on. The typical Serviceguard package is a failover package that starts on one node but can be moved (“failed over”) to another if necessary. See “What is Serviceguard?” on page 26, “How the Package Manager Works” on page 71, and “Package Configuration Planning” on page 165 for more information.
Configuring Packages and Their Services allowing you to build packages from smaller modules, and eliminating the separate package control script and the need to distribute it manually. Packages created using Serviceguard A.11.17 or earlier are referred to as legacy packages. If you need to reconfigure a legacy package (rather than create a new package), see “Configuring a Legacy Package” on page 363.
Configuring Packages and Their Services Choosing Package Modules Choosing Package Modules IMPORTANT Before you start, you need to do the package-planning tasks described under “Package Configuration Planning” on page 165. To choose the right package modules, you need to decide the following things about the package you are creating: • What type of package it is; see “Types of Package: Failover, Multi-Node, System Multi-Node” on page 273.
Configuring Packages and Their Services Choosing Package Modules Relocatable IP addresses cannot be assigned to multi_node packages. Examples are the Veritas Cluster File System (CFS) system multi-node packages; but support for multi-node packages is no longer restricted to CVM/CFS; you can create a multi-node package for any purpose. IMPORTANT But if the package uses volume groups, they must be activated in shared mode: vgchange -a s, which is available only if the SGeRAC add-on product is installed.
Configuring Packages and Their Services Choosing Package Modules NOTE The following parameters cannot be configured for multi-node or system multi-node packages: • failover_policy • failback_policy • ip_subnet • ip_address Volume groups configured for packages of these types must be activated in shared mode. For more information about types of packages and how they work, see “How the Package Manager Works” on page 71.
Configuring Packages and Their Services Choosing Package Modules (The output will be written to $SGCONF/sg-all.) Base Package Modules At least one base module (or default or all, which include the base module) must be specified on the cmmakepkg command line. Parameters marked with an asterisk (*) are new or changed as of Serviceguard A.11.18. (S) indicates that the parameter (or its equivalent) has moved from the package control script to the package configuration file for modular packages.
Configuring Packages and Their Services Choosing Package Modules Table 6-1 Base Modules (Continued) Module Name Parameters (page) Comments multi_node package_name (282) * module_name (282) * module_version (282) * package_type (282) node_name (282) auto_run (283) node_fail_fast_enabled (283) run_script_timeout (283) halt_script_timeout (284) successor_halt_timeout (284) * script_log_file (285) operation_sequence (285) * log_level (285) priority (286) * Base module.
Configuring Packages and Their Services Choosing Package Modules its equivalent) has moved from the package control script to the package configuration file for modular packages. See the “Package Parameter Explanations” on page 281 for more information. Table 6-2 Module Name Optional Modules Parameters (page) Comments dependency dependency_name (286) * dependency_condition (287) dependency_location (287) Add to a base module to create a package that depends on one or more other packages.
Configuring Packages and Their Services Choosing Package Modules Table 6-2 Module Name Optional Modules (Continued) Parameters (page) Comments volume_group concurrent_vgchange_operatio ns (292) (S) vgchange_cmd (292) * (S) cvm_activation_cmd (293) (S) vxvol_cmd (293) * (S) vg (294) (S) cvm_dg (294) (S) vxvm_dg (294) (S) deactivation_retry_count (294) (S) kill_processes_accessing_raw _devices (294) (S) Add to a base module if the package needs to mount file systems on LVM or VxVM volumes, or uses CVM vo
Configuring Packages and Their Services Choosing Package Modules Table 6-2 Module Name Optional Modules (Continued) Parameters (page) Comments external_pre external_pre_script (297) * Add to a base module to specify additional programs to be run before volume groups and disk groups are activated while the package is starting and after they are deactivated while the package is halting.
Configuring Packages and Their Services Choosing Package Modules Table 6-2 Optional Modules (Continued) Module Name default NOTE Parameters (page) (all parameters) Comments A symbolic link to the all module; used if a base module is not specified on the cmmakepkg command line; see “cmmakepkg Examples” on page 299. The default form for parameter names in the modular package configuration file is lower case; for legacy packages the default is upper case.
Configuring Packages and Their Services Choosing Package Modules More detailed instructions for running cmmakepkg are in the next section, “Generating the Package Configuration File” on page 299. See also “Package Configuration Planning” on page 165. package_name Any name, up to a maximum of 39 characters, that: IMPORTANT • starts and ends with an alphanumeric character • otherwise contains only alphanumeric characters or dot (.
Configuring Packages and Their Services Choosing Package Modules node_name node_name node_name Serviceguard uses the order of priority specified by this list to choose which node to run the package on. IMPORTANT See “Cluster Configuration Parameters” on page 154 for important information about node names. auto_run Can be set to yes or no. The default is yes.
Configuring Packages and Their Services Choosing Package Modules If the package does not complete its startup in the time specified by run_script_timeout, Serviceguard will terminate it and prevent it from switching to another node. In this case, if node_fail_fast_enabled is set to yes, the node will be halted (HP-UX system reset). If no timeout is specified (no_timeout), Serviceguard will wait indefinitely for the package to start. If a timeout occurs: • • Switching will be disabled.
Configuring Packages and Their Services Choosing Package Modules New as of A.11.18 (for both modular and legacy packages). See also “About Package Dependencies” on page 171. script_log_file The full pathname of the package’s log file. The default is $SGRUN/log/.log. (See “Understanding Where Files Are Located” on page 200 for more information about Serviceguard pathnames.) operation_sequence Defines the order in which the scripts defined by the package’s component modules will start up.
Configuring Packages and Their Services Choosing Package Modules This parameter can be set for failover packages only. If this package will depend on another package or vice versa, see also “About Package Dependencies” on page 171. failback_policy Specifies whether or not Serviceguard will automatically move a package that is not running on its primary node (the first node on its node_name list) when the primary node is once again available. Can be set to automatic or manual. The default is manual.
Configuring Packages and Their Services Choosing Package Modules IMPORTANT Restrictions on dependency names in previous Serviceguard releases were less stringent. Packages that specify dependency_names that do not conform to the above rules will continue to run, but if you reconfigure them, you will need to change the dependency_name; cmcheckconf and cmapplyconf will enforce the new rules.
Configuring Packages and Their Services Choosing Package Modules local_lan_failover_allowed Specifies whether or not Serviceguard can switch LANs on a cluster node (that is, switch to a standby LAN card). Legal values are yes and no. Default is yes. monitored_subnet The IP address of a LAN subnet that is to be monitored for this package. Replaces legacy SUBNET which is still supported in the package configuration file for legacy packages; see “Configuring a Legacy Package” on page 363.
Configuring Packages and Their Services Choosing Package Modules ip_address A relocatable IP address on a specified ip_subnet (see page 288). Replaces IP, which is still supported in the package control script for legacy packages; see “Configuring a Legacy Package” on page 363. For more information about relocatable IP addresses, see “Stationary and Relocatable IP Addresses” on page 98. This parameter can be set for failover packages only.
Configuring Packages and Their Services Choosing Package Modules service_cmd The command that runs the application or service for this service_name, for example, /usr/bin/X11/xclock -display 15.244.58.208:0 An absolute pathname is required; neither the PATH variable nor any other environment variable is passed to the command. The default shell is /usr/bin/sh. NOTE Be careful when defining service run commands.
Configuring Packages and Their Services Choosing Package Modules service_halt_timeout The length of time, in seconds, Serviceguard will wait for the service to halt before forcing termination of the service’s process. The value should be large enough to allow any cleanup required by the service to complete. Legal values are none, unlimited, or any number greater than zero. unlimited means Serviceguard will never force the process to terminate.
Configuring Packages and Their Services Choosing Package Modules You can configure a total of 15 resource_up_values per package. For example, if there is only one resource (resource_name) in the package, then a maximum of 15 resource_up_values can be defined. If two resource_names are defined and one of them has 10 resource_up_values, then the other resource_name can have only 5 resource_up_values.
Configuring Packages and Their Services Choosing Package Modules (SGeRAC) is installed. (See the latest version of Using Serviceguard Extension for RAC at http://www.docs.hp.com -> High Availability -> Serviceguard Extension for Real Application Cluster (ServiceGuard OPS Edition) for more information.) Shared LVM volume groups must not contain a file system.
Configuring Packages and Their Services Choosing Package Modules This allows package startup to continue while mirror re-synchronization is in progress. vg Specifies an LVM volume group (one per vg, each on a new line) on which a file system needs to be mounted. A corresponding vgchange_cmd (see page 292) specifies how the volume group is to be activated. The package script generates the necessary filesystem commands on the basis of the fs_ parameters (see page 295).
Configuring Packages and Their Services Choosing Package Modules Legal value is any number greater than zero. The default is 1. If the package needs to mount and unmount a large number of filesystems, you can improve performance by carefully tuning this parameter during testing (increase it a little at time and monitor performance each time). fs_mount_retry_count The number of mount retries for each file system. Legal value is zero or any greater number. The default is zero.
Configuring Packages and Their Services Choosing Package Modules NOTE A volume group must be defined in this file (using vg; see page 294) for each logical volume specified by an fs_name entry. fs_directory The root of the file system specified by fs_name. Replaces FS, which is still supported in the package control script for legacy packages; see “Configuring a Legacy Package” on page 363. See the mount (1m) manpage for more information. fs_type The type of the file system specified by fs_name.
Configuring Packages and Their Services Choosing Package Modules The variable name and value can each consist of a maximum of MAXPATHLEN characters (1024 on HP-UX systems). You can define more than one variable. See “About External Scripts” on page 178, as well as the comments in the configuration file, for more information.
Configuring Packages and Their Services Choosing Package Modules NOTE The only access role that can be granted in the package configuration file is package_admin for this particular package; you grant other roles in the cluster configuration file. See “Setting Access Controls for Configured Cluster Nodes” on page 207 for further discussion and examples. Legal values for user_name are any_user or a maximum of eight login names from /etc/passwd on user_host.
Configuring Packages and Their Services Generating the Package Configuration File Generating the Package Configuration File When you have chosen the configuration modules your package needs (see “Choosing Package Modules” on page 273), you are ready to generate a package configuration file that contains those modules. This file will consist of a base module (usually failover, multi-node or system multi-node) plus the modules that contain the additional parameters you have decided to include.
Configuring Packages and Their Services Generating the Package Configuration File cmmakepkg $SGCONF/pkg1/pkg1.conf • To create a generic failover package (that could be applied with out editing): cmmakepkg -n pkg1 -m sg/failover $SGCONF/pkg1/pkg1.
Configuring Packages and Their Services Editing the Configuration File Editing the Configuration File When you have generated the configuration file that contains the modules your package needs (see “Generating the Package Configuration File” on page 299), you need to edit the file to set the package parameters to the values that will make the package function as you intend.
Configuring Packages and Their Services Editing the Configuration File Use the following bullet points as a checklist, referring to the “Package Parameter Explanations” on page 281, and the comments in the configuration file itself, for detailed specifications for each parameter. NOTE Optional parameters are commented out in the configuration file (with a # at the beginning of the line).
Configuring Packages and Their Services Editing the Configuration File • node_fail_fast_enabled. Enter yes to cause the node to be halted (system reset) if the package fails; otherwise enter no. For system multi-node packages, you must enter yes. • run_script_timeout and halt_script_timeout. Enter the number of seconds Serviceguard should wait for package startup and shutdown, respectively, to complete; or leave the default, no_timeout; see page 283. • successor_halt_timeout.
Configuring Packages and Their Services Editing the Configuration File • Use the monitored_subnet parameter to specify a subnet that is to be monitored for this package. If there are multiple subnets, repeat the parameter as many times as needed, on a new line each time. • If this is a Serviceguard Extension for Oracle RAC (SGeRAC) installation, you can use the cluster_interconnect_subnet parameter (see page 288). • If your package will use relocatable IP addresses, enter the ip_subnet and ip_address.
Configuring Packages and Their Services Editing the Configuration File vg vg01 vg vg02 • If you are using CVM, use the cvm_dg parameters to specify the names of the disk groups to be activated, and select the appropriate cvm_activation_cmd. Enter one disk group per cvm_dg, each on a new line.
Configuring Packages and Their Services Editing the Configuration File — concurrent_mount_and_umount_operations (see page 294) You can also use the fsck_opt and fs_umount_opt parameters to specify the -s option of the fsck and mount/umount commands (see page 296). • You can use the pev_ parameter to specify a variable to be passed to external scripts. Make sure the variable name begins with the upper-case or lower-case letters pev and an underscore (_). You can specify more than one variable.
Configuring Packages and Their Services Editing the Configuration File • Configure the Access Control Policy for up to eight specific users or any_user. The only user role you can configure in the package configuration file is package_admin for the package in question. Cluster-wide roles are defined in the cluster configuration file. See “Access Roles” on page 204 for more information.
Configuring Packages and Their Services Verifying and Applying the Package Configuration Verifying and Applying the Package Configuration Serviceguard checks the configuration you enter and reports any errors. Use a command such as the following to verify the content of the package configuration file you have created, for example: cmcheckconf -v -P $SGCONF/pkg1/pkg1.config Errors are displayed on the standard output.
Configuring Packages and Their Services Verifying and Applying the Package Configuration packages; see “Configuring a Legacy Package” on page 363. And, for modular packages, you need to distribute any external scripts identified by the external_pre_script and external_script parameters.
Configuring Packages and Their Services Adding the Package to the Cluster Adding the Package to the Cluster You can add the new package to the cluster while the cluster is running, subject to the value of MAX_CONFIGURED_PACKAGES in the cluster configuration file. See “Adding a Package to a Running Cluster” on page 376.
Configuring Packages and Their Services How Control Scripts Manage VxVM Disk Groups How Control Scripts Manage VxVM Disk Groups VxVM disk groups (other than those managed by CVM, on systems that support it) are outside the control of the Serviceguard cluster. The package control script uses standard VxVM commands to import and deport these disk groups. (For details on importing and deporting disk groups, refer to the discussion of the import and deport options in the vxdg man page.
Configuring Packages and Their Services How Control Scripts Manage VxVM Disk Groups This command takes over ownership of all the disks in disk group dg_01, even though the disk currently has a different host ID written on it. The command writes the current node’s host ID on all disks in disk group dg_01 and sets the noautoimport flag for the disks. This flag prevents a disk group from being automatically re-imported by a node following a reboot.
Configuring Packages and Their Services Configuring Veritas System Multi-node Packages Configuring Veritas System Multi-node Packages There are two system multi-node packages that regulate Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS). These packages ship with the Serviceguard product. There are two versions of the package files: VxVM-CVM-pkg for CVM Version 3.5, and SG-CFS-pkg for CFS/CVM Version 4.1 and later.
Configuring Packages and Their Services Configuring Veritas System Multi-node Packages For CVM, use the cmapplyconf command to add the system multi-node packages to your cluster. If you are using the Veritas Cluster File System, use the cfscluster command to activate and halt the system multi-node package in your cluster. NOTE Do not create or modify these packages by editing a configuration file. Never edit their control script files. The CFS admin commands are listed in Appendix A.
Configuring Packages and Their Services Configuring Veritas Multi-node Packages Configuring Veritas Multi-node Packages There are two types of multi-node packages that work with the Veritas Cluster File System (CFS): SG-CFS-DG-id# for disk groups, which you configure with the cfsdgadm command, and SG-CFS-MP-id# for mount points, which you configure with the cfsmntadm command. Each package name will have a unique number, appended by Serviceguard at creation.
Configuring Packages and Their Services Configuring Veritas Multi-node Packages the dependent application package loses access and cannot read and write to the disk, it will fail; however that will not cause the DG or MP multi-node package to fail. NOTE Do not create or edit ASCII configuration files for the Serviceguard supplied packages VxVM-CVM-pkg, SG-CFS-pkg, SG-CFS-DG-id#, or SG-CFS-MP-id#. Create VxVM-CVM-pkg and SG-CFS-pkg by means of the cmapplyconf command.
Cluster and Package Maintenance 7 Cluster and Package Maintenance This chapter describes how to see cluster configuration and status information, how to start and halt a cluster or an individual node, how to perform permanent reconfiguration, and how to start, halt, move, and modify packages during routine maintenance of the cluster.
Cluster and Package Maintenance Reviewing Cluster and Package Status Reviewing Cluster and Package Status You can check status using Serviceguard Manager or from a cluster node’s command line. Reviewing Cluster and Package Status with the cmviewcl Command Information about cluster status is stored in the status database, which is maintained on each individual node in the cluster.
Cluster and Package Maintenance Reviewing Cluster and Package Status Viewing Dependencies The cmviewcl -v command output lists dependencies throughout the cluster. For a specific package’s dependencies, use the -p option.
Cluster and Package Maintenance Reviewing Cluster and Package Status • Failed. A node never sees itself in this state. Other active members of the cluster will see a node in this state if that node was in an active cluster, but is no longer, and is not halted. • Reforming. A node is in this state when the cluster is re-forming. The node is currently running the protocols which ensure that all nodes agree to the new membership of an active cluster.
Cluster and Package Maintenance Reviewing Cluster and Package Status • fail_wait - The package is waiting to be halted because the package or a package it depends on has failed, but must wait for a package that depends on it to halt before it can halt. • relocate_wait - The package’s halt script has completed or Serviceguard is still trying to place the package. • unknown - Serviceguard could not determine the status at the time cmviewcl was run.
Cluster and Package Maintenance Reviewing Cluster and Package Status • unknown - Serviceguard could not determine the state at the time cmviewcl was run. Package Switching Attributes cmviewcl shows the following package switching information: • AUTO_RUN: Can be enabled or disabled. For failover packages, enabled means that the package starts when the cluster starts, and Serviceguard can switch the package to another node in the event of failure.
Cluster and Package Maintenance Reviewing Cluster and Package Status Failover and Failback Policies Failover packages can be configured with one of two values for the failover_policy parameter (see page 285), as displayed in the output of cmviewcl -v: • configured_node. The package fails over to the next node in the node_name list in the package configuration file (see page 282). • min_package_node. The package fails over to the node in the cluster that has the fewest running packages.
Cluster and Package Maintenance Reviewing Cluster and Package Status Script_Parameters: ITEM STATUS Service up Subnet up MAX_RESTARTS 0 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled NODE ftsys10 STATUS up RESTARTS 0 0 NAME ftsys9 ftsys10 (current) STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 28.1 32.1 NAME lan0 lan1 PACKAGE pkg2 STATE running AUTO_RUN enabled STATUS up NAME service1 15.13.168.
Cluster and Package Maintenance Reviewing Cluster and Package Status ftsys9 up running Quorum Server Status: NAME STATUS lp-qs up ... NODE ftsys10 STATUS up STATE running STATE running Quorum Server Status: NAME STATUS lp-qs up STATE running CVM Package Status If the cluster is using the Veritas Cluster Volume Manager (CVM), version 3.5, for disk storage, the system multi-node package VxVM-CVM-pkg must be running on all active nodes for applications to be able to access CVM disk groups.
Cluster and Package Maintenance Reviewing Cluster and Package Status PACKAGE VxVM-CVM-pkg STATUS up STATE running AUTO_RUN enabled SYSTEM yes When you use the -v option, the display shows the system multi-node package associated with each active node in the cluster, as in the following: MULTI_NODE_PACKAGES: PACKAGE STATUS VxVM-CVM-pkg up STATE running AUTO_RUN enabled NODE ftsys7 STATUS down SWITCHING disabled NODE ftsys8 STATUS down SWITHCHING disabled NODE STATUS ftsys9 up Script_Parameters
Cluster and Package Maintenance Reviewing Cluster and Package Status NOTE CFS is supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
Cluster and Package Maintenance Reviewing Cluster and Package Status CLUSTER example NODE ftsys9 STATUS up STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 56/36.
Cluster and Package Maintenance Reviewing Cluster and Package Status Failover Failback configured_node manual Script_Parameters: ITEM STATUS Resource up Subnet up Resource down Subnet up NODE_NAME ftsys9 ftsys9 ftsys10 ftsys10 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled NAME /example/float 15.13.168.0 /example/float 15.13.168.0 NAME ftsys10 ftsys9 pkg2 now has the status down, and it is shown as unowned, with package switching disabled.
Cluster and Package Maintenance Reviewing Cluster and Package Status Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover configured_node Failback manual Script_Parameters: ITEM STATUS Service up Subnet up Resource up MAX_RESTARTS 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled PACKAGE pkg2 STATUS up STATE running RESTARTS 0 NAME ftsys9 ftsys10 AUTO_RUN disabled NAME service1 15.13.168.
Cluster and Package Maintenance Reviewing Cluster and Package Status Now pkg2 is running on node ftsys9. Note that switching is still disabled.
Cluster and Package Maintenance Reviewing Cluster and Package Status Viewing Information about Unowned Packages The following example shows packages that are currently unowned, that is, not running on any configured node. cmviewcl provides information on monitored resources for each node on which the package can run; this allows you to identify the cause of a failure and decide where to start the package up again.
Cluster and Package Maintenance Reviewing Cluster and Package Status manx up PACKAGE pkg1 NODE tabby running STATUS up STATUS up PACKAGE pkg2 STATE running AUTO_RUN enabled NODE manx AUTO_RUN enabled NODE tabby STATE running STATUS up STATE running SYSTEM_MULTI_NODE_PACKAGES: PACKAGE VxVM-CVM-pkg STATUS up STATE running Checking Status of the Cluster File System (CFS) If the cluster is using the cluster file system, you can check status with the cfscluster command, as shown in the example b
Cluster and Package Maintenance Reviewing Cluster and Package Status Cluster Manager : up CVM state : up MOUNT POINT TYPE /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/lvol1 /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/lvol4 SHARED VOLUME DISK GROUP STATUS regular lvol1 vg_for_cvm_veggie_dd5 MOUNTED regular lvol4 vg_for_cvm_dd5 MOUNTED Status of the Packages with a Cluster File System Installed You can use cmviewcl to see the status of the package and the cluster file system on al
Cluster and Package Maintenance Reviewing Cluster and Package Status Status of CFS Disk Group Packages To see the status of the disk group, use the cfsdgadm display command. For example, for the diskgroup logdata, enter: cfsdgadm display -v logdata NODE NAME ACTIVATION MODE ftsys9 sw (sw) MOUNT POINT SHARED VOLUME TYPE ftsys10 sw (sw) MOUNT POINT SHARED VOLUME TYPE ... To see which package is monitoring a disk group, use the cfsdgadm show_package command.
Cluster and Package Maintenance Managing the Cluster and Nodes Managing the Cluster and Nodes Managing the cluster involves the following tasks: • Starting the Cluster When All Nodes are Down • Adding Previously Configured Nodes to a Running Cluster • Removing Nodes from Operation in a Running Cluster • Halting the Entire Cluster In Serviceguard A.11.
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Commands to Start the Cluster Use the cmruncl command to start the cluster when all cluster nodes are down. Particular command options can be used to start the cluster under specific circumstances.
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Commands to Add Previously Configured Nodes to a Running Cluster Use the cmrunnode command to join one or more nodes to an already running cluster. Any node you add must already be a part of the cluster configuration. The following example adds node ftsys8 to the cluster that was just started with only nodes ftsys9 and ftsys10.
Cluster and Package Maintenance Managing the Cluster and Nodes NOTE HP recommends that you remove a node from participation in the cluster (by running cmhaltnode as shown below, or Halt Node in Serviceguard Manager) before running the HP-UX shutdown command, especially in cases in which a packaged application might have trouble during shutdown and not halt cleanly. Use cmhaltnode to halt one or more nodes in a cluster.
Cluster and Package Maintenance Managing the Cluster and Nodes Automatically Restarting the Cluster You can configure your cluster to automatically restart after an event, such as a long-term power failure, which brought down all nodes in the cluster. This is done by setting AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file.
Cluster and Package Maintenance Managing Packages and Services Managing Packages and Services Managing packages and services involves the following tasks: • Starting a Package • Halting a Package • Moving a Package (halt, then start) • Changing Package Switching Behavior Non-root users can perform these tasks, as regulated by access policies in the cluster’s configuration files. See “Editing Security Files” on page 201 for more information about configuring access.
Cluster and Package Maintenance Managing Packages and Services You cannot start a package unless all the packages that it depends on are running. If you try, you’ll see a Serviceguard message telling you why the operation failed, and the package will not start. If this happens, you can repeat the run command, this time including the package(s) this package depends on; Serviceguard will start all the packages in the correct order.
Cluster and Package Maintenance Managing Packages and Services System multi-node packages run on all cluster nodes simultaneously; halting these packages stops them running on all nodes. A multi-node package can run on several nodes simultaneously; you can halt it on all the nodes it is running on, or you can specify individual nodes. Halting a Package that Has Dependencies Before halting a package, it is a good idea to use the cmviewcl command to check for package dependencies.
Cluster and Package Maintenance Managing Packages and Services Using Serviceguard Commands to Move a Running Failover Package Before you move a failover package to a new node, it is a good idea to run cmviewcl -v -l package and look at dependencies. If the package has dependencies, be sure they can be met on the new node. To move the package, first halt it where it is running using the cmhaltpkg command. This action not only halts the package, but also disables package switching.
Cluster and Package Maintenance Managing Packages and Services Changing Package Switching with Serviceguard Commands You can change package switching behavior either temporarily or permanently using Serviceguard commands. To temporarily disable switching to other nodes for a running package, use the cmmodpkg command.
Cluster and Package Maintenance Reconfiguring a Cluster Reconfiguring a Cluster You can reconfigure a cluster either when it is halted or while it is still running. Some operations can only be done when the cluster is halted. Table 7-1 shows the required cluster state for many kinds of changes. Table 7-1 Types of Changes to the Cluster Configuration Change to the Cluster Configuration 346 Required Cluster State Add a new node All systems configured as members of this cluster must be running.
Cluster and Package Maintenance Reconfiguring a Cluster Table 7-1 Types of Changes to the Cluster Configuration (Continued) Change to the Cluster Configuration Delete NICs and their IP addresses, if any, from the cluster configuration Required Cluster State Cluster can be running. “Changing the Cluster Networking Configuration while the Cluster Is Running” on page 353. If removing the NIC from the system, see “Removing a LAN or VLAN Interface from a Node” on page 358.
Cluster and Package Maintenance Reconfiguring a Cluster Table 7-1 Types of Changes to the Cluster Configuration (Continued) Change to the Cluster Configuration Failover Optimization to enable or disable Faster Failover product NOTE Required Cluster State Cluster must not be running. If you are using CVM or CFS, you cannot change HEARTBEAT_INTERVAL, NODE_TIMEOUT, or AUTO_START_TIMEOUT while the cluster is running.
Cluster and Package Maintenance Reconfiguring a Cluster To update the values of the FIRST_CLUSTER_LOCK_PV and SECOND_CLUSTER_LOCK_PV parameters without bringing down the cluster, proceed as follows: Step 1. Halt the node (cmhaltnode) on which you want to make the changes. Step 2. In the cluster configuration file, modify the values of FIRST_CLUSTER_LOCK_PV and SECOND_CLUSTER_LOCK_PV for this node. Step 3. Run cmcheckconf to check the configuration. Step 4. Run cmapplyconf to apply the configuration. Step 5.
Cluster and Package Maintenance Reconfiguring a Cluster Step 3. Run cmcheckconf to check the configuration. Step 4. Run cmapplyconf to apply the configuration. For information about replacing the physical device, see “Replacing a Lock LUN” on page 395. Reconfiguring a Halted Cluster You can make a permanent change in the cluster configuration when the cluster is halted.
Cluster and Package Maintenance Reconfiguring a Cluster • You cannot delete an active volume group from the cluster configuration. You must halt any package that uses the volume group and ensure that the volume is inactive before deleting it. • The only configuration change allowed while a node is unreachable (for example, completely disconnected from the network) is to delete the unreachable node from the cluster configuration.
Cluster and Package Maintenance Reconfiguring a Cluster Use cmrunnode to start the new node, and, if you so decide, set the AUTOSTART_CMCLD parameter to 1 in the /etc/rc.config.d/cmcluster file to enable the new node to join the cluster automatically each time it reboots. NOTE Before you can add a node to a running cluster that uses Veritas CVM (on systems that support it), the node must already be connected to the disk devices for all CVM disk groups.
Cluster and Package Maintenance Reconfiguring a Cluster cmquerycl -C clconfig.ascii -c cluster1 -n ftsys8 -n ftsys9 Step 3. Edit the file clconfig.ascii to check the information about the nodes that remain in the cluster. Step 4. Halt the node you are going to remove (ftsys10 in this example): cmhaltnode -f -v ftsys10 Step 5. Verify the new configuration: cmcheckconf -C clconfig.ascii Step 6.
Cluster and Package Maintenance Reconfiguring a Cluster • Change the NETWORK_POLLING_INTERVAL. • Change the NETWORK_FAILURE_DETECTION parameter. • A combination of any of these in one transaction (cmapplyconf), given the restrictions below. What You Must Keep in Mind The following restrictions apply: • You must not change the configuration of all heartbeats at one time, or change or delete the only configured heartbeat. At least one working heartbeat, preferably with a standby, must remain unchanged.
Cluster and Package Maintenance Reconfiguring a Cluster See page 288 for more information about the package networking parameters. • You cannot change the IP configuration of an interface used by the cluster in a single transaction (cmapplyconf). You must first delete the NIC from the cluster configuration, then reconfigure the NIC (using ifconfig (1m), for example), then add the NIC back into the cluster.
Cluster and Package Maintenance Reconfiguring a Cluster #STATIONARY_IP NETWORK_INTERFACE 15.13.170.18 lan3 # Possible standby Network Interfaces for lan1, lan0: lan2. NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP #NETWORK_INTERFACE # STATIONARY_IP NETWORK_INTERFACE ftsys10 lan1 192.3.17.19 lan0 15.13.170.19 lan3 # Possible standby Network Interfaces for lan0, lan1: lan2 Step 2.
Cluster and Package Maintenance Reconfiguring a Cluster Step 4. Apply the changes to the configuration and distribute the new binary configuration file to all cluster nodes.: cmapplyconf -C clconfig.ascii If you were configuring the subnet for data instead, and wanted to add it to a package configuration, you would now need to: 1. Halt the package 2. Add the new networking information to the package configuration file 3.
Cluster and Package Maintenance Reconfiguring a Cluster NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP ftsys9 lan1 192.3.17.18 # NETWORK_INTERFACE lan0 # STATIONARY_IP 15.13.170.18 # NETWORK_INTERFACE lan3 # Possible standby Network Interfaces for lan1, lan0: lan2. NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP # NETWORK_INTERFACE # STATIONARY_IP # NETWORK_INTERFACE ftsys10 lan1 192.3.17.19 lan0 15.13.170.19 lan3 # Possible standby Network Interfaces for lan0, lan1: lan2 Step 4.
Cluster and Package Maintenance Reconfiguring a Cluster Step 1. If you are not sure whether or not a physical interface (NIC) is part of the cluster configuration, run olrad -C with the affected I/O slot ID as argument. If the NIC is part of the cluster configuration, you’ll see a warning message telling you to remove it from the configuration before you proceed. See the olrad(1M) manpage for more information about olrad. Step 2.
Cluster and Package Maintenance Reconfiguring a Cluster 1. Use the cmgetconf command to store a copy of the cluster's existing cluster configuration in a temporary file. For example: cmgetconf clconfig.ascii 2. Edit the file clconfig.ascii to add or delete volume groups. 3. Use the cmcheckconf command to verify the new configuration. 4. Use the cmapplyconf command to apply the changes to the configuration and distribute the new binary configuration file to all cluster nodes.
Cluster and Package Maintenance Reconfiguring a Cluster • For CVM 4.1 and later with CFS, edit the configuration file of the package that uses CFS. Configure the three dependency_ parameters. Then run the cmapplyconf command. Similarly, you can delete VxVM or CVM disk groups provided they are not being used by a cluster node at the time.
Cluster and Package Maintenance Reconfiguring a Cluster Use cmapplyconf to apply the changes to the configuration and send the new configuration file to all cluster nodes. Using -k or -K can significantly reduce the response time.
Cluster and Package Maintenance Configuring a Legacy Package Configuring a Legacy Package IMPORTANT You can still create a new legacy package. If you are using a Serviceguard Toolkit such as Serviceguard NFS Toolkit, consult the documentation for that product. Otherwise, use this section to maintain and re-work existing legacy packages rather than to create new ones.
Cluster and Package Maintenance Configuring a Legacy Package You can create a legacy package and its control script in Serviceguard Manager; use the Help for detailed instructions. Otherwise, use the following procedure to create a legacy package. NOTE For instructions on creating Veritas special-purpose system multi-node and multi-node packages, see “Configuring Veritas System Multi-node Packages” on page 313 and “Configuring Veritas Multi-node Packages” on page 315. Step 1.
Cluster and Package Maintenance Configuring a Legacy Package Configuring a Package in Stages It is a good idea to configure failover packages in stages, as follows: 1. Configure volume groups and mount points only. 2. Distribute the control script to all nodes. 3. Apply the configuration. 4. Run the package and ensure that it can be moved from node to node. 5. Halt the package. 6. Configure package IP addresses and application services in the control script. 7. Distribute the control script to all nodes. 8.
Cluster and Package Maintenance Configuring a Legacy Package For modular packages, the default form for parameter names in the package configuration file is lower case; for legacy packages the default is upper case. There are no compatibility issues; Serviceguard is case-insensitive as far as the parameter names are concerned.
Cluster and Package Maintenance Configuring a Legacy Package • STORAGE_GROUP. On systems that support Veritas Cluster Volume manager (CVM), specify the names of any CVM storage groups that will be used by this package. Enter each storage group (CVM disk group) on a separate line. Note that CVM storage groups are not entered in the cluster configuration file. You should not enter LVM volume groups or VxVM disk groups in this file.
Cluster and Package Maintenance Configuring a Legacy Package For legacy packages, DEFERRED resources must be specified in the package control script. NOTE • ACCESS_CONTROL_POLICY. You can grant a non-root user PACKAGE_ADMIN privileges for this package. See the entries for user_name, user_host, and user_role on page 297, and “Access Roles” on page 204, for more information. • If the package will depend on another package, enter values for DEPENDENCY_NAME, DEPENDENCY_CONDITION, and DEPENDENCY_LOCATION.
Cluster and Package Maintenance Configuring a Legacy Package edit the configuration or control script files for these packages, although Serviceguard does not forbid it. Create and modify the information using cfs admin commands only. Use cmmakepkg to create the control script, then edit the control script. Use the following procedure to create the template for the sample failover package pkg1. First, generate a control script template, for example: cmmakepkg -s /etc/cmcluster/pkg1/pkg1.
Cluster and Package Maintenance Configuring a Legacy Package Do not include CFS-based disk groups in the package control script; on systems that support CFS and CVM, they are activated by the CFS multi-node packages before standard packages are started. • If you are using mirrored VxVM disks, specify the mirror recovery option VXVOL. • Add the names of logical volumes and the file system that will be mounted on them.
Cluster and Package Maintenance Configuring a Legacy Package # START OF CUSTOMER DEFINED FUNCTIONS # This function is a place holder for customer defined functions. # You should define all actions you want to happen here, before the service is # started. You can create as many functions as you need. function customer_defined_run_cmds { # ADD customer defined run commands. : # do nothing instruction, because a function must contain some command. date >> /tmp/pkg1.datelog echo 'Starting pkg1' >> /tmp/pkg1.
Cluster and Package Maintenance Configuring a Legacy Package To avoid this situation, it is a good idea to always specify a RUN_SCRIPT_TIMEOUT and a HALT_SCRIPT_TIMEOUT for all packages, especially packages that use Serviceguard commands in their control scripts. If a timeout is not specified and your configuration has a command loop as described above, inconsistent results can occur, including a hung cluster.
Cluster and Package Maintenance Configuring a Legacy Package • Configured resources are available on cluster nodes. • If a dependency is configured, the dependency package must already be configured in the cluster. Distributing the Configuration You can use Serviceguard Manager or HP-UX commands to distribute the binary cluster configuration file among the nodes of the cluster.
Cluster and Package Maintenance Configuring a Legacy Package cmcheckconf -C /etc/cmcluster/cmcl.config -P \ /etc/cmcluster/pkg1/pkg1.config • Activate the cluster lock volume group so that the lock disk can be initialized: vgchange -a y /dev/vg01 • Generate the binary configuration file and distribute it across the nodes. cmapplyconf -v -C /etc/cmcluster/cmcl.config -P \ /etc/cmcluster/pkg1/pkg1.config • If you are using a lock disk, deactivate the cluster lock volume group.
Cluster and Package Maintenance Reconfiguring a Package Reconfiguring a Package You reconfigure a a package in much the same way as you originally configured it; for modular packages, see Chapter 6, “Configuring Packages and Their Services,” on page 271; for older packages, see “Configuring a Legacy Package” on page 363. The cluster can be either halted or running during package reconfiguration.
Cluster and Package Maintenance Reconfiguring a Package 3. Edit the package configuration file. IMPORTANT Restrictions on package names, dependency names, and service names have become more stringent as of A.11.18. Packages that have or contain names that do not conform to the new rules (spelled out under package_name on page 282) will continue to run, but if you reconfigure these packages, you will need to change the names that do not conform; cmcheckconf and cmapplyconf will enforce the new rules. 4.
Cluster and Package Maintenance Reconfiguring a Package If this is a legacy package, remember to copy the control script to the /etc/cmcluster/pkg1 directory on all nodes that can run the package. To create the CFS disk group or mount point multi-node packages on systems that support CFS, see “Creating the Disk Group Cluster Packages” on page 248 and “Creating a Filesystem and Mount Point Package” on page 249.
Cluster and Package Maintenance Reconfiguring a Package NOTE Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-CFS commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
Cluster and Package Maintenance Reconfiguring a Package cmmodpkg -R -s myservice pkg1 The current value of the restart counter may be seen in the output of the cmviewcl -v command. Allowable Package States During Reconfiguration In many cases, you can make changes to a package’s configuration while the package is running. The table that follows shows exceptions - cases in which the package must not be running, or in which the results might not be what you expect.
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 380 Types of Changes to Packages (Continued) Change to the Package Required Package State Change run script contents (legacy package) Package should not be running. Timing problems may occur if the script is changed while the package is running. Change halt script contents (legacy package) Package should not be running. Timing problems may occur if the script is changed while the package is running.
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package Remove a file system Package must not be running. Add, change, or delete modular external scripts and pre-scripts Package must not be running. Package auto_run Package can be either running or halted.
Cluster and Package Maintenance Responding to Cluster Events Responding to Cluster Events Serviceguard does not require much ongoing system administration intervention. As long as there are no failures, your cluster will be monitored and protected. In the event of a failure, those packages that you have designated to be transferred to another node will be transferred automatically.
Cluster and Package Maintenance Single-Node Operation Single-Node Operation In a multi-node cluster, you could have a situation in which all but one node has failed, or you have shut down all but one node, leaving your cluster in single-node operation. This remaining node will probably have applications running on it. As long as the Serviceguard daemon cmcld is active, other nodes can rejoin the cluster.
Cluster and Package Maintenance Removing Serviceguard from a System Removing Serviceguard from a System If you want to disable a node permanently from Serviceguard use, use the swremove command to delete the software. CAUTION Remove the node from the cluster first. If you run the swremove command on a server that is still a member of a cluster, it will cause that cluster to halt, and the cluster to be deleted. To remove Serviceguard: 1. If the node is an active member of a cluster, halt the node. 2.
Troubleshooting Your Cluster 8 Troubleshooting Your Cluster This chapter describes how to verify cluster operation, how to review cluster status, how to add and replace hardware, and how to solve some typical cluster problems.
Troubleshooting Your Cluster Testing Cluster Operation Testing Cluster Operation Once you have configured your Serviceguard cluster, you should verify that the various components of the cluster behave correctly in case of a failure. In this section, the following procedures test that the cluster responds properly in the event of a package failure, a node failure, or a LAN failure.
Troubleshooting Your Cluster Testing Cluster Operation 4. Move the package back to the primary node (see “Moving a Failover Package” on page 343). Testing the Cluster Manager To test that the cluster manager is operating correctly, perform the following steps for each node on the cluster: 1. Turn off the power to the node SPU. 2.
Troubleshooting Your Cluster Testing Cluster Operation 3. Verify that a local switch has taken place so that the Standby card is now the Primary card. In Serviceguard Manager, check the cluster properties. On the command line, use cmviewcl -v. 4. Reconnect the LAN to the original Primary card, and verify its status. In Serviceguard Manager, check the cluster properties. On the command line, use cmviewcl -v .
Troubleshooting Your Cluster Monitoring Hardware Monitoring Hardware Good standard practice in handling a high availability system includes careful fault monitoring so as to prevent failures if possible or at least to react to them swiftly when they occur.
Troubleshooting Your Cluster Monitoring Hardware action in case of a problem. For example, you could configure a disk monitor to report when a mirror was lost from a mirrored volume group being used in the cluster. Refer to the manual Using High Availability Monitors for additional information. Using EMS (Event Monitoring Service) Hardware Monitors A set of hardware monitors is available for monitoring and reporting on memory, CPU, and many other system values.
Troubleshooting Your Cluster Monitoring Hardware HP ISEE also reports error conditions directly to an HP Response Center, alerting support personnel to the potential problem. HP ISEE is available through various support contracts. For more information, contact your HP representative.
Troubleshooting Your Cluster Replacing Disks Replacing Disks The procedure for replacing a faulty disk mechanism depends on the type of disk configuration you are using. Separate descriptions are provided for replacing an array mechanism and a disk in a high availability enclosure. For more information, see the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, at http://docs.hp.
Troubleshooting Your Cluster Replacing Disks new device. See the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, posted at http://docs.hp.com. 2. Identify the names of any logical volumes that have extents defined on the failed physical volume. 3.
Troubleshooting Your Cluster Replacing Disks Replacing a Lock Disk You can replace an unusable lock disk while the cluster is running, provided you do not change the devicefile name (DSF).
Troubleshooting Your Cluster Replacing Disks NOTE If you restore or recreate the volume group for the lock disk and you need to re-create the cluster lock (for example if no vgcfgbackup is available), you can run cmdisklock to re-create the lock. See the cmdisklock (1m) manpage for more information. Replacing a Lock LUN You can replace an unusable lock LUN while the cluster is running, provided you do not change the devicefile name (DSF).
Troubleshooting Your Cluster Replacing Disks cmdisklock checks that the specified device is not in use by LVM, VxVM, ASM, or the file system, and will fail if the device has a label marking it as in use by any of those subsystems. cmdisklock -f overrides this check. CAUTION You are responsible for determining that the device is not being used by any subsystem on any node connected to the device before using cmdisklock -f. If you use cmdisklock -f without taking this precaution, you could lose data.
Troubleshooting Your Cluster Replacing I/O Cards Replacing I/O Cards Replacing SCSI Host Bus Adapters After a SCSI Host Bus Adapter (HBA) card failure, you can replace the card using the following steps. Normally disconnecting any portion of the SCSI bus will leave the SCSI bus in an unterminated state, which will cause I/O errors for other nodes connected to that SCSI bus, so the cluster would need to be halted before disconnecting any portion of the SCSI bus.
Troubleshooting Your Cluster Replacing LAN or Fibre Channel Cards Replacing LAN or Fibre Channel Cards If a LAN or fibre channel card fails and the card has to be replaced, you can replace it on-line or off-line depending on the type of hardware and operating system you are running. It is not necessary to bring the cluster down to do this. Off-Line Replacement Follow these steps to replace an I/O card off-line. 1. Halt the node by using the cmhaltnode command. 2.
Troubleshooting Your Cluster Replacing LAN or Fibre Channel Cards NOTE After replacing a Fibre Channel I/O card, it may necessary to reconfigure the SAN to use the World Wide Name (WWN) of the new Fibre Channel card if Fabric Zoning or other SAN security requiring WWN is used.
Troubleshooting Your Cluster Replacing a Failed Quorum Server System Replacing a Failed Quorum Server System When a quorum server fails or becomes unavailable to the clusters it is providing quorum services for, this will not cause a failure on any cluster. However, the loss of the quorum server does increase the vulnerability of the clusters in case there is an additional failure. Use the following procedure to replace a defective quorum server system.
Troubleshooting Your Cluster Replacing a Failed Quorum Server System The command will output an error message if the specified nodes cannot communicate with the quorum server.
Troubleshooting Your Cluster Troubleshooting Approaches Troubleshooting Approaches The following sections offer a few suggestions for troubleshooting by reviewing the state of the running system and by examining cluster status data, log files, and configuration files.
Troubleshooting Your Cluster Troubleshooting Approaches IPv6: Name lan1* lo0 Mtu Address/Prefix 1500 none 4136 ::1/128 Ipkts Opkts 0 10690 0 10690 Reviewing the System Log File Messages from the Cluster Manager and Package Manager are written to the system log file. The default location of the log file is /var/adm/syslog/syslog.log. Also, package-related messages are logged into the package log file. The package log file is located in the package directory, by default.
Troubleshooting Your Cluster Troubleshooting Approaches Dec 14 14:34:44 star04 cmcld[2048]: Request from node star04 to start package pkg5 on node star04. Dec 14 14:34:44 star04 cmcld[2048]: Executing '/etc/cmcluster/pkg5/pkg5_run start' for package pkg5. Dec 14 14:34:45 star04 LVM[2066]: vgchange -a n /dev/vg02 Dec 14 14:34:45 star04 cmcld[2048]: Package pkg5 run script exited with NO_RESTART. Dec 14 14:34:45 star04 cmcld[2048]: Examine the file /etc/cmcluster/pkg5/pkg5_run.log for more details.
Troubleshooting Your Cluster Troubleshooting Approaches Reviewing the System Multi-node Package Files If you are running Veritas Cluster Volume Manager (supported on some versions of HP-UX), and you have problems starting the cluster, check the log file for the system multi-node package. For Cluster Volume Manager (CVM) 3.5, the file is VxVM-CVM-pkg.log. For CVM 4.1 and later, the file is SG-CFS-pkg.log.
Troubleshooting Your Cluster Troubleshooting Approaches cmquerycl -v -C /etc/cmcluster/verify.ascii -n ftsys9 -n ftsys10 cmcheckconf -v -C /etc/cmcluster/verify.ascii The cmcheckconf command checks: • The network addresses and connections. • The cluster lock disk connectivity. • The validity of configuration parameters of the cluster and packages for: — The uniqueness of names. — The existence and permission of scripts. It doesn’t check: • The correct setup of the power circuits.
Troubleshooting Your Cluster Troubleshooting Approaches Table 8-1 Data Displayed by the cmscancl Command (Continued) Description Source of Data file systems mount command LVM configuration /etc/lvmtab file LVM physical volume group data /etc/lvmpvg file link level connectivity for all links linkloop command binary configuration file cmviewconf command Using the cmviewconf Command cmviewconf allows you to examine the binary cluster configuration file, even when the cluster is not running.
Troubleshooting Your Cluster Troubleshooting Approaches • cmscancl can be used to verify that primary and standby LANs are on the same bridged net. • cmviewcl -v shows the status of primary and standby LANs. Use these commands on all nodes.
Troubleshooting Your Cluster Solving Problems Solving Problems Problems with Serviceguard may be of several types. The following is a list of common categories of problem: • Serviceguard Command Hangs. • Cluster Re-formations. • System Administration Errors. • Package Control Script Hangs. • Problems with VxVM Disk Groups. • Package Movement Errors. • Node and Network Failures. • Quorum Server Problems.
Troubleshooting Your Cluster Solving Problems Name: ftsys9.cup.hp.com Address: 15.13.172.229 If the output of this command does not include the correct IP address of the node, then check your name resolution services further. Cluster Re-formations Cluster re-formations may occur from time to time due to current cluster conditions. Some of the causes are as follows: • local switch on an Ethernet LAN if the switch takes longer than the cluster NODE_TIMEOUT value.
Troubleshooting Your Cluster Solving Problems You can use the following commands to check the status of your disks: • bdf - to see if your package's volume group is mounted. • vgdisplay -v - to see if all volumes are present. • lvdisplay -v - to see if the mirrors are synchronized. • strings /etc/lvmtab - to ensure that the configuration is correct. • ioscan -fnC disk - to see physical disks. • diskinfo -v /dev/rdsk/cxtydz - to display information about a disk.
Troubleshooting Your Cluster Solving Problems NOTE Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
Troubleshooting Your Cluster Solving Problems Next, deactivate the package volume groups. These are specified by the VG[] array entries in the package control script. vgchange -a n 4. Finally, re-enable the package for switching.
Troubleshooting Your Cluster Solving Problems 3. v w - cvm 4. f - cfs Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
Troubleshooting Your Cluster Solving Problems When the package starts up on another node in the cluster, a series of messages is printed in the package log file Follow the instructions in the messages to use the force import option (-C) to allow the current node to import the disk group. Then deport the disk group, after which it can be used again by the package.
Troubleshooting Your Cluster Solving Problems • HPMC. This is a High Priority Machine Check, a system panic caused by a hardware error. • TOC • Panics • Hangs • Power failures In the event of a TOC, a system dump is performed on the failed node and numerous messages are also displayed on the console. You can use the following commands to check the status of your network and subnets: • netstat -in - to display LAN status and check to see if the package IP is stacked on the LAN card.
Troubleshooting Your Cluster Solving Problems Unable to set client version at quorum server 192.6.7.2:reply timed out Probe of quorum server 192.6.7.2 timed out These messages could be an indication of an intermittent network; or the default quorum server timeout may not be sufficient. You can set the QS_TIMEOUT_EXTENSION to increase the timeout, or you can increase the heartbeat or node timeout value.
Troubleshooting Your Cluster Solving Problems 418 Chapter 8
Serviceguard Commands A Serviceguard Commands The following is an alphabetical list of commands used for Serviceguard cluster configuration and maintenance. Manpages for these commands are available on your system after installation. NOTE Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cfsdgadm Description • Display the status of CFS disk groups. • Add shared disk groups to a Veritas Cluster File System CFS cluster configuration, or remove existing CFS disk groups from the configuration. Serviceguard automatically creates the multi-node package SG-CFS-DG-id# to regulate the disk groups. This package has a dependency on the SG-CFS-pkg created by cfscluster command.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf Description Verify and apply Serviceguard cluster configuration and package configuration files. cmapplyconf verifies the cluster configuration and package configuration specified in the cluster_ascii_file and the associated pkg_ascii_file(s), creates or updates the binary configuration file, called cmclconfig, and distributes it to all nodes.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf (continued) Description Run cmgetconf to get either the cluster configuration file or package configuration file whenever changes to the existing configuration are required. Note that cmapplyconf will verify and distribute cluster configuration or package files. It will not cause the cluster daemon to start or be removed from the cluster configuration.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmdeleteconf Description Delete either the cluster or the package configuration. cmdeleteconf deletes either the entire cluster configuration, including all its packages, or only the specified package configuration. If neither cluster_name nor package_name is specified, cmdeleteconf will delete the local cluster’s configuration and all its packages.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltcl Description Halt a high availability cluster. cmhaltcl causes all nodes in a configured cluster to stop their cluster daemons, optionally halting all packages or applications in the process. This command will halt all the daemons on all currently running systems. If the user only wants to shutdown a subset of daemons, the cmhaltnode command should be used instead. cmhaltnode Halt a node in a high availability cluster.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltserv Description Halt a service from the high availability package halt script. This is not a command line executable command, it runs only from within the package control script. cmhaltserv is used in the high availability package halt script to halt a service. If any part of package is marked down, the package halt script is executed as part of the recovery process.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmmodnet Description Add or remove an address from a high availability cluster. cmmodnet is used to add or remove a relocatable package IP_address for the current network interface running the given subnet_name. cmmodnet can also be used to enable or disable a LAN_name currently configured in a cluster.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmquerycl Description Query cluster or node configuration information. cmquerycl searches all specified nodes for cluster configuration and Logical Volume Manager (LVM) information. Cluster configuration information includes network information such as LAN interface, IP addresses, bridged networks and possible heartbeat networks. LVM information includes volume group (VG) interconnection and file system mount point information.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmruncl Description Run a high availability cluster. cmruncl causes all nodes in a configured cluster or all nodes specified to start their cluster daemons and form a new cluster.This command should only be run when the cluster is not active on any of the configured nodes.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmrunserv Description Run a service from the high availability package run script. This is not a command line executable command, it runs only from within the package control script. cmrunserv is used in the high availability package run script to run a service. If the service process dies, cmrunserv updates the status of the service to down.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmscancl Description Gather system configuration information from nodes with Serviceguard installed. cmscancl is a configuration report and diagnostic tool which gathers system software and hardware configuration information from a list of nodes, or from all the nodes in a cluster.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmstartres Description This command is run by package control scripts, and not by users! Starts resource monitoring on the local node for an EMS resource that is configured in a Serviceguard package. cmstartres starts resource monitoring for an EMS resource on the local node. This resource must be configured in the specified package_name.
Serviceguard Commands 432 Appendix A
Enterprise Cluster Master Toolkit B Enterprise Cluster Master Toolkit The Enterprise Cluster Master Toolkit (ECMT) provides a group of example scripts and package configuration files for creating Serviceguard packages for several major database and internet software products. Each toolkit contains a README file that explains how to customize the package for your needs. The ECMT can be installed on HP-UX 11i v1, 11i v2, or 11i v3.
Enterprise Cluster Master Toolkit 434 Appendix B
Designing Highly Available Cluster Applications C Designing Highly Available Cluster Applications This appendix describes how to create or port applications for high availability, with emphasis on the following topics: • Automating Application Operation • Controlling the Speed of Application Failover • Designing Applications to Run on Multiple Systems • Restoring Client Connections • Handling Application Failures • Minimizing Planned Downtime Designing for high availability means reducing the
Designing Highly Available Cluster Applications Automating Application Operation Automating Application Operation Can the application be started and stopped automatically or does it require operator intervention? This section describes how to automate application operations to avoid the need for user intervention. One of the first rules of high availability is to avoid manual intervention.
Designing Highly Available Cluster Applications Automating Application Operation Define Application Startup and Shutdown Applications must be restartable without manual intervention. If the application requires a switch to be flipped on a piece of hardware, then automated restart is impossible. Procedures for application startup, shutdown and monitoring must be created so that the HA software can perform these functions automatically.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Controlling the Speed of Application Failover What steps can be taken to ensure the fastest failover? If a failure does occur causing the application to be moved (failed over) to another node, there are many things the application can do to reduce the amount of time it takes to get the application back up and running.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Use Raw Volumes If your application uses data, use raw volumes rather than filesystems. Raw volumes do not require an fsck of the filesystem, thus eliminating one of the potentially lengthy steps during a failover. Evaluate the Use of JFS If a file system must be used, a JFS offers significantly faster file system recovery as compared to an HFS. However, performance of the JFS may vary with the application.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Keep Logs Small Some databases permit logs to be buffered in memory to increase online performance. Of course, when a failure occurs, any in-flight transaction will be lost. However, minimizing the size of this in-memory log will reduce the amount of completed transaction data that would be lost in case of failure.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Another example is an application where a clerk is entering data about a new employee. Suppose this application requires that employee numbers be unique, and that after the name and number of the new employee is entered, a failure occurs.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Design for Multiple Servers If you use multiple active servers, multiple service points can provide relatively transparent service to a client. However, this capability requires that the client be smart enough to have knowledge about the multiple servers and the priority for addressing them. It also requires access to the data of the failed server or replicated data.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Designing Applications to Run on Multiple Systems If an application can be failed to a backup node, how will it work on that different system? The previous sections discussed methods to ensure that an application can be automatically restarted. This section will discuss some ways to ensure the application can run on multiple systems.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Each application or package should be given a unique name as well as a relocatable IP address. Following this rule separates the application from the system on which it runs, thus removing the need for user knowledge of which system the application runs on. It also makes it easier to move the application among different systems in a cluster for for load balancing or other reasons.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Avoid Using SPU IDs or MAC Addresses Design the application so that it does not rely on the SPU ID or MAC (link-level) addresses. The SPU ID is a unique hardware ID contained in non-volatile memory, which cannot be changed. A MAC address (also known as a LANIC id) is a link-specific address associated with the LAN hardware.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Applications should not reference official hostnames or IP addresses. The official hostname and corresponding IP address for the hostname refer to the primary LAN card and the stationary IP address for that card.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems develop alternate means of verifying where they are running. For example, an application might check a list of hostnames that have been provided in a configuration file. Bind to a Fixed Port When binding a socket, a port address can be specified or one can be assigned dynamically.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems For TCP stream sockets, the TCP level of the protocol stack resolves this problem for the client since it is a connection-based protocol. On the client, TCP ignores the stationary IP address and continues to use the previously bound relocatable IP address originally used by the client. With UDP datagram sockets, however, there is a problem.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems applications must move together. If the applications’ data stores are in separate volume groups, they can switch to different nodes in the event of a failover. The application data should be set up on different disk drives and if applicable, different mount points. The application should be designed to allow for different disks and separate mount points.
Designing Highly Available Cluster Applications Restoring Client Connections Restoring Client Connections How does a client reconnect to the server after a failure? It is important to write client applications to specifically differentiate between the loss of a connection to the server and other application-oriented errors that might be returned. The application should take special action in case of connection loss.
Designing Highly Available Cluster Applications Restoring Client Connections the retry to the current server should continue for the amount of time it takes to restart the server locally. This will keep the client from having to switch to the second server in the event of a application failure. • Use a transaction processing monitor or message queueing software to increase robustness.
Designing Highly Available Cluster Applications Handling Application Failures Handling Application Failures What happens if part or all of an application fails? All of the preceding sections have assumed the failure in question was not a failure of the application, but of another component of the cluster. This section deals specifically with application problems.
Designing Highly Available Cluster Applications Handling Application Failures ensure that the application is behaving correctly. If the application fails and it is not detected automatically, it might take hours for a user to determine the cause of the downtime and recover from it.
Designing Highly Available Cluster Applications Minimizing Planned Downtime Minimizing Planned Downtime Planned downtime (as opposed to unplanned downtime) is scheduled; examples include backups, systems upgrades to new operating system revisions, or hardware replacements. For planned downtime, application designers should consider: • Reducing the time needed for application upgrades/patches.
Designing Highly Available Cluster Applications Minimizing Planned Downtime Provide for Rolling Upgrades Provide for a “rolling upgrade” in a client/server environment. For a system with many components, the typical scenario is to bring down the entire system, upgrade every node to the new version of the software, and then restart the application on all the affected nodes. For large systems, this could result in a long downtime. An alternative is to provide for a rolling upgrade.
Designing Highly Available Cluster Applications Minimizing Planned Downtime Providing Online Application Reconfiguration Most applications have some sort of configuration information that is read when the application is started. If to make a change to the configuration, the application must be halted and a new configuration file read, downtime is incurred. To avoid this downtime use configuration tools that interact with an application and make dynamic changes online.
Integrating HA Applications with Serviceguard D Integrating HA Applications with Serviceguard The following is a summary of the steps you should follow to integrate an application into the Serviceguard environment: 1. Read the rest of this book, including the chapters on cluster and package configuration, and the Appendix “Designing Highly Available Cluster Applications.” 2.
Integrating HA Applications with Serviceguard NOTE 458 • Can the application be installed cluster-wide? • Does the application work with a cluster-wide file name space? • Will the application run correctly with the data (file system) available on all nodes in the cluster? This includes being available on cluster nodes where the application is not currently running.
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Checklist for Integrating HA Applications This section contains a checklist for integrating HA applications in both single and multiple systems. Defining Baseline Application Behavior on a Single System Define a baseline behavior for the application on a standalone system: 1. Install the application, database, and other required resources on one of the systems.
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications c. Install the appropriate executables. d. With the application not running on the first system, try to bring it up on the second system. You might use the script you created in the step above. Is there anything different that you must do? Does it run? e. Repeat this process until you can get the application to run on the second system. 2. Configure the Serviceguard cluster: a. Create the cluster configuration. b.
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications NOTE Appendix D CVM and CFS are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Testing the Cluster 1. Test the cluster: • Have clients connect. • Provide a normal system load. • Halt the package on the first node and move it to the second node: # cmhaltpkg pkg1 # cmrunpkg -n node2 pkg1 # cmmodpkg -e pkg1 • Move it back. # cmhaltpkg pkg1 # cmrunpkg -n node1 pkg1 # cmmodpkg -e pkg1 • Fail one of the systems. For example, turn off the power on node 1.
Software Upgrades E Software Upgrades There are three types of upgrade you can do: • rolling upgrade • non-rolling upgrade • migration with cold install Each of these is discussed below.
Software Upgrades Types of Upgrade Types of Upgrade Rolling Upgrade In a rolling upgrade, you upgrade the HP-UX operating system (if necessary) and the Serviceguard software one node at a time without bringing down your cluster. A rolling upgrade can also be done any time one system needs to be taken offline for hardware maintenance or patch installations. This method is the least disruptive, but your cluster must meet both general and release-specific requirements.
Software Upgrades Guidelines for Rolling Upgrade Guidelines for Rolling Upgrade You can normally do a rolling upgrade if: • You are not upgrading the nodes to a new version of HP-UX; or • You are upgrading to a new version of HP-UX, but using the update process (update-ux), rather than a cold install. update-ux supports many, but not all, upgrade paths. For more information, see the HP-UX Installation and Update Guide for the target version of HP-UX.
Software Upgrades Performing a Rolling Upgrade Performing a Rolling Upgrade Limitations of Rolling Upgrades The following limitations apply to rolling upgrades: • During a rolling upgrade, you should issue Serviceguard commands (other than cmrunnode and cmhaltnode) only on a node containing the latest revision of the software. Performing tasks on a node containing an earlier revision of the software will not work or will cause inconsistent results.
Software Upgrades Performing a Rolling Upgrade • Rolling upgrades are not intended as a means of using mixed releases of Serviceguard or HP-UX within the cluster. HP strongly recommends that you upgrade all cluster nodes as quickly as possible to the new release level. • You cannot delete Serviceguard software (via swremove) from a node while a rolling upgrade is in progress.
Software Upgrades Performing a Rolling Upgrade If the cluster fails before the rolling upgrade is complete (because of a catastrophic power failure, for example), you can restart the cluster by entering the cmruncl command from a node which has been upgraded to the latest version of the software. Keeping Kernels Consistent If you change kernel parameters as a part of doing an upgrade, be sure to change the parameters to the same values on all nodes that can run the same packages in case of failover.
Software Upgrades Example of a Rolling Upgrade Example of a Rolling Upgrade NOTE Warning messages may appear during a rolling upgrade while the node is determining what version of software is running. This is a normal occurrence and not a cause for concern. The following example shows a simple rolling upgrade on two nodes running one package each, as shown in Figure E-1.
Software Upgrades Example of a Rolling Upgrade This will cause pkg1 to be halted cleanly and moved to node 2. The Serviceguard daemon on node 1 is halted, and the result is shown in Figure E-2. Figure E-2 Running Cluster with Packages Moved to Node 2 Step 2. Upgrade node 1 to the next operating system release (“HP-UX (new)”), and install the next version of Serviceguard (“SG (new)”).
Software Upgrades Example of a Rolling Upgrade Step 3. When upgrading is finished, enter the following command on node 1 to restart the cluster on node 1. # cmrunnode -n node1 At this point, different versions of the Serviceguard daemon (cmcld) are running on the two nodes, as shown in Figure E-4. Figure E-4 Node 1 Rejoining the Cluster Step 4. Repeat the process on node 2. Halt the node, as follows: # cmhaltnode -f node2 This causes both packages to move to node 1.
Software Upgrades Example of a Rolling Upgrade Figure E-5 Running Cluster with Packages Moved to Node 1 Step 5. Move pkg2 back to its original node. Use the following commands: # cmhaltpkg pkg2 # cmrunpkg -n node2 pkg2 # cmmodpkg -e pkg2 The cmmodpkg command re-enables switching of the package, which was disabled by the cmhaltpkg command. The final running cluster is shown in Figure E-6.
Software Upgrades Example of a Rolling Upgrade Figure E-6 Appendix E Running Cluster After Upgrades 473
Software Upgrades Guidelines for Non-Rolling Upgrade Guidelines for Non-Rolling Upgrade Do a non-rolling upgrade if: • Your cluster does not meet the requirements for rolling upgrade as specified in the Release Notes for the target version of Serviceguard; or • The limitations imposed by rolling upgrades make it impractical for you to do a rolling upgrade (see “Limitations of Rolling Upgrades” on page 466); or • For some other reason you need or prefer to bring the cluster down before performing the u
Software Upgrades Performing a Non-Rolling Upgrade Performing a Non-Rolling Upgrade Limitations of Non-Rolling Upgrades The following limitations apply to non-rolling upgrades: • Binary configuration files may be incompatible between releases of Serviceguard. Do not manually copy configuration files between nodes. • You must halt the entire cluster before performing a non-rolling upgrade. Steps for Non-Rolling Upgrades Use the following steps for a non-rolling software upgrade: Step 1.
Software Upgrades Guidelines for Migrating a Cluster with Cold Install Guidelines for Migrating a Cluster with Cold Install There may be circumstances when you prefer to do a cold install of the HP-UX operating system rather than an upgrade. A cold install erases the existing operating system and data and then installs the new operating system and software; you must then restore the data. CAUTION The cold install process erases the existing software, operating system, and data.
Software Upgrades Guidelines for Migrating a Cluster with Cold Install See “Creating the Storage Infrastructure and Filesystems with LVM and VxVM” on page 221 for more information. 2. Halt the cluster applications, and then halt the cluster. 3. Do a cold install of the HP-UX operating system. For more information on the cold install process, see the HP-UX Installation and Update Guide for the target version of HP-UX: go to http://docs.hp.
Software Upgrades Guidelines for Migrating a Cluster with Cold Install 478 Appendix E
Blank Planning Worksheets F Blank Planning Worksheets This appendix reprints blank versions of the planning worksheets described in the chapter “Planning and Documenting an HA Cluster.” You can duplicate any of these worksheets that you find useful and fill them in as a part of the planning process.
Blank Planning Worksheets Worksheet for Hardware Planning Worksheet for Hardware Planning HARDWARE WORKSHEET Page ___ of ____ =============================================================================== Node Information: Host Name _____________________ Series No _____________________ Memory Capacity ____________________ Number of I/O Slots ________________ =============================================================================== LAN Information: Name of Subnet _________ Name of IP Interface
Blank Planning Worksheets Power Supply Worksheet Power Supply Worksheet POWER SUPPLY WORKSHEET Page ___ of ____ =============================================================================== SPU Power: Host Name _____________________ Power Supply _______________________ Host Name _____________________ Power Supply _______________________ =============================================================================== Disk Power: Disk Unit __________________________ Power Supply _____________________
Blank Planning Worksheets Quorum Server Worksheet Quorum Server Worksheet Quorum Server Data: ============================================================================== QS Hostname: _________________IP Address: ______________________ ============================================================================== Quorum Services are Provided for: Cluster Name: ___________________________________________________________ Host Names ____________________________________________ Host Names _________________
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet LVM Volume Group and Physical Volume Worksheet PHYSICAL VOLUME WORKSHEET Page ___ of ____ =============================================================================== Volume Group Name: ______________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet Physical Volume Name: _____________________________________________________ 484 Appendix F
Blank Planning Worksheets VxVM Disk Group and Disk Worksheet VxVM Disk Group and Disk Worksheet DISK GROUP WORKSHEET Page ___ of ____ =========================================================================== Disk Group Name: __________________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:____________________________________________________
Blank Planning Worksheets Cluster Configuration Worksheet Cluster Configuration Worksheet =============================================================================== Name and Nodes: =============================================================================== Cluster Name: __________________________ RAC Version: _______________ Node Names: _________________________________________________________ Volume Groups (for packages):________________________________________ ===================================
Blank Planning Worksheets Cluster Configuration Worksheet Autostart Delay: ___________ =============================================================================== Access Policies: User name: Host node: Role: =============================================================================== Appendix F 487
Blank Planning Worksheets Package Configuration Worksheet Package Configuration Worksheet Package Configuration File Data: ========================================================================== Package Name: __________________Package Type:______________ Primary Node: ____________________ First Failover Node:__________________ Additional Failover Nodes:__________________________________ Run Script Timeout: _____ Halt Script Timeout: _____________ Package AutoRun Enabled? ______ Local LAN Failover Allow
Blank Planning Worksheets Package Configuration Worksheet CVM Disk Groups [ignore CVM items if CVM is not being used]: cvm_vg___________cvm_dg_____________cvm_vg_______________ cvm_activation_cmd: ______________________________________________ VxVM Disk Groups: vxvm_dg_________vxvm_dg____________vxvm_dg_____________ vxvol_cmd ______________________________________________________ ________________________________________________________________________________ Logical Volumes and File Systems: fs_name_____
Blank Planning Worksheets Package Configuration Worksheet Package environment variable:________________________________________________ Package environment variable:________________________________________________ External pre-script:_________________________________________________________ External script:_____________________________________________________________ ================================================================================ NOTE 490 CVM (and CFS - Cluster File System) are supported
Blank Planning Worksheets Package Control Script Worksheet Package Control Script Worksheet =============================================================================== Name and Nodes: =============================================================================== Cluster Name: __________________________ RAC Version: _______________ Node Names: _________________________________________________________ Volume Groups (for packages):________________________________________ ===============================
Blank Planning Worksheets Package Control Script Worksheet First Lock Volume Group: | Physical Volume: | ________________ | Name on Node 1: ___________________ | Name on Node 2: ___________________ | Disk Unit No: _________ | Power Supply No: _________ =============================================================================== Cluster Lock LUN: Pathname on Node 1: ___________________ Pathname on Node 2: ___________________ Pathname on Node 3: ___________________ Pathname on Node 4: _________
Blank Planning Worksheets Package Control Script Worksheet NOTE Appendix F CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
Blank Planning Worksheets Package Control Script Worksheet 494 Appendix F
Migrating from LVM to VxVM Data Storage G Migrating from LVM to VxVM Data Storage This appendix describes how to migrate LVM volume groups to VxVM disk groups for use with the Veritas Volume Manager (VxVM), or with the Cluster Volume Manager (CVM) on systems that support it.
Migrating from LVM to VxVM Data Storage Loading VxVM Loading VxVM Before you can begin migrating data, you must install the Veritas Volume Manager software and all required VxVM licenses on all cluster nodes. This step requires each system to be rebooted, so it requires you to remove the node from the cluster before the installation, and restart the node after installation. This can be done as a part of a rolling upgrade procedure, described in Appendix E.
Migrating from LVM to VxVM Data Storage Migrating Volume Groups Migrating Volume Groups The following procedure shows how to do the migration of individual volume groups for packages that are configured to run on a given node. You should convert all the volume groups for a package at the same time. It is assumed that VxVM software and an appropriate version of HP-UX and Serviceguard have been installed on the node, and that the node has rebooted and rejoined the cluster.
Migrating from LVM to VxVM Data Storage Migrating Volume Groups utility is described along with its limitations and cautions in the Veritas Volume Manager Migration Guide for your version, available from http://www.docs.hp.com. If using the vxconvert(1M) utility, then skip the next step and go ahead to the following section. NOTE Remember that the cluster lock disk, if used, must be configured on an LVM volume group and physical volume.
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM Customizing Packages for VxVM After creating the VxVM disk group, you need to customize the Serviceguard package that will access the storage. Use the following procedure for a legacy package that will use with the Veritas Volume Manager (VxVM) disk groups. If you are using the Cluster Volume Manager (CVM), skip ahead to the next section.
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM LV[0]="/dev/vx/dsk/dg01/lvol101" LV[1]="/dev/vx/dsk/dg01/lvol102" LV[2]="/dev/vx/dsk/dg02/lvol201" LV[3]="/dev/vx/dsk/dg02/lvol202" FS[0]="/mnt_dg0101" FS[1]="/mnt_dg0102" FS[2]="/mnt_dg0201" FS[3]="/mnt_dg0202" FS_MOUNT_OPT[0]="-o FS_MOUNT_OPT[1]="-o FS_MOUNT_OPT[2]="-o FS_MOUNT_OPT[3]="-o ro" rw" ro" rw" 4.
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM Customizing Packages for CVM NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard). After creating the CVM disk group, you need to customize the Serviceguard package that will access the storage.
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM • The LV[], FS[] and FS_MOUNT_OPT[] arrays are used the same as they are for LVM. LV[] defines the logical volumes, FS[] defines the mount points, and FS_MOUNT_OPT[] defines any mount options. For example lets say we have two volumes defined in each of the two disk groups from above, lvol101 and lvol102, and lvol201 and lvol202. These are mounted on /mnt_dg0101 and /mnt_dg0102, and /mnt_dg0201 and /mnt_dg0202, respectively.
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM Then re-apply the package configuration: # cmapplyconf -P PackageName.ascii 8. Test to make sure the disk group and data are intact. 9. Deport the disk group: # vxdg deport DiskGroupName 10. Start the cluster, if it is not already running: # cmruncl This will activate the special CVM package. 11. When CVM starts up, it selects a master node, and this is the node from which you must issue the disk group configuration commands.
Migrating from LVM to VxVM Data Storage Removing LVM Volume Groups Removing LVM Volume Groups After testing the new VxVM disk groups, remove any LVM volume groups that are no longer wanted from the system using the standard LVM commands lvremove, pvremove, and vgremove. At a convenient time, you should also edit the cluster ASCII configuration file to remove the VOLUME_GROUP statements that refer to the LVM volume groups that are no longer used in the cluster.
IPv6 Network Support H IPv6 Network Support This appendix describes some of the characteristics of IPv6 network addresses.
IPv6 Network Support IPv6 Address Types IPv6 Address Types Several IPv6 types of addressing schemes are specified in the RFC 2373 (IPv6 Addressing Architecture). IPv6 addresses are 128-bit identifiers for interfaces and sets of interfaces. There are various address formats for IPv6 defined by the RFC 2373. IPv6 addresses are broadly classified as follows: The following table explains the three types of IPv6 address types: unicast, anycast, and multicast.
IPv6 Network Support IPv6 Address Types multiple groups of 16-bits of zeros. The “::” can appear only once in an address and it can be used to compress the leading, trailing, or contiguous sixteen-bit zeroes in an address. Example: fec0:1:0:0:0:0:0:1234 can be represented as fec0:1::1234. • When dealing with a mixed environment of IPv4 and IPv6 nodes there is an alternative form of IPv6 address that will be used. It is x:x:x:x:x:x:d.d.d.
IPv6 Network Support IPv6 Address Types Unicast Addresses IPv6 unicast addresses are classified into different types. They are global aggregatable unicast address, site-local address and link-local address. Typically a unicast address is logically divided as follows: Table H-2 n bits 128-n bits Subnet prefix Interface ID Interface identifiers in a IPv6 unicast address are used to identify the interfaces on a link. Interface identifiers are required to be unique on that link.
IPv6 Network Support IPv6 Address Types IPv4 Mapped IPv6 Address There is a special type of IPv6 address that holds an embedded IPv4 address. This address is used to represent the addresses of IPv4-only nodes as IPv6 addresses. These addresses are used especially by applications that support both IPv6 and IPv4. These addresses are called as IPv4 Mapped IPv6 Addresses. The format of these address is as follows: Table H-4 80 bits 16 bits zeros 32 bits FFFF IPv4 address Example: ::ffff:192.168.0.
IPv6 Network Support IPv6 Address Types Link-Local Addresses Link-local addresses have the following format: Table H-6 10 bits 1111111010 54 bits 0 64 bits interface ID Link-local address are supposed to be used for addressing nodes on a single link. Packets originating from or destined to a link-local address will not be forwarded by a router.
IPv6 Network Support IPv6 Address Types “FF” at the beginning of the address identifies the address as a multicast address. The “flgs” field is a set of 4 flags “000T”. The higher order 3 bits are reserved and must be zero. The last bit ‘T’ indicates whether it is permanently assigned or not. A value of zero indicates that it is permanently assigned otherwise it is a temporary assignment. The “scop” field is a 4-bit field which is used to limit the scope of the multicast group.
IPv6 Network Support Network Configuration Restrictions Network Configuration Restrictions Serviceguard supports IPv6 for data links only. The heartbeat IP must still be IPv4, but the package IPs can be IPv4 or IPv6. To configure IPv6, the system should be set up in what is called a dual-stack configuration, which requires the IPv6 product bundle. The restrictions for supporting IPv6 in Serviceguard are listed below. 512 • The heartbeat IP address must be IPv4.
IPv6 Network Support Network Configuration Restrictions NOTE Appendix H Even though link-local IP addresses are not supported in the Serviceguard cluster configuration, the primary link-local address on the Serviceguard primary interface will be switched over the standby during a local switch. This is because of two requirements: First, the dual stack (IPv4/IPv6) kernel requires that the primary IP address associated with an interface must always be a link-local address.
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature IPv6 Relocatable Address and Duplicate Address Detection Feature The IPv6 networking stack has a new feature, Duplicate Address Detection (DAD), that was not previously available in IPv4. When an address is being added, the DAD detects a duplicate address that is already being used on the network.
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature # TRANSPORT_NAME[index]=ip6 # NDD_NAME[index]=ip6_nd_dad_solicit_count # NDD_VALUE[index]=n Where index is the next available integer value of the nddconf file, and n is a number: either 1 to turn the feature ON or 0 to turn it OFF.
IPv6 Network Support Local Primary/Standby LAN Patterns Local Primary/Standby LAN Patterns The use of IPv6 allows a number of different patterns of failover among LAN cards configured in the cluster. This is true because each LAN card can support several IP addresses when a dual IPv4/IPv6 configuration is used. This section describes several ways in that local failover to a standby LAN can be configured.
IPv6 Network Support Example Configurations Example Configurations An example of a LAN configuration on a cluster node using both IPv4 and IPv6 addresses is shown in below. Figure H-1 Example 1: IPv4 and IPv6 Addresses in Standby Configuration Following the loss of lan0 or lan2, lan1 can adopt either address, as shown below. Figure H-2 Example 1: IPv4 and IPv6 Addresses after Failover to Standby The same LAN card can be configured with both IPv4 and IPv6 addresses, as shown in below.
IPv6 Network Support Example Configurations Figure H-3 Example 2: IPv4 and IPv6 Addresses in Standby Configuration This type of configuration allows failover of both addresses to the standby. This is shown in below.
Maximum and Minimum Values for Cluster and Package Configuration Parameters I Maximum and Minimum Values for Cluster and Package Configuration Parameters Table I-1 shows the range of possible values for cluster configuration parameters.
Maximum and Minimum Values for Cluster and Package Configuration Parameters Table I-2 shows the range of possible values for package configuration parameters. Table I-2 Package Paramet er 520 Minimum and Maximum Values of Package Configuration Parameters Minimum Value Maximum Value Run Script Timeout 10 seconds 4294 seconds if a non-zero value is specified 0 (NO_TIMEOUT) This is a recommended value.
A Access Control Policies, 193, 204 Access Control Policy, 162 Access roles, 162 active node, 27 adding a package to a running cluster, 376 adding cluster nodes advance planning, 221 adding nodes to a running cluster, 337 adding packages on a running cluster, 310 additional package resource parameter in package configuration, 189, 190 additional package resources monitoring, 81 addressing, SCSI, 137 administration adding nodes to a ruuning cluster, 337 cluster and package states, 319 halting a package, 342
hardware planning, 139 C CFS Creating a storage infrastructure, 245 creating a storage infrastructure, 245 not supported on all HP-UX versions, 29 changes in cluster membership, 64 changes to cluster allowed while the cluster is running, 346 changes to packages allowed while the cluster is running, 379 changing the volume group configuration while the cluster is running, 359 checkpoints, 441 client connections restoring in applications, 450 cluster configuring with commands, 234 redundancy of components, 36
creating physical volumes, 223 parameter in cluster manager configuration, 161 cluster with high availability disk array figure, 46, 47 clusters active/standby type, 49 larger size, 49 cmapplyconf, 242, 373 cmassistd daemon, 55 cmcheckconf, 241, 308, 372 troubleshooting, 405 cmclconfd daemon, 55 cmcld daemon, 55 and node TOC, 56 and safety timer, 56 functions, 56 runtime priority, 57 cmclnodelist bootstrap file, 204 cmdeleteconf deleting a package configuration, 377 deleting the cluster configuration, 269 c
deleting a package configuration using cmdeleteconf, 377 deleting a package from a running cluster, 377 deleting nodes while the cluster is running, 352, 360 deleting the cluster configuration using cmdeleteconf, 269 dependencies configuring, 171 designing applications to run on multiple systems, 443 detecting failures in network manager, 100 device special files (DSFs) agile addressing, 111, 466 legacy, 112 migrating cluster lock disks to, 348 disk choosing for volume groups, 223 data, 41 interfaces, 41
F failback policy package configuration file parameter, 186 used by package manager, 78 FAILBACK_POLICY parameter in package configuration file, 186 used by package manager, 78 failover controlling the speed in applications, 438 defined, 27 failover behavior in packages, 82 failover package, 71 failover policy package configuration parameter, 186 used by package manager, 75 FAILOVER_POLICY parameter in package configuration file, 186 used by package manager, 75 failure kinds of responses, 125 network commun
host IP address, 136, 146 host name, 135 I/O bus addresses, 139 I/O slot numbers, 139 LAN information, 136 LAN interface name, 136, 146 LAN traffic type, 136 memory capacity, 135 number of I/O slots, 135 planning the configuration, 134 S800 series number, 135 SPU information, 135 subnet, 136, 146 worksheet, 140 heartbeat interval parameter in cluster manager configuration, 159 heartbeat messages, 27 defined, 62 heartbeat subnet address parameter in cluster manager configuration, 156 HEARTBEAT_INTERVAL and n
L LAN Critical Resource Analysis (CRA), 358 heartbeat, 62 interface name, 136, 146 planning information, 136 LAN CRA (Critical Resource Analysis), 358 LAN failure Serviceguard behavior, 36 LAN interfaces monitoring with network manager, 100 primary and secondary, 38 LAN planning host IP address, 136, 146 traffic type, 136 LANs, standby and safety timer, 57 larger clusters, 49 legacy DSFs defined, 112 legacy package, 363 link-level addresses, 445 LLT for CVM and CFS, 60 load balancing HP-UX and Veritas DMP,
monitoring LAN interfaces in network manager, 100 mount options in control script, 191, 192 in package configuration, 191 moving a package, 343 multi-node package, 71 multi-node package configuration, 315 multi-node packages configuring, 315 multipathing and Veritas DMP, 43 automatically configured, 43 native, 43 sources of information, 43 multiple systems designing applications for, 443 N name resolution services, 209 native mutipathing defined, 43 network adding and deleting package IP addresses, 99 fail
O olrad command removing a LAN or VLAN interface, 359 online hardware maintenance by means of in-line SCSI terminators, 396 OTS/9000 support, 519 outages insulating users from, 436 P package adding and deleting package IP addresses, 99 base modules, 276 basic concepts, 36 changes allowed while the cluster is running, 379 configuring legacy, 363 failure, 125 halting, 342 legacy, 363 local interface switching, 101 modular, 275 modular and legacy, 271 modules, 275 moving, 343 optional modules, 277 parameters,
package type parameter in package configuration, 183 Package types, 26 failover, 26 multi-node, 26 system multi-node, 26 package types, 26 package_configuration file cvm_activation_cmd, 190 PACKAGE_NAME parameter in package ASCII configuration file, 183 PACKAGE_TYPE parameter in package ASCII configuration file, 183 packages deciding where and when to run, 72, 73 managed by cmcld, 57 parameters for failover, 82 parameters for cluster manager initial configuration, 62 PATH, 197 performance variables in packa
parameter in cluster manager configuration, 155 quorum and cluster reformation, 126 quorum server and safety timer, 57 blank planning worksheet, 482 installing, 217 parameters in cluster manager configuration, 155 planning, 145 status and state, 324 use in re-forming a cluster, 69 worksheet, 146 R RAID for data protection, 42 raw volumes, 439 README for database toolkits, 433 reconfiguring a package while the cluster is running, 375 reconfiguring a package with the cluster offline, 376 reconfiguring a runni
and node TOC, 56 and syslog.
parameter in cluster manager configuration, 158 status cmviewcl, 318 multi-node packages, 319 of cluster and package, 319 package IP address, 402 system log file, 403 stopping a cluster, 339 storage management, 111 SUBNET array variable in package control script, 188, 192 in sample package control script, 369 parameter in package configuration, 187, 197 subnet hardware planning, 136, 146 parameter in package configuration, 187, 197 successor_halt_timeout parameter, 185, 284 supported disks in Serviceguar
VG in sample package control script, 369 vgcfgbackup and cluster lock data, 243 VGCHANGE in package control script, 369 vgextend creating a root mirror with, 212 vgimport using to set up volume groups on another node, 226 VLAN Critical Resource Analysis (CRA), 358 Volume, 111 volume group creating for a cluster, 223 creating physical volumes for clusters, 223 deactivating before export to another node, 225 for cluster lock, 66 planning, 148 relinquishing exclusive access via TOC, 126 setting up on another