Managing Serviceguard 14th Edition, June 2007

ManualsBrandsHP ManualsSoftwareHP Serviceguard Software

Managing Serviceguard

Fourteenth Edition

Manufacturing Part Number: B3936-90117

June 2007

Summary of content (534 pages)

PAGE 1
Managing Serviceguard Fourteenth Edition Manufacturing Part Number: B3936-90117 June 2007
PAGE 2
Legal Notices © Copyright 1995-2007 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
PAGE 3
Contents 1. Serviceguard at a Glance What is Serviceguard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About Veritas CFS and CVM from Symantec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 4
Contents Configuration of the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Heartbeat Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Manual Startup of Entire Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Automatic Cluster Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 5
Contents Examples of Mirrored Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Storage on Disk Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Volume Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HP-UX Logical Volume Manager (LVM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Veritas Volume Manager (VxVM) . . . . . . . . . . . . . . . . .
PAGE 6
Contents Cluster Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Cluster Configuration Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Package Configuration Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Logical Volume and File System Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 7
Contents Specifying Maximum Number of Configured Packages . . . . . . . . . . . . . . . . . . . . . Modifying Cluster Timing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Access Control Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding Volume Groups. . . . . . . . . . . . . . . . . . . . .
PAGE 8
Contents Package Modules and Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generating the Package Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Before You Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cmmakepkg Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Next Step. . . . . . . . . . . . . . . . . . . .
PAGE 9
Contents Reconfiguring a Package on a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconfiguring a Package on a Halted Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding a Package to a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deleting a Package from a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resetting the Service Restart Counter . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Contents Reviewing the System Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Object Manager Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Serviceguard Manager Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing the System Multi-node Package Files . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 11
Contents Design for Multiple Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Design for Replicated Data Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing Applications to Run on Multiple Systems . . . . . . . . . . . . . . . . . . . . . . . . . Avoid Node-Specific Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avoid Using SPU IDs or MAC Addresses . . . . . . . . . . . . . . . .
PAGE 12
Contents Running the Rolling Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keeping Kernels Consistent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Migrating cmclnodelist entries from A.11.15 or earlier . . . . . . . . . . . . . . . . . . . . . . Example of a Rolling Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 13
Contents H. IPv6 Network Support IPv6 Address Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Textual Representation of IPv6 Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IPv6 Address Prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unicast Addresses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 14
Contents 14
PAGE 15
Tables Table 1. Printing History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 Table 3-1. Package Configuration Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76 Table 3-2. Node Lists in Sample Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79 Table 3-3. Package Failover Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82 Table 3-4.
PAGE 16
Tables 16
PAGE 17
Printing History Table 1 Printing History Printing Date Part Number Edition January 1995 B3936-90001 First June 1995 B3936-90003 Second December 1995 B3936-90005 Third August 1997 B3936-90019 Fourth January 1998 B3936-90024 Fifth October 1998 B3936-90026 Sixth December 2000 B3936-90045 Seventh September 2001 B3936-90053 Eighth March 2002 B3936-90065 Ninth June 2003 B3936-90070 Tenth June 2004 B3936-90076 Eleventh June 2005 B3936-90076 Eleventh, First reprint.
PAGE 18
HP Printing Division: Infrastructure Solutions Division Hewlett-Packard Co. 19111 Pruneridge Ave.
PAGE 19
Preface This fourteenth printing of the manual applies to Serviceguard Version A.11.18. Earlier versions are available at http://www.docs.hp.com -> High Availability -> Serviceguard. This guide describes how to configure Serviceguard to run on HP 9000 or HP Integrity servers under the HP-UX operating system. The contents are as follows: • Chapter 1, “Serviceguard at a Glance,” describes a Serviceguard cluster and provides a roadmap for using this guide.
PAGE 20
Related Publications • Appendix C, “Designing Highly Available Cluster Applications,” gives guidelines for creating cluster-aware applications that provide optimal performance in a Serviceguard environment. • Appendix D, “Integrating HA Applications with Serviceguard,” presents suggestions for integrating your existing applications with Serviceguard. • Appendix E, “Software Upgrades,” shows how to move from one Serviceguard or HP-UX release to another without bringing down your applications.
PAGE 21
— Managing HP Serviceguard for Linux, Sixth Edition, August 2006 • Documentation for your version of Veritas storage products from http://www.docs.hp.com -> High Availability -> HP Serviceguard Storage Management Suite • Before using Veritas Volume Manager (VxVM) storage with Serviceguard, refer to the Veritas documentation posted at http://docs.hp.com. From the heading Operating Environments, choose 11i v3. Then, scroll down to the section Veritas Volume Manager and File System.
PAGE 22
— Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • From http://www.docs.hp.com -> High Availability - > HP Serviceguard Extension for Faster Failover: — HP Serviceguard Extension for Faster Failover, Version A.01.00, Release Notes • From http://www.docs.hp.com -> High Availability - > Serviceguard Extension for SAP: — Managing Serviceguard Extension for SAP • From http://www.docs.hp.
PAGE 23
— HP Auto Port Aggregation Release Notes and other Auto Port Aggregation documents Problem Reporting If you have any problems with the software or documentation, please contact your local Hewlett-Packard Sales Office or Customer Service Center.
PAGE 24
PAGE 25
Serviceguard at a Glance 1 Serviceguard at a Glance This chapter introduces Serviceguard on HP-UX, and shows where to find different kinds of information in this book. The following topics are presented: • What is Serviceguard? • Using Serviceguard Manager • A Roadmap for Configuring Clusters and Packages If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4, “Planning and Documenting an HA Cluster,” on page 131.
PAGE 26
Serviceguard at a Glance What is Serviceguard? What is Serviceguard? Serviceguard allows you to create high availability clusters of HP 9000 or HP Integrity servers (or a mixture of both; see the release notes for your version for details and restrictions). A high availability computer system allows application services to continue in spite of a hardware or software failure.
PAGE 27
Serviceguard at a Glance What is Serviceguard? A multi-node package can be configured to run on one or more cluster nodes. It is considered UP as long as it is running on any of its configured nodes. In Figure 1-1, node 1 (one of two SPU's) is running failover package A, and node 2 is running package B. Each package has a separate group of disks associated with it, containing data needed by the package's applications, and a mirror copy of the data.
PAGE 28
Serviceguard at a Glance What is Serviceguard? Figure 1-2 Typical Cluster After Failover After this transfer, the failover package typically remains on the adoptive node as long the adoptive node continues running. If you wish, however, you can configure the package to return to its primary node as soon as the primary node comes back online. Alternatively, you may manually transfer control of the package back to the primary node at the appropriate time.
PAGE 29
Serviceguard at a Glance What is Serviceguard? • Mirrordisk/UX or Veritas Volume Manager, which provide disk redundancy to eliminate single points of failure in the disk subsystem; • Event Monitoring Service (EMS), which lets you monitor and detect failures that are not directly handled by Serviceguard; • disk arrays, which use various RAID levels for data protection; • HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, which eliminates failures related to power outage.
PAGE 30
Serviceguard at a Glance Using Serviceguard Manager Using Serviceguard Manager Serviceguard Manager is the graphical user interface for Serviceguard. It is available as a “plug-in” to the System Management Homepage (SMH). SMH is a web-based graphical user interface (GUI) that replaces SAM as the system administration GUI as of HP-UX 11i v3 (but you can still run the SAM terminal interface; see “Using SAM” on page 32).
PAGE 31
Serviceguard at a Glance Using Serviceguard Manager Configuring Clusters with Serviceguard Manager You can configure clusters and legacy packages in Serviceguard Manager; modular packages must be configured by means of Serviceguard commands (see “How the Package Manager Works” on page 71; Chapter 6, “Configuring Packages and Their Services,” on page 271; and “Configuring a Legacy Package” on page 363). You must have root (UID=0) access to the cluster nodes.
PAGE 32
Serviceguard at a Glance Using SAM Using SAM You can using SAM, the System Administration Manager, to do many of the HP-UX system administration tasks described in this manual (that is, tasks, such as configuring disks and filesystems, that are not specifically Serviceguard tasks). To launch SAM, enter /usr/sbin/sam on the command line. As of HP-UX 11i v3, SAM offers a Terminal User Interface (TUI) which also acts as a gateway to the web-based System Management Homepage (SMH).
PAGE 33
Serviceguard at a Glance What are the Distributed Systems Administration Utilities? What are the Distributed Systems Administration Utilities? HP Distributed Systems Administration Utilities (DSAU) simplify the task of managing multiple systems, including Serviceguard clusters.
PAGE 34
Serviceguard at a Glance A Roadmap for Configuring Clusters and Packages A Roadmap for Configuring Clusters and Packages This manual presents the tasks you need to perform in order to create a functioning HA cluster using Serviceguard. These tasks are shown in Figure 1-3. Figure 1-3 Tasks in Configuring a Serviceguard Cluster The tasks in Figure 1-3 are covered in step-by-step detail in chapters 4 through 7. HP recommends you gather all the data that is needed for configuration before you start.
PAGE 35
Understanding Serviceguard Hardware Configurations 2 Understanding Serviceguard Hardware Configurations This chapter gives a broad overview of how the Serviceguard hardware components work. The following topics are presented: • Redundancy of Cluster Components • Redundant Network Components • Redundant Disk Storage • Redundant Power Supplies • Larger Clusters Refer to the next chapter for information about Serviceguard software components.
PAGE 36
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Redundancy of Cluster Components In order to provide a high level of availability, a typical cluster uses redundant system components, for example two or more SPUs and two or more independent disks. This redundancy eliminates single points of failure. In general, the more redundancy, the greater your access to applications, data, and supportive services in the event of a failure.
PAGE 37
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Note that a package that does not access data from a disk on a shared bus can be configured to fail over to as many nodes as you have configured in the cluster (regardless of disk technology). For instance, if a package only runs local executables, it can be configured to failover to all nodes in the cluster that have local copies of those executables, regardless of the type of disk connectivity.
PAGE 38
Understanding Serviceguard Hardware Configurations Redundant Network Components Redundant Network Components To eliminate single points of failure for networking, each subnet accessed by a cluster node is required to have redundant network interfaces. Redundant cables are also needed to protect against cable failures. Each interface card is connected to a different cable, and the cables themselves are connected by a component such as a hub or a bridge.
PAGE 39
Understanding Serviceguard Hardware Configurations Redundant Network Components Figure 2-1 Redundant LANs In the figure, a two-node Serviceguard cluster has one bridged net configured with both a primary and a standby LAN card for the data/heartbeat subnet (Subnet A). Another LAN card provides an optional dedicated heartbeat LAN. Note that the primary and standby LAN segments are connected by a hub to provide a redundant data/heartbeat subnet. Each node has its own IP address for this subnet.
PAGE 40
Understanding Serviceguard Hardware Configurations Redundant Network Components NOTE You should verify that network traffic is not too high on the heartbeat/ data LAN. If traffic is too high, this LAN might not perform adequately in transmitting heartbeats if the dedicated heartbeat LAN fails. Replacing Failed Network Cards Depending on the system configuration, it is possible to replace failed network cards while the cluster is running.
PAGE 41
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Redundant Disk Storage Each node in a cluster has its own root disk, but each node is also physically connected to several other disks in such a way that more than one node can obtain access to the data and programs associated with a package it is configured for.
PAGE 42
Understanding Serviceguard Hardware Configurations Redundant Disk Storage When planning and assigning SCSI bus priority, remember that one node can dominate a bus shared by multiple nodes, depending on what SCSI addresses are assigned to the controller for each node on the shared bus. All SCSI addresses, including the addresses of all interface cards, must be unique for all devices on a shared bus.
PAGE 43
Understanding Serviceguard Hardware Configurations Redundant Disk Storage another node until the failing node is halted. Mirroring the root disk can allow the system to continue normal operation when a root disk failure occurs, and help avoid this downtime. Disk Arrays using RAID Levels and Multiple Data Paths An alternate method of achieving protection for your data is to employ a disk array with hardware RAID levels that provide data redundancy, such as RAID Level 1 or RAID Level 5.
PAGE 44
Understanding Serviceguard Hardware Configurations Redundant Disk Storage set up to trigger a package failover or to report disk failure events to a Serviceguard, to another application, or by email. For more information, refer to the manual Using High Availability Monitors (B5736-90046), available at http://docs.hp.com -> High Availability. Replacement of Failed Disk Mechanisms Mirroring provides data protection, but after a disk failure, the failed disk must be replaced.
PAGE 45
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-2 Mirrored Disks Connected for High Availability Figure 2-3 below shows a similar cluster with a disk array connected to each node on two I/O channels. See “About Multipathing” on page 43.
PAGE 46
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-3 Cluster with High Availability Disk Array Details on logical volume configuration for Serviceguard are in the chapter “Building an HA Cluster Configuration.” Sample Fibre Channel Disk Configuration In Figure 2-4 below, the root disks are shown with simple mirroring, but the shared storage is now accessed via redundant Fibre Channel switches attached to a disk array.
PAGE 47
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-4 Cluster with Fibre Channel Switched Disk Array This type of configuration uses native HP-UX or other multipathing software; see “About Multipathing” on page 43.
PAGE 48
Understanding Serviceguard Hardware Configurations Redundant Power Supplies Redundant Power Supplies You can extend the availability of your hardware by providing battery backup to your nodes and disks. HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, can provide this protection from momentary power loss. Disks should be attached to power circuits in such a way that mirror copies are attached to different power sources.
PAGE 49
Understanding Serviceguard Hardware Configurations Larger Clusters Larger Clusters You can create clusters of up to 16 nodes with Serviceguard. Clusters of up to 16 nodes may be built by connecting individual SPUs via Ethernet. The possibility of configuring a cluster consisting of 16 nodes does not mean that all types of cluster configuration behave in the same way in a 16-node configuration.
PAGE 50
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-5 Eight-Node Active/Standby Cluster Point to Point Connections to Storage Devices Some storage devices allow point-to-point connection to a large number of host nodes without using a shared SCSI bus. An example is shown in Figure 2-11, a cluster consisting of eight nodes with a SCSI interconnect. The nodes access shared data on an XP or EMC disk array configured with 16 SCSI I/O ports.
PAGE 51
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-6 Eight-Node Cluster with XP or EMC Disk Array Fibre Channel switched configurations also are supported using either an arbitrated loop or fabric login topology. For additional information about supported cluster configurations, refer to the HP Unix Servers Configuration Guide, available through your HP representative.
PAGE 52
Understanding Serviceguard Hardware Configurations Larger Clusters 52 Chapter 2
PAGE 53
Understanding Serviceguard Software Components 3 Understanding Serviceguard Software Components This chapter gives a broad overview of how the Serviceguard software components work.
PAGE 54
Understanding Serviceguard Software Components Serviceguard Architecture Serviceguard Architecture The following figure shows the main software components used by Serviceguard. This chapter discusses these components in some detail. NOTE Veritas CFS may not yet be supported on the version of HP-UX you are running; see “About Veritas CFS and CVM from Symantec” on page 29.
PAGE 55
Understanding Serviceguard Software Components Serviceguard Architecture • /usr/lbin/cmclconfd—Serviceguard Configuration Daemon • /usr/lbin/cmcld—Serviceguard Cluster Daemon • /usr/lbin/cmfileassistd—Serviceguard File Management daemon • /usr/lbin/cmlogd—Serviceguard Syslog Log Daemon • /usr/lbin/cmlvmd—Cluster Logical Volume Manager Daemon • /opt/cmom/lbin/cmomd—Cluster Object Manager Daemon • /usr/lbin/cmsnmpd—Cluster SNMP subagent (optionally running) • /usr/lbin/cmsrvassistd—Serviceguard
PAGE 56
Understanding Serviceguard Software Components Serviceguard Architecture hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c Then force inetd to re-read inetd.conf: /usr/sbin/inetd -c You can check that this did in fact disable Serviceguard by trying the following command: cmquerycl -n nodename where nodename is the name of the local system. If the command fails, you have successfully disabled Serviceguard.
PAGE 57
Understanding Serviceguard Software Components Serviceguard Architecture from the expiration of the safety timer, messages will be written to /var/adm/syslog/syslog.log and the kernel’s message buffer, and a system dump is performed.
PAGE 58
Understanding Serviceguard Software Components Serviceguard Architecture Cluster Object Manager Daemon: cmomd This daemon is responsible for providing information about the cluster to clients—external products or tools that depend on knowledge of the state of cluster objects. Clients send queries to the object manager and receive responses from it (this communication is done indirectly, through a Serviceguard API).
PAGE 59
Understanding Serviceguard Software Components Serviceguard Architecture For services, cmcld monitors the service process and, depending on the number of service retries, cmcld either restarts the service through cmsrvassistd or it causes the package to halt and moves the package to an available alternate node. Quorum Server Daemon: qs Using a quorum server is one way to break a tie and establish a quorum when the cluster is re-forming; the other way is to use a cluster lock.
PAGE 60
Understanding Serviceguard Software Components Serviceguard Architecture Utility Daemon: cmlockd Runs on every node on which cmcld is running (though currently not actually used by Serviceguard on HP-UX systems). CFS Components The HP Serviceguard Storage Management Suite offers additional components for interfacing with the Veritas Cluster File System on some current versions of HP-UX (see “About Veritas CFS and CVM from Symantec” on page 29). Documents for the management suite are posted on http://docs.
PAGE 61
Understanding Serviceguard Software Components Serviceguard Architecture • Chapter 3 cmvxping - The Serviceguard-to-Veritas daemon activates certain subsystems of the Veritas Clustered File System product. (Only present when Veritas CFS is installed.
PAGE 62
Understanding Serviceguard Software Components How the Cluster Manager Works How the Cluster Manager Works The cluster manager is used to initialize a cluster, to monitor the health of the cluster, to recognize node failure if it should occur, and to regulate the re-formation of the cluster when a node joins or leaves the cluster. The cluster manager operates as a daemon process that runs on each node.
PAGE 63
Understanding Serviceguard Software Components How the Cluster Manager Works (described further in this chapter, in “How the Package Manager Works” on page 71). Failover packages that were running on nodes that are no longer in the new cluster are transferred to their adoptive nodes. Note that if there is a transitory loss of heartbeat, the cluster may re-form with the same nodes as before.
PAGE 64
Understanding Serviceguard Software Components How the Cluster Manager Works Manual Startup of Entire Cluster A manual startup forms a cluster out of all the nodes in the cluster configuration. Manual startup is normally done the first time you bring up the cluster, after cluster-wide maintenance or upgrade, or after reconfiguration. Before startup, the same binary cluster configuration file must exist on all nodes in the cluster.
PAGE 65
Understanding Serviceguard Software Components How the Cluster Manager Works • A node halts because of a package failure. • A node halts because of a service failure. • Heavy network traffic prohibited the heartbeat signal from being received by the cluster. • The heartbeat network failed, and another network is not configured to carry heartbeat. Typically, re-formation results in a cluster with a different composition.
PAGE 66
Understanding Serviceguard Software Components How the Cluster Manager Works possibility of two sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-cluster with greater than 50% of the nodes will form the new cluster, and the cluster lock is not used. If you have a two-node cluster, you are required to configure a cluster lock.
PAGE 67
Understanding Serviceguard Software Components How the Cluster Manager Works Figure 3-2 Lock Disk or Lock LUN Operation Serviceguard periodically checks the health of the lock disk or LUN and writes messages to the syslog file if the device fails the health check. This file should be monitored for early detection of lock disk problems. If you are using a lock disk, you can choose between two lock disk options—a single or dual lock disk—based on the kind of high availability configuration you are building.
PAGE 68
Understanding Serviceguard Software Components How the Cluster Manager Works either node, and a lock disk must be an external disk. For three or four node clusters, the disk should not share a power circuit with 50% or more of the nodes.
PAGE 69
Understanding Serviceguard Software Components How the Cluster Manager Works If one of the dual lock disks fails, Serviceguard will detect this when it carries out periodic checking, and it will write a message to the syslog file. After the loss of one of the lock disks, the failure of a cluster node could cause the cluster to go down if the remaining node(s) cannot access the surviving cluster lock disk. Use of the Quorum Server as the Cluster Lock A quorum server can be used in clusters of any size.
PAGE 70
Understanding Serviceguard Software Components How the Cluster Manager Works Figure 3-3 Quorum Server Operation The quorum server runs on a separate system, and can provide quorum services for multiple clusters. No Cluster Lock Normally, you should not configure a cluster of three or fewer nodes without a cluster lock. In two-node clusters, a cluster lock is required.
PAGE 71
Understanding Serviceguard Software Components How the Package Manager Works How the Package Manager Works Packages are the means by which Serviceguard starts and halts configured applications. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator is known as the package coordinator.
PAGE 72
Understanding Serviceguard Software Components How the Package Manager Works Failover Packages A failover package starts up on an appropriate node when the cluster starts. A package failover takes place when the package coordinator initiates the start of a package on a new node. A package failover involves both halting the existing package (in the case of a service, network, or resource failure), and starting the new instance of the package.
PAGE 73
Understanding Serviceguard Software Components How the Package Manager Works Deciding When and Where to Run and Halt Failover Packages The package configuration file assigns a name to the package and includes a list of the nodes on which the package can run. Failover packages list the nodes in order of priority (i.e., the first node in the list is the highest priority node). In addition, failover packages’ files contain three parameters that determine failover behavior.
PAGE 74
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-5 Before Package Switching Figure 3-6 shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2. Package 1's IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package1’s disk and Package2’s disk.
PAGE 75
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-6 After Package Switching Failover Policy The Package Manager selects a node for a failover package to run on based on the priority list included in the package configuration file together with the failover_policy parameter, also in the configuration file.
PAGE 76
Understanding Serviceguard Software Components How the Package Manager Works If you use min_package_node as the value for the failover policy, the package will start up on the node that is currently running the fewest other packages. (Note that this does not mean the lightest load; the only thing that is checked is the number of packages currently running on the node.
PAGE 77
Understanding Serviceguard Software Components How the Package Manager Works If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8, which shows a failure on node 2: Figure 3-8 Rotating Standby Configuration after Failover NOTE Using the min_package_node policy, when node 2 is repaired and brought back into the cluster, it will then be running the fewest packages, and thus will become the new standby node.
PAGE 78
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-9 CONFIGURED_NODE Policy Packages after Failover If you use configured_node as the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.
PAGE 79
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-10 Automatic Failback Configuration before Failover Table 3-2 Node Lists in Sample Cluster Package Name NODE_NAME List FAILOVER POLICY FAILBACK POLICY pkgA node1, node4 CONFIGURED_NODE AUTOMATIC pkgB node2, node4 CONFIGURED_NODE AUTOMATIC pkgC node3, node4 CONFIGURED_NODE AUTOMATIC Node1 panics, and after the cluster reforms, pkgA starts running on node4: Figure 3-11 Chapter 3 Automatic Failback Config
PAGE 80
Understanding Serviceguard Software Components How the Package Manager Works After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1. Figure 3-12 Automatic Failback Configuration After Restart of Node 1 NOTE Setting the failback_policy to automatic can result in a package failback and application outage during a critical production period.
PAGE 81
Understanding Serviceguard Software Components How the Package Manager Works For full details of the current parameters and their default values, see Chapter 6, “Configuring Packages and Their Services,” on page 271, and the package configuration file template itself. Using the Event Monitoring Service Basic package resources include cluster nodes, LAN interfaces, and services, which are the individual processes within an application. All of these are monitored by Serviceguard directly.
PAGE 82
Understanding Serviceguard Software Components How the Package Manager Works • File system utilization • LAN health Once a monitor is configured as a package resource dependency, the monitor will notify the package manager if an event occurs showing that a resource is down. The package may then be failed over to an adoptive node. The EMS HA Monitors can also be used to report monitored events to a target application such as OpenView IT/Operations for graphical display or for operator notification.
PAGE 83
Understanding Serviceguard Software Components How the Package Manager Works Table 3-3 Package Failover Behavior (Continued) Options in Serviceguard Manager Switching Behavior Parameters in Configuration File Package fails over to the node with the fewest active packages. • Failover Policy set to minimum package node. • failover_policy set to min_package_node. Package fails over to the node that is next on the list of nodes. (Default) • Failover Policy set to configured node.
PAGE 84
Understanding Serviceguard Software Components How the Package Manager Works Table 3-3 Package Failover Behavior (Continued) Options in Serviceguard Manager Switching Behavior All packages switch following a system reset on the node when any service fails. An attempt is first made to reboot the system prior to the system reset. 84 • Service Failfast set for all services. • Auto Run set for all packages. Parameters in Configuration File • service_fail_fast_enabled set to yes for all services.
PAGE 85
Understanding Serviceguard Software Components How Packages Run How Packages Run Packages are the means by which Serviceguard starts and halts configured applications. Failover packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 150 packages per cluster and a total of 900 services per cluster.
PAGE 86
Understanding Serviceguard Software Components How Packages Run package, that node switching is disabled for the package on particular nodes, or that the package has a dependency that is not being met. When a package has failed on one node and is enabled to switch to another node, it will start up automatically in a new location where its dependencies are met. This process is known as package switching, or remote switching.
PAGE 87
Understanding Serviceguard Software Components How Packages Run Figure 3-13 Legacy Package Time Line Showing Important Events The following are the most important moments in a package’s life: 1. Before the control script starts. (For modular packages, this is the master control script.) 2. During run script execution. (For modular packages, during control script execution to start the package.) 3. While services are running 4.
PAGE 88
Understanding Serviceguard Software Components How Packages Run Before the Control Script Starts First, a node is selected. This node must be in the package’s node list, it must conform to the package’s failover policy, and any resources required by the package must be available on the chosen node. One resource is the subnet that is monitored for the package. If the subnet is not available, the package cannot start on this node.
PAGE 89
Understanding Serviceguard Software Components How Packages Run Figure 3-14 Package Time Line (Legacy Package) At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error. NOTE This diagram is specific to legacy packages. Modular packages also run external scripts and “pre-scripts” as explained above.
PAGE 90
Understanding Serviceguard Software Components How Packages Run the package is running. If a number of Restarts is specified for a service in the package control script, the service may be restarted if the restart count allows it, without re-running the package run script. Normal and Abnormal Exits from the Run Script Exit codes on leaving the run script determine what happens to the package next.
PAGE 91
Understanding Serviceguard Software Components How Packages Run legacy package; for more information about configuring services in modular packages, see the discussion starting on page 289, and the comments in the package configuration template file.
PAGE 92
Understanding Serviceguard Software Components How Packages Run When a Service, Subnet, or Monitored Resource Fails, or a Dependency is Not Met What happens when something goes wrong? If a service fails and there are no more restarts, if a subnet fails and there are no standbys, if a configured resource fails, or if a configured dependency on a special-purpose package is not met, then a failover package will halt on its current node and, depending on the setting of the package switching flags, may be resta
PAGE 93
Understanding Serviceguard Software Components How Packages Run NOTE If you use cmhaltpkg command with the -n option, the package is halted only if it is running on that node. The cmmodpkg command cannot be used to halt a package, but it can disable switching either on particular nodes or on all nodes. A package can continue running when its switching has been disabled, but it will not be able to start on other nodes if it stops running on its current node.
PAGE 94
Understanding Serviceguard Software Components How Packages Run Figure 3-15 Legacy Package Time Line for Halt Script Execution At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). Also, if the halt script execution is not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script. During halt script execution, messages are written to a log file.
PAGE 95
Understanding Serviceguard Software Components How Packages Run • 1—abnormal exit, also known as no_restart exit. The package did not halt normally. Services are killed, and the package is disabled globally. It is not disabled on the current node, however. • Timeout—Another type of exit occurs when the halt_script_timeout is exceeded. In this scenario, the package is killed and disabled globally. It is not disabled on the current node, however.
PAGE 96
Understanding Serviceguard Software Components How Packages Run Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Node Failfast Enabled Service Failfast Enabled HP-UX Status on Primary after Error Run Script Exit 2 YES Either Setting system reset No N/A (system reset) Yes Run Script Exit 2 NO Either Setting Running No No Yes Run Script Timeout YES
PAGE 97
Understanding Serviceguard Software Components How Packages Run Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Node Failfast Enabled Service Failfast Enabled HP-UX Status on Primary after Error Service Failure Either Setting NO Running Yes No Yes Loss of Network YES Either Setting system reset No N/A (system reset) Yes Loss of Network NO Eithe
PAGE 98
Understanding Serviceguard Software Components How the Network Manager Works How the Network Manager Works The purpose of the network manager is to detect and recover from network card and cable failures so that network services remain highly available to clients. In practice, this means assigning IP addresses for each package to the primary LAN interface card on the node where the package is running and monitoring the health of all interfaces, switching them when necessary.
PAGE 99
Understanding Serviceguard Software Components How the Network Manager Works Both stationary and relocatable IP addresses will switch to a standby LAN interface in the event of a LAN card failure. In addition, relocatable addresses (but not stationary addresses) can be taken over by an adoptive node if control of the package is transferred. This means that applications can access the package via its relocatable address without knowing which node the package currently resides on.
PAGE 100
Understanding Serviceguard Software Components How the Network Manager Works Monitoring LAN Interfaces and Detecting Failure At regular intervals, Serviceguard polls all the network interface cards specified in the cluster configuration file. Network failures are detected within each single node in the following manner. One interface on the node is assigned to be the poller.
PAGE 101
Understanding Serviceguard Software Components How the Network Manager Works This option is not suitable for all environments. Before choosing it, be sure these conditions are met: — All bridged nets in the cluster should have more than two interfaces each. — Each primary interface should have at least one standby interface, and it should be connected to a standby switch. — The primary switch should be directly connected to its standby.
PAGE 102
Understanding Serviceguard Software Components How the Network Manager Works Within the Ethernet family, local switching is supported in the following configurations: • 1000Base-SX and 1000Base-T • 1000Base-T or 1000BaseSX and 100Base-T On HP-UX 11i, however, Jumbo Frames can only be used when the 1000Base-T or 1000Base-SX cards are configured. The 100Base-T and 10Base-T do not support Jumbo Frames.
PAGE 103
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-16 Cluster Before Local Network Switching Node 1 and Node 2 are communicating over LAN segment 2. LAN segment 1 is a standby. In Figure 3-17, we see what would happen if the LAN segment 2 network interface card on Node 1 were to fail.
PAGE 104
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-17 Cluster After Local Network Switching As the standby interface takes over, IP addresses will be switched to the hardware path associated with the standby interface. The switch is transparent at the TCP/IP level. All applications continue to run on their original nodes. During this time, IP traffic on Node 1 will be delayed as the transfer occurs.
PAGE 105
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-18 Local Switching After Cable Failure Local network switching will work with a cluster containing one or more nodes. You may wish to design a single-node cluster in order to take advantage of this local network switching feature in situations where you need only one node and do not wish to set up a more complex cluster.
PAGE 106
Understanding Serviceguard Software Components How the Network Manager Works Remote Switching A remote switch (that is, a package switch) involves moving packages and their associated IP addresses to a new system. The new system must already have the same subnetwork configured and working properly, otherwise the packages will not be started. With remote switching, TCP connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically.
PAGE 107
Understanding Serviceguard Software Components How the Network Manager Works recovery for environments which require high availability. Port aggregation capability is sometimes referred to as link aggregation or trunking. APA is also supported on dual-stack kernel. Once enabled, each link aggregate can be viewed as a single logical link of multiple physical ports with only one IP and MAC address.
PAGE 108
Understanding Serviceguard Software Components How the Network Manager Works Both the Single and Dual ported LANs in the non-aggregated configuration have four LAN cards, each associated with a separate non-aggregated IP address and MAC address, and each with its own LAN name (lan0, lan1, lan2, lan3). When these ports are aggregated all four ports are associated with a single IP address and MAC address.
PAGE 109
Understanding Serviceguard Software Components How the Network Manager Works failover of VLAN interfaces when failure is detected. Failure of a VLAN interface is typically the result of the failure of the underlying physical NIC port or aggregated (APA) ports. Configuration Restrictions HP-UX allows up to 1024 VLANs to be created from a physical NIC port.
PAGE 110
Understanding Serviceguard Software Components How the Network Manager Works 1. VLAN heartbeat networks must be configured on separate physical NICs or APA aggregates, to avoid single points of failure. 2. Heartbeats are still recommended on all cluster networks, including VLANs. 3. If you are using VLANs, but decide not to use VLANs for heartbeat networks, heartbeats are recommended for all other physical networks or APA aggregates specified in the cluster configuration file.
PAGE 111
Understanding Serviceguard Software Components Volume Managers for Data Storage Volume Managers for Data Storage A volume manager is a tool that lets you create units of disk storage known as storage groups. Storage groups contain logical volumes for use on single systems and in high availability clusters. In Serviceguard clusters, storage groups are activated by package control scripts.
PAGE 112
Understanding Serviceguard Software Components Volume Managers for Data Storage For instructions on migrating a system to agile addressing, see the white paper Migrating from HP-UX 11i v2 to HP-UX 11i v3 at http://docs.hp.com. NOTE It is possible, though not a best practice, to use legacy DSFs (that is, DSFs using the older naming convention) on some nodes after migrating to agile addressing on others; this allows you to migrate different nodes at different times, if necessary.
PAGE 113
Understanding Serviceguard Software Components Volume Managers for Data Storage Each of two nodes also has two (non-shared) internal disks which are used for the root file system, swap etc. Each shared storage unit has three disks, The device file names of the three disks on one of the two storage units are c0t0d0, c0t1d0, and c0t2d0. On the other, they are c1t0d0, c1t1d0, and c1t2d0.
PAGE 114
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-21 Mirrored Physical Disks Figure 3-22 shows the mirrors configured into LVM volume groups, shown in the figure as /dev/vgpkgA and /dev/vgpkgB. The volume groups are activated by Serviceguard packages for use by highly available applications.
PAGE 115
Understanding Serviceguard Software Components Volume Managers for Data Storage Examples of Storage on Disk Arrays Figure 3-23 shows an illustration of storage configured on a disk array. Physical disks are configured by an array utility program into logical units or LUNs which are then seen by the operating system. Figure 3-23 Physical Disks Combined into LUNs NOTE LUN definition is normally done using utility programs provided by the disk array manufacturer.
PAGE 116
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-24 Multiple Paths to LUNs Finally, the multiple paths are configured into volume groups as shown in Figure 3-25.
PAGE 117
Understanding Serviceguard Software Components Volume Managers for Data Storage Types of Volume Manager Serviceguard allows a choice of volume managers for data storage: • HP-UX Logical Volume Manager (LVM) and (optionally) Mirrordisk/UX • Veritas Volume Manager for HP-UX (VxVM)—Base and add-on Products • Veritas Cluster Volume Manager for HP-UX (CVM), if available (see “About Veritas CFS and CVM from Symantec” on page 29) Separate sections in Chapters 5 and 6 explain how to configure cluster storage
PAGE 118
Understanding Serviceguard Software Components Volume Managers for Data Storage Veritas Volume Manager (VxVM) The Base Veritas Volume Manager for HP-UX (Base-VXVM) is provided at no additional cost with HP-UX 11i. This includes basic volume manager features, including a Java-based GUI, known as VEA. It is possible to configure cluster storage for Serviceguard with only Base-VXVM. However, only a limited set of features is available.
PAGE 119
Understanding Serviceguard Software Components Volume Managers for Data Storage Veritas Cluster Volume Manager (CVM) NOTE CVM (and CFS - Cluster File System) are supported on some, but not all, current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard). You may choose to configure cluster storage with the Veritas Cluster Volume Manager (CVM) instead of the Volume Manager (VxVM).
PAGE 120
Understanding Serviceguard Software Components Volume Managers for Data Storage CVM 4.1 and later can be used with Veritas Cluster File System (CFS) in Serviceguard. Several of the HP Serviceguard Storage Management Suite bundles include features to enable both CVM and CFS.
PAGE 121
Understanding Serviceguard Software Components Volume Managers for Data Storage Redundant Heartbeat Subnet Required HP recommends that you configure all subnets that connect cluster nodes as heartbeat networks; this increases protection against multiple faults at no additional cost. Heartbeat configurations are configured differently depending on whether you are using CVM 3.5 or 4.1 and later.
PAGE 122
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Logical Volume Manager (LVM) Mirrordisk/UX Shared Logical Volume Manager (SLVM) 122 Advantages • Software is provided with all versions of HP-UX. • Provides up to 3-way mirroring using optional Mirrordisk/UX software. • Dynamic multipathing (DMP) is active by default as of HP-UX 11i v3.
PAGE 123
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Base-VxVM Veritas Volume Manager— Full VxVM product B9116AA (VxVM 3.5) B9116BA (VxVM 4.1) B9116CA (VxVM 5.0) Chapter 3 Advantages Tradeoffs • Software is supplied free with HP-UX 11i releases. • Cannot be used for a cluster lock • Java-based administration through graphical user interface. • root/boot disk supported only on VxVM 3.
PAGE 124
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Veritas Cluster Volume Manager— B9117AA (CVM 3.5) B9117BA (CVM 4.1) B9117CA (CVM 5.0) Advantages • Provides volume configuration propagation. • Disk groups must be configured on a master node • Supports cluster shareable disk groups. • • Package startup time is faster than with VxVM. CVM can only be used with up to 8 cluster nodes.
PAGE 125
Understanding Serviceguard Software Components Responses to Failures Responses to Failures Serviceguard responds to different kinds of failures in specific ways. For most hardware failures, the response is not user-configurable, but for package and service failures, you can choose the system’s response, within limits.
PAGE 126
Understanding Serviceguard Software Components Responses to Failures 2. If the node cannot get a quorum (if it cannot get the cluster lock) then 3. The node halts (system reset). Example Situation. Assume a two-node cluster, with Package1 running on SystemA and Package2 running on SystemB. Volume group vg01 is exclusively activated on SystemA; volume group vg02 is exclusively activated on SystemB. Package IP addresses are assigned to SystemA and SystemB respectively. Failure.
PAGE 127
Understanding Serviceguard Software Components Responses to Failures For more information on cluster failover, see the white paper Optimizing Failover Time in a Serviceguard Environment at http://www.docs.hp.com->High Availability->Serviceguard->White Papers.
PAGE 128
Understanding Serviceguard Software Components Responses to Failures Serviceguard does not respond directly to power failures, although a loss of power to an individual cluster component may appear to Serviceguard like the failure of that component, and will result in the appropriate switching behavior. Power protection is provided by HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust.
PAGE 129
Understanding Serviceguard Software Components Responses to Failures NOTE In a very few cases, Serviceguard will attempt to reboot the system before a system reset when this behavior is specified. If there is enough time to flush the buffers in the buffer cache, the reboot succeeds, and a system reset does not take place. Either way, the system will be guaranteed to come down within a predetermined number of seconds.
PAGE 130
Understanding Serviceguard Software Components Responses to Failures 130 Chapter 3
PAGE 131
Planning and Documenting an HA Cluster 4 Planning and Documenting an HA Cluster Building a Serviceguard cluster begins with a planning phase in which you gather and record information about all the hardware and software components of the configuration.
PAGE 132
Planning and Documenting an HA Cluster General Planning General Planning A clear understanding of your high availability objectives will help you to define your hardware requirements and design your system. Use the following questions as a guide for general planning: 1. What applications must continue to be available in the event of a failure? 2. What system resources (processing power, networking, SPU, memory, disk space) are needed to support these applications? 3.
PAGE 133
Planning and Documenting an HA Cluster General Planning additional disk hardware for shared data storage. If you intend to expand your cluster without the need to bring it down, careful planning of the initial configuration is required. Use the following guidelines: • Remember the rules for cluster locks when considering expansion. A one-node cluster does not require a cluster lock. A two-node cluster must have a cluster lock. In clusters larger than 3 nodes, a cluster lock is strongly recommended.
PAGE 134
Planning and Documenting an HA Cluster Hardware Planning Hardware Planning Hardware planning requires examining the physical hardware itself. One useful procedure is to sketch the hardware configuration in a diagram that shows adapter cards and buses, cabling, disks and peripherals. A sample diagram for a two-node cluster is shown in Figure 4-1. NOTE Under agile addressing, the storage units in this example would have names such as disk1, disk2, disk3, etc.
PAGE 135
Planning and Documenting an HA Cluster Hardware Planning Create a similar sketch for your own cluster, and record the information on the Hardware Worksheet. Indicate which device adapters occupy which slots, and determine the bus address for each adapter. Update the details as you do the cluster configuration (described in Chapter 5). Use one form for each SPU.
PAGE 136
Planning and Documenting an HA Cluster Hardware Planning Serviceguard communication relies on the exchange of DLPI (Data Link Provider Interface) traffic at the data link layer and the UDP/TCP (User Datagram Protocol/Transmission Control Protocol) traffic at the Transport layer between cluster nodes. LAN Information While a minimum of one LAN interface per subnet is required, at least two LAN interfaces, one primary and one or more standby, are needed to eliminate single points of network failure.
PAGE 137
Planning and Documenting an HA Cluster Hardware Planning When there is a primary and a standby network card, Serviceguard needs to determine when a card has failed, so it knows whether to fail traffic over to the other card. The configuration file specifies one of two ways to decide when the network interface card has failed: • INOUT • INONLY_OR_INOUT The default is INOUT. See “Monitoring LAN Interfaces and Detecting Failure” on page 100 for more information.
PAGE 138
Planning and Documenting an HA Cluster Hardware Planning SCSI address must be uniquely set on the interface cards in all four systems, and must be high priority addresses.
PAGE 139
Planning and Documenting an HA Cluster Hardware Planning Disk I/O Information This part of the worksheet lets you indicate where disk device adapters are installed. Enter the following items on the worksheet for each disk connected to each disk device adapter on the node: Bus Type Indicate the type of bus. Supported busses are Fibre Channel and SCSI. Slot Number Indicate the slot number in which the interface card is inserted in the backplane of the computer.
PAGE 140
Planning and Documenting an HA Cluster Hardware Planning Hardware Configuration Worksheet The following worksheet will help you organize and record your specific cluster hardware configuration. This worksheet is an example; blank worksheets are in Appendix F. Make as many copies as you need. Complete the worksheet and keep it for future reference.
PAGE 141
Planning and Documenting an HA Cluster Power Supply Planning Power Supply Planning There are two sources of power for your cluster which you will have to consider in your design: line power and uninterruptible power sources (UPS). Loss of a power circuit should not bring down the cluster. Frequently, servers, mass storage devices, and other hardware have two or three separate power supplies, so they can survive the loss of power to one or more power supplies or power circuits.
PAGE 142
Planning and Documenting an HA Cluster Power Supply Planning Other Unit Enter the number of any other unit. Power Supply Enter the power supply unit number of the UPS to which the host or other device is connected. Be sure to follow UPS and cabinet power limits as well as SPU power limits. Power Supply Configuration Worksheet The following worksheet will help you organize and record your specific power supply configuration. This worksheet is an example; blank worksheets are in Appendix F.
PAGE 143
Planning and Documenting an HA Cluster Power Supply Planning Unit Name __________________________ Power Supply _____________________ Unit Name __________________________ Power Supply _____________________ Chapter 4 143
PAGE 144
Planning and Documenting an HA Cluster Cluster Lock Planning Cluster Lock Planning The purpose of the cluster lock is to ensure that only one new cluster is formed in the event that exactly half of the previously clustered nodes try to form a new cluster. It is critical that only one new cluster is formed and that it alone has access to the disks specified in its packages. You can specify an LVM lock disk, a lock LUN, or a quorum server as the cluster lock.
PAGE 145
Planning and Documenting an HA Cluster Cluster Lock Planning Planning for Expansion Bear in mind that a cluster with more than 4 nodes cannot use a lock disk or lock LUN. Thus, if you plan to add enough nodes to bring the total to more than 4, you should use a quorum server. Using a Quorum Server The Quorum Server is described in detail under “Use of the Quorum Server as the Cluster Lock” on page 69. See also “Cluster Lock” on page 65.
PAGE 146
Planning and Documenting an HA Cluster Cluster Lock Planning Quorum Server Worksheet Use the Quorum Server Worksheet to identify a quorum server for use with one or more clusters. You should also enter quorum server host and timing parameters on the Cluster Configuration Worksheet. Blank worksheets are in Appendix F. Make as many copies as you need. Fill out the worksheet and keep it for future reference. On the QS worksheet, enter the following: Quorum Server Host Enter the host name for the quorum server.
PAGE 147
Planning and Documenting an HA Cluster Cluster Lock Planning Host Names ____________________________________________ Chapter 4 147
PAGE 148
Planning and Documenting an HA Cluster LVM Planning LVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM), or using Veritas VxVM software (and CVM if available) as described in the next section. When designing your disk layout using LVM, you should consider the following: • The root disk should belong to its own volume group.
PAGE 149
Planning and Documenting an HA Cluster LVM Planning LVM Worksheet The following worksheet will help you organize and record your specific physical disk configuration. This worksheet is an example; blank worksheets are in Appendix F. Make as many copies as you need. Fill out the worksheet and keep it for future reference. This worksheet only includes volume groups and physical volumes.
PAGE 150
Planning and Documenting an HA Cluster LVM Planning 150 Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Chapter 4
PAGE 151
Planning and Documenting an HA Cluster CVM and VxVM Planning CVM and VxVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM, described in the previous section), or using Veritas VxVM and CVM software. NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard).
PAGE 152
Planning and Documenting an HA Cluster CVM and VxVM Planning • A cluster lock disk must be configured into an LVM volume group; you cannot use a VxVM or CVM disk group. (See “Cluster Lock Planning” on page 144 for information about cluster lock options.) • VxVM disk group names should not be entered into the cluster configuration file. These names are not inserted into the cluster configuration file by cmquerycl.
PAGE 153
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Configuration Planning A cluster should be designed to provide the quickest possible recovery from failures. The actual time required to recover from a failure depends on several factors: • The length of the cluster heartbeat interval and node timeout. See the parameter descriptions for HEARTBEAT_INTERVAL and NODE_TIMEOUT under “Cluster Configuration Parameters” on page 154 for recommendations.
PAGE 154
Planning and Documenting an HA Cluster Cluster Configuration Planning Heartbeat Subnet and Re-formation Time The speed of cluster re-formation is partially dependent on the type of heartbeat network that is used. If two or more heartbeat subnets are used, the one with the fastest failover time is used. Cluster Configuration Parameters You need to define a set of cluster parameters. These are stored in the binary cluster configuration file, which is distributed to each node in the cluster.
PAGE 155
Planning and Documenting an HA Cluster Cluster Configuration Planning QS_HOST The name or IP address of a host system outside the current cluster that is providing quorum server functionality. This parameter is only used when you employ a quorum server for tie-breaking services in the cluster. QS_POLLING_INTERVAL The time (in microseconds) between attempts to contact the quorum server to make sure it is running. Default is 300,000,000 microseconds (5 minutes).
PAGE 156
Planning and Documenting an HA Cluster Cluster Configuration Planning NODE_NAME The hostname of each system that will be a node in the cluster. Do not use the full domain name. For example, enter ftsys9, not ftsys9.cup.hp.com. A Serviceguard cluster can contain up to 16 nodes (though not in all third-party configurations; see “Veritas Cluster Volume Manager (CVM)” on page 119, and the latest Release Notes for your version of Serviceguard).
PAGE 157
Planning and Documenting an HA Cluster Cluster Configuration Planning Heartbeat configuration requirements: A minimum Serviceguard configuration on HP-UX 11i v2 or 11i v3 needs two network interface cards for the heartbeat in all cases, using one of the following configurations: • Two heartbeat subnets; or • One heartbeat subnet with a standby; or • One heartbeat subnet using APA with two physical ports in hot standby mode or LAN monitor mode.
PAGE 158
Planning and Documenting an HA Cluster Cluster Configuration Planning The use of a private heartbeat network is not advisable if you plan to use Remote Procedure Call (RPC) protocols and services. RPC assumes that each network adapter device or I/O card is connected to a route-able network. An isolated or private heartbeat LAN is not route-able, and could cause an RPC request-reply, directed to that LAN, to risk time-out without being serviced.
PAGE 159
Planning and Documenting an HA Cluster Cluster Configuration Planning You cannot create a dual cluster-lock configuration using LUNs. FIRST_CLUSTER_LOCK_PV, SECOND_CLUSTER_LOCK_PV The name of the physical volume within the Lock Volume Group that will have the cluster lock written on it. Used on only if a lock disk is used for tie-breaking services. This parameter is FIRST_CLUSTER_LOCK_PV for the first physical lock volume and SECOND_CLUSTER_LOCK_PV for the second physical lock volume.
PAGE 160
Planning and Documenting an HA Cluster Cluster Configuration Planning NODE_TIMEOUT The time, in microseconds, after which a node may decide that another node has become unavailable and initiate cluster reformation. Maximum value: 60,000,000 microseconds (60 seconds). Minimum value: 2 * HEARTBEAT_INTERVAL Default value: 2,000,000 microseconds (2 seconds).
PAGE 161
Planning and Documenting an HA Cluster Cluster Configuration Planning The amount of time a node waits before it stops trying to join a cluster during automatic cluster startup. In the cluster configuration file, this parameter is AUTO_START_TIMEOUT. All nodes wait this amount of time for other nodes to begin startup before the cluster completes the operation. The time should be selected based on the slowest boot time in the cluster.
PAGE 162
Planning and Documenting an HA Cluster Cluster Configuration Planning Access Control Policies (also known as Role Based Access) For each policy, specify USER_NAME, USER_HOST, and USER_ROLE. Policies set in the configuration file of a cluster and its packages must not be conflicting or redundant. For more information, see “Access Roles” on page 204. FAILOVER_OPTIMIZATION You will only see this parameter if you have installed Serviceguard Extension for Faster Failover, a separately purchased product.
PAGE 163
Planning and Documenting an HA Cluster Cluster Configuration Planning Name and Nodes: =============================================================================== Cluster Name: ___ourcluster_______________ Node Names: ____node1_________________ ____node2_________________ Maximum Configured Packages: ______12________ =============================================================================== Quorum Server Data: =============================================================================== Quorum S
PAGE 164
Planning and Documenting an HA Cluster Cluster Configuration Planning | Disk Unit No: ________ | Power Supply No: ________ =========================================================================== Cluster Lock LUN: Pathname on Node 1: ___________________ Pathname on Node 2: ___________________ Pathname on Node 3: ___________________ Pathname on Node 4: ___________________ =========================================================================== Timing Parameters: ====================================
PAGE 165
Planning and Documenting an HA Cluster Package Configuration Planning Package Configuration Planning Planning for packages involves assembling information about each group of highly available services. NOTE As of Serviceguard A.11.18, there is a new and simpler way to configure packages.
PAGE 166
Planning and Documenting an HA Cluster Package Configuration Planning failed node are deactivated on the failed node and activated on the adoptive node. In order for this to happen, you must configure the volume groups so that they can be transferred from the failed node to the adoptive node.
PAGE 167
Planning and Documenting an HA Cluster Package Configuration Planning NOTE Do not use /etc/fstab to mount file systems that are used by Serviceguard packages. Planning Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS) NOTE CVM and CFS are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard).
PAGE 168
Planning and Documenting an HA Cluster Package Configuration Planning CVM 4.1 and later and the SG-CFS-pkg require you to configure multiple heartbeat networks, or a single heartbeat with a standby. Using APA, Infiniband, or VLAN interfaces as the heartbeat network is not supported. CVM 4.1 and later with CFS CFS (Veritas Cluster File System) is supported for use with Veritas Cluster Volume Manager Version 4.1 and later. The system multi-node package SG-CFS-pkg manages the cluster’s volumes.
PAGE 169
Planning and Documenting an HA Cluster Package Configuration Planning CAUTION Once you create the disk group and mount point packages, it is critical that you administer the cluster with the cfs commands, including cfsdgadm, cfsmntadm, cfsmount, and cfsumount. If you use the general commands such as mount and umount, it could cause serious problems such as writing to the local file system instead of the cluster file system.
PAGE 170
Planning and Documenting an HA Cluster Package Configuration Planning When adding packages, be sure not to exceed the value of max_configured_packages as defined in the cluster configuration file. (see “Cluster Configuration Parameters” on page 154). You can modify this parameter while the cluster is running if you need to.
PAGE 171
Planning and Documenting an HA Cluster Package Configuration Planning Serviceguard will start up resource monitoring for automatic resources automatically when the Serviceguard cluster daemon starts up on the node. Serviceguard will not attempt to start deferred resource monitoring during node startup, but will start monitoring these resources when the package runs. The following is an example of how to configure deferred and automatic resources.
PAGE 172
Planning and Documenting an HA Cluster Package Configuration Planning In Serviceguard A.11.17, package dependencies are supported only for use with certain applications specified by HP, such as the multi-node and system multi-node packages that HP supplies for use with Veritas Cluster File System (CFS) on systems that support it. As of Serviceguard A.11.
PAGE 173
Planning and Documenting an HA Cluster Package Configuration Planning — a failover package whose failover_policy is configured_node. • pkg2 cannot be a failover package whose failover_policy is min_package_node. • pkg2’s node list (see node_name, page 282) must contain all of the nodes on pkg1’s. — Preferably the nodes should be listed in the same order if the dependency is between packages whose failover_policy is configured_node; cmcheckconf and cmapplyconf will warn you if they are not.
PAGE 174
Planning and Documenting an HA Cluster Package Configuration Planning successor of a package depends on that package; in our example, pkg1 is a successor of pkg2; conversely pkg2 can be referred to as a predecessor of pkg1.) Dragging Rules The priority parameter gives you a way to influence the startup, failover, and failback behavior of a set of failover packages that have a configured_node failover_policy, when one or more of those packages depend on another or others.
PAGE 175
Planning and Documenting an HA Cluster Package Configuration Planning HP recommends assigning values in increments of 20 so as to leave gaps in the sequence; otherwise you may have to shuffle all the existing priorities when assigning priority to a new package. no_priority, the default, is treated as a lower priority than any numerical value. 3.
PAGE 176
Planning and Documenting an HA Cluster Package Configuration Planning If pkg1 depends on pkg2, and pkg1’s priority is lower than or equal to pkg2’s, pkg2’s node order dominates. Assuming pkg2’s node order is node1, node2, node3, then: • On startup: — pkg2 will start on node1, or node2 if node1 is not available or does not at present meet all of its dependencies, etc.
PAGE 177
Planning and Documenting an HA Cluster Package Configuration Planning — if pkg2 has failed back to node1 and node1 does not meet all of pkg1’s dependencies, pkg1 will halt. If pkg1 depends on pkg2, and pkg1’s priority is higher than pkg2’s, pkg1’s node order dominates. Assuming pkg1’s node order is node1, node2, node3, then: • On startup: — pkg1 will select node1 to start on. — pkg2 will start on node1, provided it can run there (no matter where node1 appears on pkg2’s node_name list).
PAGE 178
Planning and Documenting an HA Cluster Package Configuration Planning But you also need to weigh the relative importance of the packages. If pkg2 runs a database that is central to your business, you probably want it to run undisturbed, no matter what happens to application packages that depend on it. In this case, the database package should have the highest priority. Note that, if no priorities are set, the dragging rules favor a package that is depended on over a package that depends on it.
PAGE 179
Planning and Documenting an HA Cluster Package Configuration Planning • During package execution, after volume-groups and file systems are activated, and IP addresses are assigned, and before the service and resource functions are executed; and again, in the reverse order, on package shutdown. These scripts are invoked by external_script (see page 297). The scripts are also run when the package is validated by cmcheckconf and cmapplyconf, and must have an entry point for validation; see below.
PAGE 180
Planning and Documenting an HA Cluster Package Configuration Planning uses a parameter PEV_MONITORING_INTERVAL, defined in the package configuration file, to periodically poll the application it wants to monitor; for example: PEV_MONITORING_INTERVAL 60 At validation time, the sample script makes sure the PEV_MONITORING_INTERVAL and the monitoring service are configured properly; at start and stop time it prints out the interval to the log file. #!/bin/sh # Source utility functions.
PAGE 181
Planning and Documenting an HA Cluster Package Configuration Planning sg_log 0 "ERROR: PEV_MONITORING_INTERVAL value ($PEV_MONITORING_INTERVAL) not within legal limits!" ret=1 fi # check monitoring service we are expecting for this package is configured while (( i < ${#SG_SERVICE_NAME[*]} )) do case ${SG_SERVICE_CMD[i]} in *monitor.
PAGE 182
Planning and Documenting an HA Cluster Package Configuration Planning return 0 } typeset -i exit_val=0 case ${1} in start) start_command $* exit_val=$? ;; stop) stop_command $* exit_val=$? ;; validate) validate_command $* exit_val=$? ;; *) sg_log 0 "Unknown entry point $1" ;; esac exit $exit_val For more information about integrating an application with Serviceguard, see the white paper Framework for HP Serviceguard Toolkits, which includes a suite of customizable scripts.
PAGE 183
Planning and Documenting an HA Cluster Package Configuration Planning To avoid this situation, it is a good idea to specify a run_script_timeout and halt_script_timeout for all packages, especially packages that use Serviceguard commands in their external scripts. If a timeout is not specified and your configuration has a command loop as described above, inconsistent results can occur, including a hung cluster.
PAGE 184
Planning and Documenting an HA Cluster Package Configuration Planning the first adoptive node name, then the second adoptive node name, followed, in order of preference, by additional node names. In case of a failover, control of the package will be transferred to the next adoptive node name listed in the package configuration file, or (if that node is not available or cannot run the package at that time) to the next node in the list, and so on.
PAGE 185
Planning and Documenting an HA Cluster Package Configuration Planning NOTE If the package halt function fails with “exit 1”, Serviceguard does not halt the node, but sets no_restart for the package, which disables package switching (auto_run), thereby preventing the package from starting on any adoptive node. Possible values are yes and no. The default is no. run_script_timeout and halt_script_timeout The time allowed for package to start and halt, respectively.
PAGE 186
Planning and Documenting an HA Cluster Package Configuration Planning script_log_file The script log file documents the package run and halt activities. More details in Chapter 6, under script_log_file (see page 285). log_level Determines the amount of information printed to stdout when the package is validated, and to the script_log_file (see page 285) when the package is started and halted. Valid values are 0 through 5; more details in Chapter 6 under log_level (see page 285).
PAGE 187
Planning and Documenting an HA Cluster Package Configuration Planning selected as the Package Failover Policy, the primary node is now running fewer packages than the current node. See also “About Package Dependencies” on page 171. priority Assigns a priority to the package. Used to decide whether a package can “drag” another package it depends on to another node. See “About Package Dependencies” on page 171. Valid values are 1 through 3000, or no_priority. The default is no_priority.
PAGE 188
Planning and Documenting an HA Cluster Package Configuration Planning cluster_interconnect_subnet For use in a Serviceguard Extension for Real Application Cluster (SGeRAC) installation only. See the latest version of Using Serviceguard Extension for RAC at http://www.docs.hp.com -> High Availability -> Serviceguard Extension for Real Application Cluster (ServiceGuard OPS Edition) for more information. ip_subnet and ip_address Specifies an IP subnet and relocatable IP addresses used by the package.
PAGE 189
Planning and Documenting an HA Cluster Package Configuration Planning If the parameter is set to yes, and the service fails, Serviceguard will halt the node on which the service is running (HP-UX system reset). (An attempt is made to reboot the node first.) The default is no. service_halt_timeout In the event of a service halt, Serviceguard will first send out a SIGTERM signal to terminate the service.
PAGE 190
Planning and Documenting an HA Cluster Package Configuration Planning Default is 60 seconds. The minimum value is 1. (There is no practical maximum.). resource_start Determines whether Serviceguard should start monitoring this resource when the package starts, or when the node joins the cluster. See “Parameters for Configuring EMS Resources” on page 170. resource_up_value The criteria for judging whether a package resource has failed or not. You can configure a total of 15 resource_up_values per package.
PAGE 191
Planning and Documenting an HA Cluster Package Configuration Planning Details in Chapter 6 under cvm_activation_cmd (see page 293), and in the package configuration file. NOTE vxvol_cmd Controls the method of mirror recovery for mirrored VxVM volumes. vg An LVM volume group that will be activated by the package. cvm_dg A CVM disk group used by the package (on systems that support CVM; see “About Veritas CFS and CVM from Symantec” on page 29).
PAGE 192
Planning and Documenting an HA Cluster Package Configuration Planning Specifies the number of concurrent mounts and umounts to allow during package startup or shutdown. The default is 1; considering increasing it if the package will mount a large number of file systems. fs_mount_retry_count The number of mount retries for each file system. The default is zero. Details in Chapter 6 under fs_mount_retry_count (see page 295).
PAGE 193
Planning and Documenting an HA Cluster Package Configuration Planning pev_ Specifies a package environment variable that can be passed to external_pre_script, external_script, or both, by means of the cmgetpkgenv (1m) command. More details in Chapter 6, under pev_ (see page 296).
PAGE 194
Planning and Documenting an HA Cluster Package Configuration Planning Package Configuration File Data: ========================================================================== Package Name: ______pkg11____________Package Type:___Failover____________ Primary Node: ______ftsys9_______________ First Failover Node:____ftsys10_______________ Additional Failover Nodes:__________________________________ Run Script Timeout: _no_timeout_____ Halt Script Timeout: _no_timeout___ Package AutoRun Enabled? Node Failfas
PAGE 195
Planning and Documenting an HA Cluster Package Configuration Planning cvm_activation_cmd: ______________________________________________ VxVM Disk Groups: vxvm_dg___/dev/vx/dg01____vxvm_dg____________vxvm_dg_____________ vxvol_cmd ______________________________________________________ ________________________________________________________________________________ Logical Volumes and File Systems: fs_name___/dev/vg01/1v011___fs_directory____/mnt1______fs_mount_opt_"-o rw"___ fs_umount_opt_"-s"__________fs_
PAGE 196
Planning and Documenting an HA Cluster Package Configuration Planning Service Name: _______ Command: _________ Restart: __ Fail Fast enabled:_____ Service Name: _______ Command: _________ Restart: __ Fail Fast enabled:_____ ================================================================================ Package environment variable:__PEV_pkg11_var________________________________ Package environment variable:_______________________________________________ External pre-script:________________________________
PAGE 197
Planning and Documenting an HA Cluster Package Configuration Planning Additional Parameters Used Only by Legacy Packages IMPORTANT The following parameters are used only by legacy packages. Do not try to use them in modular packages. See “Creating the Legacy Package Configuration” on page 363 for more information. PATH Specifies the path to be used by the script. SUBNET Specifies the IP subnets that are to be monitored for the package.
PAGE 198
Planning and Documenting an HA Cluster Package Configuration Planning In most cases, though, HP recommends that you use the same script for both run and halt instructions. (When the package starts, the script is passed the parameter start; when it halts, it is passed the parameter stop.) DEFERRED_RESOURCE_NAME Add DEFERRED_RESOURCE_NAME to a legacy package control script for any resource that has a RESOURCE_START setting of DEFERRED.
PAGE 199
Building an HA Cluster Configuration 5 Building an HA Cluster Configuration This chapter and the next take you through the configuration tasks required to set up a Serviceguard cluster. These procedures are carried out on one node, called the configuration node, and the resulting binary file is distributed by Serviceguard to all the nodes in the cluster. In the examples in this chapter, the configuration node is named ftsys9, and the sample target node is called ftsys10.
PAGE 200
Building an HA Cluster Configuration Preparing Your Systems Preparing Your Systems Before configuring your cluster, ensure that all cluster nodes possess the appropriate security files, kernel configuration, and NTP (network time protocol) configuration. Installing and Updating Serviceguard For information about installing Serviceguard, see the Release Notes for your version at http://docs.hp.com -> High Availability -> Serviceguard -> Release Notes.
PAGE 201
Building an HA Cluster Configuration Preparing Your Systems SGAUTOSTART=/etc/rc.config.d/cmcluster SGFFLOC=/opt/cmcluster/cmff CMSNMPD_LOG_FILE=/var/adm/SGsnmpsuba.log NOTE If these variables are not defined on your system, then source the file /etc/cmcluster.conf in your login profile for user root. For example, you can add this line to root’s .profile file: . /etc/cmcluster.conf Throughout this book, system filenames are usually given with one of these location prefixes.
PAGE 202
Building an HA Cluster Configuration Preparing Your Systems Configuring IP Address Resolution Serviceguard uses the name resolution services built in to HP-UX. HP recommends that you define name resolutions in each node’s /etc/hosts file first, rather than rely solely on DNS or NIS services. For example, consider a two node cluster (gryf and sly) with two private subnets and a public subnet. These nodes will be granting permission to a non-cluster node (bit) which does not share the private subnets.
PAGE 203
Building an HA Cluster Configuration Preparing Your Systems NOTE Configure the name service switch to consult the /etc/hosts file before other services such as DNS, NIS, or LDAP. See “Defining Name Resolution Services” on page 209 for instructions. Username Validation Serviceguard relies on the identd daemon (usually started by inetd from /etc/inetd.conf) to verify the username of the incoming network connection.
PAGE 204
Building an HA Cluster Configuration Preparing Your Systems Access Roles Serviceguard access control policies define what a user on a remote node can do on the local node. These are known as Access Roles or Role Based Access (RBA). This manual uses Access Roles. Serviceguard recognizes two levels of access, root and non-root: • Root Access: Users authorized for root access have total control over the configuration of the cluster and packages.
PAGE 205
Building an HA Cluster Configuration Preparing Your Systems NOTE When you upgrade a cluster from Version A.11.15 or earlier, entries in $SGCONF/cmclnodelist are automatically updated into Access Control Policies in the cluster configuration file. All non-root user-hostname pairs are assigned the role of Monitor (view only).
PAGE 206
Building an HA Cluster Configuration Preparing Your Systems ########################################################### # Do not edit this file! # Serviceguard uses this file only to authorize access to an # unconfigured node. Once a cluster is created, Serviceguard # will not consult this file.
PAGE 207
Building an HA Cluster Configuration Preparing Your Systems Setting Access Controls for Configured Cluster Nodes Once nodes are configured in a cluster, access-control policies govern cluster-wide security; changes to cmclnodelist are ignored. The root user on each cluster node is automatically granted root access to all other nodes. Other users can be authorized for non-root roles. NOTE Users on systems outside the cluster cannot gain root access to cluster nodes.
PAGE 208
Building an HA Cluster Configuration Preparing Your Systems MONITOR and FULL_ADMIN can only be set in the cluster configuration file and they apply to the entire cluster. PACKAGE_ADMIN can be set in the cluster or a package configuration file. If it is set in the cluster configuration file, PACKAGE_ADMIN applies to all configured packages; if it is set in a package configuration file, it applies to that package only.
PAGE 209
Building an HA Cluster Configuration Preparing Your Systems USER_NAME ANY_USER USER_HOST ANY_SERVICEGUARD_NODE USER_ROLE MONITOR In the above example, the configuration would fail because user john is assigned two roles. (In any case, Policy 2 is unnecessary, because PACKAGE_ADMIN includes the role of MONITOR.) Policy 3 does not conflict with any other policies, even though the wildcard ANY_USER includes the individual user john. NOTE Be careful when granting access to ANY_SERVICEGUARD_NODE.
PAGE 210
Building an HA Cluster Configuration Preparing Your Systems NOTE HP also recommends that you make DNS highly available, either by using multiple DNS servers or by configuring DNS into a Serviceguard package. Safeguarding against Loss of Name Resolution Services This section explains how to create a robust name-resolution configuration that will allow cluster nodes to continue communicating with one another if DNS or NIS services fail.
PAGE 211
Building an HA Cluster Configuration Preparing Your Systems nameserver 15.243.160.51 3. Edit or create the /etc/nsswitch.
PAGE 212
Building an HA Cluster Configuration Preparing Your Systems NOTE Under agile addressing, the physical devices in these examples would have names such as /dev/[r]disk/disk1, and /dev/[r]disk/disk2. See “About Device File Names (Device Special Files)” on page 111. 1. Create a bootable LVM disk to be used for the mirror. pvcreate -B /dev/rdsk/c4t6d0 2. Add this disk to the current root volume group. vgextend /dev/vg00 /dev/dsk/c4t6d0 3. Make the new disk a boot disk. mkboot -l /dev/rdsk/c4t6d0 4.
PAGE 213
Building an HA Cluster Configuration Preparing Your Systems 6. Verify that the mirrors were properly created. lvlnboot -v The output of this command is shown in a display like the following: Boot Definitions for Volume Group /dev/vg00: Physical Volumes belonging in Root Volume Group: /dev/dsk/c4t5d0 (10/0.5.0) -- Boot Disk /dev/dsk/c4t6d0 (10/0.6.
PAGE 214
Building an HA Cluster Configuration Preparing Your Systems Backing Up Cluster Lock Disk Information After you configure the cluster and create the cluster lock volume group and physical volume, you should create a backup of the volume group configuration data on each lock volume group. Use the vgcfgbackup command for each lock volume group you have configured, and save the backup file in case the lock configuration must be restored to a new disk with the vgcfgrestore command following a disk failure.
PAGE 215
Building an HA Cluster Configuration Preparing Your Systems IMPORTANT • If you are using a disk array, create the smallest LUN the array will allow, or, on an HP Integrity server, you can partition a LUN; see “Creating a Disk Partition on an HP Integrity System”. • If you are using individual disks, use either a small disk, or a portion of a disk. On an HP Integrity server, you can partition a disk; see “Creating a Disk Partition on an HP Integrity System”.
PAGE 216
Building an HA Cluster Configuration Preparing Your Systems Step 1. Use a text editor to create a file that contains the partition information. You need to create at least three partitions, for example: 3 EFI 100MB HPUX 1MB HPUX 100% This defines: • A 100 MB EFI (Extensible Firmware Interface) partition (this is required) • A 1 MB partition that can be used for the lock LUN • A third partition that consumes the remainder of the disk is and can be used for whatever purpose you like. Step 2.
PAGE 217
Building an HA Cluster Configuration Preparing Your Systems Use the command insf -e on each node. This will create device files corresponding to the three partitions, though the names themselves may differ from node to node depending on each node’s I/O configuration. Step 5. Define the lock LUN; see “Defining the Lock LUN”. Defining the Lock LUN Use cmquerycl -L to create a cluster configuration file that defines the lock LUN.
PAGE 218
Building an HA Cluster Configuration Preparing Your Systems The quorum server executable file, qs, is installed in the /usr/lbin directory. When the installation is complete, you need to create an authorization file on the server where the QS will be running to allow specific host systems to obtain quorum services. The required pathname for this file is /etc/cmcluster/qs_authfile. Add to the file the names of all cluster nodes that will access cluster services from this quorum server.
PAGE 219
Building an HA Cluster Configuration Preparing Your Systems For a complete discussion of how the quorum server operates, see to “Cluster Quorum to Prevent Split-Brain Syndrome” on page 65. See the section “Specifying a Quorum Server” on page 237 for a description of how to use the cmquerycl command to specify a quorum server in the cluster configuration file. For more information, see the Release Notes for your version of Quorum Server at http://docs.hp.com -> High Availability -> Quorum Server.
PAGE 220
Building an HA Cluster Configuration Preparing Your Systems If you experience problems, return the parameters to their default values. When contacting HP support for any issues regarding Serviceguard and networking, please be sure to share all information about any parameters that were changed from the defaults. Third-party applications that are running in a Serviceguard environment may require tuning of network and kernel parameters: • ndd is the network tuning utility.
PAGE 221
Building an HA Cluster Configuration Preparing Your Systems Preparing for Changes in Cluster Size If you intend to add additional nodes to the cluster online, while it is running, ensure that they are connected to the same heartbeat subnets and to the same lock disks as the other cluster nodes. In selecting a cluster lock configuration, be careful to anticipate any potential need for additional cluster nodes.
PAGE 222
Building an HA Cluster Configuration Preparing Your Systems NOTE If you are configuring volume groups that use mass storage on HP's HA disk arrays, you should use redundant I/O channels from each node, connecting them to separate ports on the array. As of HP-UX 11i v3, the I/O subsystem performs load balancing and multipathing automatically. Creating a Storage Infrastructure with LVM This section describes storage configuration with LVM.
PAGE 223
Building an HA Cluster Configuration Preparing Your Systems When you have created the logical volumes and created or extended the volume groups, specify the filesystem that is to be mounted on the volume group, then skip ahead to the section “Deactivating the Volume Group”. To configure the volume groups from the command line, proceed as follows. If your volume groups have not been set up, use the procedures that follow.
PAGE 224
Building an HA Cluster Configuration Preparing Your Systems 2. Next, create a control file named group in the directory /dev/vgdatabase, as follows: mknod /dev/vgdatabase/group c 64 0xhh0000 The major number is always 64, and the hexadecimal minor number has the form 0xhh0000 where hh must be unique to the volume group you are creating. Use a unique minor number that is available across all the nodes for the mknod command above.
PAGE 225
Building an HA Cluster Configuration Preparing Your Systems Creating File Systems If your installation uses filesystems, create them next. Use the following commands to create a filesystem for mounting on the logical volume just created: 1. Create the filesystem on the newly created logical volume: newfs -F vxfs /dev/vgdatabase/rlvol1 Note the use of the raw device file for the logical volume. 2. Create a directory to mount the disk: mkdir /mnt1 3.
PAGE 226
Building an HA Cluster Configuration Preparing Your Systems same physical volume that was available on ftsys9. You must carry out the same procedure separately for each node on which the volume group's package can run. To set up the volume group on ftsys10, use the following steps: 1. On ftsys9, copy the mapping of the volume group to a specified file. vgexport -p -s -m /tmp/vgdatabase.map /dev/vgdatabase 2. Still on ftsys9, copy the map file to ftsys10: rcp /tmp/vgdatabase.map ftsys10:/tmp/vgdatabase.
PAGE 227
Building an HA Cluster Configuration Preparing Your Systems NOTE When you use PVG-strict mirroring, the physical volume group configuration is recorded in the /etc/lvmpvg file on the configuration node. This file defines the physical volume groups which are the basis of mirroring and indicate which physical volumes belong to each physical volume group.
PAGE 228
Building an HA Cluster Configuration Preparing Your Systems disks. To make merging the files easier, be sure to keep a careful record of the physical volume group names on the volume group planning worksheet (described in Chapter 4). Use the following procedure to merge files between the configuration node (ftsys9) and a new node (ftsys10) to which you are importing volume groups: 1. Copy /etc/lvmpvg from ftsys9 to /etc/lvmpvg.new on ftsys10. 2. If there are volume groups in /etc/lvmpvg.
PAGE 229
Building an HA Cluster Configuration Preparing Your Systems This section shows how to configure new storage using the command set of the Veritas Volume Manager (VxVM). Once you have created the root disk group (described next), you can use VxVM commands or the Storage Administrator GUI, VEA, to carry out configuration tasks. For more information, see the Veritas Volume Manager documentation posted at http://docs.hp.com -> 11i v3 -> VxVM (or -> 11i v2 -> VxVM, depending on your HP-UX version).
PAGE 230
Building an HA Cluster Configuration Preparing Your Systems Converting Disks from LVM to VxVM You can use the vxvmconvert(1m) utility to convert LVM volume groups into VxVM disk groups. Before you can do this, the volume group must be deactivated, which means that any package that uses the volume group must be halted. Follow the conversion procedures outlined in the Veritas Volume Manager Migration Guide for your version of VxVM.
PAGE 231
Building an HA Cluster Configuration Preparing Your Systems /usr/lib/vxvm/bin/vxdisksetup -i c0t3d2 Creating Disk Groups Use vxdiskadm, or use the vxdg command, to create disk groups, as in the following example: vxdg init logdata c0t3d2 Verify the configuration with the following command: vxdg list NAME STATE rootdg logdata enabled enabled ID 971995699.1025.node1 972078742.1084.node1 Creating Volumes Use the vxassist command to create logical volumes.
PAGE 232
Building an HA Cluster Configuration Preparing Your Systems Creating File Systems If your installation uses filesystems, create them next. Use the following commands to create a filesystem for mounting on the logical volume just created: 1. Create the filesystem on the newly created volume: newfs -F vxfs /dev/vx/rdsk/logdata/log_files 2. Create a directory to mount the volume: mkdir /logs 3. Mount the volume: mount /dev/vx/dsk/logdata/log_files /logs 4.
PAGE 233
Building an HA Cluster Configuration Preparing Your Systems vxvol -g dg_01 startall mount /dev/vx/dsk/dg_01/myvol /mountpoint NOTE Unlike LVM volume groups, VxVM disk groups are not entered in the cluster configuration file, nor in the package configuration file.
PAGE 234
Building an HA Cluster Configuration Configuring the Cluster Configuring the Cluster This section describes how to define the basic cluster configuration. This must be done on a system that is not part of a Serviceguard cluster (that is, on which Serviceguard is installed but not configured). NOTE You can use Serviceguard Manager to configure a cluster: open the System Management Homepage (SMH) and choose Tools->Serviceguard Manager. See “Using Serviceguard Manager” on page 30 for more information.
PAGE 235
Building an HA Cluster Configuration Configuring the Cluster -w full lets you specify full network probing, in which actual connectivity is verified among all LAN interfaces on all nodes in the cluster. This is the default. -w none skips network querying. If you have recently checked the networks. this option will save time. For more details, see the cmquerycl(1m) man page. The example above creates an template file, by default /etc/cmcluster/clust1.config.
PAGE 236
Building an HA Cluster Configuration Configuring the Cluster To display the failover times of disks, use the cmquerycl command, specifying all the nodes in the cluster. The output of the command lists the disks connected to each node together with the re-formation time associated with each. Do not include the node’s entire domain name; for example, specify ftsys9, not ftsys9.cup.hp.
PAGE 237
Building an HA Cluster Configuration Configuring the Cluster Specifying a Lock LUN A cluster lock disk, lock LUN, or quorum server, is required for two-node clusters. The lock must be accessible to all nodes and must be powered separately from the nodes. See “Cluster Lock” on page 65 and “Setting Up a Lock LUN” on page 214 for more information.
PAGE 238
Building an HA Cluster Configuration Configuring the Cluster # HEARTBEAT_INTERVAL. If you are experiencing quorum server # timeouts, you can adjust these parameters, or you can include # the QS_TIMEOUT_EXTENSION parameter.
PAGE 239
Building an HA Cluster Configuration Configuring the Cluster Specifying Maximum Number of Configured Packages This specifies the most packages that can be configured in the cluster. The parameter value must be equal to or greater than the number of packages currently configured in the cluster. The count includes all types of packages: failover, multi-node, and system multi-node. As of Serviceguard A.11.17, the default is 150, which is the maximum allowable number of packages in a cluster.
PAGE 240
Building an HA Cluster Configuration Configuring the Cluster SGeFF has requirements for cluster configuration, as outlined in the cluster configuration template file. For more information, see the Serviceguard Extension for Faster Failover Release Notes posted on http://www.docs.hp.com -> High Availability. See also Optimizing Failover Time in a Serviceguard Environment at http://www.docs.hp.com -> High Availability -> Serviceguard -> White Papers.
PAGE 241
Building an HA Cluster Configuration Configuring the Cluster NOTE If you are using CVM disk groups, they should be configured after cluster configuration is done, using the procedures described in “Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM)” on page 256. Veritas disk groups are added to the package configuration file, as described in Chapter 6.CVM is not supported on all systems; see “About Veritas CFS and CVM from Symantec” on page 29.
PAGE 242
Building an HA Cluster Configuration Configuring the Cluster • Heartbeat network minimum requirement is met. See the entry for HEARTBEAT_IP under “Cluster Configuration Parameters” starting on page 154. • At least one NODE_NAME is specified. • Each node is connected to each heartbeat network. • All heartbeat networks are of the same type of LAN. • The network interface device files specified are valid LAN device files. • VOLUME_GROUP entries are not currently marked as cluster-aware.
PAGE 243
Building an HA Cluster Configuration Configuring the Cluster vgchange -a y /dev/vglock • Generate the binary configuration file and distribute it: cmapplyconf -k -v -C /etc/cmcluster/clust1.config or cmapplyconf -k -v -C /etc/cmcluster/clust1.ascii Using the -k option means that cmapplyconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmapplyconf tests the connectivity of all LVM disks on all nodes.
PAGE 244
Building an HA Cluster Configuration Configuring the Cluster Be sure to use vgcfgbackup for all volume groups, especially the cluster lock volume group. NOTE You must use the vgcfgbackup command to store a copy of the cluster lock disk's configuration data whether you created the volume group using the System Management Homepage (SMH), SAM, or HP-UX commands.
PAGE 245
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Creating a Storage Infrastructure with Veritas Cluster File System (CFS) NOTE CFS (and CVM - Cluster Volume Manager) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard).
PAGE 246
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Preparing the Cluster and the System Multi-node Package 1. First, be sure the cluster is running: cmviewcl 2. If it is not, start it: cmruncl 3. If you have not initialized your disk groups, or if you have an old install that needs to be re-initialized, use the vxinstall command to initialize VxVM/CVM disk groups. See “Initializing the Veritas Volume Manager” on page 257. 4.
PAGE 247
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) cfscluster config -t 900 -s 5. Verify the system multi-node package is running and CVM is up, using the cmviewcl or cfscluster command. Following is an example of using the cfscluster command.
PAGE 248
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) NAME logdata NOTE STATE enabled, shared, cds ID 11192287592.39.ftsys9 If you want to create a cluster with CVM only - without CFS, stop here. Then, in your application package’s configuration file, add the dependency triplet, with DEPENDENCY_CONDITION set to SG-DG-pkg-id#=UP and DEPENDENCY_LOCATION set to SAME_NODE.
PAGE 249
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) NODE NAME ACTIVATION MODE ftsys9 sw (sw) MOUNT POINT SHARED VOLUME TYPE ftsys10 sw (sw) MOUNT POINT SHARED VOLUME TYPE 5. To view the package name that is monitoring a disk group, use the cfsdgadm show_package command: cfsdgadm show_package logdata sg_cfs_dg-1 Creating Volumes 1. Make log_files volume on the logdata disk group: vxassist -g logdata make log_files 1024m 2.
PAGE 250
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) CAUTION Once you create the disk group and mount point packages, it is critical that you administer the cluster with the cfs commands, including cfsdgadm, cfsmntadm, cfsmount, and cfsumount. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
PAGE 251
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) CLUSTER STATUS cfs_cluster up NODE STATUS STATE ftsys9 up running ftsys10 up running MULTI_NODE_PACKAGES PACKAGE STATUS SG-CFS-pkg up SG-CFS-DG-1 up SG-CFS-MP-1 up STATE running running running AUTO_RUN enabled enabled enabled SYSTEM yes no no ftsys9/etc/cmcluster/cfs> bdf Filesystem kbytes used avail %used /dev/vx/dsk/logdata/log_files 10485 17338 966793 Mounted on 2% tmp/logdata/log_files ft
PAGE 252
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Creating Checkpoint and Snapshot Packages for CFS The storage checkpoints and snapshots are two additional mount point package types. They can be associated with the cluster via the cfsmntadm(1m) command.
PAGE 253
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Package name "SG-CFS-CK-2" was generated to control the resource Mount point "/tmp/check_logfiles" was associated to the cluster cfsmount /tmp/check_logfiles 3. Verify.
PAGE 254
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) Mount Point Packages for Snapshot Images A snapshot is a frozen image of an active file system that does not change when the contents of target file system changes. On cluster file systems, snapshots can be created on any node in the cluster, and backup operations can be performed from that node.
PAGE 255
Building an HA Cluster Configuration Creating a Storage Infrastructure with Veritas Cluster File System (CFS) SG-CFS-SN-1 up running disabled no The snapshot file system /local/snap1 is now mounted and provides a point in time view of /tmp/logdata/log_files.
PAGE 256
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
PAGE 257
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) Separate procedures are given below for: • Initializing the Volume Manager • Preparing the Cluster for Use with CVM • Creating Disk Groups for Shared Storage • Creating File Systems with CVM For more information, including details about configuration of plexes (mirrors), multipathing, and RAID, refer to the HP-UX documentation for the Veritas Volume Manager.
PAGE 258
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) Preparing the Cluster for Use with CVM In order to use the Veritas Cluster Volume Manager (CVM), you need a cluster that is running with a Serviceguard-supplied CVM system multi-node package. This means that the cluster must already be configured and running before you create disk groups.
PAGE 259
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) • Veritas CVM 3.5: cmapplyconf -P /etc/cmcluster/cvm/VxVM-CVM-pkg.conf • Veritas CVM 4.1 and later: If you are not using Veritas Cluster File System, use the cmapplyconf command. (If you are using CFS, you will set up CVM as part of the CFS components.): cmapplyconf -P /etc/cmcluster/cfs/SG-CFS-pkg.conf Begin package verification ...
PAGE 260
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) vxdctl -c mode One node will identify itself as the master. Create disk groups from this node. Initializing Disks for CVM You need to initialize the physical disks that will be employed in CVM disk groups.
PAGE 261
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) This command creates a 1024 MB volume named log_files in a disk group named logdata. The volume can be referenced with the block device file /dev/vx/dsk/logdata/log_files or the raw (character) device file /dev/vx/rdsk/logdata/log_files.
PAGE 262
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with Veritas Cluster Volume Manager (CVM) You also need to identify the CVM disk groups, filesystems, logical volumes, and mount options in the package control script. The package configuration process is described in detail in Chapter 6.
PAGE 263
Building an HA Cluster Configuration Using DSAU during Configuration Using DSAU during Configuration As explained under “What are the Distributed Systems Administration Utilities?” on page 33, you can use DSAU to centralize and simplify configuration and monitoring tasks. See the Distributed Systems Administration Utilities User’s Guide posted at http://docs.hp.com.
PAGE 264
Building an HA Cluster Configuration Managing the Running Cluster Managing the Running Cluster This section describes some approaches to routine management of the cluster. Additional tools and suggestions are found in Chapter 7, “Cluster and Package Maintenance.” Checking Cluster Operation with Serviceguard Manager You can check configuration and status information using Serviceguard Manager: from the System Management Homepage (SMH), choose Tools-> Serviceguard Manager.
PAGE 265
Building an HA Cluster Configuration Managing the Running Cluster You can use these commands to test cluster operation, as in the following: 1. If the cluster is not already running, start it. From the Serviceguard Manager menu, choose Run Cluster. From the command line, use cmruncl -v. By default, cmruncl will check the networks. Serviceguard will probe the actual network configuration with the network information in the cluster configuration.
PAGE 266
Building an HA Cluster Configuration Managing the Running Cluster Preventing Automatic Activation of LVM Volume Groups It is important to prevent LVM volume groups that are to be used in packages from being activated at system boot time by the /etc/lvmrc file. One way to ensure that this does not happen is to edit the /etc/lvmrc file on all nodes, setting AUTO_VG_ACTIVATE to 0, then including all the volume groups that are not cluster-bound in the custom_vg_activation function.
PAGE 267
Building an HA Cluster Configuration Managing the Running Cluster To enable automatic cluster start, set the flag AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file on each node in the cluster; the nodes will then join the cluster at boot time. Here is an example of the /etc/rc.config.d/cmcluster file: #************************ CMCLUSTER ************************ # Highly Available Cluster configuration # # @(#) $Revision: 72.
PAGE 268
Building an HA Cluster Configuration Managing the Running Cluster Managing a Single-Node Cluster The number of nodes you will need for your Serviceguard cluster depends on the processing requirements of the applications you want to protect. You may want to configure a single-node cluster to take advantage of Serviceguard’s network failure protection. In a single-node cluster, a cluster lock is not required, since there is no other node in the cluster.
PAGE 269
Building an HA Cluster Configuration Managing the Running Cluster Deleting the Cluster Configuration With root login, you can delete a cluster configuration from all cluster nodes by using Serviceguard Manager, or on the command line. The cmdeleteconf command prompts for a verification before deleting the files unless you use the -f option. You can only delete the configuration when the cluster is down.
PAGE 270
Building an HA Cluster Configuration Managing the Running Cluster 270 Chapter 5
PAGE 271
Configuring Packages and Their Services 6 Configuring Packages and Their Services Serviceguard packages group together applications and the services and resources they depend on. The typical Serviceguard package is a failover package that starts on one node but can be moved (“failed over”) to another if necessary. See “What is Serviceguard?” on page 26, “How the Package Manager Works” on page 71, and “Package Configuration Planning” on page 165 for more information.
PAGE 272
Configuring Packages and Their Services allowing you to build packages from smaller modules, and eliminating the separate package control script and the need to distribute it manually. Packages created using Serviceguard A.11.17 or earlier are referred to as legacy packages. If you need to reconfigure a legacy package (rather than create a new package), see “Configuring a Legacy Package” on page 363.
PAGE 273
Configuring Packages and Their Services Choosing Package Modules Choosing Package Modules IMPORTANT Before you start, you need to do the package-planning tasks described under “Package Configuration Planning” on page 165. To choose the right package modules, you need to decide the following things about the package you are creating: • What type of package it is; see “Types of Package: Failover, Multi-Node, System Multi-Node” on page 273.
PAGE 274
Configuring Packages and Their Services Choosing Package Modules Relocatable IP addresses cannot be assigned to multi_node packages. Examples are the Veritas Cluster File System (CFS) system multi-node packages; but support for multi-node packages is no longer restricted to CVM/CFS; you can create a multi-node package for any purpose. IMPORTANT But if the package uses volume groups, they must be activated in shared mode: vgchange -a s, which is available only if the SGeRAC add-on product is installed.
PAGE 275
Configuring Packages and Their Services Choosing Package Modules NOTE The following parameters cannot be configured for multi-node or system multi-node packages: • failover_policy • failback_policy • ip_subnet • ip_address Volume groups configured for packages of these types must be activated in shared mode. For more information about types of packages and how they work, see “How the Package Manager Works” on page 71.
PAGE 276
Configuring Packages and Their Services Choosing Package Modules (The output will be written to $SGCONF/sg-all.) Base Package Modules At least one base module (or default or all, which include the base module) must be specified on the cmmakepkg command line. Parameters marked with an asterisk (*) are new or changed as of Serviceguard A.11.18. (S) indicates that the parameter (or its equivalent) has moved from the package control script to the package configuration file for modular packages.
PAGE 277
Configuring Packages and Their Services Choosing Package Modules Table 6-1 Base Modules (Continued) Module Name Parameters (page) Comments multi_node package_name (282) * module_name (282) * module_version (282) * package_type (282) node_name (282) auto_run (283) node_fail_fast_enabled (283) run_script_timeout (283) halt_script_timeout (284) successor_halt_timeout (284) * script_log_file (285) operation_sequence (285) * log_level (285) priority (286) * Base module.
PAGE 278
Configuring Packages and Their Services Choosing Package Modules its equivalent) has moved from the package control script to the package configuration file for modular packages. See the “Package Parameter Explanations” on page 281 for more information. Table 6-2 Module Name Optional Modules Parameters (page) Comments dependency dependency_name (286) * dependency_condition (287) dependency_location (287) Add to a base module to create a package that depends on one or more other packages.
PAGE 279
Configuring Packages and Their Services Choosing Package Modules Table 6-2 Module Name Optional Modules (Continued) Parameters (page) Comments volume_group concurrent_vgchange_operatio ns (292) (S) vgchange_cmd (292) * (S) cvm_activation_cmd (293) (S) vxvol_cmd (293) * (S) vg (294) (S) cvm_dg (294) (S) vxvm_dg (294) (S) deactivation_retry_count (294) (S) kill_processes_accessing_raw _devices (294) (S) Add to a base module if the package needs to mount file systems on LVM or VxVM volumes, or uses CVM vo
PAGE 280
Configuring Packages and Their Services Choosing Package Modules Table 6-2 Module Name Optional Modules (Continued) Parameters (page) Comments external_pre external_pre_script (297) * Add to a base module to specify additional programs to be run before volume groups and disk groups are activated while the package is starting and after they are deactivated while the package is halting.
PAGE 281
Configuring Packages and Their Services Choosing Package Modules Table 6-2 Optional Modules (Continued) Module Name default NOTE Parameters (page) (all parameters) Comments A symbolic link to the all module; used if a base module is not specified on the cmmakepkg command line; see “cmmakepkg Examples” on page 299. The default form for parameter names in the modular package configuration file is lower case; for legacy packages the default is upper case.
PAGE 282
Configuring Packages and Their Services Choosing Package Modules More detailed instructions for running cmmakepkg are in the next section, “Generating the Package Configuration File” on page 299. See also “Package Configuration Planning” on page 165. package_name Any name, up to a maximum of 39 characters, that: IMPORTANT • starts and ends with an alphanumeric character • otherwise contains only alphanumeric characters or dot (.
PAGE 283
Configuring Packages and Their Services Choosing Package Modules node_name node_name node_name Serviceguard uses the order of priority specified by this list to choose which node to run the package on. IMPORTANT See “Cluster Configuration Parameters” on page 154 for important information about node names. auto_run Can be set to yes or no. The default is yes.
PAGE 284
Configuring Packages and Their Services Choosing Package Modules If the package does not complete its startup in the time specified by run_script_timeout, Serviceguard will terminate it and prevent it from switching to another node. In this case, if node_fail_fast_enabled is set to yes, the node will be halted (HP-UX system reset). If no timeout is specified (no_timeout), Serviceguard will wait indefinitely for the package to start. If a timeout occurs: • • Switching will be disabled.
PAGE 285
Configuring Packages and Their Services Choosing Package Modules New as of A.11.18 (for both modular and legacy packages). See also “About Package Dependencies” on page 171. script_log_file The full pathname of the package’s log file. The default is $SGRUN/log/.log. (See “Understanding Where Files Are Located” on page 200 for more information about Serviceguard pathnames.) operation_sequence Defines the order in which the scripts defined by the package’s component modules will start up.
PAGE 286
Configuring Packages and Their Services Choosing Package Modules This parameter can be set for failover packages only. If this package will depend on another package or vice versa, see also “About Package Dependencies” on page 171. failback_policy Specifies whether or not Serviceguard will automatically move a package that is not running on its primary node (the first node on its node_name list) when the primary node is once again available. Can be set to automatic or manual. The default is manual.
PAGE 287
Configuring Packages and Their Services Choosing Package Modules IMPORTANT Restrictions on dependency names in previous Serviceguard releases were less stringent. Packages that specify dependency_names that do not conform to the above rules will continue to run, but if you reconfigure them, you will need to change the dependency_name; cmcheckconf and cmapplyconf will enforce the new rules.
PAGE 288
Configuring Packages and Their Services Choosing Package Modules local_lan_failover_allowed Specifies whether or not Serviceguard can switch LANs on a cluster node (that is, switch to a standby LAN card). Legal values are yes and no. Default is yes. monitored_subnet The IP address of a LAN subnet that is to be monitored for this package. Replaces legacy SUBNET which is still supported in the package configuration file for legacy packages; see “Configuring a Legacy Package” on page 363.
PAGE 289
Configuring Packages and Their Services Choosing Package Modules ip_address A relocatable IP address on a specified ip_subnet (see page 288). Replaces IP, which is still supported in the package control script for legacy packages; see “Configuring a Legacy Package” on page 363. For more information about relocatable IP addresses, see “Stationary and Relocatable IP Addresses” on page 98. This parameter can be set for failover packages only.
PAGE 290
Configuring Packages and Their Services Choosing Package Modules service_cmd The command that runs the application or service for this service_name, for example, /usr/bin/X11/xclock -display 15.244.58.208:0 An absolute pathname is required; neither the PATH variable nor any other environment variable is passed to the command. The default shell is /usr/bin/sh. NOTE Be careful when defining service run commands.
PAGE 291
Configuring Packages and Their Services Choosing Package Modules service_halt_timeout The length of time, in seconds, Serviceguard will wait for the service to halt before forcing termination of the service’s process. The value should be large enough to allow any cleanup required by the service to complete. Legal values are none, unlimited, or any number greater than zero. unlimited means Serviceguard will never force the process to terminate.
PAGE 292
Configuring Packages and Their Services Choosing Package Modules You can configure a total of 15 resource_up_values per package. For example, if there is only one resource (resource_name) in the package, then a maximum of 15 resource_up_values can be defined. If two resource_names are defined and one of them has 10 resource_up_values, then the other resource_name can have only 5 resource_up_values.
PAGE 293
Configuring Packages and Their Services Choosing Package Modules (SGeRAC) is installed. (See the latest version of Using Serviceguard Extension for RAC at http://www.docs.hp.com -> High Availability -> Serviceguard Extension for Real Application Cluster (ServiceGuard OPS Edition) for more information.) Shared LVM volume groups must not contain a file system.
PAGE 294
Configuring Packages and Their Services Choosing Package Modules This allows package startup to continue while mirror re-synchronization is in progress. vg Specifies an LVM volume group (one per vg, each on a new line) on which a file system needs to be mounted. A corresponding vgchange_cmd (see page 292) specifies how the volume group is to be activated. The package script generates the necessary filesystem commands on the basis of the fs_ parameters (see page 295).
PAGE 295
Configuring Packages and Their Services Choosing Package Modules Legal value is any number greater than zero. The default is 1. If the package needs to mount and unmount a large number of filesystems, you can improve performance by carefully tuning this parameter during testing (increase it a little at time and monitor performance each time). fs_mount_retry_count The number of mount retries for each file system. Legal value is zero or any greater number. The default is zero.
PAGE 296
Configuring Packages and Their Services Choosing Package Modules NOTE A volume group must be defined in this file (using vg; see page 294) for each logical volume specified by an fs_name entry. fs_directory The root of the file system specified by fs_name. Replaces FS, which is still supported in the package control script for legacy packages; see “Configuring a Legacy Package” on page 363. See the mount (1m) manpage for more information. fs_type The type of the file system specified by fs_name.
PAGE 297
Configuring Packages and Their Services Choosing Package Modules The variable name and value can each consist of a maximum of MAXPATHLEN characters (1024 on HP-UX systems). You can define more than one variable. See “About External Scripts” on page 178, as well as the comments in the configuration file, for more information.
PAGE 298
Configuring Packages and Their Services Choosing Package Modules NOTE The only access role that can be granted in the package configuration file is package_admin for this particular package; you grant other roles in the cluster configuration file. See “Setting Access Controls for Configured Cluster Nodes” on page 207 for further discussion and examples. Legal values for user_name are any_user or a maximum of eight login names from /etc/passwd on user_host.
PAGE 299
Configuring Packages and Their Services Generating the Package Configuration File Generating the Package Configuration File When you have chosen the configuration modules your package needs (see “Choosing Package Modules” on page 273), you are ready to generate a package configuration file that contains those modules. This file will consist of a base module (usually failover, multi-node or system multi-node) plus the modules that contain the additional parameters you have decided to include.
PAGE 300
Configuring Packages and Their Services Generating the Package Configuration File cmmakepkg $SGCONF/pkg1/pkg1.conf • To create a generic failover package (that could be applied with out editing): cmmakepkg -n pkg1 -m sg/failover $SGCONF/pkg1/pkg1.
PAGE 301
Configuring Packages and Their Services Editing the Configuration File Editing the Configuration File When you have generated the configuration file that contains the modules your package needs (see “Generating the Package Configuration File” on page 299), you need to edit the file to set the package parameters to the values that will make the package function as you intend.
PAGE 302
Configuring Packages and Their Services Editing the Configuration File Use the following bullet points as a checklist, referring to the “Package Parameter Explanations” on page 281, and the comments in the configuration file itself, for detailed specifications for each parameter. NOTE Optional parameters are commented out in the configuration file (with a # at the beginning of the line).
PAGE 303
Configuring Packages and Their Services Editing the Configuration File • node_fail_fast_enabled. Enter yes to cause the node to be halted (system reset) if the package fails; otherwise enter no. For system multi-node packages, you must enter yes. • run_script_timeout and halt_script_timeout. Enter the number of seconds Serviceguard should wait for package startup and shutdown, respectively, to complete; or leave the default, no_timeout; see page 283. • successor_halt_timeout.
PAGE 304
Configuring Packages and Their Services Editing the Configuration File • Use the monitored_subnet parameter to specify a subnet that is to be monitored for this package. If there are multiple subnets, repeat the parameter as many times as needed, on a new line each time. • If this is a Serviceguard Extension for Oracle RAC (SGeRAC) installation, you can use the cluster_interconnect_subnet parameter (see page 288). • If your package will use relocatable IP addresses, enter the ip_subnet and ip_address.
PAGE 305
Configuring Packages and Their Services Editing the Configuration File vg vg01 vg vg02 • If you are using CVM, use the cvm_dg parameters to specify the names of the disk groups to be activated, and select the appropriate cvm_activation_cmd. Enter one disk group per cvm_dg, each on a new line.
PAGE 306
Configuring Packages and Their Services Editing the Configuration File — concurrent_mount_and_umount_operations (see page 294) You can also use the fsck_opt and fs_umount_opt parameters to specify the -s option of the fsck and mount/umount commands (see page 296). • You can use the pev_ parameter to specify a variable to be passed to external scripts. Make sure the variable name begins with the upper-case or lower-case letters pev and an underscore (_). You can specify more than one variable.
PAGE 307
Configuring Packages and Their Services Editing the Configuration File • Configure the Access Control Policy for up to eight specific users or any_user. The only user role you can configure in the package configuration file is package_admin for the package in question. Cluster-wide roles are defined in the cluster configuration file. See “Access Roles” on page 204 for more information.
PAGE 308
Configuring Packages and Their Services Verifying and Applying the Package Configuration Verifying and Applying the Package Configuration Serviceguard checks the configuration you enter and reports any errors. Use a command such as the following to verify the content of the package configuration file you have created, for example: cmcheckconf -v -P $SGCONF/pkg1/pkg1.config Errors are displayed on the standard output.
PAGE 309
Configuring Packages and Their Services Verifying and Applying the Package Configuration packages; see “Configuring a Legacy Package” on page 363. And, for modular packages, you need to distribute any external scripts identified by the external_pre_script and external_script parameters.
PAGE 310
Configuring Packages and Their Services Adding the Package to the Cluster Adding the Package to the Cluster You can add the new package to the cluster while the cluster is running, subject to the value of MAX_CONFIGURED_PACKAGES in the cluster configuration file. See “Adding a Package to a Running Cluster” on page 376.
PAGE 311
Configuring Packages and Their Services How Control Scripts Manage VxVM Disk Groups How Control Scripts Manage VxVM Disk Groups VxVM disk groups (other than those managed by CVM, on systems that support it) are outside the control of the Serviceguard cluster. The package control script uses standard VxVM commands to import and deport these disk groups. (For details on importing and deporting disk groups, refer to the discussion of the import and deport options in the vxdg man page.
PAGE 312
Configuring Packages and Their Services How Control Scripts Manage VxVM Disk Groups This command takes over ownership of all the disks in disk group dg_01, even though the disk currently has a different host ID written on it. The command writes the current node’s host ID on all disks in disk group dg_01 and sets the noautoimport flag for the disks. This flag prevents a disk group from being automatically re-imported by a node following a reboot.
PAGE 313
Configuring Packages and Their Services Configuring Veritas System Multi-node Packages Configuring Veritas System Multi-node Packages There are two system multi-node packages that regulate Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS). These packages ship with the Serviceguard product. There are two versions of the package files: VxVM-CVM-pkg for CVM Version 3.5, and SG-CFS-pkg for CFS/CVM Version 4.1 and later.
PAGE 314
Configuring Packages and Their Services Configuring Veritas System Multi-node Packages For CVM, use the cmapplyconf command to add the system multi-node packages to your cluster. If you are using the Veritas Cluster File System, use the cfscluster command to activate and halt the system multi-node package in your cluster. NOTE Do not create or modify these packages by editing a configuration file. Never edit their control script files. The CFS admin commands are listed in Appendix A.
PAGE 315
Configuring Packages and Their Services Configuring Veritas Multi-node Packages Configuring Veritas Multi-node Packages There are two types of multi-node packages that work with the Veritas Cluster File System (CFS): SG-CFS-DG-id# for disk groups, which you configure with the cfsdgadm command, and SG-CFS-MP-id# for mount points, which you configure with the cfsmntadm command. Each package name will have a unique number, appended by Serviceguard at creation.
PAGE 316
Configuring Packages and Their Services Configuring Veritas Multi-node Packages the dependent application package loses access and cannot read and write to the disk, it will fail; however that will not cause the DG or MP multi-node package to fail. NOTE Do not create or edit ASCII configuration files for the Serviceguard supplied packages VxVM-CVM-pkg, SG-CFS-pkg, SG-CFS-DG-id#, or SG-CFS-MP-id#. Create VxVM-CVM-pkg and SG-CFS-pkg by means of the cmapplyconf command.
PAGE 317
Cluster and Package Maintenance 7 Cluster and Package Maintenance This chapter describes how to see cluster configuration and status information, how to start and halt a cluster or an individual node, how to perform permanent reconfiguration, and how to start, halt, move, and modify packages during routine maintenance of the cluster.
PAGE 318
Cluster and Package Maintenance Reviewing Cluster and Package Status Reviewing Cluster and Package Status You can check status using Serviceguard Manager or from a cluster node’s command line. Reviewing Cluster and Package Status with the cmviewcl Command Information about cluster status is stored in the status database, which is maintained on each individual node in the cluster.
PAGE 319
Cluster and Package Maintenance Reviewing Cluster and Package Status Viewing Dependencies The cmviewcl -v command output lists dependencies throughout the cluster. For a specific package’s dependencies, use the -p option.
PAGE 320
Cluster and Package Maintenance Reviewing Cluster and Package Status • Failed. A node never sees itself in this state. Other active members of the cluster will see a node in this state if that node was in an active cluster, but is no longer, and is not halted. • Reforming. A node is in this state when the cluster is re-forming. The node is currently running the protocols which ensure that all nodes agree to the new membership of an active cluster.
PAGE 321
Cluster and Package Maintenance Reviewing Cluster and Package Status • fail_wait - The package is waiting to be halted because the package or a package it depends on has failed, but must wait for a package that depends on it to halt before it can halt. • relocate_wait - The package’s halt script has completed or Serviceguard is still trying to place the package. • unknown - Serviceguard could not determine the status at the time cmviewcl was run.
PAGE 322
Cluster and Package Maintenance Reviewing Cluster and Package Status • unknown - Serviceguard could not determine the state at the time cmviewcl was run. Package Switching Attributes cmviewcl shows the following package switching information: • AUTO_RUN: Can be enabled or disabled. For failover packages, enabled means that the package starts when the cluster starts, and Serviceguard can switch the package to another node in the event of failure.
PAGE 323
Cluster and Package Maintenance Reviewing Cluster and Package Status Failover and Failback Policies Failover packages can be configured with one of two values for the failover_policy parameter (see page 285), as displayed in the output of cmviewcl -v: • configured_node. The package fails over to the next node in the node_name list in the package configuration file (see page 282). • min_package_node. The package fails over to the node in the cluster that has the fewest running packages.
PAGE 324
Cluster and Package Maintenance Reviewing Cluster and Package Status Script_Parameters: ITEM STATUS Service up Subnet up MAX_RESTARTS 0 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled NODE ftsys10 STATUS up RESTARTS 0 0 NAME ftsys9 ftsys10 (current) STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 28.1 32.1 NAME lan0 lan1 PACKAGE pkg2 STATE running AUTO_RUN enabled STATUS up NAME service1 15.13.168.
PAGE 325
Cluster and Package Maintenance Reviewing Cluster and Package Status ftsys9 up running Quorum Server Status: NAME STATUS lp-qs up ... NODE ftsys10 STATUS up STATE running STATE running Quorum Server Status: NAME STATUS lp-qs up STATE running CVM Package Status If the cluster is using the Veritas Cluster Volume Manager (CVM), version 3.5, for disk storage, the system multi-node package VxVM-CVM-pkg must be running on all active nodes for applications to be able to access CVM disk groups.
PAGE 326
Cluster and Package Maintenance Reviewing Cluster and Package Status PACKAGE VxVM-CVM-pkg STATUS up STATE running AUTO_RUN enabled SYSTEM yes When you use the -v option, the display shows the system multi-node package associated with each active node in the cluster, as in the following: MULTI_NODE_PACKAGES: PACKAGE STATUS VxVM-CVM-pkg up STATE running AUTO_RUN enabled NODE ftsys7 STATUS down SWITCHING disabled NODE ftsys8 STATUS down SWITHCHING disabled NODE STATUS ftsys9 up Script_Parameters
PAGE 327
Cluster and Package Maintenance Reviewing Cluster and Package Status NOTE CFS is supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
PAGE 328
Cluster and Package Maintenance Reviewing Cluster and Package Status CLUSTER example NODE ftsys9 STATUS up STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 56/36.
PAGE 329
Cluster and Package Maintenance Reviewing Cluster and Package Status Failover Failback configured_node manual Script_Parameters: ITEM STATUS Resource up Subnet up Resource down Subnet up NODE_NAME ftsys9 ftsys9 ftsys10 ftsys10 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled NAME /example/float 15.13.168.0 /example/float 15.13.168.0 NAME ftsys10 ftsys9 pkg2 now has the status down, and it is shown as unowned, with package switching disabled.
PAGE 330
Cluster and Package Maintenance Reviewing Cluster and Package Status Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover configured_node Failback manual Script_Parameters: ITEM STATUS Service up Subnet up Resource up MAX_RESTARTS 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled PACKAGE pkg2 STATUS up STATE running RESTARTS 0 NAME ftsys9 ftsys10 AUTO_RUN disabled NAME service1 15.13.168.
PAGE 331
Cluster and Package Maintenance Reviewing Cluster and Package Status Now pkg2 is running on node ftsys9. Note that switching is still disabled.
PAGE 332
Cluster and Package Maintenance Reviewing Cluster and Package Status Viewing Information about Unowned Packages The following example shows packages that are currently unowned, that is, not running on any configured node. cmviewcl provides information on monitored resources for each node on which the package can run; this allows you to identify the cause of a failure and decide where to start the package up again.
PAGE 333
Cluster and Package Maintenance Reviewing Cluster and Package Status manx up PACKAGE pkg1 NODE tabby running STATUS up STATUS up PACKAGE pkg2 STATE running AUTO_RUN enabled NODE manx AUTO_RUN enabled NODE tabby STATE running STATUS up STATE running SYSTEM_MULTI_NODE_PACKAGES: PACKAGE VxVM-CVM-pkg STATUS up STATE running Checking Status of the Cluster File System (CFS) If the cluster is using the cluster file system, you can check status with the cfscluster command, as shown in the example b
PAGE 334
Cluster and Package Maintenance Reviewing Cluster and Package Status Cluster Manager : up CVM state : up MOUNT POINT TYPE /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/lvol1 /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/lvol4 SHARED VOLUME DISK GROUP STATUS regular lvol1 vg_for_cvm_veggie_dd5 MOUNTED regular lvol4 vg_for_cvm_dd5 MOUNTED Status of the Packages with a Cluster File System Installed You can use cmviewcl to see the status of the package and the cluster file system on al
PAGE 335
Cluster and Package Maintenance Reviewing Cluster and Package Status Status of CFS Disk Group Packages To see the status of the disk group, use the cfsdgadm display command. For example, for the diskgroup logdata, enter: cfsdgadm display -v logdata NODE NAME ACTIVATION MODE ftsys9 sw (sw) MOUNT POINT SHARED VOLUME TYPE ftsys10 sw (sw) MOUNT POINT SHARED VOLUME TYPE ... To see which package is monitoring a disk group, use the cfsdgadm show_package command.
PAGE 336
Cluster and Package Maintenance Managing the Cluster and Nodes Managing the Cluster and Nodes Managing the cluster involves the following tasks: • Starting the Cluster When All Nodes are Down • Adding Previously Configured Nodes to a Running Cluster • Removing Nodes from Operation in a Running Cluster • Halting the Entire Cluster In Serviceguard A.11.
PAGE 337
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Commands to Start the Cluster Use the cmruncl command to start the cluster when all cluster nodes are down. Particular command options can be used to start the cluster under specific circumstances.
PAGE 338
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Commands to Add Previously Configured Nodes to a Running Cluster Use the cmrunnode command to join one or more nodes to an already running cluster. Any node you add must already be a part of the cluster configuration. The following example adds node ftsys8 to the cluster that was just started with only nodes ftsys9 and ftsys10.
PAGE 339
Cluster and Package Maintenance Managing the Cluster and Nodes NOTE HP recommends that you remove a node from participation in the cluster (by running cmhaltnode as shown below, or Halt Node in Serviceguard Manager) before running the HP-UX shutdown command, especially in cases in which a packaged application might have trouble during shutdown and not halt cleanly. Use cmhaltnode to halt one or more nodes in a cluster.
PAGE 340
Cluster and Package Maintenance Managing the Cluster and Nodes Automatically Restarting the Cluster You can configure your cluster to automatically restart after an event, such as a long-term power failure, which brought down all nodes in the cluster. This is done by setting AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file.
PAGE 341
Cluster and Package Maintenance Managing Packages and Services Managing Packages and Services Managing packages and services involves the following tasks: • Starting a Package • Halting a Package • Moving a Package (halt, then start) • Changing Package Switching Behavior Non-root users can perform these tasks, as regulated by access policies in the cluster’s configuration files. See “Editing Security Files” on page 201 for more information about configuring access.
PAGE 342
Cluster and Package Maintenance Managing Packages and Services You cannot start a package unless all the packages that it depends on are running. If you try, you’ll see a Serviceguard message telling you why the operation failed, and the package will not start. If this happens, you can repeat the run command, this time including the package(s) this package depends on; Serviceguard will start all the packages in the correct order.
PAGE 343
Cluster and Package Maintenance Managing Packages and Services System multi-node packages run on all cluster nodes simultaneously; halting these packages stops them running on all nodes. A multi-node package can run on several nodes simultaneously; you can halt it on all the nodes it is running on, or you can specify individual nodes. Halting a Package that Has Dependencies Before halting a package, it is a good idea to use the cmviewcl command to check for package dependencies.
PAGE 344
Cluster and Package Maintenance Managing Packages and Services Using Serviceguard Commands to Move a Running Failover Package Before you move a failover package to a new node, it is a good idea to run cmviewcl -v -l package and look at dependencies. If the package has dependencies, be sure they can be met on the new node. To move the package, first halt it where it is running using the cmhaltpkg command. This action not only halts the package, but also disables package switching.
PAGE 345
Cluster and Package Maintenance Managing Packages and Services Changing Package Switching with Serviceguard Commands You can change package switching behavior either temporarily or permanently using Serviceguard commands. To temporarily disable switching to other nodes for a running package, use the cmmodpkg command.
PAGE 346
Cluster and Package Maintenance Reconfiguring a Cluster Reconfiguring a Cluster You can reconfigure a cluster either when it is halted or while it is still running. Some operations can only be done when the cluster is halted. Table 7-1 shows the required cluster state for many kinds of changes. Table 7-1 Types of Changes to the Cluster Configuration Change to the Cluster Configuration 346 Required Cluster State Add a new node All systems configured as members of this cluster must be running.
PAGE 347
Cluster and Package Maintenance Reconfiguring a Cluster Table 7-1 Types of Changes to the Cluster Configuration (Continued) Change to the Cluster Configuration Delete NICs and their IP addresses, if any, from the cluster configuration Required Cluster State Cluster can be running. “Changing the Cluster Networking Configuration while the Cluster Is Running” on page 353. If removing the NIC from the system, see “Removing a LAN or VLAN Interface from a Node” on page 358.
PAGE 348
Cluster and Package Maintenance Reconfiguring a Cluster Table 7-1 Types of Changes to the Cluster Configuration (Continued) Change to the Cluster Configuration Failover Optimization to enable or disable Faster Failover product NOTE Required Cluster State Cluster must not be running. If you are using CVM or CFS, you cannot change HEARTBEAT_INTERVAL, NODE_TIMEOUT, or AUTO_START_TIMEOUT while the cluster is running.
PAGE 349
Cluster and Package Maintenance Reconfiguring a Cluster To update the values of the FIRST_CLUSTER_LOCK_PV and SECOND_CLUSTER_LOCK_PV parameters without bringing down the cluster, proceed as follows: Step 1. Halt the node (cmhaltnode) on which you want to make the changes. Step 2. In the cluster configuration file, modify the values of FIRST_CLUSTER_LOCK_PV and SECOND_CLUSTER_LOCK_PV for this node. Step 3. Run cmcheckconf to check the configuration. Step 4. Run cmapplyconf to apply the configuration. Step 5.
PAGE 350
Cluster and Package Maintenance Reconfiguring a Cluster Step 3. Run cmcheckconf to check the configuration. Step 4. Run cmapplyconf to apply the configuration. For information about replacing the physical device, see “Replacing a Lock LUN” on page 395. Reconfiguring a Halted Cluster You can make a permanent change in the cluster configuration when the cluster is halted.
PAGE 351
Cluster and Package Maintenance Reconfiguring a Cluster • You cannot delete an active volume group from the cluster configuration. You must halt any package that uses the volume group and ensure that the volume is inactive before deleting it. • The only configuration change allowed while a node is unreachable (for example, completely disconnected from the network) is to delete the unreachable node from the cluster configuration.
PAGE 352
Cluster and Package Maintenance Reconfiguring a Cluster Use cmrunnode to start the new node, and, if you so decide, set the AUTOSTART_CMCLD parameter to 1 in the /etc/rc.config.d/cmcluster file to enable the new node to join the cluster automatically each time it reboots. NOTE Before you can add a node to a running cluster that uses Veritas CVM (on systems that support it), the node must already be connected to the disk devices for all CVM disk groups.
PAGE 353
Cluster and Package Maintenance Reconfiguring a Cluster cmquerycl -C clconfig.ascii -c cluster1 -n ftsys8 -n ftsys9 Step 3. Edit the file clconfig.ascii to check the information about the nodes that remain in the cluster. Step 4. Halt the node you are going to remove (ftsys10 in this example): cmhaltnode -f -v ftsys10 Step 5. Verify the new configuration: cmcheckconf -C clconfig.ascii Step 6.
PAGE 354
Cluster and Package Maintenance Reconfiguring a Cluster • Change the NETWORK_POLLING_INTERVAL. • Change the NETWORK_FAILURE_DETECTION parameter. • A combination of any of these in one transaction (cmapplyconf), given the restrictions below. What You Must Keep in Mind The following restrictions apply: • You must not change the configuration of all heartbeats at one time, or change or delete the only configured heartbeat. At least one working heartbeat, preferably with a standby, must remain unchanged.
PAGE 355
Cluster and Package Maintenance Reconfiguring a Cluster See page 288 for more information about the package networking parameters. • You cannot change the IP configuration of an interface used by the cluster in a single transaction (cmapplyconf). You must first delete the NIC from the cluster configuration, then reconfigure the NIC (using ifconfig (1m), for example), then add the NIC back into the cluster.
PAGE 356
Cluster and Package Maintenance Reconfiguring a Cluster #STATIONARY_IP NETWORK_INTERFACE 15.13.170.18 lan3 # Possible standby Network Interfaces for lan1, lan0: lan2. NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP #NETWORK_INTERFACE # STATIONARY_IP NETWORK_INTERFACE ftsys10 lan1 192.3.17.19 lan0 15.13.170.19 lan3 # Possible standby Network Interfaces for lan0, lan1: lan2 Step 2.
PAGE 357
Cluster and Package Maintenance Reconfiguring a Cluster Step 4. Apply the changes to the configuration and distribute the new binary configuration file to all cluster nodes.: cmapplyconf -C clconfig.ascii If you were configuring the subnet for data instead, and wanted to add it to a package configuration, you would now need to: 1. Halt the package 2. Add the new networking information to the package configuration file 3.
PAGE 358
Cluster and Package Maintenance Reconfiguring a Cluster NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP ftsys9 lan1 192.3.17.18 # NETWORK_INTERFACE lan0 # STATIONARY_IP 15.13.170.18 # NETWORK_INTERFACE lan3 # Possible standby Network Interfaces for lan1, lan0: lan2. NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP # NETWORK_INTERFACE # STATIONARY_IP # NETWORK_INTERFACE ftsys10 lan1 192.3.17.19 lan0 15.13.170.19 lan3 # Possible standby Network Interfaces for lan0, lan1: lan2 Step 4.
PAGE 359
Cluster and Package Maintenance Reconfiguring a Cluster Step 1. If you are not sure whether or not a physical interface (NIC) is part of the cluster configuration, run olrad -C with the affected I/O slot ID as argument. If the NIC is part of the cluster configuration, you’ll see a warning message telling you to remove it from the configuration before you proceed. See the olrad(1M) manpage for more information about olrad. Step 2.
PAGE 360
Cluster and Package Maintenance Reconfiguring a Cluster 1. Use the cmgetconf command to store a copy of the cluster's existing cluster configuration in a temporary file. For example: cmgetconf clconfig.ascii 2. Edit the file clconfig.ascii to add or delete volume groups. 3. Use the cmcheckconf command to verify the new configuration. 4. Use the cmapplyconf command to apply the changes to the configuration and distribute the new binary configuration file to all cluster nodes.
PAGE 361
Cluster and Package Maintenance Reconfiguring a Cluster • For CVM 4.1 and later with CFS, edit the configuration file of the package that uses CFS. Configure the three dependency_ parameters. Then run the cmapplyconf command. Similarly, you can delete VxVM or CVM disk groups provided they are not being used by a cluster node at the time.
PAGE 362
Cluster and Package Maintenance Reconfiguring a Cluster Use cmapplyconf to apply the changes to the configuration and send the new configuration file to all cluster nodes. Using -k or -K can significantly reduce the response time.
PAGE 363
Cluster and Package Maintenance Configuring a Legacy Package Configuring a Legacy Package IMPORTANT You can still create a new legacy package. If you are using a Serviceguard Toolkit such as Serviceguard NFS Toolkit, consult the documentation for that product. Otherwise, use this section to maintain and re-work existing legacy packages rather than to create new ones.
PAGE 364
Cluster and Package Maintenance Configuring a Legacy Package You can create a legacy package and its control script in Serviceguard Manager; use the Help for detailed instructions. Otherwise, use the following procedure to create a legacy package. NOTE For instructions on creating Veritas special-purpose system multi-node and multi-node packages, see “Configuring Veritas System Multi-node Packages” on page 313 and “Configuring Veritas Multi-node Packages” on page 315. Step 1.
PAGE 365
Cluster and Package Maintenance Configuring a Legacy Package Configuring a Package in Stages It is a good idea to configure failover packages in stages, as follows: 1. Configure volume groups and mount points only. 2. Distribute the control script to all nodes. 3. Apply the configuration. 4. Run the package and ensure that it can be moved from node to node. 5. Halt the package. 6. Configure package IP addresses and application services in the control script. 7. Distribute the control script to all nodes. 8.
PAGE 366
Cluster and Package Maintenance Configuring a Legacy Package For modular packages, the default form for parameter names in the package configuration file is lower case; for legacy packages the default is upper case. There are no compatibility issues; Serviceguard is case-insensitive as far as the parameter names are concerned.
PAGE 367
Cluster and Package Maintenance Configuring a Legacy Package • STORAGE_GROUP. On systems that support Veritas Cluster Volume manager (CVM), specify the names of any CVM storage groups that will be used by this package. Enter each storage group (CVM disk group) on a separate line. Note that CVM storage groups are not entered in the cluster configuration file. You should not enter LVM volume groups or VxVM disk groups in this file.
PAGE 368
Cluster and Package Maintenance Configuring a Legacy Package For legacy packages, DEFERRED resources must be specified in the package control script. NOTE • ACCESS_CONTROL_POLICY. You can grant a non-root user PACKAGE_ADMIN privileges for this package. See the entries for user_name, user_host, and user_role on page 297, and “Access Roles” on page 204, for more information. • If the package will depend on another package, enter values for DEPENDENCY_NAME, DEPENDENCY_CONDITION, and DEPENDENCY_LOCATION.
PAGE 369
Cluster and Package Maintenance Configuring a Legacy Package edit the configuration or control script files for these packages, although Serviceguard does not forbid it. Create and modify the information using cfs admin commands only. Use cmmakepkg to create the control script, then edit the control script. Use the following procedure to create the template for the sample failover package pkg1. First, generate a control script template, for example: cmmakepkg -s /etc/cmcluster/pkg1/pkg1.
PAGE 370
Cluster and Package Maintenance Configuring a Legacy Package Do not include CFS-based disk groups in the package control script; on systems that support CFS and CVM, they are activated by the CFS multi-node packages before standard packages are started. • If you are using mirrored VxVM disks, specify the mirror recovery option VXVOL. • Add the names of logical volumes and the file system that will be mounted on them.
PAGE 371
Cluster and Package Maintenance Configuring a Legacy Package # START OF CUSTOMER DEFINED FUNCTIONS # This function is a place holder for customer defined functions. # You should define all actions you want to happen here, before the service is # started. You can create as many functions as you need. function customer_defined_run_cmds { # ADD customer defined run commands. : # do nothing instruction, because a function must contain some command. date >> /tmp/pkg1.datelog echo 'Starting pkg1' >> /tmp/pkg1.
PAGE 372
Cluster and Package Maintenance Configuring a Legacy Package To avoid this situation, it is a good idea to always specify a RUN_SCRIPT_TIMEOUT and a HALT_SCRIPT_TIMEOUT for all packages, especially packages that use Serviceguard commands in their control scripts. If a timeout is not specified and your configuration has a command loop as described above, inconsistent results can occur, including a hung cluster.
PAGE 373
Cluster and Package Maintenance Configuring a Legacy Package • Configured resources are available on cluster nodes. • If a dependency is configured, the dependency package must already be configured in the cluster. Distributing the Configuration You can use Serviceguard Manager or HP-UX commands to distribute the binary cluster configuration file among the nodes of the cluster.
PAGE 374
Cluster and Package Maintenance Configuring a Legacy Package cmcheckconf -C /etc/cmcluster/cmcl.config -P \ /etc/cmcluster/pkg1/pkg1.config • Activate the cluster lock volume group so that the lock disk can be initialized: vgchange -a y /dev/vg01 • Generate the binary configuration file and distribute it across the nodes. cmapplyconf -v -C /etc/cmcluster/cmcl.config -P \ /etc/cmcluster/pkg1/pkg1.config • If you are using a lock disk, deactivate the cluster lock volume group.
PAGE 375
Cluster and Package Maintenance Reconfiguring a Package Reconfiguring a Package You reconfigure a a package in much the same way as you originally configured it; for modular packages, see Chapter 6, “Configuring Packages and Their Services,” on page 271; for older packages, see “Configuring a Legacy Package” on page 363. The cluster can be either halted or running during package reconfiguration.
PAGE 376
Cluster and Package Maintenance Reconfiguring a Package 3. Edit the package configuration file. IMPORTANT Restrictions on package names, dependency names, and service names have become more stringent as of A.11.18. Packages that have or contain names that do not conform to the new rules (spelled out under package_name on page 282) will continue to run, but if you reconfigure these packages, you will need to change the names that do not conform; cmcheckconf and cmapplyconf will enforce the new rules. 4.
PAGE 377
Cluster and Package Maintenance Reconfiguring a Package If this is a legacy package, remember to copy the control script to the /etc/cmcluster/pkg1 directory on all nodes that can run the package. To create the CFS disk group or mount point multi-node packages on systems that support CFS, see “Creating the Disk Group Cluster Packages” on page 248 and “Creating a Filesystem and Mount Point Package” on page 249.
PAGE 378
Cluster and Package Maintenance Reconfiguring a Package NOTE Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-CFS commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
PAGE 379
Cluster and Package Maintenance Reconfiguring a Package cmmodpkg -R -s myservice pkg1 The current value of the restart counter may be seen in the output of the cmviewcl -v command. Allowable Package States During Reconfiguration In many cases, you can make changes to a package’s configuration while the package is running. The table that follows shows exceptions - cases in which the package must not be running, or in which the results might not be what you expect.
PAGE 380
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 380 Types of Changes to Packages (Continued) Change to the Package Required Package State Change run script contents (legacy package) Package should not be running. Timing problems may occur if the script is changed while the package is running. Change halt script contents (legacy package) Package should not be running. Timing problems may occur if the script is changed while the package is running.
PAGE 381
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package Remove a file system Package must not be running. Add, change, or delete modular external scripts and pre-scripts Package must not be running. Package auto_run Package can be either running or halted.
PAGE 382
Cluster and Package Maintenance Responding to Cluster Events Responding to Cluster Events Serviceguard does not require much ongoing system administration intervention. As long as there are no failures, your cluster will be monitored and protected. In the event of a failure, those packages that you have designated to be transferred to another node will be transferred automatically.
PAGE 383
Cluster and Package Maintenance Single-Node Operation Single-Node Operation In a multi-node cluster, you could have a situation in which all but one node has failed, or you have shut down all but one node, leaving your cluster in single-node operation. This remaining node will probably have applications running on it. As long as the Serviceguard daemon cmcld is active, other nodes can rejoin the cluster.
PAGE 384
Cluster and Package Maintenance Removing Serviceguard from a System Removing Serviceguard from a System If you want to disable a node permanently from Serviceguard use, use the swremove command to delete the software. CAUTION Remove the node from the cluster first. If you run the swremove command on a server that is still a member of a cluster, it will cause that cluster to halt, and the cluster to be deleted. To remove Serviceguard: 1. If the node is an active member of a cluster, halt the node. 2.
PAGE 385
Troubleshooting Your Cluster 8 Troubleshooting Your Cluster This chapter describes how to verify cluster operation, how to review cluster status, how to add and replace hardware, and how to solve some typical cluster problems.
PAGE 386
Troubleshooting Your Cluster Testing Cluster Operation Testing Cluster Operation Once you have configured your Serviceguard cluster, you should verify that the various components of the cluster behave correctly in case of a failure. In this section, the following procedures test that the cluster responds properly in the event of a package failure, a node failure, or a LAN failure.
PAGE 387
Troubleshooting Your Cluster Testing Cluster Operation 4. Move the package back to the primary node (see “Moving a Failover Package” on page 343). Testing the Cluster Manager To test that the cluster manager is operating correctly, perform the following steps for each node on the cluster: 1. Turn off the power to the node SPU. 2.
PAGE 388
Troubleshooting Your Cluster Testing Cluster Operation 3. Verify that a local switch has taken place so that the Standby card is now the Primary card. In Serviceguard Manager, check the cluster properties. On the command line, use cmviewcl -v. 4. Reconnect the LAN to the original Primary card, and verify its status. In Serviceguard Manager, check the cluster properties. On the command line, use cmviewcl -v .
PAGE 389
Troubleshooting Your Cluster Monitoring Hardware Monitoring Hardware Good standard practice in handling a high availability system includes careful fault monitoring so as to prevent failures if possible or at least to react to them swiftly when they occur.
PAGE 390
Troubleshooting Your Cluster Monitoring Hardware action in case of a problem. For example, you could configure a disk monitor to report when a mirror was lost from a mirrored volume group being used in the cluster. Refer to the manual Using High Availability Monitors for additional information. Using EMS (Event Monitoring Service) Hardware Monitors A set of hardware monitors is available for monitoring and reporting on memory, CPU, and many other system values.
PAGE 391
Troubleshooting Your Cluster Monitoring Hardware HP ISEE also reports error conditions directly to an HP Response Center, alerting support personnel to the potential problem. HP ISEE is available through various support contracts. For more information, contact your HP representative.
PAGE 392
Troubleshooting Your Cluster Replacing Disks Replacing Disks The procedure for replacing a faulty disk mechanism depends on the type of disk configuration you are using. Separate descriptions are provided for replacing an array mechanism and a disk in a high availability enclosure. For more information, see the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, at http://docs.hp.
PAGE 393
Troubleshooting Your Cluster Replacing Disks new device. See the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, posted at http://docs.hp.com. 2. Identify the names of any logical volumes that have extents defined on the failed physical volume. 3.
PAGE 394
Troubleshooting Your Cluster Replacing Disks Replacing a Lock Disk You can replace an unusable lock disk while the cluster is running, provided you do not change the devicefile name (DSF).
PAGE 395
Troubleshooting Your Cluster Replacing Disks NOTE If you restore or recreate the volume group for the lock disk and you need to re-create the cluster lock (for example if no vgcfgbackup is available), you can run cmdisklock to re-create the lock. See the cmdisklock (1m) manpage for more information. Replacing a Lock LUN You can replace an unusable lock LUN while the cluster is running, provided you do not change the devicefile name (DSF).
PAGE 396
Troubleshooting Your Cluster Replacing Disks cmdisklock checks that the specified device is not in use by LVM, VxVM, ASM, or the file system, and will fail if the device has a label marking it as in use by any of those subsystems. cmdisklock -f overrides this check. CAUTION You are responsible for determining that the device is not being used by any subsystem on any node connected to the device before using cmdisklock -f. If you use cmdisklock -f without taking this precaution, you could lose data.
PAGE 397
Troubleshooting Your Cluster Replacing I/O Cards Replacing I/O Cards Replacing SCSI Host Bus Adapters After a SCSI Host Bus Adapter (HBA) card failure, you can replace the card using the following steps. Normally disconnecting any portion of the SCSI bus will leave the SCSI bus in an unterminated state, which will cause I/O errors for other nodes connected to that SCSI bus, so the cluster would need to be halted before disconnecting any portion of the SCSI bus.
PAGE 398
Troubleshooting Your Cluster Replacing LAN or Fibre Channel Cards Replacing LAN or Fibre Channel Cards If a LAN or fibre channel card fails and the card has to be replaced, you can replace it on-line or off-line depending on the type of hardware and operating system you are running. It is not necessary to bring the cluster down to do this. Off-Line Replacement Follow these steps to replace an I/O card off-line. 1. Halt the node by using the cmhaltnode command. 2.
PAGE 399
Troubleshooting Your Cluster Replacing LAN or Fibre Channel Cards NOTE After replacing a Fibre Channel I/O card, it may necessary to reconfigure the SAN to use the World Wide Name (WWN) of the new Fibre Channel card if Fabric Zoning or other SAN security requiring WWN is used.
PAGE 400
Troubleshooting Your Cluster Replacing a Failed Quorum Server System Replacing a Failed Quorum Server System When a quorum server fails or becomes unavailable to the clusters it is providing quorum services for, this will not cause a failure on any cluster. However, the loss of the quorum server does increase the vulnerability of the clusters in case there is an additional failure. Use the following procedure to replace a defective quorum server system.
PAGE 401
Troubleshooting Your Cluster Replacing a Failed Quorum Server System The command will output an error message if the specified nodes cannot communicate with the quorum server.
PAGE 402
Troubleshooting Your Cluster Troubleshooting Approaches Troubleshooting Approaches The following sections offer a few suggestions for troubleshooting by reviewing the state of the running system and by examining cluster status data, log files, and configuration files.
PAGE 403
Troubleshooting Your Cluster Troubleshooting Approaches IPv6: Name lan1* lo0 Mtu Address/Prefix 1500 none 4136 ::1/128 Ipkts Opkts 0 10690 0 10690 Reviewing the System Log File Messages from the Cluster Manager and Package Manager are written to the system log file. The default location of the log file is /var/adm/syslog/syslog.log. Also, package-related messages are logged into the package log file. The package log file is located in the package directory, by default.
PAGE 404
Troubleshooting Your Cluster Troubleshooting Approaches Dec 14 14:34:44 star04 cmcld[2048]: Request from node star04 to start package pkg5 on node star04. Dec 14 14:34:44 star04 cmcld[2048]: Executing '/etc/cmcluster/pkg5/pkg5_run start' for package pkg5. Dec 14 14:34:45 star04 LVM[2066]: vgchange -a n /dev/vg02 Dec 14 14:34:45 star04 cmcld[2048]: Package pkg5 run script exited with NO_RESTART. Dec 14 14:34:45 star04 cmcld[2048]: Examine the file /etc/cmcluster/pkg5/pkg5_run.log for more details.
PAGE 405
Troubleshooting Your Cluster Troubleshooting Approaches Reviewing the System Multi-node Package Files If you are running Veritas Cluster Volume Manager (supported on some versions of HP-UX), and you have problems starting the cluster, check the log file for the system multi-node package. For Cluster Volume Manager (CVM) 3.5, the file is VxVM-CVM-pkg.log. For CVM 4.1 and later, the file is SG-CFS-pkg.log.
PAGE 406
Troubleshooting Your Cluster Troubleshooting Approaches cmquerycl -v -C /etc/cmcluster/verify.ascii -n ftsys9 -n ftsys10 cmcheckconf -v -C /etc/cmcluster/verify.ascii The cmcheckconf command checks: • The network addresses and connections. • The cluster lock disk connectivity. • The validity of configuration parameters of the cluster and packages for: — The uniqueness of names. — The existence and permission of scripts. It doesn’t check: • The correct setup of the power circuits.
PAGE 407
Troubleshooting Your Cluster Troubleshooting Approaches Table 8-1 Data Displayed by the cmscancl Command (Continued) Description Source of Data file systems mount command LVM configuration /etc/lvmtab file LVM physical volume group data /etc/lvmpvg file link level connectivity for all links linkloop command binary configuration file cmviewconf command Using the cmviewconf Command cmviewconf allows you to examine the binary cluster configuration file, even when the cluster is not running.
PAGE 408
Troubleshooting Your Cluster Troubleshooting Approaches • cmscancl can be used to verify that primary and standby LANs are on the same bridged net. • cmviewcl -v shows the status of primary and standby LANs. Use these commands on all nodes.
PAGE 409
Troubleshooting Your Cluster Solving Problems Solving Problems Problems with Serviceguard may be of several types. The following is a list of common categories of problem: • Serviceguard Command Hangs. • Cluster Re-formations. • System Administration Errors. • Package Control Script Hangs. • Problems with VxVM Disk Groups. • Package Movement Errors. • Node and Network Failures. • Quorum Server Problems.
PAGE 410
Troubleshooting Your Cluster Solving Problems Name: ftsys9.cup.hp.com Address: 15.13.172.229 If the output of this command does not include the correct IP address of the node, then check your name resolution services further. Cluster Re-formations Cluster re-formations may occur from time to time due to current cluster conditions. Some of the causes are as follows: • local switch on an Ethernet LAN if the switch takes longer than the cluster NODE_TIMEOUT value.
PAGE 411
Troubleshooting Your Cluster Solving Problems You can use the following commands to check the status of your disks: • bdf - to see if your package's volume group is mounted. • vgdisplay -v - to see if all volumes are present. • lvdisplay -v - to see if the mirrors are synchronized. • strings /etc/lvmtab - to ensure that the configuration is correct. • ioscan -fnC disk - to see physical disks. • diskinfo -v /dev/rdsk/cxtydz - to display information about a disk.
PAGE 412
Troubleshooting Your Cluster Solving Problems NOTE Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
PAGE 413
Troubleshooting Your Cluster Solving Problems Next, deactivate the package volume groups. These are specified by the VG[] array entries in the package control script. vgchange -a n 4. Finally, re-enable the package for switching.
PAGE 414
Troubleshooting Your Cluster Solving Problems 3. v w - cvm 4. f - cfs Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
PAGE 415
Troubleshooting Your Cluster Solving Problems When the package starts up on another node in the cluster, a series of messages is printed in the package log file Follow the instructions in the messages to use the force import option (-C) to allow the current node to import the disk group. Then deport the disk group, after which it can be used again by the package.
PAGE 416
Troubleshooting Your Cluster Solving Problems • HPMC. This is a High Priority Machine Check, a system panic caused by a hardware error. • TOC • Panics • Hangs • Power failures In the event of a TOC, a system dump is performed on the failed node and numerous messages are also displayed on the console. You can use the following commands to check the status of your network and subnets: • netstat -in - to display LAN status and check to see if the package IP is stacked on the LAN card.
PAGE 417
Troubleshooting Your Cluster Solving Problems Unable to set client version at quorum server 192.6.7.2:reply timed out Probe of quorum server 192.6.7.2 timed out These messages could be an indication of an intermittent network; or the default quorum server timeout may not be sufficient. You can set the QS_TIMEOUT_EXTENSION to increase the timeout, or you can increase the heartbeat or node timeout value.
PAGE 418
Troubleshooting Your Cluster Solving Problems 418 Chapter 8
PAGE 419
Serviceguard Commands A Serviceguard Commands The following is an alphabetical list of commands used for Serviceguard cluster configuration and maintenance. Manpages for these commands are available on your system after installation. NOTE Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.
PAGE 420
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cfsdgadm Description • Display the status of CFS disk groups. • Add shared disk groups to a Veritas Cluster File System CFS cluster configuration, or remove existing CFS disk groups from the configuration. Serviceguard automatically creates the multi-node package SG-CFS-DG-id# to regulate the disk groups. This package has a dependency on the SG-CFS-pkg created by cfscluster command.
PAGE 421
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf Description Verify and apply Serviceguard cluster configuration and package configuration files. cmapplyconf verifies the cluster configuration and package configuration specified in the cluster_ascii_file and the associated pkg_ascii_file(s), creates or updates the binary configuration file, called cmclconfig, and distributes it to all nodes.
PAGE 422
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf (continued) Description Run cmgetconf to get either the cluster configuration file or package configuration file whenever changes to the existing configuration are required. Note that cmapplyconf will verify and distribute cluster configuration or package files. It will not cause the cluster daemon to start or be removed from the cluster configuration.
PAGE 423
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmdeleteconf Description Delete either the cluster or the package configuration. cmdeleteconf deletes either the entire cluster configuration, including all its packages, or only the specified package configuration. If neither cluster_name nor package_name is specified, cmdeleteconf will delete the local cluster’s configuration and all its packages.
PAGE 424
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltcl Description Halt a high availability cluster. cmhaltcl causes all nodes in a configured cluster to stop their cluster daemons, optionally halting all packages or applications in the process. This command will halt all the daemons on all currently running systems. If the user only wants to shutdown a subset of daemons, the cmhaltnode command should be used instead. cmhaltnode Halt a node in a high availability cluster.
PAGE 425
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltserv Description Halt a service from the high availability package halt script. This is not a command line executable command, it runs only from within the package control script. cmhaltserv is used in the high availability package halt script to halt a service. If any part of package is marked down, the package halt script is executed as part of the recovery process.
PAGE 426
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmmodnet Description Add or remove an address from a high availability cluster. cmmodnet is used to add or remove a relocatable package IP_address for the current network interface running the given subnet_name. cmmodnet can also be used to enable or disable a LAN_name currently configured in a cluster.
PAGE 427
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmquerycl Description Query cluster or node configuration information. cmquerycl searches all specified nodes for cluster configuration and Logical Volume Manager (LVM) information. Cluster configuration information includes network information such as LAN interface, IP addresses, bridged networks and possible heartbeat networks. LVM information includes volume group (VG) interconnection and file system mount point information.
PAGE 428
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmruncl Description Run a high availability cluster. cmruncl causes all nodes in a configured cluster or all nodes specified to start their cluster daemons and form a new cluster.This command should only be run when the cluster is not active on any of the configured nodes.
PAGE 429
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmrunserv Description Run a service from the high availability package run script. This is not a command line executable command, it runs only from within the package control script. cmrunserv is used in the high availability package run script to run a service. If the service process dies, cmrunserv updates the status of the service to down.
PAGE 430
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmscancl Description Gather system configuration information from nodes with Serviceguard installed. cmscancl is a configuration report and diagnostic tool which gathers system software and hardware configuration information from a list of nodes, or from all the nodes in a cluster.
PAGE 431
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmstartres Description This command is run by package control scripts, and not by users! Starts resource monitoring on the local node for an EMS resource that is configured in a Serviceguard package. cmstartres starts resource monitoring for an EMS resource on the local node. This resource must be configured in the specified package_name.
PAGE 432
Serviceguard Commands 432 Appendix A
PAGE 433
Enterprise Cluster Master Toolkit B Enterprise Cluster Master Toolkit The Enterprise Cluster Master Toolkit (ECMT) provides a group of example scripts and package configuration files for creating Serviceguard packages for several major database and internet software products. Each toolkit contains a README file that explains how to customize the package for your needs. The ECMT can be installed on HP-UX 11i v1, 11i v2, or 11i v3.
PAGE 434
Enterprise Cluster Master Toolkit 434 Appendix B
PAGE 435
Designing Highly Available Cluster Applications C Designing Highly Available Cluster Applications This appendix describes how to create or port applications for high availability, with emphasis on the following topics: • Automating Application Operation • Controlling the Speed of Application Failover • Designing Applications to Run on Multiple Systems • Restoring Client Connections • Handling Application Failures • Minimizing Planned Downtime Designing for high availability means reducing the
PAGE 436
Designing Highly Available Cluster Applications Automating Application Operation Automating Application Operation Can the application be started and stopped automatically or does it require operator intervention? This section describes how to automate application operations to avoid the need for user intervention. One of the first rules of high availability is to avoid manual intervention.
PAGE 437
Designing Highly Available Cluster Applications Automating Application Operation Define Application Startup and Shutdown Applications must be restartable without manual intervention. If the application requires a switch to be flipped on a piece of hardware, then automated restart is impossible. Procedures for application startup, shutdown and monitoring must be created so that the HA software can perform these functions automatically.
PAGE 438
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Controlling the Speed of Application Failover What steps can be taken to ensure the fastest failover? If a failure does occur causing the application to be moved (failed over) to another node, there are many things the application can do to reduce the amount of time it takes to get the application back up and running.
PAGE 439
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Use Raw Volumes If your application uses data, use raw volumes rather than filesystems. Raw volumes do not require an fsck of the filesystem, thus eliminating one of the potentially lengthy steps during a failover. Evaluate the Use of JFS If a file system must be used, a JFS offers significantly faster file system recovery as compared to an HFS. However, performance of the JFS may vary with the application.
PAGE 440
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Keep Logs Small Some databases permit logs to be buffered in memory to increase online performance. Of course, when a failure occurs, any in-flight transaction will be lost. However, minimizing the size of this in-memory log will reduce the amount of completed transaction data that would be lost in case of failure.
PAGE 441
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Another example is an application where a clerk is entering data about a new employee. Suppose this application requires that employee numbers be unique, and that after the name and number of the new employee is entered, a failure occurs.
PAGE 442
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Design for Multiple Servers If you use multiple active servers, multiple service points can provide relatively transparent service to a client. However, this capability requires that the client be smart enough to have knowledge about the multiple servers and the priority for addressing them. It also requires access to the data of the failed server or replicated data.
PAGE 443
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Designing Applications to Run on Multiple Systems If an application can be failed to a backup node, how will it work on that different system? The previous sections discussed methods to ensure that an application can be automatically restarted. This section will discuss some ways to ensure the application can run on multiple systems.
PAGE 444
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Each application or package should be given a unique name as well as a relocatable IP address. Following this rule separates the application from the system on which it runs, thus removing the need for user knowledge of which system the application runs on. It also makes it easier to move the application among different systems in a cluster for for load balancing or other reasons.
PAGE 445
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Avoid Using SPU IDs or MAC Addresses Design the application so that it does not rely on the SPU ID or MAC (link-level) addresses. The SPU ID is a unique hardware ID contained in non-volatile memory, which cannot be changed. A MAC address (also known as a LANIC id) is a link-specific address associated with the LAN hardware.
PAGE 446
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Applications should not reference official hostnames or IP addresses. The official hostname and corresponding IP address for the hostname refer to the primary LAN card and the stationary IP address for that card.
PAGE 447
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems develop alternate means of verifying where they are running. For example, an application might check a list of hostnames that have been provided in a configuration file. Bind to a Fixed Port When binding a socket, a port address can be specified or one can be assigned dynamically.
PAGE 448
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems For TCP stream sockets, the TCP level of the protocol stack resolves this problem for the client since it is a connection-based protocol. On the client, TCP ignores the stationary IP address and continues to use the previously bound relocatable IP address originally used by the client. With UDP datagram sockets, however, there is a problem.
PAGE 449
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems applications must move together. If the applications’ data stores are in separate volume groups, they can switch to different nodes in the event of a failover. The application data should be set up on different disk drives and if applicable, different mount points. The application should be designed to allow for different disks and separate mount points.
PAGE 450
Designing Highly Available Cluster Applications Restoring Client Connections Restoring Client Connections How does a client reconnect to the server after a failure? It is important to write client applications to specifically differentiate between the loss of a connection to the server and other application-oriented errors that might be returned. The application should take special action in case of connection loss.
PAGE 451
Designing Highly Available Cluster Applications Restoring Client Connections the retry to the current server should continue for the amount of time it takes to restart the server locally. This will keep the client from having to switch to the second server in the event of a application failure. • Use a transaction processing monitor or message queueing software to increase robustness.
PAGE 452
Designing Highly Available Cluster Applications Handling Application Failures Handling Application Failures What happens if part or all of an application fails? All of the preceding sections have assumed the failure in question was not a failure of the application, but of another component of the cluster. This section deals specifically with application problems.
PAGE 453
Designing Highly Available Cluster Applications Handling Application Failures ensure that the application is behaving correctly. If the application fails and it is not detected automatically, it might take hours for a user to determine the cause of the downtime and recover from it.
PAGE 454
Designing Highly Available Cluster Applications Minimizing Planned Downtime Minimizing Planned Downtime Planned downtime (as opposed to unplanned downtime) is scheduled; examples include backups, systems upgrades to new operating system revisions, or hardware replacements. For planned downtime, application designers should consider: • Reducing the time needed for application upgrades/patches.
PAGE 455
Designing Highly Available Cluster Applications Minimizing Planned Downtime Provide for Rolling Upgrades Provide for a “rolling upgrade” in a client/server environment. For a system with many components, the typical scenario is to bring down the entire system, upgrade every node to the new version of the software, and then restart the application on all the affected nodes. For large systems, this could result in a long downtime. An alternative is to provide for a rolling upgrade.
PAGE 456
Designing Highly Available Cluster Applications Minimizing Planned Downtime Providing Online Application Reconfiguration Most applications have some sort of configuration information that is read when the application is started. If to make a change to the configuration, the application must be halted and a new configuration file read, downtime is incurred. To avoid this downtime use configuration tools that interact with an application and make dynamic changes online.
PAGE 457
Integrating HA Applications with Serviceguard D Integrating HA Applications with Serviceguard The following is a summary of the steps you should follow to integrate an application into the Serviceguard environment: 1. Read the rest of this book, including the chapters on cluster and package configuration, and the Appendix “Designing Highly Available Cluster Applications.” 2.
PAGE 458
Integrating HA Applications with Serviceguard NOTE 458 • Can the application be installed cluster-wide? • Does the application work with a cluster-wide file name space? • Will the application run correctly with the data (file system) available on all nodes in the cluster? This includes being available on cluster nodes where the application is not currently running.
PAGE 459
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Checklist for Integrating HA Applications This section contains a checklist for integrating HA applications in both single and multiple systems. Defining Baseline Application Behavior on a Single System Define a baseline behavior for the application on a standalone system: 1. Install the application, database, and other required resources on one of the systems.
PAGE 460
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications c. Install the appropriate executables. d. With the application not running on the first system, try to bring it up on the second system. You might use the script you created in the step above. Is there anything different that you must do? Does it run? e. Repeat this process until you can get the application to run on the second system. 2. Configure the Serviceguard cluster: a. Create the cluster configuration. b.
PAGE 461
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications NOTE Appendix D CVM and CFS are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
PAGE 462
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Testing the Cluster 1. Test the cluster: • Have clients connect. • Provide a normal system load. • Halt the package on the first node and move it to the second node: # cmhaltpkg pkg1 # cmrunpkg -n node2 pkg1 # cmmodpkg -e pkg1 • Move it back. # cmhaltpkg pkg1 # cmrunpkg -n node1 pkg1 # cmmodpkg -e pkg1 • Fail one of the systems. For example, turn off the power on node 1.
PAGE 463
Software Upgrades E Software Upgrades There are three types of upgrade you can do: • rolling upgrade • non-rolling upgrade • migration with cold install Each of these is discussed below.
PAGE 464
Software Upgrades Types of Upgrade Types of Upgrade Rolling Upgrade In a rolling upgrade, you upgrade the HP-UX operating system (if necessary) and the Serviceguard software one node at a time without bringing down your cluster. A rolling upgrade can also be done any time one system needs to be taken offline for hardware maintenance or patch installations. This method is the least disruptive, but your cluster must meet both general and release-specific requirements.
PAGE 465
Software Upgrades Guidelines for Rolling Upgrade Guidelines for Rolling Upgrade You can normally do a rolling upgrade if: • You are not upgrading the nodes to a new version of HP-UX; or • You are upgrading to a new version of HP-UX, but using the update process (update-ux), rather than a cold install. update-ux supports many, but not all, upgrade paths. For more information, see the HP-UX Installation and Update Guide for the target version of HP-UX.
PAGE 466
Software Upgrades Performing a Rolling Upgrade Performing a Rolling Upgrade Limitations of Rolling Upgrades The following limitations apply to rolling upgrades: • During a rolling upgrade, you should issue Serviceguard commands (other than cmrunnode and cmhaltnode) only on a node containing the latest revision of the software. Performing tasks on a node containing an earlier revision of the software will not work or will cause inconsistent results.
PAGE 467
Software Upgrades Performing a Rolling Upgrade • Rolling upgrades are not intended as a means of using mixed releases of Serviceguard or HP-UX within the cluster. HP strongly recommends that you upgrade all cluster nodes as quickly as possible to the new release level. • You cannot delete Serviceguard software (via swremove) from a node while a rolling upgrade is in progress.
PAGE 468
Software Upgrades Performing a Rolling Upgrade If the cluster fails before the rolling upgrade is complete (because of a catastrophic power failure, for example), you can restart the cluster by entering the cmruncl command from a node which has been upgraded to the latest version of the software. Keeping Kernels Consistent If you change kernel parameters as a part of doing an upgrade, be sure to change the parameters to the same values on all nodes that can run the same packages in case of failover.
PAGE 469
Software Upgrades Example of a Rolling Upgrade Example of a Rolling Upgrade NOTE Warning messages may appear during a rolling upgrade while the node is determining what version of software is running. This is a normal occurrence and not a cause for concern. The following example shows a simple rolling upgrade on two nodes running one package each, as shown in Figure E-1.
PAGE 470
Software Upgrades Example of a Rolling Upgrade This will cause pkg1 to be halted cleanly and moved to node 2. The Serviceguard daemon on node 1 is halted, and the result is shown in Figure E-2. Figure E-2 Running Cluster with Packages Moved to Node 2 Step 2. Upgrade node 1 to the next operating system release (“HP-UX (new)”), and install the next version of Serviceguard (“SG (new)”).
PAGE 471
Software Upgrades Example of a Rolling Upgrade Step 3. When upgrading is finished, enter the following command on node 1 to restart the cluster on node 1. # cmrunnode -n node1 At this point, different versions of the Serviceguard daemon (cmcld) are running on the two nodes, as shown in Figure E-4. Figure E-4 Node 1 Rejoining the Cluster Step 4. Repeat the process on node 2. Halt the node, as follows: # cmhaltnode -f node2 This causes both packages to move to node 1.
PAGE 472
Software Upgrades Example of a Rolling Upgrade Figure E-5 Running Cluster with Packages Moved to Node 1 Step 5. Move pkg2 back to its original node. Use the following commands: # cmhaltpkg pkg2 # cmrunpkg -n node2 pkg2 # cmmodpkg -e pkg2 The cmmodpkg command re-enables switching of the package, which was disabled by the cmhaltpkg command. The final running cluster is shown in Figure E-6.
PAGE 473
Software Upgrades Example of a Rolling Upgrade Figure E-6 Appendix E Running Cluster After Upgrades 473
PAGE 474
Software Upgrades Guidelines for Non-Rolling Upgrade Guidelines for Non-Rolling Upgrade Do a non-rolling upgrade if: • Your cluster does not meet the requirements for rolling upgrade as specified in the Release Notes for the target version of Serviceguard; or • The limitations imposed by rolling upgrades make it impractical for you to do a rolling upgrade (see “Limitations of Rolling Upgrades” on page 466); or • For some other reason you need or prefer to bring the cluster down before performing the u
PAGE 475
Software Upgrades Performing a Non-Rolling Upgrade Performing a Non-Rolling Upgrade Limitations of Non-Rolling Upgrades The following limitations apply to non-rolling upgrades: • Binary configuration files may be incompatible between releases of Serviceguard. Do not manually copy configuration files between nodes. • You must halt the entire cluster before performing a non-rolling upgrade. Steps for Non-Rolling Upgrades Use the following steps for a non-rolling software upgrade: Step 1.
PAGE 476
Software Upgrades Guidelines for Migrating a Cluster with Cold Install Guidelines for Migrating a Cluster with Cold Install There may be circumstances when you prefer to do a cold install of the HP-UX operating system rather than an upgrade. A cold install erases the existing operating system and data and then installs the new operating system and software; you must then restore the data. CAUTION The cold install process erases the existing software, operating system, and data.
PAGE 477
Software Upgrades Guidelines for Migrating a Cluster with Cold Install See “Creating the Storage Infrastructure and Filesystems with LVM and VxVM” on page 221 for more information. 2. Halt the cluster applications, and then halt the cluster. 3. Do a cold install of the HP-UX operating system. For more information on the cold install process, see the HP-UX Installation and Update Guide for the target version of HP-UX: go to http://docs.hp.
PAGE 478
Software Upgrades Guidelines for Migrating a Cluster with Cold Install 478 Appendix E
PAGE 479
Blank Planning Worksheets F Blank Planning Worksheets This appendix reprints blank versions of the planning worksheets described in the chapter “Planning and Documenting an HA Cluster.” You can duplicate any of these worksheets that you find useful and fill them in as a part of the planning process.
PAGE 480
Blank Planning Worksheets Worksheet for Hardware Planning Worksheet for Hardware Planning HARDWARE WORKSHEET Page ___ of ____ =============================================================================== Node Information: Host Name _____________________ Series No _____________________ Memory Capacity ____________________ Number of I/O Slots ________________ =============================================================================== LAN Information: Name of Subnet _________ Name of IP Interface
PAGE 481
Blank Planning Worksheets Power Supply Worksheet Power Supply Worksheet POWER SUPPLY WORKSHEET Page ___ of ____ =============================================================================== SPU Power: Host Name _____________________ Power Supply _______________________ Host Name _____________________ Power Supply _______________________ =============================================================================== Disk Power: Disk Unit __________________________ Power Supply _____________________
PAGE 482
Blank Planning Worksheets Quorum Server Worksheet Quorum Server Worksheet Quorum Server Data: ============================================================================== QS Hostname: _________________IP Address: ______________________ ============================================================================== Quorum Services are Provided for: Cluster Name: ___________________________________________________________ Host Names ____________________________________________ Host Names _________________
PAGE 483
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet LVM Volume Group and Physical Volume Worksheet PHYSICAL VOLUME WORKSHEET Page ___ of ____ =============================================================================== Volume Group Name: ______________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________
PAGE 484
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet Physical Volume Name: _____________________________________________________ 484 Appendix F
PAGE 485
Blank Planning Worksheets VxVM Disk Group and Disk Worksheet VxVM Disk Group and Disk Worksheet DISK GROUP WORKSHEET Page ___ of ____ =========================================================================== Disk Group Name: __________________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:____________________________________________________
PAGE 486
Blank Planning Worksheets Cluster Configuration Worksheet Cluster Configuration Worksheet =============================================================================== Name and Nodes: =============================================================================== Cluster Name: __________________________ RAC Version: _______________ Node Names: _________________________________________________________ Volume Groups (for packages):________________________________________ ===================================
PAGE 487
Blank Planning Worksheets Cluster Configuration Worksheet Autostart Delay: ___________ =============================================================================== Access Policies: User name: Host node: Role: =============================================================================== Appendix F 487
PAGE 488
Blank Planning Worksheets Package Configuration Worksheet Package Configuration Worksheet Package Configuration File Data: ========================================================================== Package Name: __________________Package Type:______________ Primary Node: ____________________ First Failover Node:__________________ Additional Failover Nodes:__________________________________ Run Script Timeout: _____ Halt Script Timeout: _____________ Package AutoRun Enabled? ______ Local LAN Failover Allow
PAGE 489
Blank Planning Worksheets Package Configuration Worksheet CVM Disk Groups [ignore CVM items if CVM is not being used]: cvm_vg___________cvm_dg_____________cvm_vg_______________ cvm_activation_cmd: ______________________________________________ VxVM Disk Groups: vxvm_dg_________vxvm_dg____________vxvm_dg_____________ vxvol_cmd ______________________________________________________ ________________________________________________________________________________ Logical Volumes and File Systems: fs_name_____
PAGE 490
Blank Planning Worksheets Package Configuration Worksheet Package environment variable:________________________________________________ Package environment variable:________________________________________________ External pre-script:_________________________________________________________ External script:_____________________________________________________________ ================================================================================ NOTE 490 CVM (and CFS - Cluster File System) are supported
PAGE 491
Blank Planning Worksheets Package Control Script Worksheet Package Control Script Worksheet =============================================================================== Name and Nodes: =============================================================================== Cluster Name: __________________________ RAC Version: _______________ Node Names: _________________________________________________________ Volume Groups (for packages):________________________________________ ===============================
PAGE 492
Blank Planning Worksheets Package Control Script Worksheet First Lock Volume Group: | Physical Volume: | ________________ | Name on Node 1: ___________________ | Name on Node 2: ___________________ | Disk Unit No: _________ | Power Supply No: _________ =============================================================================== Cluster Lock LUN: Pathname on Node 1: ___________________ Pathname on Node 2: ___________________ Pathname on Node 3: ___________________ Pathname on Node 4: _________
PAGE 493
Blank Planning Worksheets Package Control Script Worksheet NOTE Appendix F CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
PAGE 494
Blank Planning Worksheets Package Control Script Worksheet 494 Appendix F
PAGE 495
Migrating from LVM to VxVM Data Storage G Migrating from LVM to VxVM Data Storage This appendix describes how to migrate LVM volume groups to VxVM disk groups for use with the Veritas Volume Manager (VxVM), or with the Cluster Volume Manager (CVM) on systems that support it.
PAGE 496
Migrating from LVM to VxVM Data Storage Loading VxVM Loading VxVM Before you can begin migrating data, you must install the Veritas Volume Manager software and all required VxVM licenses on all cluster nodes. This step requires each system to be rebooted, so it requires you to remove the node from the cluster before the installation, and restart the node after installation. This can be done as a part of a rolling upgrade procedure, described in Appendix E.
PAGE 497
Migrating from LVM to VxVM Data Storage Migrating Volume Groups Migrating Volume Groups The following procedure shows how to do the migration of individual volume groups for packages that are configured to run on a given node. You should convert all the volume groups for a package at the same time. It is assumed that VxVM software and an appropriate version of HP-UX and Serviceguard have been installed on the node, and that the node has rebooted and rejoined the cluster.
PAGE 498
Migrating from LVM to VxVM Data Storage Migrating Volume Groups utility is described along with its limitations and cautions in the Veritas Volume Manager Migration Guide for your version, available from http://www.docs.hp.com. If using the vxconvert(1M) utility, then skip the next step and go ahead to the following section. NOTE Remember that the cluster lock disk, if used, must be configured on an LVM volume group and physical volume.
PAGE 499
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM Customizing Packages for VxVM After creating the VxVM disk group, you need to customize the Serviceguard package that will access the storage. Use the following procedure for a legacy package that will use with the Veritas Volume Manager (VxVM) disk groups. If you are using the Cluster Volume Manager (CVM), skip ahead to the next section.
PAGE 500
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM LV[0]="/dev/vx/dsk/dg01/lvol101" LV[1]="/dev/vx/dsk/dg01/lvol102" LV[2]="/dev/vx/dsk/dg02/lvol201" LV[3]="/dev/vx/dsk/dg02/lvol202" FS[0]="/mnt_dg0101" FS[1]="/mnt_dg0102" FS[2]="/mnt_dg0201" FS[3]="/mnt_dg0202" FS_MOUNT_OPT[0]="-o FS_MOUNT_OPT[1]="-o FS_MOUNT_OPT[2]="-o FS_MOUNT_OPT[3]="-o ro" rw" ro" rw" 4.
PAGE 501
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM Customizing Packages for CVM NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard). After creating the CVM disk group, you need to customize the Serviceguard package that will access the storage.
PAGE 502
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM • The LV[], FS[] and FS_MOUNT_OPT[] arrays are used the same as they are for LVM. LV[] defines the logical volumes, FS[] defines the mount points, and FS_MOUNT_OPT[] defines any mount options. For example lets say we have two volumes defined in each of the two disk groups from above, lvol101 and lvol102, and lvol201 and lvol202. These are mounted on /mnt_dg0101 and /mnt_dg0102, and /mnt_dg0201 and /mnt_dg0202, respectively.
PAGE 503
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM Then re-apply the package configuration: # cmapplyconf -P PackageName.ascii 8. Test to make sure the disk group and data are intact. 9. Deport the disk group: # vxdg deport DiskGroupName 10. Start the cluster, if it is not already running: # cmruncl This will activate the special CVM package. 11. When CVM starts up, it selects a master node, and this is the node from which you must issue the disk group configuration commands.
PAGE 504
Migrating from LVM to VxVM Data Storage Removing LVM Volume Groups Removing LVM Volume Groups After testing the new VxVM disk groups, remove any LVM volume groups that are no longer wanted from the system using the standard LVM commands lvremove, pvremove, and vgremove. At a convenient time, you should also edit the cluster ASCII configuration file to remove the VOLUME_GROUP statements that refer to the LVM volume groups that are no longer used in the cluster.
PAGE 505
IPv6 Network Support H IPv6 Network Support This appendix describes some of the characteristics of IPv6 network addresses.
PAGE 506
IPv6 Network Support IPv6 Address Types IPv6 Address Types Several IPv6 types of addressing schemes are specified in the RFC 2373 (IPv6 Addressing Architecture). IPv6 addresses are 128-bit identifiers for interfaces and sets of interfaces. There are various address formats for IPv6 defined by the RFC 2373. IPv6 addresses are broadly classified as follows: The following table explains the three types of IPv6 address types: unicast, anycast, and multicast.
PAGE 507
IPv6 Network Support IPv6 Address Types multiple groups of 16-bits of zeros. The “::” can appear only once in an address and it can be used to compress the leading, trailing, or contiguous sixteen-bit zeroes in an address. Example: fec0:1:0:0:0:0:0:1234 can be represented as fec0:1::1234. • When dealing with a mixed environment of IPv4 and IPv6 nodes there is an alternative form of IPv6 address that will be used. It is x:x:x:x:x:x:d.d.d.
PAGE 508
IPv6 Network Support IPv6 Address Types Unicast Addresses IPv6 unicast addresses are classified into different types. They are global aggregatable unicast address, site-local address and link-local address. Typically a unicast address is logically divided as follows: Table H-2 n bits 128-n bits Subnet prefix Interface ID Interface identifiers in a IPv6 unicast address are used to identify the interfaces on a link. Interface identifiers are required to be unique on that link.
PAGE 509
IPv6 Network Support IPv6 Address Types IPv4 Mapped IPv6 Address There is a special type of IPv6 address that holds an embedded IPv4 address. This address is used to represent the addresses of IPv4-only nodes as IPv6 addresses. These addresses are used especially by applications that support both IPv6 and IPv4. These addresses are called as IPv4 Mapped IPv6 Addresses. The format of these address is as follows: Table H-4 80 bits 16 bits zeros 32 bits FFFF IPv4 address Example: ::ffff:192.168.0.
PAGE 510
IPv6 Network Support IPv6 Address Types Link-Local Addresses Link-local addresses have the following format: Table H-6 10 bits 1111111010 54 bits 0 64 bits interface ID Link-local address are supposed to be used for addressing nodes on a single link. Packets originating from or destined to a link-local address will not be forwarded by a router.
PAGE 511
IPv6 Network Support IPv6 Address Types “FF” at the beginning of the address identifies the address as a multicast address. The “flgs” field is a set of 4 flags “000T”. The higher order 3 bits are reserved and must be zero. The last bit ‘T’ indicates whether it is permanently assigned or not. A value of zero indicates that it is permanently assigned otherwise it is a temporary assignment. The “scop” field is a 4-bit field which is used to limit the scope of the multicast group.
PAGE 512
IPv6 Network Support Network Configuration Restrictions Network Configuration Restrictions Serviceguard supports IPv6 for data links only. The heartbeat IP must still be IPv4, but the package IPs can be IPv4 or IPv6. To configure IPv6, the system should be set up in what is called a dual-stack configuration, which requires the IPv6 product bundle. The restrictions for supporting IPv6 in Serviceguard are listed below. 512 • The heartbeat IP address must be IPv4.
PAGE 513
IPv6 Network Support Network Configuration Restrictions NOTE Appendix H Even though link-local IP addresses are not supported in the Serviceguard cluster configuration, the primary link-local address on the Serviceguard primary interface will be switched over the standby during a local switch. This is because of two requirements: First, the dual stack (IPv4/IPv6) kernel requires that the primary IP address associated with an interface must always be a link-local address.
PAGE 514
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature IPv6 Relocatable Address and Duplicate Address Detection Feature The IPv6 networking stack has a new feature, Duplicate Address Detection (DAD), that was not previously available in IPv4. When an address is being added, the DAD detects a duplicate address that is already being used on the network.
PAGE 515
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature # TRANSPORT_NAME[index]=ip6 # NDD_NAME[index]=ip6_nd_dad_solicit_count # NDD_VALUE[index]=n Where index is the next available integer value of the nddconf file, and n is a number: either 1 to turn the feature ON or 0 to turn it OFF.
PAGE 516
IPv6 Network Support Local Primary/Standby LAN Patterns Local Primary/Standby LAN Patterns The use of IPv6 allows a number of different patterns of failover among LAN cards configured in the cluster. This is true because each LAN card can support several IP addresses when a dual IPv4/IPv6 configuration is used. This section describes several ways in that local failover to a standby LAN can be configured.
PAGE 517
IPv6 Network Support Example Configurations Example Configurations An example of a LAN configuration on a cluster node using both IPv4 and IPv6 addresses is shown in below. Figure H-1 Example 1: IPv4 and IPv6 Addresses in Standby Configuration Following the loss of lan0 or lan2, lan1 can adopt either address, as shown below. Figure H-2 Example 1: IPv4 and IPv6 Addresses after Failover to Standby The same LAN card can be configured with both IPv4 and IPv6 addresses, as shown in below.
PAGE 518
IPv6 Network Support Example Configurations Figure H-3 Example 2: IPv4 and IPv6 Addresses in Standby Configuration This type of configuration allows failover of both addresses to the standby. This is shown in below.
PAGE 519
Maximum and Minimum Values for Cluster and Package Configuration Parameters I Maximum and Minimum Values for Cluster and Package Configuration Parameters Table I-1 shows the range of possible values for cluster configuration parameters.
PAGE 520
Maximum and Minimum Values for Cluster and Package Configuration Parameters Table I-2 shows the range of possible values for package configuration parameters. Table I-2 Package Paramet er 520 Minimum and Maximum Values of Package Configuration Parameters Minimum Value Maximum Value Run Script Timeout 10 seconds 4294 seconds if a non-zero value is specified 0 (NO_TIMEOUT) This is a recommended value.
PAGE 521
A Access Control Policies, 193, 204 Access Control Policy, 162 Access roles, 162 active node, 27 adding a package to a running cluster, 376 adding cluster nodes advance planning, 221 adding nodes to a running cluster, 337 adding packages on a running cluster, 310 additional package resource parameter in package configuration, 189, 190 additional package resources monitoring, 81 addressing, SCSI, 137 administration adding nodes to a ruuning cluster, 337 cluster and package states, 319 halting a package, 342
PAGE 522
hardware planning, 139 C CFS Creating a storage infrastructure, 245 creating a storage infrastructure, 245 not supported on all HP-UX versions, 29 changes in cluster membership, 64 changes to cluster allowed while the cluster is running, 346 changes to packages allowed while the cluster is running, 379 changing the volume group configuration while the cluster is running, 359 checkpoints, 441 client connections restoring in applications, 450 cluster configuring with commands, 234 redundancy of components, 36
PAGE 523
creating physical volumes, 223 parameter in cluster manager configuration, 161 cluster with high availability disk array figure, 46, 47 clusters active/standby type, 49 larger size, 49 cmapplyconf, 242, 373 cmassistd daemon, 55 cmcheckconf, 241, 308, 372 troubleshooting, 405 cmclconfd daemon, 55 cmcld daemon, 55 and node TOC, 56 and safety timer, 56 functions, 56 runtime priority, 57 cmclnodelist bootstrap file, 204 cmdeleteconf deleting a package configuration, 377 deleting the cluster configuration, 269 c
PAGE 524
deleting a package configuration using cmdeleteconf, 377 deleting a package from a running cluster, 377 deleting nodes while the cluster is running, 352, 360 deleting the cluster configuration using cmdeleteconf, 269 dependencies configuring, 171 designing applications to run on multiple systems, 443 detecting failures in network manager, 100 device special files (DSFs) agile addressing, 111, 466 legacy, 112 migrating cluster lock disks to, 348 disk choosing for volume groups, 223 data, 41 interfaces, 41
PAGE 525
F failback policy package configuration file parameter, 186 used by package manager, 78 FAILBACK_POLICY parameter in package configuration file, 186 used by package manager, 78 failover controlling the speed in applications, 438 defined, 27 failover behavior in packages, 82 failover package, 71 failover policy package configuration parameter, 186 used by package manager, 75 FAILOVER_POLICY parameter in package configuration file, 186 used by package manager, 75 failure kinds of responses, 125 network commun
PAGE 526
host IP address, 136, 146 host name, 135 I/O bus addresses, 139 I/O slot numbers, 139 LAN information, 136 LAN interface name, 136, 146 LAN traffic type, 136 memory capacity, 135 number of I/O slots, 135 planning the configuration, 134 S800 series number, 135 SPU information, 135 subnet, 136, 146 worksheet, 140 heartbeat interval parameter in cluster manager configuration, 159 heartbeat messages, 27 defined, 62 heartbeat subnet address parameter in cluster manager configuration, 156 HEARTBEAT_INTERVAL and n
PAGE 527
L LAN Critical Resource Analysis (CRA), 358 heartbeat, 62 interface name, 136, 146 planning information, 136 LAN CRA (Critical Resource Analysis), 358 LAN failure Serviceguard behavior, 36 LAN interfaces monitoring with network manager, 100 primary and secondary, 38 LAN planning host IP address, 136, 146 traffic type, 136 LANs, standby and safety timer, 57 larger clusters, 49 legacy DSFs defined, 112 legacy package, 363 link-level addresses, 445 LLT for CVM and CFS, 60 load balancing HP-UX and Veritas DMP,
PAGE 528
monitoring LAN interfaces in network manager, 100 mount options in control script, 191, 192 in package configuration, 191 moving a package, 343 multi-node package, 71 multi-node package configuration, 315 multi-node packages configuring, 315 multipathing and Veritas DMP, 43 automatically configured, 43 native, 43 sources of information, 43 multiple systems designing applications for, 443 N name resolution services, 209 native mutipathing defined, 43 network adding and deleting package IP addresses, 99 fail
PAGE 529
O olrad command removing a LAN or VLAN interface, 359 online hardware maintenance by means of in-line SCSI terminators, 396 OTS/9000 support, 519 outages insulating users from, 436 P package adding and deleting package IP addresses, 99 base modules, 276 basic concepts, 36 changes allowed while the cluster is running, 379 configuring legacy, 363 failure, 125 halting, 342 legacy, 363 local interface switching, 101 modular, 275 modular and legacy, 271 modules, 275 moving, 343 optional modules, 277 parameters,
PAGE 530
package type parameter in package configuration, 183 Package types, 26 failover, 26 multi-node, 26 system multi-node, 26 package types, 26 package_configuration file cvm_activation_cmd, 190 PACKAGE_NAME parameter in package ASCII configuration file, 183 PACKAGE_TYPE parameter in package ASCII configuration file, 183 packages deciding where and when to run, 72, 73 managed by cmcld, 57 parameters for failover, 82 parameters for cluster manager initial configuration, 62 PATH, 197 performance variables in packa
PAGE 531
parameter in cluster manager configuration, 155 quorum and cluster reformation, 126 quorum server and safety timer, 57 blank planning worksheet, 482 installing, 217 parameters in cluster manager configuration, 155 planning, 145 status and state, 324 use in re-forming a cluster, 69 worksheet, 146 R RAID for data protection, 42 raw volumes, 439 README for database toolkits, 433 reconfiguring a package while the cluster is running, 375 reconfiguring a package with the cluster offline, 376 reconfiguring a runni
PAGE 532
and node TOC, 56 and syslog.
PAGE 533
parameter in cluster manager configuration, 158 status cmviewcl, 318 multi-node packages, 319 of cluster and package, 319 package IP address, 402 system log file, 403 stopping a cluster, 339 storage management, 111 SUBNET array variable in package control script, 188, 192 in sample package control script, 369 parameter in package configuration, 187, 197 subnet hardware planning, 136, 146 parameter in package configuration, 187, 197 successor_halt_timeout parameter, 185, 284 supported disks in Serviceguar
PAGE 534
VG in sample package control script, 369 vgcfgbackup and cluster lock data, 243 VGCHANGE in package control script, 369 vgextend creating a root mirror with, 212 vgimport using to set up volume groups on another node, 226 VLAN Critical Resource Analysis (CRA), 358 Volume, 111 volume group creating for a cluster, 223 creating physical volumes for clusters, 223 deactivating before export to another node, 225 for cluster lock, 66 planning, 148 relinquishing exclusive access via TOC, 126 setting up on another