Managing Serviceguard 13th Edition, February 2007

ManualsBrandsHP ManualsSoftwareHP Serviceguard Software

Managing Serviceguard

Thirteenth Edition

Manufacturing Part Number: B3936-90105

February 2007

Summary of content (489 pages)

PAGE 1
Managing Serviceguard Thirteenth Edition Manufacturing Part Number: B3936-90105 February 2007
PAGE 2
Legal Notices © Copyright 1995-2007 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
PAGE 3
Contents 1. Serviceguard at a Glance What is Serviceguard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About VERITAS CFS and CVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 4
Contents 3. Understanding Serviceguard Software Components Serviceguard Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Serviceguard Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 How the Cluster Manager Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Configuration of the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 5
Contents VLAN Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Volume Managers for Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Redundant Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About Device File Names (Device Special Files). . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Mirrored Storage . . . . . . . . . . . . . . .
PAGE 6
Contents Cluster Configuration Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Heartbeat Subnet and Re-formation Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Cluster Lock Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Cluster Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Cluster Configuration Worksheet . . . . . .
PAGE 7
Contents Verifying the Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Distributing the Binary Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) . . . . . 231 Preparing the Cluster and the System Multi-node Package . . . . . . . . . . . . . . . . . . 232 Creating the Disk Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 8
Contents Creating Failover Packages For Database Products. . . . . . . . . . . . . . . . . . . . . . . . . Customizing the Package Control Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimizing for Large Numbers of Storage Units . . . . . . . . . . . . . . . . . . . . . . . . . . . Package Control Script Template File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding Customer Defined Functions to the Package Control Script . . . . . . . . . . .
PAGE 9
Contents Deleting a Package from a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resetting the Service Restart Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allowable Package States During Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . Responding to Cluster Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Removing Serviceguard from a System . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Contents Reviewing the Package Control Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the cmcheckconf Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the cmscancl Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the cmviewconf Command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing the LAN Configuration . . . . . . . . . . . . . . . . . . .
PAGE 11
Contents Assign Unique Names to Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Use uname(2) With Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bind to a Fixed Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bind to Relocatable IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Give Each Application its Own Volume Group . . . .
PAGE 12
Contents Step 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 13
Contents IPv4 and IPv6 Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Network Configuration Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IPv6 Relocatable Address and Duplicate Address Detection Feature . . . . . . . . . . . . Local Primary/Standby LAN Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 14
Contents 14
PAGE 15
Tables Table 1. Printing History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 Table 3-1. Package Configuration Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80 Table 3-2. Node Lists in Sample Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83 Table 3-3. Package Failover Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87 Table 3-4.
PAGE 16
Tables 16
PAGE 17
Printing History Table 1 Printing History Printing Date Part Number Edition January 1995 B3936-90001 First June 1995 B3936-90003 Second December 1995 B3936-90005 Third August 1997 B3936-90019 Fourth January 1998 B3936-90024 Fifth October 1998 B3936-90026 Sixth December 2000 B3936-90045 Seventh September 2001 B3936-90053 Eighth March 2002 B3936-90065 Ninth June 2003 B3936-90070 Tenth June 2004 B3936-90076 Eleventh June 2005 B3936-90076 Eleventh, First reprint.
PAGE 18
HP Printing Division: Infrastructure Solutions Division Hewlett-Packard Co. 19111 Pruneridge Ave.
PAGE 19
Preface This thirteenth printing of the manual has been updated for Serviceguard Version A.11.17.01. This guide describes how to configure Serviceguard to run on HP 9000 or HP Integrity servers under the HP-UX operating system. The contents are as follows: • Chapter 1, “Serviceguard at a Glance,” describes a Serviceguard cluster and provides a roadmap for using this guide.
PAGE 20
Related Publications • Appendix C, “Designing Highly Available Cluster Applications,” gives guidelines for creating cluster-aware applications that provide optimal performance in a Serviceguard environment. • Appendix D, “Integrating HA Applications with Serviceguard,” presents suggestions for integrating your existing applications with Serviceguard. • Appendix E, “Software Upgrades,” shows how to move from one Serviceguard or HP-UX release to another without bringing down your applications.
PAGE 21
— Managing HP Serviceguard for Linux, Sixth Edition, August 2006 • Documentation for your version of VERITAS storage products from http://www.docs.hp.com -> High Availability - > HP Serviceguard Storage Management Suite • Before using VERITAS Volume Manager (VxVM) storage with Serviceguard, please refer to the documents posted at http://docs.hp.com. From the heading Operating Environments, chose 11i v3. Then, scroll down to the section VERITAS Volume Manager and File System.
PAGE 22
— Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • From http://www.docs.hp.com -> High Availability - > HP Serviceguard Extension for Faster Failover: — HP Serviceguard Extension for Faster Failover, Version A.01.00, Release Notes • From http://www.docs.hp.com -> High Availability - > Serviceguard Extension for SAP: — Managing Serviceguard Extension for SAP • From http://www.docs.hp.
PAGE 23
Serviceguard at a Glance 1 Serviceguard at a Glance This chapter introduces Serviceguard on HP-UX, and shows where to find different kinds of information in this book. The following topics are presented: • What is Serviceguard? • Using Serviceguard Manager • A Roadmap for Configuring Clusters and Packages If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4, “Planning and Documenting an HA Cluster,” on page 135.
PAGE 24
Serviceguard at a Glance What is Serviceguard? What is Serviceguard? Serviceguard allows you to create high availability clusters of HP 9000 or HP Integrity servers. A high availability computer system allows application services to continue in spite of a hardware or software failure. Highly available systems protect users from software failures as well as from failure of a system processing unit (SPU), disk, or local area network (LAN) component.
PAGE 25
Serviceguard at a Glance What is Serviceguard? In Figure 1-1, node 1 (one of two SPU's) is running failover package A, and node 2 is running package B. Each package has a separate group of disks associated with it, containing data needed by the package's applications, and a mirror copy of the data. Note that both nodes are physically connected to both groups of mirrored disks. In this example, however, only one node at a time may access the data for a given group of disks.
PAGE 26
Serviceguard at a Glance What is Serviceguard? Figure 1-2 Typical Cluster After Failover After this transfer, the failover package typically remains on the adoptive node as long the adoptive node continues running. If you wish, however, you can configure the package to return to its primary node as soon as the primary node comes back online. Alternatively, you may manually transfer control of the package back to the primary node at the appropriate time.
PAGE 27
Serviceguard at a Glance What is Serviceguard? • Mirrordisk/UX or VERITAS Volume Manager, which provide disk redundancy to eliminate single points of failure in the disk subsystem; • Event Monitoring Service (EMS), which lets you monitor and detect failures that are not directly handled by Serviceguard; • disk arrays, which use various RAID levels for data protection; • HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, which eliminates failures related to power outage.
PAGE 28
Serviceguard at a Glance Using Serviceguard Manager Using Serviceguard Manager Serviceguard Manager is the graphical user interface for Serviceguard. In Serviceguard A.11.17.01, Serviceguard Manager is available in both a new and an old form: • New: as a “plug-in” to the System Management Homepage (SMH). SMH is a web-based graphical user interface (GUI) that replaces SAM as the system administration GUI as of HP-UX 11i v3 (but you can still run the SAM terminal interface; see “Using SAM” on page 31).
PAGE 29
Serviceguard at a Glance Using Serviceguard Manager Monitoring Clusters with Serviceguard Manager You can see all the clusters the server can reach, or you can list specific clusters. You can also see all the unused nodes on the subnet - that is, all the Serviceguard nodes that are not currently configured in a cluster.
PAGE 30
Serviceguard at a Glance Using Serviceguard Manager Using the Serviceguard Management Application To open a saved “snapshot” cluster file, specify a filename with the .sgm extension; you must have view permission on the file and its directory. To see “live” clusters, from a management station, connect to a Serviceguard node’s Cluster Object Manager (COM) daemon. (The COM is automatically installed with Serviceguard.) This node becomes the session server.
PAGE 31
Serviceguard at a Glance Using SAM Using SAM You can do many of the HP-UX system administration tasks described in this manual (that is, tasks, such as configuring disks and filesystems, that are not specifically Serviceguard tasks) by using SAM, the System Administration Manager. To launch SAM, enter /usr/sbin/sam on the command line. As of HP-UX 11i v3, SAM offers a Terminal User Interface (TUI) which also acts as a gateway to the web-based System Management Homepage (SMH).
PAGE 32
Serviceguard at a Glance What are the Distributed Systems Administration Utilities? What are the Distributed Systems Administration Utilities? HP Distributed Systems Administration Utilities (DSAU) simplify the task of managing multiple systems, including Serviceguard clusters.
PAGE 33
Serviceguard at a Glance A Roadmap for Configuring Clusters and Packages A Roadmap for Configuring Clusters and Packages This manual presents the tasks you need to perform in order to create a functioning HA cluster using Serviceguard. These tasks are shown in Figure 1-3. Figure 1-3 Tasks in Configuring a Serviceguard Cluster The tasks in Figure 1-3 are covered in step-by-step detail in chapters 4 through 7.
PAGE 34
Serviceguard at a Glance A Roadmap for Configuring Clusters and Packages 34 Chapter 1
PAGE 35
Understanding Serviceguard Hardware Configurations 2 Understanding Serviceguard Hardware Configurations This chapter gives a broad overview of how the Serviceguard hardware components work. The following topics are presented: • Redundancy of Cluster Components • Redundant Network Components • Redundant Disk Storage • Redundant Power Supplies • Larger Clusters Refer to the next chapter for information about Serviceguard software components.
PAGE 36
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Redundancy of Cluster Components In order to provide a high level of availability, a typical cluster uses redundant system components, for example two or more SPUs and two or more independent disks. This redundancy eliminates single points of failure. In general, the more redundancy, the greater your access to applications, data, and supportive services in the event of a failure.
PAGE 37
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Note that a package that does not access data from a disk on a shared bus can be configured to fail over to as many nodes as you have configured in the cluster (regardless of disk technology). For instance, if a package only runs local executables, it can be configured to failover to all nodes in the cluster that have local copies of those executables, regardless of the type of disk connectivity.
PAGE 38
Understanding Serviceguard Hardware Configurations Redundant Network Components Redundant Network Components To eliminate single points of failure for networking, each subnet accessed by a cluster node is required to have redundant network interfaces. Redundant cables are also needed to protect against cable failures. Each interface card is connected to a different cable, and the cables themselves are connected by a component such as a hub or a bridge.
PAGE 39
Understanding Serviceguard Hardware Configurations Redundant Network Components Figure 2-1 Redundant LANs In the figure, a two-node Serviceguard cluster has one bridged net configured with both a primary and a standby LAN card for the data/heartbeat subnet (Subnet A). Another LAN card provides an optional dedicated heartbeat LAN. Note that the primary and standby LAN segments are connected by a hub to provide a redundant data/heartbeat subnet. Each node has its own IP address for this subnet.
PAGE 40
Understanding Serviceguard Hardware Configurations Redundant Network Components NOTE You should verify that network traffic is not too high on the heartbeat/ data LAN. If traffic is too high, this LAN might not perform adequately in transmitting heartbeats if the dedicated heartbeat LAN fails. Providing Redundant FDDI Connections FDDI is a high speed fiber optic interconnect medium.
PAGE 41
Understanding Serviceguard Hardware Configurations Redundant Network Components Using Dual Attach FDDI Stations Another way of obtaining redundant FDDI connections is to configure dual attach stations on each node to create an FDDI ring, shown in Figure 2-3. An advantage of this configuration is that only one slot is used in the system card cage. In Figure 2-3, note that nodes 3 and 4 also use Ethernet to provide connectivity outside the cluster.
PAGE 42
Understanding Serviceguard Hardware Configurations Redundant Network Components Replacement of Failed Network Cards Depending on the system configuration, it is possible to replace failed network cards while the cluster is running. The process is described under “Replacement of LAN Cards” in the chapter “Troubleshooting Your Cluster.
PAGE 43
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Redundant Disk Storage Each node in a cluster has its own root disk, but each node is also physically connected to several other disks in such a way that more than one node can obtain access to the data and programs associated with a package it is configured for.
PAGE 44
Understanding Serviceguard Hardware Configurations Redundant Disk Storage When planning and assigning SCSI bus priority, remember that one node can dominate a bus shared by multiple nodes, depending on what SCSI addresses are assigned to the controller for each node on the shared bus. All SCSI addresses, including the addresses of all interface cards, must be unique for all devices on a shared bus.
PAGE 45
Understanding Serviceguard Hardware Configurations Redundant Disk Storage another node until the failing node is halted. Mirroring the root disk can allow the system to continue normal operation when a root disk failure occurs, and help avoid this downtime. Disk Arrays using RAID Levels and Multiple Data Paths An alternate method of achieving protection for your data is to employ a disk array with hardware RAID levels that provide data redundancy, such as RAID Level 1 or RAID Level 5.
PAGE 46
Understanding Serviceguard Hardware Configurations Redundant Disk Storage set up to trigger a package failover or to report disk failure events to a Serviceguard, to another application, or by email. For more information, refer to the manual Using High Availability Monitors (B5736-90046), available at http://docs.hp.com -> High Availability. Replacement of Failed Disk Mechanisms Mirroring provides data protection, but after a disk failure, the failed disk must be replaced.
PAGE 47
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-4 Mirrored Disks Connected for High Availability Figure 2-5 below shows a similar cluster with a disk array connected to each node on two I/O channels. See “About Multipathing” on page 45.
PAGE 48
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-5 Cluster with High Availability Disk Array Details on logical volume configuration for Serviceguard are in the chapter “Building an HA Cluster Configuration.” Sample Fibre Channel Disk Configuration In Figure 2-6 below, the root disks are shown with simple mirroring, but the shared storage is now accessed via redundant Fibre Channel switches attached to a disk array.
PAGE 49
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-6 Cluster with Fibre Channel Switched Disk Array This type of configuration uses native HP-UX or other multipathing software; see “About Multipathing” on page 45. Root Disk Limitations on Shared SCSI Buses The IODC firmware does not support two or more nodes booting from the same SCSI bus at the same time. For this reason, it is important not to attach more than one root disk per cluster to a single SCSI bus.
PAGE 50
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-7 Root Disks on Different Shared Buses Note that if both nodes had their primary root disks connected to the same bus, you would have an unsupported configuration. You can put a mirror copy of Node B's root disk on the same SCSI bus as Node A's primary root disk, because three failures would have to occur for both systems to boot at the same time, which is an acceptable risk.
PAGE 51
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-8 Primaries and Mirrors on Different Shared Buses Note that you cannot use a disk within a disk array as a root disk if the array is on a shared bus.
PAGE 52
Understanding Serviceguard Hardware Configurations Redundant Power Supplies Redundant Power Supplies You can extend the availability of your hardware by providing battery backup to your nodes and disks. HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, can provide this protection from momentary power loss. Disks should be attached to power circuits in such a way that mirror copies are attached to different power sources.
PAGE 53
Understanding Serviceguard Hardware Configurations Larger Clusters Larger Clusters You can create clusters of up to 16 nodes with Serviceguard. Clusters of up to 16 nodes may be built by connecting individual SPUs via Ethernet and using FDDI networking. The possibility of configuring a cluster consisting of 16 nodes does not mean that all types of cluster configuration behave in the same way in a 16-node configuration.
PAGE 54
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-9 Eight-Node Active/Standby Cluster Point to Point Connections to Storage Devices Some storage devices allow point-to-point connection to a large number of host nodes without using a shared SCSI bus. An example is shown in Figure 2-11, a cluster consisting of eight nodes with a SCSI interconnect. The nodes access shared data on an XP or EMC disk array configured with 16 SCSI I/O ports.
PAGE 55
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-10 Eight-Node Cluster with XP or EMC Disk Array Fibre Channel switched configurations also are supported using either an arbitrated loop or fabric login topology. For additional information about supported cluster configurations, refer to the HP Unix Servers Configuration Guide, available through your HP representative.
PAGE 56
Understanding Serviceguard Hardware Configurations Larger Clusters 56 Chapter 2
PAGE 57
Understanding Serviceguard Software Components 3 Understanding Serviceguard Software Components This chapter gives a broad overview of how the Serviceguard software components work.
PAGE 58
Understanding Serviceguard Software Components Serviceguard Architecture Serviceguard Architecture The following figure shows the main software components used by Serviceguard. This chapter discusses these components in some detail. NOTE VERITAS CFS may not yet be supported on the version of HP-UX you are running; see “About VERITAS CFS and CVM” on page 27.
PAGE 59
Understanding Serviceguard Software Components Serviceguard Architecture • /usr/lbin/cmclconfd—Serviceguard Configuration Daemon • /usr/lbin/cmcld—Serviceguard Cluster Daemon • /usr/lbin/cmfileassistd—Serviceguard File Management daemon • /usr/lbin/cmlogd—Serviceguard Syslog Log Daemon • /usr/lbin/cmlvmd—Cluster Logical Volume Manager Daemon • /opt/cmom/lbin/cmomd—Cluster Object Manager Daemon • /usr/lbin/cmsnmpd—Cluster SNMP subagent (optionally running) • /usr/lbin/cmsrvassistd—Serviceguard
PAGE 60
Understanding Serviceguard Software Components Serviceguard Architecture Then force inetd to re-read inetd.conf: /usr/sbin/inetd -c You can check that this did in fact disable Serviceguard by trying the following command: cmquerycl -n nodename where nodename is the name of the local system. If the command fails, you have successfully disabled Serviceguard. NOTE You should not disable Serviceguard on a system on which it is actually running.
PAGE 61
Understanding Serviceguard Software Components Serviceguard Architecture not standby LANs are configured. (For further discussion, see “What Happens when a Node Times Out” on page 129. For advice on setting HEARTBEAT_INTERVAL and NODE_TIMEOUT, see “Cluster Configuration Parameters” on page 157.) The cmcld daemon also monitors the health of the cluster networks and performs local LAN failover. Finally, cmcld manages Serviceguard packages, determining where to run them and when to start them.
PAGE 62
Understanding Serviceguard Software Components Serviceguard Architecture Clients send queries to the object manager and receive responses from it (this communication is done indirectly, through a Serviceguard API). The queries are decomposed into categories (of classes) which are serviced by various providers.
PAGE 63
Understanding Serviceguard Software Components Serviceguard Architecture The quorum server, if used, runs on a system external to the cluster and is started by the system administrator, not by Serviceguard. It is normally started from /etc/inittab with the respawn option, which means that it automatically restarts if it fails or is killed.
PAGE 64
Understanding Serviceguard Software Components Serviceguard Architecture LLT provides kernel-to-kernel communications and monitors network communications for CFS. 64 • vxfend - When VERITAS CFS is deployed as part of the Serviceguard Storage Management Suite, the I/O fencing daemon vxfend is also included. It implements a quorum-type functionality for the VERITAS Cluster File System. vxfend is controlled by Serviceguard to synchronize quorum mechanisms.
PAGE 65
Understanding Serviceguard Software Components How the Cluster Manager Works How the Cluster Manager Works The cluster manager is used to initialize a cluster, to monitor the health of the cluster, to recognize node failure if it should occur, and to regulate the re-formation of the cluster when a node joins or leaves the cluster. The cluster manager operates as a daemon process that runs on each node.
PAGE 66
Understanding Serviceguard Software Components How the Cluster Manager Works (described further in this chapter, in “How the Package Manager Works” on page 74). Failover packages that were running on nodes that are no longer in the new cluster are transferred to their adoptive nodes. Note that if there is a transitory loss of heartbeat, the cluster may re-form with the same nodes as before.
PAGE 67
Understanding Serviceguard Software Components How the Cluster Manager Works Manual Startup of Entire Cluster A manual startup forms a cluster out of all the nodes in the cluster configuration. Manual startup is normally done the first time you bring up the cluster, after cluster-wide maintenance or upgrade, or after reconfiguration. Before startup, the same binary cluster configuration file must exist on all nodes in the cluster.
PAGE 68
Understanding Serviceguard Software Components How the Cluster Manager Works • A node halts because of a package failure. • A node halts because of a service failure. • Heavy network traffic prohibited the heartbeat signal from being received by the cluster. • The heartbeat network failed, and another network is not configured to carry heartbeat. Typically, re-formation results in a cluster with a different composition.
PAGE 69
Understanding Serviceguard Software Components How the Cluster Manager Works possibility of two sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-cluster with greater than 50% of the nodes will form the new cluster, and the cluster lock is not used. If you have a two-node cluster, you are required to configure a cluster lock.
PAGE 70
Understanding Serviceguard Software Components How the Cluster Manager Works Figure 3-2 Lock Disk Operation Serviceguard periodically checks the health of the lock disk and writes messages to the syslog file when a lock disk fails the health check. This file should be monitored for early detection of lock disk problems. You can choose between two lock disk options—a single or dual lock disk—based on the kind of high availability configuration you are building.
PAGE 71
Understanding Serviceguard Software Components How the Cluster Manager Works Dual Lock Disk If you are using disks that are internally mounted in the same cabinet as the cluster nodes, then a single lock disk would be a single point of failure in this type of cluster, since the loss of power to the node that has the lock disk in its cabinet would also render the cluster lock unavailable.
PAGE 72
Understanding Serviceguard Software Components How the Cluster Manager Works area in memory for each cluster, and when a node obtains the cluster lock, this area is marked so that other nodes will recognize the lock as “taken.” If communications are lost between two equal-sized groups of nodes, the group that obtains the lock from the Quorum Server will take over the cluster and the other nodes will perform a TOC.
PAGE 73
Understanding Serviceguard Software Components How the Cluster Manager Works three-node cluster is removed for maintenance, the cluster reforms as a two-node cluster. If a tie-breaking scenario later occurs due to a node or communication failure, the entire cluster will become unavailable. In a cluster with four or more nodes, you may not need a cluster lock since the chance of the cluster being split into two halves of equal size is very small.
PAGE 74
Understanding Serviceguard Software Components How the Package Manager Works How the Package Manager Works Packages are the means by which Serviceguard starts and halts configured applications. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator is known as the package coordinator.
PAGE 75
Understanding Serviceguard Software Components How the Package Manager Works all versions of HP-UX; see “About VERITAS CFS and CVM” on page 27). This package is known as VxVM-CVM-pkg for VERITAS CVM Version 3.5 and called SG-CFS-pkg for VERITAS CVM Version 4.1. It runs on all nodes that are active in the cluster and provides cluster membership information to the volume manager software. This type of package is configured and used only when you employ CVM for storage management.
PAGE 76
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-4 Package Moving During Failover Configuring Failover Packages Each package is separately configured. You create a failover package by using Serviceguard Manager or by editing a package ASCII configuration file template. (Detailed instructions are given in “Configuring Packages and Their Services” on page 257). Then you use the cmapplyconf command to check and apply the package to the cluster configuration database.
PAGE 77
Understanding Serviceguard Software Components How the Package Manager Works restart the package on a new node in response to a failure. Once the cluster is running, the package switching attribute of each package can be temporarily set with the cmmodpkg command; at reboot, the configured value will be restored. The parameter is coded in the package ASCII configuration file: # The default for AUTO_RUN is YES.
PAGE 78
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-5 Before Package Switching Figure 3-6 shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2. Package 1's IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package1’s disk and Package2’s disk.
PAGE 79
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-6 After Package Switching Failover Policy The Package Manager selects a node for a failover package to run on based on the priority list included in the package configuration file together with the FAILOVER_POLICY parameter, also in the configuration file.
PAGE 80
Understanding Serviceguard Software Components How the Package Manager Works # This policy will select nodes in priority order from the list of # NODE_NAME entries specified below. # The alternative policy is MIN_PACKAGE_NODE. This policy will select # the node, from the list of NODE_NAME entries below, which is # running the least number of packages at the time of failover.
PAGE 81
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-7 Rotating Standby Configuration before Failover If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8, which shows a failure on node 2: Figure 3-8 Rotating Standby Configuration after Failover NOTE Using the MIN_PACKAGE_NODE policy, when node 2 is repaired and brought back into the cluster, it will then be running the fewest packages, and thus will become th
PAGE 82
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-9 CONFIGURED_NODE Policy Packages after Failover If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.
PAGE 83
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-10 Automatic Failback Configuration before Failover Table 3-2 Node Lists in Sample Cluster Package Name NODE_NAME List FAILOVER POLICY FAILBACK POLICY pkgA node1, node4 CONFIGURED_NODE AUTOMATIC pkgB node2, node4 CONFIGURED_NODE AUTOMATIC pkgC node3, node4 CONFIGURED_NODE AUTOMATIC Node1 panics, and after the cluster reforms, pkgA starts running on node4: Figure 3-11 Chapter 3 Automatic Failback Config
PAGE 84
Understanding Serviceguard Software Components How the Package Manager Works After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1. Figure 3-12 Automatic Failback Configuration After Restart of Node 1 NOTE Setting the FAILBACK_POLICY to AUTOMATIC can result in a package failback and application outage during a critical production period.
PAGE 85
Understanding Serviceguard Software Components How the Package Manager Works In Serviceguard A.11.17 and later, you specify a package type parameter; the PACKAGE_TYPE for a traditional package is the default value, FAILOVER. Starting with the A.11.12 version of Serviceguard, the PKG_SWITCHING_ENABLED parameter was renamed AUTO_RUN. The NET_SWITCHING_ENABLED parameter was renamed to LOCAL_LAN_FAILOVER_ALLOWED.
PAGE 86
Understanding Serviceguard Software Components How the Package Manager Works • Physical volume status • System load • Number of users • File system utilization • LAN health Once a monitor is configured as a package resource dependency, the monitor will notify the package manager if an event occurs showing that a resource is down. The package may then be failed over to an adoptive node.
PAGE 87
Understanding Serviceguard Software Components How the Package Manager Works Table 3-3 Package Failover Behavior Options in Serviceguard Manager Switching Behavior Parameters in ASCII File • NODE_FAIL_FAST_ENABLED set to NO. (Default) • SERVICE_FAIL_FAST_ENABLED set to NO for all services. (Default) • AUTO_RUN set to YES for the package. (Default) Failover Policy set to minimum package node. • FAILOVER_POLICY set to MIN_PACKAGE_NODE. • Failover Policy set to configured node.
PAGE 88
Understanding Serviceguard Software Components How the Package Manager Works Table 3-3 Package Failover Behavior (Continued) Options in Serviceguard Manager Switching Behavior 88 All packages switch following a TOC (Transfer of Control, an immediate halt without a graceful shutdown) on the node when a specific service fails. An attempt is first made to reboot the system prior to the TOC. Halt scripts are not run. • Service Failfast set for a specific service. • Auto Run set for all packages.
PAGE 89
Understanding Serviceguard Software Components How Package Control Scripts Work How Package Control Scripts Work Packages are the means by which Serviceguard starts and halts configured applications. Failover packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 150 packages per cluster and a total of 900 services per cluster.
PAGE 90
Understanding Serviceguard Software Components How Package Control Scripts Work The CFS packages, however, are not created by performing cmapplyconf on package configuration files, but by a series of CFS-specific commands. Serviceguard determines most of their options; all user-determined options can be entered as parameters to the commands. (See the cfs* commands in Appendix A.) A failover package can be configured to have a dependency on a multi-node or system multi-node package.
PAGE 91
Understanding Serviceguard Software Components How Package Control Scripts Work NOTE If you configure the package while the cluster is running, the package does not start up immediately after the cmapplyconf command completes. To start the package without halting and restarting the cluster, issue the cmrunpkg or cmmodpkg command. How does a failover package start up, and what is its behavior while it is running? Some of the many phases of package life are shown in Figure 3-13.
PAGE 92
Understanding Serviceguard Software Components How Package Control Scripts Work 7. When the node fails Before the Control Script Starts First, a node is selected. This node must be in the package’s node list, it must conform to the package’s failover policy, and any resources required by the package must be available on the chosen node. One resource is the subnet that is monitored for the package. If the subnet is not available, the package cannot start on this node.
PAGE 93
Understanding Serviceguard Software Components How Package Control Scripts Work Figure 3-14 Package Time Line for Run Script Execution At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error. Also, if the run script execution is not complete before the time specified in the RUN_SCRIPT_TIMEOUT, the package manager will kill the script.
PAGE 94
Understanding Serviceguard Software Components How Package Control Scripts Work Normal and Abnormal Exits from the Run Script Exit codes on leaving the run script determine what happens to the package next. A normal exit means the package startup was successful, but all other exits mean that the start operation did not complete successfully. • 0—normal exit. The package started normally, so all services are up on this node. • 1—abnormal exit, also known as NO_RESTART exit.
PAGE 95
Understanding Serviceguard Software Components How Package Control Scripts Work NOTE If you set restarts and also set SERVICE_FAILFAST_ENABLED to YES, the failfast will take place after restart attempts have failed. It does not make sense to set SERVICE_RESTART to “-R” for a service and also set SERVICE_FAILFAST_ENABLED to YES.
PAGE 96
Understanding Serviceguard Software Components How Package Control Scripts Work Package halting normally means that the package halt script executes (see the next section). However, if a failover package’s configuration has the SERVICE_FAILFAST_ENABLED flag set to yes for the service that fails, then the node will halt as soon as the failure is detected. If this flag is not set, the loss of a service will result in halting the package gracefully by running the halt script.
PAGE 97
Understanding Serviceguard Software Components How Package Control Scripts Work During Halt Script Execution This section applies only to failover packages. Once the package manager has detected the failure of a service or package that a failover package depends on, or when the cmhaltpkg command has been issued for a particular failover package, then the package manager launches the halt script. That is, the failover package’s control script executes the ‘halt’ parameter.
PAGE 98
Understanding Serviceguard Software Components How Package Control Scripts Work This log has the same name as the halt script and the extension.log. Normal starts are recorded in the log, together with error messages or warnings related to halting the package. Normal and Abnormal Exits from the Halt Script The package’s ability to move to other nodes is affected by the exit conditions on leaving the halt script. The following are the possible exit codes: • 0—normal exit.
PAGE 99
Understanding Serviceguard Software Components How Package Control Scripts Work Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Node Failfast Enabled Service Failfast Enabled HP-UX Status on Primary after Error Service Failure NO YES TOC No N/A (TOC) Yes Service Failure YES NO Running Yes No Yes Service Failure NO NO Running Yes No Yes Run
PAGE 100
Understanding Serviceguard Software Components How Package Control Scripts Work Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results HP-UX Status on Primary after Error Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Package Allowed to Run on Alternate Node Node Failfast Enabled Service Failfast Enabled Halt Script Timeout YES Either Setting TOC N/A N/A (TOC) Yes, unless the timeout happened after the cmh
PAGE 101
Understanding Serviceguard Software Components How Package Control Scripts Work Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results HP-UX Status on Primary after Error Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Package Allowed to Run on Alternate Node Node Failfast Enabled Service Failfast Enabled Loss of Monitored Resource NO Either Setting Running Yes Yes, if the resource is not a deferred resource
PAGE 102
Understanding Serviceguard Software Components How the Network Manager Works How the Network Manager Works The purpose of the network manager is to detect and recover from network card and cable failures so that network services remain highly available to clients. In practice, this means assigning IP addresses for each package to the primary LAN interface card on the node where the package is running and monitoring the health of all interfaces, switching them when necessary.
PAGE 103
Understanding Serviceguard Software Components How the Network Manager Works Both stationary and relocatable IP addresses will switch to a standby LAN interface in the event of a LAN card failure. In addition, relocatable addresses (but not stationary addresses) can be taken over by an adoptive node if control of the package is transferred. This means that applications can access the package via its relocatable address without knowing which node the package currently resides on.
PAGE 104
Understanding Serviceguard Software Components How the Network Manager Works Monitoring LAN Interfaces and Detecting Failure At regular intervals, Serviceguard polls all the network interface cards specified in the cluster configuration file. Network failures are detected within each single node in the following manner. One interface on the node is assigned to be the poller.
PAGE 105
Understanding Serviceguard Software Components How the Network Manager Works This option is not suitable for all environments. Before choosing it, be sure these conditions are met: — All bridged nets in the cluster should have more than two interfaces each. — Each primary interface should have at least one standby interface, and it should be connected to a standby switch. — The primary switch should be directly connected to its standby.
PAGE 106
Understanding Serviceguard Software Components How the Network Manager Works Within the Ethernet family, local switching configuration is supported: • 1000Base-SX and 1000Base-T • 1000Base-T or 1000BaseSX and 100Base-T On HP-UX 11i, however, Jumbo Frames can only be used when the 1000Base-T or 1000Base-SX cards are configured. The 100Base-T and 10Base-T do not support Jumbo Frames. Additionally, network interface cards running 1000Base-T or 1000Base-SX cannot do local failover to 10BaseT.
PAGE 107
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-16 Cluster Before Local Network Switching Node 1 and Node 2 are communicating over LAN segment 2. LAN segment 1 is a standby. In Figure 3-17, we see what would happen if the LAN segment 2 network interface card on Node 1 were to fail.
PAGE 108
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-17 Cluster After Local Network Switching As the standby interface takes over, IP addresses will be switched to the hardware path associated with the standby interface. The switch is transparent at the TCP/IP level. All applications continue to run on their original nodes. During this time, IP traffic on Node 1 will be delayed as the transfer occurs.
PAGE 109
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-18 Local Switching After Cable Failure Local network switching will work with a cluster containing one or more nodes. You may wish to design a single-node cluster in order to take advantage of this local network switching feature in situations where you need only one node and do not wish to set up a more complex cluster.
PAGE 110
Understanding Serviceguard Software Components How the Network Manager Works Remote Switching A remote switch (that is, a package switch) involves moving packages and their associated IP addresses to a new system. The new system must already have the same subnetwork configured and working properly, otherwise the packages will not be started. With remote switching, TCP connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically.
PAGE 111
Understanding Serviceguard Software Components How the Network Manager Works recovery for environments which require high availability. Port aggregation capability is sometimes referred to as link aggregation or trunking. APA is also supported on dual-stack kernel. Once enabled, each link aggregate can be viewed as a single logical link of multiple physical ports with only one IP and MAC address.
PAGE 112
Understanding Serviceguard Software Components How the Network Manager Works Both the Single and Dual ported LANs in the non-aggregated configuration have four LAN cards, each associated with a separate non-aggregated IP address and MAC address, and each with its own LAN name (lan0, lan1, lan2, lan3). When these ports are aggregated all four ports are associated with a single IP address and MAC address.
PAGE 113
Understanding Serviceguard Software Components How the Network Manager Works failover of VLAN interfaces when failure is detected. Failure of a VLAN interface is typically the result of the failure of the underlying physical NIC port or aggregated (APA) ports. Configuration Restrictions HP-UX allows up to 1024 VLANs to be created from a physical NIC port.
PAGE 114
Understanding Serviceguard Software Components How the Network Manager Works Additional Heartbeat Requirements VLAN technology allows great flexibility in network configuration. To maintain Serviceguard’s reliability and availability in such an environment, the heartbeat rules are tightened as follows when the cluster is using VLANs: 1. VLAN heartbeat networks must be configured on separate physical NICs or APA aggregates, to avoid single points of failure. 2.
PAGE 115
Understanding Serviceguard Software Components Volume Managers for Data Storage Volume Managers for Data Storage A volume manager is a tool that lets you create units of disk storage known as storage groups. Storage groups contain logical volumes for use on single systems and in high availability clusters. In Serviceguard clusters, storage groups are activated by package control scripts.
PAGE 116
Understanding Serviceguard Software Components Volume Managers for Data Storage For instructions on migrating a system to agile addressing, see the white paper Migrating from HP-UX 11i v2 to HP-UX 11i v3 at http://docs.hp.com. NOTE It is possible, though not a best practice, to use legacy DSFs (that is, DSFs using the older naming convention) on some nodes after migrating to agile addressing on others; this allows you to migrate different nodes at different times, if necessary.
PAGE 117
Understanding Serviceguard Software Components Volume Managers for Data Storage Each of two nodes also has two (non-shared) internal disks which are used for the root file system, swap etc. Each shared storage unit has three disks, The device file names of the three disks on one of the two storage units are c0t0d0, c0t1d0, and c0t2d0. On the other, they are c1t0d0, c1t1d0, and c1t2d0.
PAGE 118
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-21 Mirrored Physical Disks Figure 3-22 shows the mirrors configured into LVM volume groups, shown in the figure as /dev/vgpkgA and /dev/vgpkgB. The volume groups are activated by Serviceguard packages for use by highly available applications.
PAGE 119
Understanding Serviceguard Software Components Volume Managers for Data Storage Examples of Storage on Disk Arrays Figure 3-23 shows an illustration of storage configured on a disk array. Physical disks are configured by an array utility program into logical units or LUNs which are then seen by the operating system. Figure 3-23 Physical Disks Combined into LUNs NOTE LUN definition is normally done using utility programs provided by the disk array manufacturer.
PAGE 120
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-24 Multiple Paths to LUNs Finally, the multiple paths are configured into volume groups as shown in Figure 3-25.
PAGE 121
Understanding Serviceguard Software Components Volume Managers for Data Storage Types of Volume Manager Serviceguard allows a choice of volume managers for data storage: • HP-UX Logical Volume Manager (LVM) and (optionally) Mirrordisk/UX • VERITAS Volume Manager for HP-UX (VxVM)—Base and add-on Products • VERITAS Cluster Volume Manager for HP-UX (CVM), if available (see “About VERITAS CFS and CVM” on page 27) Separate sections in Chapters 5 and 6 explain how to configure cluster storage using all of
PAGE 122
Understanding Serviceguard Software Components Volume Managers for Data Storage VERITAS Volume Manager (VxVM) The Base VERITAS Volume Manager for HP-UX (Base-VXVM) is provided at no additional cost with HP-UX 11i. This includes basic volume manager features, including a Java-based GUI, known as VEA. It is possible to configure cluster storage for Serviceguard with only Base-VXVM. However, only a limited set of features is available.
PAGE 123
Understanding Serviceguard Software Components Volume Managers for Data Storage VERITAS Cluster Volume Manager (CVM) NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard). You may choose to configure cluster storage with the VERITAS Cluster Volume Manager (CVM) instead of the Volume Manager (VxVM).
PAGE 124
Understanding Serviceguard Software Components Volume Managers for Data Storage CVM can be used in clusters that: • run applications that require fast disk group activation after package failover; • require storage activation on more than one node at a time, for example to perform a backup from one node while a package using the volume is active on another node.
PAGE 125
Understanding Serviceguard Software Components Volume Managers for Data Storage 1) dual (multiple) heartbeat networks 2) single heartbeat network with standby LAN card(s) 3) single heartbeat network with APA CVM 3.5 supports only options 2 and 3. Options 1 and 2 are the minimum recommended configurations for CVM 4.1. Comparison of Volume Managers The following table summarizes the advantages and disadvantages of the volume managers.
PAGE 126
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Shared Logical Volume Manager (SLVM) Base-VxVM 126 Advantages • Provided free with SGeRAC for multi-node access to RAC data • Supports up to 16 nodes in shared read/write mode for each cluster Tradeoffs • Lacks the flexibility and extended features of some other volume managers.
PAGE 127
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product VERITAS Volume Manager— Full VxVM product B9116AA (VxVM 3.5) B9116BA (VxVM 4.1) Chapter 3 Advantages Tradeoffs • Disk group configuration from any node. • Requires purchase of additional license • DMP for active/active storage devices. • Cannot be used for a cluster lock • Supports exclusive activation.
PAGE 128
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product VERITAS Cluster Volume Manager— B9117AA (CVM 3.5) B9117BA (CVM 4.1) Advantages • Provides volume configuration propagation. • Disk groups must be configured on a master node • Supports cluster shareable disk groups. • • Package startup time is faster than with VxVM. CVM can only be used with up to 8 cluster nodes. CFS can be used with up to 4 nodes.
PAGE 129
Understanding Serviceguard Software Components Responses to Failures Responses to Failures Serviceguard responds to different kinds of failures in specific ways. For most hardware failures, the response is not user-configurable, but for package and service failures, you can choose the system’s response, within limits.
PAGE 130
Understanding Serviceguard Software Components Responses to Failures 2. If the node cannot get a quorum (if it cannot get the cluster lock) then 3. The node halts (TOC). Example Situation. Assume a two-node cluster, with Package1 running on SystemA and Package2 running on SystemB. Volume group vg01 is exclusively activated on SystemA; volume group vg02 is exclusively activated on SystemB. Package IP addresses are assigned to SystemA and SystemB respectively. Failure.
PAGE 131
Understanding Serviceguard Software Components Responses to Failures For more information on cluster failover, see the white paper Optimizing Failover Time in a Serviceguard Environment at http://www.docs.hp.com->High Availability->Serviceguard->White Papers.
PAGE 132
Understanding Serviceguard Software Components Responses to Failures Serviceguard does not respond directly to power failures, although a loss of power to an individual cluster component may appear to Serviceguard like the failure of that component, and will result in the appropriate switching behavior. Power protection is provided by HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust.
PAGE 133
Understanding Serviceguard Software Components Responses to Failures NOTE In a very few cases, Serviceguard will attempt to reboot the system before a TOC when this behavior is specified. If there is enough time to flush the buffers in the buffer cache, the reboot succeeds, and a TOC does not take place. Either way, the system will be guaranteed to come down within a predetermined number of seconds.
PAGE 134
Understanding Serviceguard Software Components Responses to Failures 134 Chapter 3
PAGE 135
Planning and Documenting an HA Cluster 4 Planning and Documenting an HA Cluster Building a Serviceguard cluster begins with a planning phase in which you gather and record information about all the hardware and software components of the configuration.
PAGE 136
Planning and Documenting an HA Cluster General Planning General Planning A clear understanding of your high availability objectives will help you to define your hardware requirements and design your system. Use the following questions as a guide for general planning: 1. What applications must continue to be available in the event of a failure? 2. What system resources (processing power, networking, SPU, memory, disk space) are needed to support these applications? 3.
PAGE 137
Planning and Documenting an HA Cluster General Planning additional disk hardware for shared data storage. If you intend to expand your cluster without the need to bring it down, careful planning of the initial configuration is required. Use the following guidelines: • Remember the rules for cluster locks when considering expansion. A one-node cluster does not require a cluster lock. A two-node cluster must have a cluster lock. In clusters larger than 3 nodes, a cluster lock is strongly recommended.
PAGE 138
Planning and Documenting an HA Cluster Hardware Planning Hardware Planning Hardware planning requires examining the physical hardware itself. One useful procedure is to sketch the hardware configuration in a diagram that shows adapter cards and buses, cabling, disks and peripherals. A sample diagram for a two-node cluster is shown in Figure 4-1. NOTE Under agile addressing, the storage units in this example would have names such as disk1, disk2, disk3, etc.
PAGE 139
Planning and Documenting an HA Cluster Hardware Planning Create a similar sketch for your own cluster, and record the information on the Hardware Worksheet. Indicate which device adapters occupy which slots, and determine the bus address for each adapter. Update the details as you do the cluster configuration (described in Chapter 5). Use one form for each SPU.
PAGE 140
Planning and Documenting an HA Cluster Hardware Planning Serviceguard communication relies on the exchange of DLPI (Data Link Provider Interface) traffic at the data link layer and the UDP/TCP (User Datagram Protocol/Transmission Control Protocol) traffic at the Transport layer between cluster nodes. LAN Information While a minimum of one LAN interface per subnet is required, at least two LAN interfaces, one primary and one or more standby, are needed to eliminate single points of network failure.
PAGE 141
Planning and Documenting an HA Cluster Hardware Planning When there is a primary and a standby network card, Serviceguard needs to determine when a card has failed, so it knows whether to fail traffic over to the other card. The configuration file specifies one of two ways to decide when the network interface card has failed: • INOUT • INONLY_OR_INOUT The default is INOUT. See “Monitoring LAN Interfaces and Detecting Failure” on page 104 for more information.
PAGE 142
Planning and Documenting an HA Cluster Hardware Planning SCSI address must be uniquely set on the interface cards in all four systems, and must be high priority addresses.
PAGE 143
Planning and Documenting an HA Cluster Hardware Planning Disk I/O Information This part of the worksheet lets you indicate where disk device adapters are installed. Enter the following items on the worksheet for each disk connected to each disk device adapter on the node: Bus Type Indicate the type of bus. Supported busses are Fibre Channel and SCSI. Slot Number Indicate the slot number in which the interface card is inserted in the backplane of the computer.
PAGE 144
Planning and Documenting an HA Cluster Hardware Planning Hardware Configuration Worksheet The following worksheet will help you organize and record your specific cluster hardware configuration. This worksheet is an example; blank worksheets are in Appendix F. Make as many copies as you need. Complete the worksheet and keep it for future reference.
PAGE 145
Planning and Documenting an HA Cluster Power Supply Planning Power Supply Planning There are two sources of power for your cluster which you will have to consider in your design: line power and uninterruptible power sources (UPS). Loss of a power circuit should not bring down the cluster. Frequently, servers, mass storage devices, and other hardware have two or three separate power supplies, so they can survive the loss of power to one or more power supplies or power circuits.
PAGE 146
Planning and Documenting an HA Cluster Power Supply Planning Other Unit Enter the number of any other unit. Power Supply Enter the power supply unit number of the UPS to which the host or other device is connected. Be sure to follow UPS and cabinet power limits as well as SPU power limits. Power Supply Configuration Worksheet The following worksheet will help you organize and record your specific power supply configuration. This worksheet is an example; blank worksheets are in Appendix F.
PAGE 147
Planning and Documenting an HA Cluster Power Supply Planning Unit Name __________________________ Power Supply _____________________ Unit Name __________________________ Power Supply _____________________ Chapter 4 147
PAGE 148
Planning and Documenting an HA Cluster Quorum Server Planning Quorum Server Planning The quorum server (QS) provides tie-breaking services for clusters. The QS is described in “Use of the Quorum Server as the Cluster Lock” on page 71. A quorum server: NOTE • can be used with up to 50 clusters, not exceeding 100 nodes total. • can support a cluster with any supported number of nodes.
PAGE 149
Planning and Documenting an HA Cluster Quorum Server Planning Enter the name (39 characters or fewer) of each cluster node that will be supported by this quorum server. These entries will be entered into qs_authfile on the system that is running the quorum server process. Quorum Server Worksheet The following worksheet will help you organize and record your specific quorum server hardware configuration. Blank worksheets are in Appendix F. Make as many copies as you need.
PAGE 150
Planning and Documenting an HA Cluster LVM Planning LVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM), or using VERITAS VxVM software (and CVM if available) as described in the next section. When designing your disk layout using LVM, you should consider the following: • The root disk should belong to its own volume group.
PAGE 151
Planning and Documenting an HA Cluster LVM Planning LVM Worksheet The following worksheet will help you organize and record your specific physical disk configuration. This worksheet is an example; blank worksheets are in Appendix F. Make as many copies as you need. Fill out the worksheet and keep it for future reference. This worksheet only includes volume groups and physical volumes.
PAGE 152
Planning and Documenting an HA Cluster LVM Planning 152 Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Chapter 4
PAGE 153
Planning and Documenting an HA Cluster CVM and VxVM Planning CVM and VxVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM, described in the previous section), or using VERITAS VxVM and CVM software. NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard).
PAGE 154
Planning and Documenting an HA Cluster CVM and VxVM Planning • The cluster lock disk can only be configured with an LVM volume group. • VxVM disk group names should not be entered into the cluster configuration ASCII file. These names are not inserted into the cluster configuration ASCII file by cmquerycl. CVM and VxVM Worksheet The following worksheet will help you organize and record your specific physical disk configuration. This worksheet is an example; blank worksheets are in Appendix F.
PAGE 155
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Configuration Planning A cluster should be designed to provide the quickest possible recovery from failures. The actual time required to recover from a failure depends on several factors: • The length of the cluster heartbeat interval and node timeout. See the parameter descriptions for HEATRTBEAT_INTERVAL and NODE_TIMEOUT under “Cluster Configuration Parameters” on page 157 for recommendations.
PAGE 156
Planning and Documenting an HA Cluster Cluster Configuration Planning Heartbeat Subnet and Re-formation Time The speed of cluster re-formation is partially dependent on the type of heartbeat network that is used. If two or more heartbeat subnets are used, the one with the fastest failover time is used. Cluster Lock Information The purpose of the cluster lock is to ensure that only one new cluster is formed in the event that exactly half of the previously clustered nodes try to form a new cluster.
PAGE 157
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Configuration Parameters For the operation of the cluster manager, you need to define a set of cluster parameters. These are stored in the binary cluster configuration file, which is located on all nodes in the cluster. These parameters can be entered by editing the cluster configuration template file created by issuing the cmquerycl command, as described in the chapter “Building an HA Cluster Configuration.
PAGE 158
Planning and Documenting an HA Cluster Cluster Configuration Planning The volume group containing the physical disk volume on which a cluster lock is written. Identifying a cluster lock volume group is essential in a two-node cluster. If you are creating two cluster locks, enter the volume group name or names for both locks. This parameter is only used when you employ a lock disk for tie-breaking services in the cluster. Use FIRST_CLUSTER_LOCK_VG for the first lock volume group.
PAGE 159
Planning and Documenting an HA Cluster Cluster Configuration Planning The use of a private heartbeat network is not advisable if you plan to use Remote Procedure Call (RPC) protocols and services. RPC assumes that each network adapter device or I/O card is connected to a route-able network. An isolated or private heartbeat LAN is not route-able, and could cause an RPC request-reply, directed to that LAN, to risk time-out without being serviced.
PAGE 160
Planning and Documenting an HA Cluster Cluster Configuration Planning Enter the physical volume name as it appears on both nodes in the cluster (the same physical volume may have a different name on each node). If you are creating two cluster locks, enter the physical volume names for both locks. The physical volume group identifier can contain up to 39 characters.
PAGE 161
Planning and Documenting an HA Cluster Cluster Configuration Planning • The maximum recommended value is 30,000,000 microseconds (30 seconds). Remember that a cluster reformation may result in a system halt (TOC) on one of the cluster nodes. For further discussion, see“What Happens when a Node Times Out” on page 129. There are more complex cases that require you to make a trade-off between fewer failovers and faster failovers.
PAGE 162
Planning and Documenting an HA Cluster Cluster Configuration Planning interface to make sure it can still send and receive information. Using the default is highly recommended. Changing this value can affect how quickly a network failure is detected. The minimum value is 1,000,000 (1 second). The maximum value recommended is 15 seconds, and the maximum value supported is 30 seconds. MAX_CONFIGURED_PACKAGES This parameter sets the maximum number of packages that can be configured in the cluster.
PAGE 163
Planning and Documenting an HA Cluster Cluster Configuration Planning to http://www.docs.hp.com/ -> High Availability and choose Serviceguard Extension for Faster Failover. NETWORK_FAILURE_DETECTION The configuration file specifies one of two ways to decide when a network interface card has failed: • INOUT • INONLY_OR_INOUT The default is INOUT. See “Monitoring LAN Interfaces and Detecting Failure” on page 104 for more information.
PAGE 164
Planning and Documenting an HA Cluster Cluster Configuration Planning =========================================================================== Cluster Lock Volume Groups and Volumes: =============================================================================== First Lock Volume Group: | Physical Volume: | ________________ | Name on Node 1: ___________________ | | Name on Node 2: ___________________ | | Disk Unit No: ________ | | Power Supply No: ________ ================================================
PAGE 165
Planning and Documenting an HA Cluster Package Configuration Planning Package Configuration Planning Planning for packages involves assembling information about each group of highly available services. Some of this information is used in creating the package configuration file, and some is used for editing the package control script. NOTE LVM Volume groups that are to be activated by packages must also be defined as cluster aware in the cluster configuration file.
PAGE 166
Planning and Documenting an HA Cluster Package Configuration Planning • If a package moves to an adoptive node, what effect will its presence have on performance? Create a list by package of volume groups, logical volumes, and file systems. Indicate which nodes need to have access to common file systems at different times. It is recommended that you use customized logical volume names that are different from the default logical volume names (lvol1, lvol2, etc.).
PAGE 167
Planning and Documenting an HA Cluster Package Configuration Planning If you have a failover package in a cluster that uses the CVM or CFS, you configure system multi-node packages to handle the volume groups and file systems. CAUTION Serviceguard manages VERITAS processes, specifically gab and LLT, through system multi-node packages. As a result, the VERITAS administration commands such as gabconfig, llthosts, and lltconfig should only be used in the display mode, such as gabconfig -a.
PAGE 168
Planning and Documenting an HA Cluster Package Configuration Planning 1. The failover package’s applications should not run on a node unless the mount point packages are already running. In the package’s configuration file, you fill out the dependency parameter to specify the requirement SG-CFS-MP-id# =UP on the SAME_NODE. 2. The mount point packages should not run unless the disk group packages are running. Create the mount point packages using the cfsmntadm and cfsmount commands.
PAGE 169
Planning and Documenting an HA Cluster Package Configuration Planning NOTE The Disk Group (DG) and Mount Point (MP) multi-node packages (SG-CFS-DG_ID# and SG-CFS-MP_ID#) do not monitor the health of the disk group and mount point. They check that the application packages that depend on them have access to the disk groups and mount points. If the dependent application package loses access and cannot read and write to the disk, it will fail, but that will not cause the DG or MP multi-node package to fail.
PAGE 170
Planning and Documenting an HA Cluster Package Configuration Planning RESOURCE_NAME RESOURCE_POLLING_INTERVAL RESOURCE_START RESOURCE_UP_VALUE /net/interfaces/lan/status/lan0 60 DEFERRED = UP RESOURCE_NAME RESOURCE_POLLING_INTERVAL RESOURCE_START RESOURCE_UP_VALUE /net/interfaces/lan/status/lan1 60 DEFERRED = UP RESOURCE_NAME RESOURCE_POLLING_INTERVAL RESOURCE_START RESOURCE_UP_VALUE /net/interfaces/lan/status/lan2 60 AUTOMATIC = UP In the package control script, specify only the deferred resources, u
PAGE 171
Planning and Documenting an HA Cluster Package Configuration Planning Table 3-3 on page 87 describes different types of failover behavior and how to set the parameters that determine each behavior. Package Configuration File Parameters Before generating the package configuration file, assemble the following information and enter it on the worksheet for each package: PACKAGE_NAME The name of the package. The package name must be unique in the cluster.
PAGE 172
Planning and Documenting an HA Cluster Package Configuration Planning node as soon as the primary node is capable of running the package and, if MIN_PACKAGE_NODE has been selected as the Package Failover Policy, the primary node is now running fewer packages than the current node. NODE_NAME The names of primary and alternate nodes for the package. Enter a node name for each node on which the package can run. The order in which you specify the node names is important.
PAGE 173
Planning and Documenting an HA Cluster Package Configuration Planning NODE_FAIL_FAST_ENABLED If this parameter is set to YES and one of the following events occurs, Serviceguard will issue a TOC (system reset) on the node where the control script fails: NOTE • A package subnet fails and no backup network is available • An EMS resource fails • The halt script does not exist • Serviceguard is unable to execute the halt script • The halt script or the run script times out If the package halt script
PAGE 174
Planning and Documenting an HA Cluster Package Configuration Planning RUN_SCRIPT_TIMEOUT and HALT_SCRIPT_TIMEOUT If the script has not completed by the specified timeout value, Serviceguard will terminate the script. Enter a value in seconds. The default is 0, or no timeout. The minimum is 10 seconds, but the minimum HALT_SCRIPT_TIMEOUT value must be greater than the sum of all the SERVICE_HALT_TIMEOUT values.
PAGE 175
Planning and Documenting an HA Cluster Package Configuration Planning (CFS resources are controlled by two multi-node packages, one for the disk group and one for the mount point.) CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard). NOTE SERVICE_NAME Enter a unique name for each service.
PAGE 176
Planning and Documenting an HA Cluster Package Configuration Planning SUBNET Enter the IP subnets that are to be monitored for the package. PACKAGE_TYPE The type of the package. This parameter indicates whether the package will run on one node at a time or on multiple nodes. Valid types are FAILOVER, MULTI_NODE, and SYSTEM_MULTI_NODE. Default is FAILOVER. You cannot create a user-defined package with a type of SYSTEM_MULTI_NODE or MULTI-NODE. They are only supported for specific purposes designed by HP.
PAGE 177
Planning and Documenting an HA Cluster Package Configuration Planning Default setting is AUTOMATIC, which means that the resource starts at the time the node joins the cluster. The other possible setting is DEFERRED, which means that the package services will start up before the resource starts. If a resource is configured with DEFERRED startup, the name of the resource has to be added to the control script’s DEFERRED_RESOURCE_NAME parameter.
PAGE 178
Planning and Documenting an HA Cluster Package Configuration Planning system multi-node packages that HP supplies for use with VERITAS Cluster File System on systems that support it. DEPENDENCY_NAME - A unique identifier for the dependency DEPENDENCY_CONDITION- pkgname = UP DEPENDENCY_LOCATION - SAME_NODE Package Configuration Worksheet Assemble your package configuration data in a separate worksheet for each package, as shown in the following example. (Blank worksheets are in Appendix F.
PAGE 179
Planning and Documenting an HA Cluster Package Configuration Planning _______________________________________________________________________ Access Policies: User:___any_user______ From node:___ftsys9____ Role:__package_admin____ User:___lee ron admn__ From node:__ftsys10__ Role:__package_admin____ _______________________________________________________________________ [NOTE: the following sample values assume CFS:] DEPENDENCY_NAME _________ SG-CFS-MP-1_dep___________ DEPENDENCY_CONDITION ____SG-CFS-MP-1 =
PAGE 180
Planning and Documenting an HA Cluster Package Configuration Planning VxVM disk groups do not allow you to select specific activation commands. The VxVM disk group activation always uses the same command. NOTE Leave the default setting or change it following the directions in the control script template. VXVOL Controls the method of mirror recovery for mirrored VxVM volumes. Volume Groups This array parameter contains a list of the LVM volume groups that will be activated by the package.
PAGE 181
Planning and Documenting an HA Cluster Package Configuration Planning file systems and deactivates each storage group. All storage groups must be accessible on each target node. (CVM disk groups, on systems that support CVM, must be accessible on all nodes in the cluster). For each file system (FS), you must identify a logical volume (LV). A logical volume can be built on an LVM volume group, a VERITAS VxVM disk group, or a VERITAS CVM disk group on systems that support CVM.
PAGE 182
Planning and Documenting an HA Cluster Package Configuration Planning Specifies the number of concurrent fsck commands to allow during package startup. The default is 1. (See the package control script template for more information). CONCURRENT_MOUNT_OPERATIONS Specifies the number of mounts and umounts to allow during package startup or shutdown. The default is 1. (See the package control script template for more information).
PAGE 183
Planning and Documenting an HA Cluster Package Configuration Planning The service name must not contain any of the following illegal characters: space, slash (/), backslash (\), and asterisk (*). All other characters are legal. The service name can contain up to 39 characters. Service Command For each service, enter a command run the service (see also the Service Name and Service Restart Parameter descriptions, and the examples in the package control script template).
PAGE 184
Planning and Documenting an HA Cluster Package Configuration Planning The package control script will clean up the environment and undo the operations in the event of an error. See “How Package Control Scripts Work” in Chapter 3 for more information. Control Script Worksheet Assemble your package control script data in a separate worksheet for each package, as in the following example. (Blank worksheets are in Appendix F.
PAGE 185
Planning and Documenting an HA Cluster Package Configuration Planning IP[1] ____________________ SUBNET ________________________ ================================================================================ Service Name: __svc1____ Command: ___/usr/bin/MySvc -f__ Restart: _-r 2___ Service Name: _______ Command: _________ Restart: __ Deferred Resource Name __________________ Chapter 4 185
PAGE 186
Planning and Documenting an HA Cluster Package Configuration Planning 186 Chapter 4
PAGE 187
Building an HA Cluster Configuration 5 Building an HA Cluster Configuration This chapter and the next take you through the configuration tasks required to set up a Serviceguard cluster. These procedures are carried out on one node, called the configuration node, and the resulting binary file is distributed by Serviceguard to all the nodes in the cluster. In the examples in this chapter, the configuration node is named ftsys9, and the sample target node is called ftsys10.
PAGE 188
Building an HA Cluster Configuration Preparing Your Systems Preparing Your Systems Before configuring your cluster, ensure that all cluster nodes possess the appropriate security files, kernel configuration, and NTP (network time protocol) configuration. Understanding Where Files Are Located Serviceguard uses a special file, /etc/cmcluster.conf, to define the locations for configuration and log files within the HP-UX filesystem.
PAGE 189
Building an HA Cluster Configuration Preparing Your Systems NOTE Do not edit the /etc/cmcluster.conf configuration file. Editing Security Files Serviceguard daemons grant access to commands by matching incoming hostname and username against the access control policies you define. Serviceguard nodes can communicate over any of the cluster’s shared networks, so all their primary addresses on each of those networks must be identified.
PAGE 190
Building an HA Cluster Configuration Preparing Your Systems 15.145.162.150 NOTE bit.uksr.hp.com bit Serviceguard recognizes only the hostname (the first element) in a fully qualified domain name (a name with four elements separated by periods, like those in the example above). This means, for example, that gryf.uksr.hp.com and gryf.cup.hp.com cannot be nodes in the same cluster, as they would both be treated as the same host gryf.
PAGE 191
Building an HA Cluster Configuration Preparing Your Systems If you need to disable identd. You can configure Serviceguard not to use identd. CAUTION This is not recommended. Consult the white paper Securing Serviceguard at http://docs.hp.com -> High Availability -> Serviceguard -> White Papers for more information. If you must disable identd, you can do so by adding the -i option to the tcp hacl-cfg and hacl-probe commands in /etc/inetd.conf. For example: 1. Change the cmclconfd entry in /etc/inetd.
PAGE 192
Building an HA Cluster Configuration Preparing Your Systems Access Roles Serviceguard access control policies define what a user on a remote node can do on the local node. Serviceguard recognizes two levels of access, root and non-root: • Root Access: Users authorized for root access have total control over the configuration of the cluster and packages. These users have full operating-system-level root privileges for the node, the same privileges as the local root user.
PAGE 193
Building an HA Cluster Configuration Preparing Your Systems NOTE When you upgrade a cluster from Version A.11.15 or earlier, entries in $SGCONF/cmclnodelist are automatically updated into Access Control Policies in the cluster configuration file. All non-root user-hostname pairs are assigned the role of Monitor (view only). Serviceguard uses different mechanisms for access control depending on whether the node is configured into a cluster or not.
PAGE 194
Building an HA Cluster Configuration Preparing Your Systems For example: gryf gryf root user1 #cluster1,node1 #cluster1,node 1 sly sly root user1 # cluster1, node2 #cluster1, node 2 bit root #Administration/COM Server Users with root access can use any cluster configuration commands. Users with non-root access are assigned the Monitor role, giving them read-only access to the node’s configuration.
PAGE 195
Building an HA Cluster Configuration Preparing Your Systems • USER_NAME can either be ANY_USER, or a maximum of 8 login names from the /etc/passwd file on USER_HOST. The names must be separated by spaces or tabs, for example: # Policy 1: USER_NAME john fred patrick USER_HOST bit USER_ROLE PACKAGE_ADMIN • USER_HOST is the node where USER_NAME will issue Serviceguard commands. If you are using the management-station version of Serviceguard Manager, USER_HOST is the COM server.
PAGE 196
Building an HA Cluster Configuration Preparing Your Systems If this policy is defined in the cluster configuration file, it grants user john the PACKAGE_ADMIN role for any package on node bit. User john also has the MONITOR role for the entire cluster, because PACKAGE_ADMIN includes MONITOR. If the policy is defined in the package configuration file for PackageA, then user john on node bit has the PACKAGE_ADMIN role only for PackageA.
PAGE 197
Building an HA Cluster Configuration Preparing Your Systems Plan the cluster’s roles and validate them as soon as possible. If your organization’s security policies allow it, you may find it easiest to create group logins. For example, you could create a MONITOR role for user operator1 from ANY_CLUSTER_NODE. Then you could give this login name and password to everyone who will need to monitor your clusters.
PAGE 198
Building an HA Cluster Configuration Preparing Your Systems NOTE If a NIC fails, the affected node will be able to fail over to a standby LAN so long as the node is running in the cluster. But if a NIC that is used by Serviceguard fails when the affected node is not running in the cluster, Serviceguard will not be able to restart the node. (For instructions on replacing a failed NIC, see “Replacing LAN or Fibre Channel Cards” on page 354.) 1. Edit the /etc/hosts file on all nodes in the cluster.
PAGE 199
Building an HA Cluster Configuration Preparing Your Systems files [NOTFOUND=continue UNAVAIL=continue] dns [NOTFOUND=return UNAVAIL=return] or files [NOTFOUND=continue UNAVAIL=continue] nis [NOTFOUND=return UNAVAIL=return] This step is critical, allowing the cluster nodes to resolve hostnames to IP addresses while DNS, NIS, or the primary LAN is down. 4.
PAGE 200
Building an HA Cluster Configuration Preparing Your Systems 4. Mirror the boot, primary swap, and root logical volumes to the new bootable disk. Ensure that all devices in vg00, such as /usr, /swap, etc., are mirrored. NOTE The boot, root, and swap logical volumes must be done in exactly the following order to ensure that the boot volume occupies the first contiguous set of extents on the new disk, followed by the swap and the root.
PAGE 201
Building an HA Cluster Configuration Preparing Your Systems Choosing Cluster Lock Disks The following guidelines apply if you are using a lock disk. The cluster lock disk is configured on a volume group that is physically connected to all cluster nodes. This volume group may also contain data that is used by packages. When you are using dual cluster lock disks, it is required that the default IO timeout values are used for the cluster lock physical volumes.
PAGE 202
Building an HA Cluster Configuration Preparing Your Systems Ensuring Consistency of Kernel Configuration Make sure that the kernel configurations of all cluster nodes are consistent with the expected behavior of the cluster during failover. In particular, if you change any kernel parameters on one cluster node, they may also need to be changed on other cluster nodes that can run the same packages.
PAGE 203
Building an HA Cluster Configuration Preparing Your Systems Serviceguard has also been tested with non-default values for these two network parameters: • ip6_nd_dad_solicit_count - This network parameter enables the Duplicate Address Detection feature for IPv6 address. For more information, see “IPv6 Relocatable Address and Duplicate Address Detection Feature” on page 468 of this manual.
PAGE 204
Building an HA Cluster Configuration Preparing Your Systems If you are planning to add a node online, and a package will run on the new node, ensure that any existing cluster bound volume groups for the package have been imported to the new node. Also, ensure that the MAX_CONFIGURED_PACKAGES parameter is set high enough to accommodate the total number of packages you will be using.
PAGE 205
Building an HA Cluster Configuration Setting up the Quorum Server Setting up the Quorum Server The quorum server software, which has to be running during cluster configuration, must be installed on a system other than the nodes on which your cluster will be running. NOTE It is recommended that the node on which the quorum server is running be in the same subnet as the clusters for which it is providing services. This will help prevent any network delays which could affect quorum server operation.
PAGE 206
Building an HA Cluster Configuration Setting up the Quorum Server To allow access by all nodes, enter the plus character (+) on its own line. Running the Quorum Server The quorum server must be running during the following cluster operations: • when the cmquerycl command is issued. • when the cmapplyconf command is issued. • when there is a cluster re-formation. By default, quorum server run-time messages go to stdout and stderr.
PAGE 207
Building an HA Cluster Configuration Installing and Updating Serviceguard Installing and Updating Serviceguard For information about installing Serviceguard, see the Release Notes for your version at http://docs.hp.com -> High Availability -> Serviceguard -> Release Notes. For information about installing and updating HP-UX, see the HP-UX Installation and Update Guide for the version you need: go to http://docs.hp.
PAGE 208
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Creating the Storage Infrastructure and Filesystems with LVM and VxVM In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes.
PAGE 209
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM The Event Monitoring Service HA Disk Monitor provides the capability to monitor the health of LVM disks. If you intend to use this monitor for your mirrored disks, you should configure them in physical volume groups. For more information, refer to the manual Using High Availability Monitors (http://docs.hp.
PAGE 210
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM In the following examples, we use /dev/rdsk/c1t2d0 and /dev/rdsk/c0t2d0, which happen to be the device names for the same disks on both ftsys9 and ftsys10. In the event that the device file names are different on the different nodes, make a careful note of the correspondences.
PAGE 211
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM # ls -l /dev/*/group 3. Create the volume group and add physical volumes to it with the following commands: # vgcreate -g bus0 /dev/vgdatabase /dev/dsk/c1t2d0 # vgextend -g bus1 /dev/vgdatabase /dev/dsk/c0t2d0 The first command creates the volume group and adds a physical volume to it in a physical volume group called bus0.
PAGE 212
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Note the mount command uses the block device file for the logical volume. 4. Verify the configuration: # vgdisplay -v /dev/vgdatabase Distributing Volume Groups to Other Nodes After creating volume groups for cluster data, you must make them available to any cluster node that will need to activate the volume group. The cluster lock volume group must be made available to all nodes.
PAGE 213
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM 4. Still on ftsys10, create a control file named group in the directory /dev/vgdatabase, as follows: # mknod /dev/vgdatabase/group c 64 0xhh0000 Use the same minor number as on ftsys9. Use the following command to display a list of existing volume groups: # ls -l /dev/*/group 5. Import the volume group data using the map file from node ftsys9. On node ftsys10, enter: # vgimport -s -m /tmp/vgdatabase.
PAGE 214
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM reflects the contents of all physical volume groups on that node. See the following section, “Making Physical Volume Group Files Consistent.” 7. Make sure that you have deactivated the volume group on ftsys9. Then enable the volume group on ftsys10: # vgchange -a y /dev/vgdatabase 8. Create a directory to mount the disk: # mkdir /mnt1 9.
PAGE 215
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM 3. If /etc/lvmpvg on ftsys10 contains entries for volume groups that do not appear in /etc/lvmpvg.new, then copy all physical volume group entries for that volume group to /etc/lvmpvg.new. 4. Adjust any physical volume names in /etc/lvmpvg.new to reflect their correct names on ftsys10. 5. On ftsys10, copy /etc/lvmpvg to /etc/lvmpvg.old to create a backup. Copy /etc/lvmvpg.new to /etc/lvmpvg on ftsys10.
PAGE 216
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Initializing the Veritas Cluster Volume Manager 3.5 NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard). With CVM 3.
PAGE 217
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Initializing Disks for VxVM You need to initialize the physical disks that will be employed in VxVM disk groups.
PAGE 218
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM NAME STATE rootdg logdata enabled enabled ID 971995699.1025.node1 972078742.1084.node1 Creating Volumes Use the vxassist command to create logical volumes. The following is an example: # vxassist -g logdata make log_files 1024m This command creates a 1024 MB volume named log_files in a disk group named logdata.
PAGE 219
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM # mkdir /logs 3. Mount the volume: # mount /dev/vx/dsk/logdata/log_files /logs 4.
PAGE 220
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Clearimport at System Reboot Time At system reboot time, the cmcluster RC script does a vxdisk clearimport on all disks formerly imported by the system, provided they have the noautoimport flag set, and provided they are not currently imported by another running node.
PAGE 221
Building an HA Cluster Configuration Configuring the Cluster Configuring the Cluster This section describes how to define the basic cluster configuration. This must be done on a system that is not part of a Serviceguard cluster (that is, on which Serviceguard is installed but not configured). • To use Serviceguard Manager to configure a cluster, open the System Management Homepage (SMH) and choose Tools->Serviceguard Manager. See “Using Serviceguard Manager” on page 28 for more information.
PAGE 222
Building an HA Cluster Configuration Configuring the Cluster -w full lets you specify full network probing, in which actual connectivity is verified among all LAN interfaces on all nodes in the cluster. This is the default. -w none skips network querying. If you have recently checked the networks. this option will save time. For more details, see the cmquerycl(1m) man page. The example above creates an ASCII template file, by default /etc/cmcluster/clust1.config.
PAGE 223
Building an HA Cluster Configuration Configuring the Cluster To display the failover times of disks, use the cmquerycl command, specifying all the nodes in the cluster.The output of the command lists the disks connected to each node together with the re-formation time associated with each. Do not include the node’s entire domain name; for example, specify ftsys9 not ftsys9.cup.hp.
PAGE 224
Building an HA Cluster Configuration Configuring the Cluster Specifying a Quorum Server To specify a quorum server instead of a lock disk, use the -q option of the cmquerycl command, specifying a Quorum Server host server. Example: # cmquerycl -n ftsys9 -n ftsys10 -q qshost The cluster ASCII file that is generated in this case contains parameters for defining the quorum server. This portion of the file is shown below: # Quorum Server Parameters.
PAGE 225
Building an HA Cluster Configuration Configuring the Cluster Identifying Heartbeat Subnets The cluster ASCII file includes entries for IP addresses on the heartbeat subnet. It is recommended that you use a dedicated heartbeat subnet, but it is possible to configure heartbeat on other subnets as well, including the data subnet. The heartbeat must be on an IPv4 subnet and must employ IPv4 addresses. IPv6 heartbeat is not supported. NOTE If you are using Version 3.
PAGE 226
Building an HA Cluster Configuration Configuring the Cluster The default value of 2 seconds for NODE_TIMEOUT leads to a best case failover time of 30 seconds. If NODE_TIMEOUT is changed to 10 seconds, which means that the cluster manager waits 5 times longer to timeout a node, the failover time is increased by 5, to approximately 150 seconds. NODE_TIMEOUT must be at least 2*HEARTBEAT_INTERVAL. A good rule of thumb is to have at least two or three heartbeats within one NODE_TIMEOUT.
PAGE 227
Building an HA Cluster Configuration Configuring the Cluster For more information, see “Access Roles” on page 192 and “Editing Security Files” on page 189. Adding Volume Groups Add any LVM volume groups you have configured to the ASCII cluster configuration file, with a separate VOLUME_GROUP parameter for each cluster-aware volume group that will be used in the cluster. These volume groups will be initialized with the cluster ID when the cmapplyconf command is used.
PAGE 228
Building an HA Cluster Configuration Configuring the Cluster • Existence and permission of scripts specified in the command line. • If all nodes specified are in the same heartbeat subnet. • If you specify the wrong configuration filename. • If all nodes can be accessed. • No more than one CLUSTER_NAME, HEARTBEAT_INTERVAL, and AUTO_START_TIMEOUT are specified. • The value for package run and halt script timeouts is less than 4294 seconds.
PAGE 229
Building an HA Cluster Configuration Configuring the Cluster Distributing the Binary Configuration File After specifying all cluster parameters, you apply the configuration. This action distributes the binary configuration file to all the nodes in the cluster. We recommend doing this separately before you configure packages (described in the next chapter).
PAGE 230
Building an HA Cluster Configuration Configuring the Cluster The cmapplyconf command creates a binary version of the cluster configuration file and distributes it to all nodes in the cluster. This action ensures that the contents of the file are consistent across all nodes. Note that the cmapplyconf command does not distribute the ASCII configuration file. NOTE The apply will not complete unless the cluster lock volume group is activated on exactly one node before applying.
PAGE 231
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) NOTE CFS (and CVM - Cluster Volume Manager) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability -> Serviceguard).
PAGE 232
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) Preparing the Cluster and the System Multi-node Package 1. First, be sure the cluster is running: # cmviewcl 2. If it is not, start it: # cmruncl 3. If you have not initialized your disk groups, or if you have an old install that needs to be re-initialized, use the vxinstall command to initialize VxVM/CVM disk groups. See “Initializing the VERITAS Volume Manager” on page 244. 4.
PAGE 233
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) # cfscluster config -t 900 -s 5. Verify the system multi-node package is running and CVM is up, using the cmviewcl or cfscluster command. Following is an example of using the cfscluster command.
PAGE 234
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) # cfscluster status NOTE Node : Cluster Manager : CVM state : MOUNT POINT TYPE ftsys9 up up (MASTER) SHARED VOLUME DISK GROUP STATUS Node : Cluster Manager : CVM state : MOUNT POINT TYPE ftsys10 up up SHARED VOLUME DISK GROUP STATUS Because the CVM 4.
PAGE 235
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) NOTE If you want to create a cluster with CVM only - without CFS, stop here. Then, in your application package’s configuration file, add the dependency triplet, with DEPENDENCY_CONDITION set to SG-DG-pkg-id#=UP and LOCATION set to SAME_NODE. For more information about the DEPENDENCY parameter, see “Package Configuration File Parameters” on page 171. Creating the Disk Group Cluster Packages 1.
PAGE 236
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) 5. To view the package name that is monitoring a disk group, use the cfsdgadm show_package command: # cfsdgadm show_package logdata sg_cfs_dg-1 Creating Volumes 1. Make log_files volume on the logdata disk group: # vxassist -g logdata make log_files 1024m 2.
PAGE 237
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages. Use of these other forms of mount will not create an appropriate multi-node package which means that the cluster packages are not aware of the file system changes. NOTE The disk group and mount point multi-node packages do not monitor the health of the disk group and mount point.
PAGE 238
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) SG-CFS-pkg SG-CFS-DG-1 SG-CFS-MP-1 up up up running running running enabled enabled enabled yes no no # ftsys9/etc/cmcluster/cfs> bdf Filesystem kbytes used avail %used /dev/vx/dsk/logdata/log_files 10485 17338 966793 Mounted on 2% tmp/logdata/log_files # ftsys10/etc/cmcluster/cfs> bdf Filesystem kbytes used avail %used /dev/vx/dsk/logdata/log_files 10485 17338 966793 Mounted on 2% tmp/logda
PAGE 239
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) Mount Point Packages for Storage Checkpoints The VERITAS File System provides a unique storage checkpoint facility which quickly creates a persistent image of a filesystem at an exact point in time. Storage checkpoints significantly reduce I/O overhead by identifying and maintaining only the filesystem blocks that have changed since the last storage checkpoint or backup.
PAGE 240
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) CLUSTER cfs-cluster STATUS up NODE ftsys9 ftsys10 STATUS up up STATE running running MULTI_NODE_PACKAGES PACKAGE SG-CFS-pkg SG-CFS-DG-1 SG-CFS-MP-1 SG-CFS-CK-1 STATUS up up up up STATE running running running running AUTO_RUN enabled enabled enabled disabled SYSTEM yes no no no /tmp/check_logfiles now contains a point in time view of /tmp/logdata/log_files, and it is persistent.
PAGE 241
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) operations can be performed from that node. The snapshot of a cluster file system is accessible only on the node where it is created; the snapshot file system itself cannot be cluster mounted. For details on creating snapshots on cluster file systems, see the VERITAS Storage Foundation Cluster File System Installation and Administration Guide posted at http://docs.hp.com:.
PAGE 242
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) # bdf Filesystem kbytes used avail %used Mounted on /dev/vg00/lvol3 544768 352233 180547 66% / /dev/vg00/lvol1 307157 80196 196245 29% /stand /dev/vg00/lvol5 1101824 678426 397916 63% /var /dev/vg00/lvol7 2621440 1702848 861206 66% /usr /dev/vg00/lvol4 4096 707 3235 18% /tmp /dev/vg00/lvol6 2367488 1718101 608857 74% /opt /dev/vghome/varopt 4194304 258609 3689741 7% /var/opt /dev/vghome/home 2097152
PAGE 243
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
PAGE 244
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) Separate procedures are given below for: • Initializing the Volume Manager • Preparing the Cluster for Use with CVM • Creating Disk Groups for Shared Storage • Creating File Systems with CVM For more information, including details about configuration of plexes (mirrors), multipathing, and RAID, refer to the HP-UX documentation for the VERITAS Volume Manager.
PAGE 245
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) Preparing the Cluster for Use with CVM In order to use the VERITAS Cluster Volume Manager (CVM), you need a cluster that is running with a Serviceguard-supplied CVM system multi-node package. This means that the cluster must already be configured and running before you create disk groups.
PAGE 246
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) • VERITAS CVM 3.5: # cmapplyconf -P /etc/cmcluster/cvm/VxVM-CVM-pkg.conf • VERITAS CVM 4.1: If you are not using VERITAS Cluster File System, use the cmapplyconf command. (If you are using CFS, you will set up CVM as part of the CFS components.): # cmapplyconf -P /etc/cmcluster/cfs/SG-CFS-pkg.conf Begin package verification ...
PAGE 247
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) # vxdctl -c mode One node will identify itself as the master. Create disk groups from this node. Initializing Disks for CVM You need to initialize the physical disks that will be employed in CVM disk groups.
PAGE 248
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) # vxassist -g logdata make log_files 1024m This command creates a 1024 MB volume named log_files in a disk group named logdata. The volume can be referenced with the block device file /dev/vx/dsk/logdata/log_files or the raw (character) device file /dev/vx/rdsk/logdata/log_files.
PAGE 249
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) You also need to identify the CVM disk groups, filesystems, logical volumes, and mount options in the package control script. The package configuration process is described in detail in Chapter 6.
PAGE 250
Building an HA Cluster Configuration Using DSAU during Configuration Using DSAU during Configuration As explained under “What are the Distributed Systems Administration Utilities?” on page 32, you can use DSAU to centralize and simplify configuration and monitoring tasks. See the Distributed Systems Administration Utilities User’s Guide posted at http://docs.hp.com.
PAGE 251
Building an HA Cluster Configuration Managing the Running Cluster Managing the Running Cluster This section describes some approaches to routine management of the cluster. Additional tools and suggestions are found in Chapter 7, “Cluster and Package Maintenance.” Checking Cluster Operation with Serviceguard Manager You can check configuration and status information using Serviceguard Manager: from the System Management Homepage (SMH), choose Tools-> Serviceguard Manager.
PAGE 252
Building an HA Cluster Configuration Managing the Running Cluster You can use these commands to test cluster operation, as in the following: 1. If the cluster is not already online, start it. From the Serviceguard Manager menu, choose Run Cluster. From the command line, use cmruncl -v. By default, cmruncl will check the networks. Serviceguard will probe the actual network configuration with the network information in the cluster configuration.
PAGE 253
Building an HA Cluster Configuration Managing the Running Cluster Preventing Automatic Activation of LVM Volume Groups It is important to prevent LVM volume groups that are to be used in packages from being activated at system boot time by the /etc/lvmrc file. One way to ensure that this does not happen is to edit the /etc/lvmrc file on all nodes, setting AUTO_VG_ACTIVATE to 0, then including all the volume groups that are not cluster-bound in the custom_vg_activation function.
PAGE 254
Building an HA Cluster Configuration Managing the Running Cluster To enable automatic cluster start, set the flag AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file on each node in the cluster; the nodes will then join the cluster at boot time. Here is an example of the /etc/rc.config.d/cmcluster file: #************************ CMCLUSTER ************************ # Highly Available Cluster configuration # # @(#) $Revision: 72.
PAGE 255
Building an HA Cluster Configuration Managing the Running Cluster Managing a Single-Node Cluster The number of nodes you will need for your Serviceguard cluster depends on the processing requirements of the applications you want to protect. You may want to configure a single-node cluster to take advantage of Serviceguard’s network failure protection. In a single-node cluster, a cluster lock is not required, since there is no other node in the cluster.
PAGE 256
Building an HA Cluster Configuration Managing the Running Cluster Deleting the Cluster Configuration With root login, you can delete a cluster configuration from all cluster nodes by using Serviceguard Manager, or on the command line. The cmdeleteconf command prompts for a verification before deleting the files unless you use the -f option. You can only delete the configuration when the cluster is down.
PAGE 257
Configuring Packages and Their Services 6 Configuring Packages and Their Services Serviceguard packages group together applications and the services and resources they depend on. The typical Serviceguard package is a failover package that starts on one node but can be moved (“failed over”) to another if necessary. See “What is Serviceguard?” on page 24, “How the Package Manager Works” on page 74, and “Package Configuration Planning” on page 165 for more information.
PAGE 258
Configuring Packages and Their Services Creating the Package Configuration Creating the Package Configuration The package configuration process defines a set of application services that are run by the package manager when a package starts up on a node in the cluster. The configuration also includes a prioritized list of cluster nodes on which the package can run together with definitions of the acceptable types of failover allowed for the package.
PAGE 259
Configuring Packages and Their Services Creating the Package Configuration Configuring System Multi-node Packages There are two system multi-node packages that regulate VERITAS CVM Cluster Volume Manager. These packages ship with the Serviceguard product. There are two versions of the package files: VxVM-CVM-pkg for CVM Version 3.5, and SG-CFS-pkg for CVM Version 4.1. NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX.
PAGE 260
Configuring Packages and Their Services Creating the Package Configuration The CFS admin commands are listed in Appendix A. Configuring Multi-node Packages There are two types of multi-node packages that work with the VERITAS cluster file system: SG-CFS-DG-id# for disk groups, which you configure with the cfsdgadm command, and SG-CFS-MP-id# for mount points, which you configure with the cfsmntadm command. Each package name will have a unique number, appended by Serviceguard at creation.
PAGE 261
Configuring Packages and Their Services Creating the Package Configuration the dependent application package loses access and cannot read and write to the disk, it will fail; however that will not cause the DG or MP multi-node package to fail. NOTE Do not create or edit ASCII configuration files for the Serviceguard supplied packages VxVM-CVM-pkg, SG-CFS-pkg, SG-CFS-DG-id#, or SG-CFS-MP-id#. Create VxVM-CVM-pkg and SG-CFS-pkg by issuing the cmapplyconf command.
PAGE 262
Configuring Packages and Their Services Creating the Package Configuration 2. Distribute the control script to all nodes. 3. Apply the configuration. 4. Run the package and ensure that it can be moved from node to node. 5. Halt the package. 6. Configure package IP addresses and application services in the control script. 7. Distribute the control script to all nodes. 8. Run the package and ensure that applications run as expected and that the package fails over correctly when services are disrupted.
PAGE 263
Configuring Packages and Their Services Creating the Package Configuration # whether this package is to run as a FAILOVER, MULTI_NODE, or # SYSTEM_MULTI_NODE package. # # FAILOVER package runs on one node at a time and if a failure # occurs it can switch to an alternate node. # # MULTI_NODE package runs on multiple nodes at the same time and # can be independently started and halted on # individual nodes.
PAGE 264
Configuring Packages and Their Services Creating the Package Configuration # # # PACKAGE_TYPE PACKAGE_TYPE # # # # # # # # # # FAILOVER Enter the failover policy for this package. This policy will be used toselect an adoptive node whenever the package needs to be started. The default policy unless otherwise specified is CONFIGURED_NODE. This policy will select nodes in priority order from the list of NODE_NAME entries specified below. The alternative policy is MIN_PACKAGE_NODE.
PAGE 265
Configuring Packages and Their Services Creating the Package Configuration # # Example : NODE_NAME * NODE_NAME # # # # # # Enter the value The default for package will be package will be AUTO_RUN replaces obsolete PKG_SWITCHING_ENABLED. AUTO_RUN # # # # # # # for AUTO_RUN. Possible values are YES and NO. AUTO_RUN is YES. When the cluster is started the automatically started. In the event of a failure the started on an adoptive node. Adjust as necessary.
PAGE 266
Configuring Packages and Their Services Creating the Package Configuration # is NO_TIMEOUT. Adjust the timeouts as necessary to permit full # execution of each script. # # Note: The HALT_SCRIPT_TIMEOUT should be greater than the sum of # all SERVICE_HALT_TIMEOUT values specified for all services. # # The file where the output of the scripts is logged can be specified # via the SCRIPT_LOG_FILE parameter. If not set, script output is sent # to a file named by appending '.log' to the script path.
PAGE 267
Configuring Packages and Their Services Creating the Package Configuration # # The syntax is: = UP , where # is the name of multi-node or system multi-node package. # # DEPENDENCY_LOCATION # This describes where the condition must be satisfied. # The only possible value for this attribute is SAME_NODE # # NOTE: # Dependencies should be used only for a CFS cluster, or by # applications specified by Hewlett-Packard.
PAGE 268
Configuring Packages and Their Services Creating the Package Configuration # out the SIGKILL signal to the service to force its termination. # This timeout value should be large enough to allow all cleanup # processes associated with the service to complete. If the # SERVICE_HALT_TIMEOUT is not specified, a zero timeout will be # assumed, meaning the cluster software will not wait at all # before sending the SIGKILL signal to halt the service.
PAGE 269
Configuring Packages and Their Services Creating the Package Configuration # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # should specify all the DEFERRED resources in the package run script so that these DEFERRED resources will be started up from the package run script during package run time. RESOURCE_UP_VALUE requires an operator and a value. This defines the resource 'UP' condition. The operators are =, !=, >, <, >=, and <=, depending on the type of value.
PAGE 270
Configuring Packages and Their Services Creating the Package Configuration # #RESOURCE_NAME #RESOURCE_POLLING_INTERVAL #RESOURCE_START #RESOURCE_UP_VALUE # # # # # # # # # # # # # # # # # # # # # # # # # [and ] Access Control Policy Parameters. Three entries set the access control policy for the package: First line must be USER_NAME, second USER_HOST, and third USER_ROLE. Enter a value after each. 1.
PAGE 271
Configuring Packages and Their Services Creating the Package Configuration On systems that support VERITAS Cluster File System (CFS) and Cluster Volume Manager (CVM), do not create or edit ASCII configuration files for the Serviceguard supplied packages VxVM-CVM-pkg, SG-CFS-pkg, SG-CFS-DG-id#, or SG-CFS-MP-id# Create VxVM-CVM-pkg and SG-CFS-pkg by issuing the cmapplyconf command. Create and modify SG-CFS-DG-id# and SG-CFS-MP-id# using the cfs commands listed in Appendix A.
PAGE 272
Configuring Packages and Their Services Creating the Package Configuration • RUN_SCRIPT and HALT_SCRIPT. Specify the pathname of the package control script (described in the next section). No default is provided. TIMEOUT: For the run and halt scripts, enter the number of seconds Serviceguard should try to complete the script before it acknowledges failure. If you have timeouts for the halt script, this value must be larger than all the halt script timeouts added together. SCRIPT_LOG_FILE. (optional).
PAGE 273
Configuring Packages and Their Services Creating the Package Configuration — RESOURCE_UP_VALUE. Enter the value or values that determine when the resource is considered to be up. During monitoring, if a different value is found for the resource, the package will fail. — RESOURCE_START. The RESOURCE_START option is used to determine when Serviceguard should start up resource monitoring for EMS resources. The RESOURCE_START option can be set to either AUTOMATIC or DEFERRED.
PAGE 274
Configuring Packages and Their Services Creating the Package Configuration The only role in the package configuration file is that of PACKAGE_ADMIN over the one configured package. Cluster-wide roles are defined in the cluster configuration file. There must be no conflict in roles. If there is, configuration will fail and you will get a message. It is a good idea, therefore, to look at the cluster configuration file (use the cmgetconf command) before creating any roles in the package’s file.
PAGE 275
Configuring Packages and Their Services Creating the Package Control Script Creating the Package Control Script The package control script contains all the information necessary to run all the services in the package, monitor them during operation, react to a failure, and halt the package when necessary. You can use Serviceguard Manager, HP-UX commands, or a combination of both, to create or modify the package control script. Each package must have a separate control script, which must be executable.
PAGE 276
Configuring Packages and Their Services Creating the Package Control Script Master Toolkit product (B5139DA). These files are found in /opt/cmcluster/toolkit/DB/. Separate toolkits are available for Oracle, Informix, and Sybase. In addition to the standard package control script, you use the special script that is provided for the database. To set up these scripts, follow the instructions that appear in the README file provided with each toolkit.
PAGE 277
Configuring Packages and Their Services Creating the Package Control Script NOTE • Specify the filesystem mount retry and unmount count options. • If your package uses a large number of volume groups or disk groups or mounts a large number of file systems, consider increasing the number of concurrent vgchange, mount/umount, and fsck operations. The default of 1 is adequate for most packages. • Define IP subnet and IP address pairs for your package. IPv4 or IPv6 addresses may be used.
PAGE 278
Configuring Packages and Their Services Creating the Package Control Script How Control Scripts Manage VxVM Disk Groups VxVM disk groups (other than those managed by CVM, on systems that support it) are outside the control of the Serviceguard cluster. The package control script uses standard VxVM commands to import and deport these disk groups. (For details on importing and deporting disk groups, refer to the discussion of the import and deport options in the vxdg man page.
PAGE 279
Configuring Packages and Their Services Creating the Package Control Script reboot. If a node in the cluster fails, the host ID is still written on each disk in the disk group. However, if the node is part of a Serviceguard cluster then on reboot the host ID will be cleared by the owning node from all disks which have the noautoimport flag set, even if the disk group is not under Serviceguard control.
PAGE 280
Configuring Packages and Their Services Creating the Package Control Script NOTE This a sample file that may not be identical to the file produced by the version of Serviceguard running on your system. Run cmmakepkg -s pathname to generate a package control script template on your local Serviceguard node; see “Using Serviceguard Commands to Configure a Package” on page 258 for more information.
PAGE 281
Configuring Packages and Their Services Creating the Package Control Script # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # Uncomment the first line (VGCHANGE=”vgchange -a e -q n”), and comment out the default, if you want to activate volume groups in exclusive mode and ignore the disk quorum requirement. Since the disk quorum ensures the integrity of the LVM configuration, it is normally not advisable to override the quorum.
PAGE 282
Configuring Packages and Their Services Creating the Package Control Script # and comment out the default, if you want disk groups activated in the # shared read mode. # # Uncomment the third line # (CVM_ACTIVATION_CMD=”vxdg -g \$DiskGroup set activation=sharedwrite”), # and comment out the default, if you want disk groups activated in the # shared write mode.
PAGE 283
Configuring Packages and Their Services Creating the Package Control Script # devices used by CFS (cluster file system). CFS resources are # controlled by the Disk Group and Mount Multi-node packages. # # VxVM DISK GROUPS # Specify which VxVM disk groups are used by this package. Uncomment # VXVM_DG[0]=”” and fill in the name of your first disk group. You must # begin with VXVM_DG[0], and increment the list in sequence.
PAGE 284
Configuring Packages and Their Services Creating the Package Control Script # # # # # # # and filesystem type for the file system. You must begin with LV[0], FS[0], FS_MOUNT_OPT[0], FS_UMOUNT_OPT[0], FS_FSCK_OPT[0], FS_TYPE[0] and increment the list in sequence. Note: The FS_TYPE parameter lets you specify the type of filesystem to be mounted. Specifying a particular FS_TYPE will improve package failover time.
PAGE 285
Configuring Packages and Their Services Creating the Package Control Script # # # # # # # # # # Specify the number of mount retrys for each filesystem. The default is 0. During startup, if a mount point is busy and FS_MOUNT_RETRY_COUNT is 0, package startup will fail and the script will exit with 1. If a mount point is busy and FS_MOUNT_RETRY_COUNT is greater than 0, the script will attempt to kill the user responsible for the busy mount point and then mount the file system.
PAGE 286
Configuring Packages and Their Services Creating the Package Control Script # logfile. CONCURRENT_VGCHANGE_OPERATIONS=1 # # CONCURRENT FSCK OPERATIONS # Specify the number of concurrent fsck to allow during package startup. # Setting this value to an appropriate number may improve the performance # while checking a large number of file systems in the package. If the # specified value is less than 1, the script defaults it to 1 and proceeds # with a warning message in the package control script logfile.
PAGE 287
Configuring Packages and Their Services Creating the Package Control Script # For example, if this package uses an IP of 192.10.25.12 and a subnet of # 192.10.25.0 enter: # IP[0]=192.10.25.12 # SUBNET[0]=192.10.25.0 # (netmask=255.255.255.0) # # Hint: Run “netstat -i” to see the available subnets in the Network field. # # For example, if this package uses an IPv6 IP of 2001::1/64 # The address prefix identifies the subnet as 2001::/64 which is an # available subnet.
PAGE 288
Configuring Packages and Their Services Creating the Package Control Script # SERVICE_NAME[2]=pkg1c # SERVICE_CMD[2]=”/usr/sbin/ping” # SERVICE_RESTART[2]=”-R” # Will restart the service an infinite # number of times. # # Note: No environmental variables will be passed to the command, this # includes the PATH variable. Absolute path names are required for the # service command definition. Default shell is /usr/bin/sh.
PAGE 289
Configuring Packages and Their Services Creating the Package Control Script The main function appears at the end of the script. Note that individual variables are optional; you should include only as many as you need for proper package operation. For example, if your package does not need to activate a volume group, omit the VG variables; if the package does not use services, omit the corresponding SERVICE_NAME, SERVICE_CMD, and SERVICE_RESTART variables; and so on.
PAGE 290
Configuring Packages and Their Services Creating the Package Control Script echo 'Starting pkg1' >> /tmp/pkg1.datelog test_return 51 } # This function is a place holder for customer defined functions. # You should define all actions you want to happen here, before the service is # halted. function customer_defined_halt_cmds { # ADD customer defined halt commands. : # do nothing instruction, because a function must contain some command. date >> /tmp/pkg1.datelog echo 'Halting pkg1' >> /tmp/pkg1.
PAGE 291
Configuring Packages and Their Services Creating the Package Control Script Support for Additional Products The package control script template provides exits for use with additional products, including MetroCluster with Continuous Access/CA, MetroCluster with EMC SRDF, and the HA NFS toolkit. Refer to the additional product’s documentation for details about how to create a package using the hooks that are provided in the control script.
PAGE 292
Configuring Packages and Their Services Verifying the Package Configuration Verifying the Package Configuration Serviceguard checks the configuration you enter and reports any errors. In Serviceguard Manager, click Check to verify the package configuration you have done under any package configuration tab, or to check changes you have made to the control script. Click Apply to verify the package as a whole. See the local Help for more details.
PAGE 293
Configuring Packages and Their Services Distributing the Configuration Distributing the Configuration You can use Serviceguard Manager or HP-UX commands to distribute the binary cluster configuration file among the nodes of the cluster. DSAU (Distributed Systems Administration Utilities) can help you streamline your distribution, as explained under “What are the Distributed Systems Administration Utilities?” on page 32.
PAGE 294
Configuring Packages and Their Services Distributing the Configuration • Activate the cluster lock volume group so that the lock disk can be initialized: # vgchange -a y /dev/vg01 • Generate the binary configuration file and distribute it across the nodes. # cmapplyconf -v -C /etc/cmcluster/cmcl.config -P \ /etc/cmcluster/pkg1/pkg1.config • If you are using a lock disk, deactivate the cluster lock volume group.
PAGE 295
Cluster and Package Maintenance 7 Cluster and Package Maintenance This chapter describes how to see cluster configuration and status information, how to start and halt a cluster or an individual node, how to perform permanent reconfiguration, and how to start, halt, move, and modify packages during routine maintenance of the cluster.
PAGE 296
Cluster and Package Maintenance Reviewing Cluster and Package Status Reviewing Cluster and Package Status You can check status using Serviceguard Manager or from a cluster node’s command line. Reviewing Cluster and Package Status with the cmviewcl Command Information about cluster status is stored in the status database, which is maintained on each individual node in the cluster.
PAGE 297
Cluster and Package Maintenance Reviewing Cluster and Package Status Viewing multi-node Information On systems that support VERITAS Cluster File System (CFS), you can use cfs commands to see multi-node package configuration information, status, and dependencies in a CFS cluster; for example cfsdgadm show_package diskgroup, cfsmntadm show_package mountpoint, getconf -p mnpkg | grep DEPENDENCY. The cmviewcl -v command output lists dependencies throughout the cluster.
PAGE 298
Cluster and Package Maintenance Reviewing Cluster and Package Status • Reforming. A node is in this state when the cluster is re-forming. The node is currently running the protocols which ensure that all nodes agree to the new membership of an active cluster. If agreement is reached, the status database is updated to reflect the new cluster membership. • Running. A node in this state has completed all required activity for the last re-formation and is operating normally. • Halted.
PAGE 299
Cluster and Package Maintenance Reviewing Cluster and Package Status • Switching Enabled for a Node. For failover packages, enabled means that the package can switch to the referenced node. Disabled means that the package cannot switch to the specified node until the node is enabled for the package using the cmmodpkg command. Every failover package is marked Enabled or Disabled for each node that is either a primary or adoptive node for the package.
PAGE 300
Cluster and Package Maintenance Reviewing Cluster and Package Status Failover and failback policies are displayed in the output of the cmviewcl -v command. Examples of Cluster and Package States The following sample output from the cmviewcl -v command shows status for the cluster in the sample configuration. Normal Running Status Everything is running normally; both nodes in the cluster are running, and the packages are in their primary locations.
PAGE 301
Cluster and Package Maintenance Reviewing Cluster and Package Status STANDBY up 32.1 lan1 PACKAGE pkg2 STATUS up STATE running AUTO_RUN enabled NODE ftsys10 Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover configured_node Failback manual Script_Parameters: ITEM STATUS Service up Subnet up MAX_RESTARTS 0 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled RESTARTS 0 0 NAME ftsys10 ftsys9 NAME service2 15.13.168.
PAGE 302
Cluster and Package Maintenance Reviewing Cluster and Package Status CVM Package Status If the cluster is using the VERITAS Cluster Volume Manager (CVM), version 3.5, for disk storage, the system multi-node package VxVM-CVM-pkg must be running on all active nodes for applications to be able to access CVM disk groups. The system multi-node package is named SG-CFS-pkg if the cluster is using version 4.1 of the VERITAS Cluster Volume Manager.
PAGE 303
Cluster and Package Maintenance Reviewing Cluster and Package Status NODE ftsys8 STATUS down NODE STATUS ftsys9 up Script_Parameters: ITEM STATUS Service up NODE STATUS ftsys10 up Script_Parameters: ITEM STATUS Service up SWITHCHING disabled SWITCHING enabled MAX_RESTARTS 0 RESTARTS 0 NAME VxVM-CVM-pkg.srv SWITCHING enabled MAX_RESTARTS 0 RESTARTS 0 NAME VxVM-CVM-pkg.
PAGE 304
Cluster and Package Maintenance Reviewing Cluster and Package Status ITEM Service Service Service Service Service STATUS up up up up up NODE_NAME ftsys10 MAX_RESTARTS 0 5 5 0 0 STATUS up RESTARTS 0 0 0 0 0 NAME SG-CFS-vxconfigd SG-CFS-sgcvmd SG-CFS-vxfsckd SG-CFS-cmvxd SG-CFS-cmvxpingd SWITCHING enabled Script_Parameters: ITEM STATUS MAX_RESTARTS Service up 0 Service up 5 Service up 5 Service up 0 Service up 0 RESTARTS 0 0 0 0 0 NAME SG-CFS-vxconfigd SG-CFS-sgcvmd SG-CFS-vxfsckd SG-CFS-cmvxd SG-CF
PAGE 305
Cluster and Package Maintenance Reviewing Cluster and Package Status Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled NODE ftsys10 STATUS up Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up NAME ftsys9 ftsys10 (current) STATE running PATH 28.1 32.
PAGE 306
Cluster and Package Maintenance Reviewing Cluster and Package Status Status After Moving the Package to Another Node After issuing the following command: # cmrunpkg -n ftsys9 pkg2 the output of the cmviewcl -v command is as follows: CLUSTER example NODE ftsys9 STATUS up STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 56/36.
PAGE 307
Cluster and Package Maintenance Reviewing Cluster and Package Status Failover Failback configured_node manual Script_Parameters: ITEM STATUS MAX_RESTARTS RESTARTS NAME Service up 0 0 service2.1 Subnet up 15.13.168.0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING NAME Primary up enabled ftsys10 Alternate up enabled ftsys9 (current) NODE ftsys10 STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 28.1 32.1 NAME lan0 lan1 Now pkg2 is running on node ftsys9.
PAGE 308
Cluster and Package Maintenance Reviewing Cluster and Package Status Status After Halting a Node After halting ftsys10, with the following command: # cmhaltnode ftsys10 the output of cmviewcl is as follows on ftsys9: CLUSTER example STATUS up NODE ftsys9 STATUS up PACKAGE pkg1 pkg2 NODE ftsys10 STATE running STATUS up up STATUS down STATE running running AUTO_RUN enabled enabled NODE ftsys9 ftsys9 STATE halted This output is seen on both ftsys9 and ftsys10.
PAGE 309
Cluster and Package Maintenance Reviewing Cluster and Package Status Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled Alternate up enabled Alternate up enabled NAME manx burmese tabby persian Viewing Data on System multi-node packages The following example shows a cluster that includes system multi-node packages as well as standard Serviceguard packages.
PAGE 310
Cluster and Package Maintenance Reviewing Cluster and Package Status NOTE CFS is supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
PAGE 311
Cluster and Package Maintenance Reviewing Cluster and Package Status PACKAGE STATUS STATE AUTO_RUN SYSTEM SG-CFS-pkg up running enabled yes NODE_NAME STATUS SWITCHING soy up enabled Script_Parameters: ITEM STATUS MAX_RESTARTS RESTARTS NAME Service up 0 0 SG-CFS-vxconfigd Service up 5 0 SG-CFS-sgcvmd Service up 5 0 SG-CFS-vxfsckd Service up 0 0 SG-CFS-cmvxd Service up 0 0 SG-CFS-cmvxpingd NODE_NAME STATUS SWITCHING tofu up enabled Script_Parameters: ITEM STATUS MAX_RESTARTS RESTARTS NAME Service up 0 0 SG-CF
PAGE 312
Cluster and Package Maintenance Reviewing Cluster and Package Status Status of CFS mount point packages To see the status of the mount point package, use the cfsmntadm display command. For example, for the mount point /tmp/logdata/log_files, enter: # cfsmntadm display -v /tmp/logdata/log_files Mount Point : /tmp/logdata/log_files Shared Volume : lvol1 Disk Group : logdata To see which package is monitoring a mount point, use the cfsmntadm show_package command.
PAGE 313
Cluster and Package Maintenance Managing the Cluster and Nodes Managing the Cluster and Nodes Managing the cluster involves the following tasks: • Starting the Cluster When All Nodes are Down • Adding Previously Configured Nodes to a Running Cluster • Removing Nodes from Operation in a Running Cluster • Halting the Entire Cluster In Serviceguard A.11.
PAGE 314
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Commands to Start the Cluster Use the cmruncl command to start the cluster when all cluster nodes are down. Particular command options can be used to start the cluster under specific circumstances.
PAGE 315
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Commands to Add Previously Configured Nodes to a Running Cluster Use the cmrunnode command to join one or more nodes to an already running cluster. Any node you add must already be a part of the cluster configuration. The following example adds node ftsys8 to the cluster that was just started with only nodes ftsys9 and ftsys10.
PAGE 316
Cluster and Package Maintenance Managing the Cluster and Nodes NOTE HP recommends that you remove a node from participation in the cluster (by running cmhaltnode, or Halt Node in Serviceguard Manger) before running the HP-UX shutdown command, especially in cases in which a packaged application might have trouble during shutdown and not halt cleanly. Using Serviceguard Commands to Remove a Node from Participation in the Cluster Use the cmhaltnode command to halt one or more nodes in a cluster.
PAGE 317
Cluster and Package Maintenance Managing the Cluster and Nodes This halts all nodes that are configured in the cluster. Automatically Restarting the Cluster You can configure your cluster to automatically restart after an event, such as a long-term power failure, which brought down all nodes in the cluster. This is done by setting AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file.
PAGE 318
Cluster and Package Maintenance Managing Packages and Services Managing Packages and Services Managing packages and services involves the following tasks: • Starting a Package • Halting a Package • Moving a Package (halt, then start) • Changing Package Switching Behavior In Serviceguard A.11.16 and later, these commands can be done by non-root users, according to access policies in the cluster’s configuration files.
PAGE 319
Cluster and Package Maintenance Managing Packages and Services This starts up the package on ftsys9, then enables package switching. This sequence is necessary when a package has previously been halted on some node, since halting the package disables switching. On systems that support VERITAS Cluster File System and Cluster Volume Manager, use the cfs admin commands, listed in Appendix A, to start the special-purpose multi-node packages used with CVM and CFS.
PAGE 320
Cluster and Package Maintenance Managing Packages and Services Using Serviceguard Commands to Halt a Package Use the cmhaltpkg command to halt a package, as follows: # cmhaltpkg pkg1 This halts pkg1. If pkg1 is a failover package, it will also disables it from switching to another node. Before halting a package, it is a good idea to use the cmviewcl command to check for package dependencies. You cannot halt a package unless all packages that depend on it are down.
PAGE 321
Cluster and Package Maintenance Managing Packages and Services Changing the Switching Behavior of Failover Packages There are two types of switching flags: • package switching is enabled (YES) or disabled (NO) for the package. • node switching is enabled (YES) or disabled (NO) on individual nodes. For failover packages, if package switching is NO the package cannot move to any other node. If node switching is NO, the package cannot move to that particular node.
PAGE 322
Cluster and Package Maintenance Managing Packages and Services # cmmodpkg -d -n lptest3 pkg1 To permanently disable switching so that the next time the cluster restarts, the change you made in package switching is still in effect, you must change the AUTO_RUN flag in the package configuration file, then re-apply the configuration. (Any change made this way will take effect the next time the cluster is restarted.
PAGE 323
Cluster and Package Maintenance Reconfiguring a Cluster Reconfiguring a Cluster You can reconfigure a cluster either when it is halted or while it is still running. Some operations can only be done when the cluster is halted. Table 7-1 shows the required cluster state for many kinds of changes.
PAGE 324
Cluster and Package Maintenance Reconfiguring a Cluster Table 7-1 Types of Changes to Permanent Cluster Configuration Change to the Cluster Configuration Required Cluster State Change NETWORK_FAILURE_DETECTION parameter (see “Monitoring LAN Interfaces and Detecting Failure” on page 104) Cluster can be running. Change Access Control Policy (Serviceguard A.11.16 or later) Cluster and package can be running or halted.
PAGE 325
Cluster and Package Maintenance Reconfiguring a Cluster To update the values of the FIRST_CLUSTER_LOCK_PV and SECOND_CLUSTER_LOCK_PV parameters without bringing down the cluster, proceed as follows: Step 1. Halt the node (cmhaltnode) on which you want to make the changes. Step 2. In the cluster configuration ASCII file, modify the values of FIRST_CLUSTER_LOCK_PV and SECOND_CLUSTER_LOCK_PV for this node. Step 3. Run cmcheckconf to check the configuration. Step 4. Run cmapplyconf to apply the configuration.
PAGE 326
Cluster and Package Maintenance Reconfiguring a Cluster • If you are replacing, rather than removing, the interface, you do not need to bring down the cluster, and you may be able to do the replacement online; see “Replacing LAN or Fibre Channel Cards” on page 354. Reconfiguring a Halted Cluster You can make a permanent change in cluster configuration when the cluster is halted.
PAGE 327
Cluster and Package Maintenance Reconfiguring a Cluster Using Serviceguard Commands to Change MAX_CONFIGURED_ PACKAGES As of Serviceguard A.11.17, you can change MAX_CONFIGURED_PACKAGES while the cluster is running. The default is the maximum number allowed in the cluster. Use the cmgetconf command to obtain a current copy of the cluster's existing configuration. Example: # cmgetconf -c cluster_name clconfig.ascii Edit the clconfig.ascii file to include the desired value for MAX_CONFIGURED_PACKAGES.
PAGE 328
Cluster and Package Maintenance Reconfiguring a Cluster Changes to the package configuration are described in a later section. You can use Serviceguard Manager to add nodes to a running cluster, or use Serviceguard commands as shown below. Using Serviceguard Commands to Add Nodes to the Configuration While the Cluster is Running Use the following procedure to add a node with HP-UX commands.
PAGE 329
Cluster and Package Maintenance Reconfiguring a Cluster NOTE If you add a node to a running cluster that uses VERITAS CVM disk groups (on systems that support CVM), the disk groups will be available for import when the node joins the cluster. To add a node to the cluster, it must already have connectivity to the disk devices for all CVM disk groups.
PAGE 330
Cluster and Package Maintenance Reconfiguring a Cluster 1. Use the following command to store a current copy of the existing cluster configuration in a temporary file: # cmgetconf -c cluster1 temp.ascii 2. Specify the new set of nodes to be configured (omitting ftsys10) and generate a template of the new configuration: # cmquerycl -C clconfig.ascii -c cluster1 -n ftsys8 -n ftsys9 3. Edit the file clconfig.ascii to check the information about the nodes that remain in the cluster. 4.
PAGE 331
Cluster and Package Maintenance Reconfiguring a Cluster NOTE If you are removing a volume group from the cluster configuration, make sure that you also modify or delete any package control script that activates and deactivates this volume group. In addition, you should use the LVM vgexport command on the removed volume group from each node that will no longer be using the volume group.
PAGE 332
Cluster and Package Maintenance Reconfiguring a Cluster Create CVM disk groups from the CVM Master Node: • For CVM 3.5, and for CVM 4.1 without CFS, edit the configuration ASCII file of the package that uses CVM storage. Add the CVM storage group in a STORAGE_GROUP statement. Then issue the cmapplyconf command. • For CVM 4.1 with CFS, edit the configuration ASCII file of the package that uses CFS. Fill in the three-part DEPENDENCY parameter. Then issue the cmapplyconf command.
PAGE 333
Cluster and Package Maintenance Reconfiguring a Package Reconfiguring a Package The process of reconfiguration of a package is somewhat like the basic configuration described in Chapter 6. Refer to that chapter for details on the configuration process. The cluster can be either halted or running during package reconfiguration. The types of changes that can be made and the times when they take effect depend on whether the package is running or not.
PAGE 334
Cluster and Package Maintenance Reconfiguring a Package # cmcheckconf -v -P pkg1.ascii 5. Distribute your changes to all nodes: # cmapplyconf -v -P pkg1.ascii 6. Copy the package control script to all nodes that can run the package. Reconfiguring a Package on a Halted Cluster You can also make permanent changes in package configuration while the cluster is not running. Use the same steps as in “Reconfiguring a Package on a Running Cluster” on page 333.
PAGE 335
Cluster and Package Maintenance Reconfiguring a Package Deleting a Package from a Running Cluster Serviceguard will not allow you to delete a package if any other package is dependent on it. To check for dependencies, use the cmviewcl -v -l package command. System multi-node packages cannot be deleted from a running cluster. You can use Serviceguard Manager to delete the package.
PAGE 336
Cluster and Package Maintenance Reconfiguring a Package This disassociates the mount point from the cluster. When there is a single VG associated with the mount point, the disk group package will also be removed 4. Remove the disk group package from the cluster. This disassociates the disk group from the cluster.
PAGE 337
Cluster and Package Maintenance Reconfiguring a Package -R -s command, thus enabling the service in future restart situations to have the full number of restart attempts up to the configured SERVICE_RESTART count. Example: # cmmodpkg -R -s myservice pkg1 The current value of the restart counter may be seen in the output of the cmviewcl -v command. Allowable Package States During Reconfiguration All nodes in the cluster must be powered up and accessible when making configuration changes.
PAGE 338
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package 338 Required Package State Add a volume group Volume group may be configured into the cluster while the cluster is running. The package may be in any state, because the change is made in the control script. However, the package must be halted and restarted for the change to have an effect. Remove a volume group Package must not be running.
PAGE 339
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package NOTE Chapter 7 Required Package State Change the Package Failback Policy Package may be either running or halted. Change access policy Package may be either running or halted.
PAGE 340
Cluster and Package Maintenance Responding to Cluster Events Responding to Cluster Events Serviceguard does not require much ongoing system administration intervention. As long as there are no failures, your cluster will be monitored and protected. In the event of a failure, those packages that you have designated to be transferred to another node will be transferred automatically.
PAGE 341
Cluster and Package Maintenance Removing Serviceguard from a System Removing Serviceguard from a System If you wish to remove a node from Serviceguard use, use the swremove command to delete the software. If you issue the swremove command on a server that is still a member of a cluster, however, it will cause that cluster to halt, and the cluster to be deleted. To remove Serviceguard: 1. If the node is an active member of a cluster, halt the node first. 2.
PAGE 342
Cluster and Package Maintenance Removing Serviceguard from a System 342 Chapter 7
PAGE 343
Troubleshooting Your Cluster 8 Troubleshooting Your Cluster This chapter describes how to verify cluster operation, how to review cluster status, how to add and replace hardware, and how to solve some typical cluster problems.
PAGE 344
Troubleshooting Your Cluster Testing Cluster Operation Testing Cluster Operation Once you have configured your Serviceguard cluster, you should verify that the various components of the cluster behave correctly in case of a failure. In this section, the following procedures test that the cluster responds properly in the event of a package failure, a node failure, or a LAN failure.
PAGE 345
Troubleshooting Your Cluster Testing Cluster Operation 4. Move the package back to the primary node (see “Moving a Failover Package” on page 320). Testing the Cluster Manager To test that the cluster manager is operating correctly, perform the following steps for each node on the cluster: 1. Turn off the power to the node SPU. 2.
PAGE 346
Troubleshooting Your Cluster Testing Cluster Operation 3. Verify that a local switch has taken place so that the Standby card is now the Primary card. In Serviceguard Manager, check the cluster properties. Or, on the command line, use the cmviewcl -v command. 4. Reconnect the LAN to the original Primary card, and verify its status. In Serviceguard Manager, check the cluster properties. Or, on the command line, use the cmviewcl -v command.
PAGE 347
Troubleshooting Your Cluster Monitoring Hardware Monitoring Hardware Good standard practice in handling a high availability system includes careful fault monitoring so as to prevent failures if possible or at least to react to them swiftly when they occur.
PAGE 348
Troubleshooting Your Cluster Monitoring Hardware action in case of a problem. For example, you could configure a disk monitor to report when a mirror was lost from a mirrored volume group being used in the cluster Refer to the manual Using High Availability Monitors for additional information. Using EMS (Event Monitoring Service) Hardware Monitors A set of hardware monitors is available for monitoring and reporting on memory, CPU, and many other system values.
PAGE 349
Troubleshooting Your Cluster Monitoring Hardware HP ISEE also reports error conditions directly to an HP Response Center, alerting support personnel to the potential problem. HP ISEE is available through various support contracts. For more information, contact your HP representative.
PAGE 350
Troubleshooting Your Cluster Replacing Disks Replacing Disks The procedure for replacing a faulty disk mechanism depends on the type of disk configuration you are using. Separate descriptions are provided for replacing an array mechanism and a disk in a high availability enclosure. For more information, see the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, at http://docs.hp.
PAGE 351
Troubleshooting Your Cluster Replacing Disks command to reassign the existing DSF to the new device. See the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, posted at http://docs.hp.com. 2. Identify the names of any logical volumes that have extents defined on the failed physical volume. 3.
PAGE 352
Troubleshooting Your Cluster Replacing Disks NOTE Under agile addressing, the physical device in this example would have a name such as /dev/disk/disk1. See “About Device File Names (Device Special Files)” on page 115. Serviceguard checks the lock disk on an hourly basis. After the vgcfgrestore command, review the syslog file of an active cluster node for not more than one hour. Then look for a message showing that the lock disk is healthy again.
PAGE 353
Troubleshooting Your Cluster Replacing I/O Cards Replacing I/O Cards Replacing SCSI Host Bus Adapters After a SCSI Host Bus Adapter (HBA) card failure, you can replace the card using the following steps. Normally disconnecting any portion of the SCSI bus will leave the SCSI bus in an unterminated state, which will cause I/O errors for other nodes connected to that SCSI bus, so the cluster would need to be halted before disconnecting any portion of the SCSI bus.
PAGE 354
Troubleshooting Your Cluster Replacing LAN or Fibre Channel Cards Replacing LAN or Fibre Channel Cards If a LAN or fibre channel card fails and the card has to be replaced, you can replace it on-line or off-line depending on the type of hardware and operating system you are running. It is not necessary to bring the cluster down to do this. Off-Line Replacement Follow these steps to replace an I/O card off-line. 1. Halt the node by using the cmhaltnode command. 2.
PAGE 355
Troubleshooting Your Cluster Replacing LAN or Fibre Channel Cards After Replacing the Card After the on-line or off-line replacement of LAN cards has been done, Serviceguard will detect that the MAC address (LLA) of the card has changed from the value stored in the cluster binary configuration file, and it will notify the other nodes in the cluster of the new MAC address. The cluster will operate normally after this.
PAGE 356
Troubleshooting Your Cluster Replacing a Failed Quorum Server System Replacing a Failed Quorum Server System When a quorum server fails or becomes unavailable to the clusters it is providing quorum services for, this will not cause a failure on any cluster. However, the loss of the quorum server does increase the vulnerability of the clusters in case there is an additional failure. Use the following procedure to replace a defective quorum server system.
PAGE 357
Troubleshooting Your Cluster Replacing a Failed Quorum Server System The command will output an error message if the specified nodes cannot communicate with the quorum server.
PAGE 358
Troubleshooting Your Cluster Troubleshooting Approaches Troubleshooting Approaches The following sections offer a few suggestions for troubleshooting by reviewing the state of the running system and by examining cluster status data, log files, and configuration files.
PAGE 359
Troubleshooting Your Cluster Troubleshooting Approaches IPv6: Name lan1* lo0 Mtu Address/Prefix 1500 none 4136 ::1/128 Ipkts Opkts 0 10690 0 10690 Reviewing the System Log File Messages from the Cluster Manager and Package Manager are written to the system log file. The default location of the log file is /var/adm/syslog/syslog.log. Also, package-related messages are logged into the package log file. The package log file is located in the package directory, by default.
PAGE 360
Troubleshooting Your Cluster Troubleshooting Approaches Dec 14 14:34:44 star04 cmcld[2048]: Request from node star04 to start package pkg5 on node star04. Dec 14 14:34:44 star04 cmcld[2048]: Executing '/etc/cmcluster/pkg5/pkg5_run start' for package pkg5. Dec 14 14:34:45 star04 LVM[2066]: vgchange -a n /dev/vg02 Dec 14 14:34:45 star04 cmcld[2048]: Package pkg5 run script exited with NO_RESTART. Dec 14 14:34:45 star04 cmcld[2048]: Examine the file /etc/cmcluster/pkg5/pkg5_run.log for more details.
PAGE 361
Troubleshooting Your Cluster Troubleshooting Approaches Reviewing the System Multi-node Package Files If you are running VERITAS Cluster Volume Manager (supported on some versions of HP-UX), and you have problems starting the cluster, check the log file for the system multi-node package. For Cluster Volume Manager (CVM) 3.5, the file is VxVM-CVM-pkg.log. For CVM 4.1, the file is SG-CFS-pkg.log. Reviewing Configuration Files Review the following ASCII configuration files: • Cluster configuration file.
PAGE 362
Troubleshooting Your Cluster Troubleshooting Approaches # cmquerycl -v -C /etc/cmcluster/verify.ascii -n ftsys9 -n ftsys10 # cmcheckconf -v -C /etc/cmcluster/verify.ascii The cmcheckconf command checks: • The network addresses and connections. • The cluster lock disk connectivity. • The validity of configuration parameters of the cluster and packages for: — The uniqueness of names. — The existence and permission of scripts. It doesn’t check: • The correct setup of the power circuits.
PAGE 363
Troubleshooting Your Cluster Troubleshooting Approaches Table 8-1 Data Displayed by the cmscancl Command (Continued) Description Source of Data file systems mount command LVM configuration /etc/lvmtab file LVM physical volume group data /etc/lvmpvg file link level connectivity for all links linkloop command binary configuration file cmviewconf command Using the cmviewconf Command cmviewconf allows you to examine the binary cluster configuration file, even when the cluster is not running.
PAGE 364
Troubleshooting Your Cluster Troubleshooting Approaches • cmscancl can be used to verify that primary and standby LANs are on the same bridged net. • cmviewcl -v shows the status of primary and standby LANs. Use these commands on all nodes.
PAGE 365
Troubleshooting Your Cluster Solving Problems Solving Problems Problems with Serviceguard may be of several types. The following is a list of common categories of problem: • Serviceguard Command Hangs. • Cluster Re-formations. • System Administration Errors. • Package Control Script Hangs. • Problems with VxVM Disk Groups. • Package Movement Errors. • Node and Network Failures. • Quorum Server Problems.
PAGE 366
Troubleshooting Your Cluster Solving Problems Name: ftsys9.cup.hp.com Address: 15.13.172.229 If the output of this command does not include the correct IP address of the node, then check your name resolution services further. Cluster Re-formations Cluster re-formations may occur from time to time due to current cluster conditions. Some of the causes are as follows: • local switch on an Ethernet LAN if the switch takes longer than the cluster NODE_TIMEOUT value.
PAGE 367
Troubleshooting Your Cluster Solving Problems You can use the following commands to check the status of your disks: • bdf - to see if your package's volume group is mounted. • vgdisplay -v - to see if all volumes are present. • lvdisplay -v - to see if the mirrors are synchronized. • strings /etc/lvmtab - to ensure that the configuration is correct. • ioscan -fnC disk - to see physical disks. • diskinfo -v /dev/rdsk/cxtydz - to display information about a disk.
PAGE 368
Troubleshooting Your Cluster Solving Problems NOTE Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
PAGE 369
Troubleshooting Your Cluster Solving Problems Next, deactivate the package volume groups. These are specified by the VG[] array entries in the package control script. # vgchange -a n 4. Finally, re-enable the package for switching.
PAGE 370
Troubleshooting Your Cluster Solving Problems 2. b - vxfen 3. v w - cvm 4. f - cfs Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
PAGE 371
Troubleshooting Your Cluster Solving Problems This can happen if a package is running on a node which then fails before the package control script can deport the disk group. In these cases, the host name of the node that had failed is still written on the disk group header.
PAGE 372
Troubleshooting Your Cluster Solving Problems The force import will clear the host name currently written on the disks in the disk group, after which you can deport the disk group without error so it can then be imported by a package running on a different node. CAUTION This force import procedure should only be used when you are certain the disk is not currently being accessed by another node. If you force import a disk that is already being accessed on another node, data corruption can result.
PAGE 373
Troubleshooting Your Cluster Solving Problems In the event of a TOC, a system dump is performed on the failed node and numerous messages are also displayed on the console. You can use the following commands to check the status of your network and subnets: • netstat -in - to display LAN status and check to see if the package IP is stacked on the LAN card. • lanscan - to see if the LAN is on the primary interface or has switched to the standby interface. • arp -a - to check the arp tables.
PAGE 374
Troubleshooting Your Cluster Solving Problems The following kind of message in a Serviceguard node’s syslog file indicates that the node did not receive a reply to it's lock request on time. This could be because of delay in communication between the node and the qs or between the qs and other nodes in the cluster: Attempt to get lock /sg/cluser1 unsuccessful. Reason: request_timedout Messages The coordinator node in Serviceguard sometimes sends a request to the quorum server to set the lock state.
PAGE 375
Serviceguard Commands A Serviceguard Commands The following is an alphabetical list of commands used for Serviceguard cluster configuration and maintenance. Manpages for these commands are available on your system after installation. NOTE VERITAS Cluster Volume Manager (CVM) and Cluster File System (CFS) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.
PAGE 376
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cfsdgadm Description • Display the status of CFS disk groups. • Add shared disk groups to a VERITAS Cluster File System CFS cluster configuration, or remove existing CFS disk groups from the configuration. Serviceguard automatically creates the multi-node package SG-CFS-DG-id# to regulate the disk groups. This package has a dependency on the SG-CFS-pkg created by cfscluster command.
PAGE 377
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf Description Verify and apply Serviceguard cluster configuration and package configuration files. cmapplyconf verifies the cluster configuration and package configuration specified in the cluster_ascii_file and the associated pkg_ascii_file(s), creates or updates the binary configuration file, called cmclconfig, and distributes it to all nodes.
PAGE 378
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf (continued) Description It is recommended that the user run the cmgetconf command to get either the cluster ASCII configuration file or package ASCII configuration file whenever changes to the existing configuration are required. Note that cmapplyconf will verify and distribute cluster configuration or package files. It will not cause the cluster daemon to start or removed from the cluster configuration.
PAGE 379
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmdeleteconf Description Delete either the cluster or the package configuration. cmdeleteconf deletes either the entire cluster configuration, including all its packages, or only the specified package configuration. If neither cluster_name nor package_name is specified, cmdeleteconf will delete the local cluster’s configuration and all its packages.
PAGE 380
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltcl Description Halt a high availability cluster. cmhaltcl causes all nodes in a configured cluster to stop their cluster daemons, optionally halting all packages or applications in the process. This command will halt all the daemons on all currently running systems. If the user only wants to shutdown a subset of daemons, the cmhaltnode command should be used instead. cmhaltnode Halt a node in a high availability cluster.
PAGE 381
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltserv Description Halt a service from the high availability package halt script. This is not a command line executable command, it runs only from within the package control script. cmhaltserv is used in the high availability package halt script to halt a service. If any part of package is marked down, the package halt script is executed as part of the recovery process.
PAGE 382
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmmodnet Description Add or remove an address from a high availability cluster. cmmodnet is used in the high availability package control scripts to add or remove an IP_address from the current network interface running the given subnet_name. Extreme caution should be exercised when executing this command outside the context of the package control script.
PAGE 383
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmquerycl Description Query cluster or node configuration information. cmquerycl searches all specified nodes for cluster configuration and Logical Volume Manager (LVM) information. Cluster configuration information includes network information such as LAN interface, IP addresses, bridged networks and possible heartbeat networks. LVM information includes volume group (VG) interconnection and file system mount point information.
PAGE 384
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmrunnode Description Run a node in a high availability cluster. cmrunnode causes a node to start its cluster daemon to join the existing cluster Starting a node will not cause any active packages to be moved to the new node. However, if a package is DOWN, has its switching enabled, and is able to run on the new node, that package will automatically run there. cmrunpkg Run a high availability package.
PAGE 385
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmrunserv Description Run a service from the high availability package run script. This is not a command line executable command, it runs only from within the package control script. cmrunserv is used in the high availability package run script to run a service. If the service process dies, cmrunserv updates the status of the service to down.
PAGE 386
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmscancl Description Gather system configuration information from nodes with Serviceguard installed. cmscancl is a configuration report and diagnostic tool which gathers system software and hardware configuration information from a list of nodes, or from all the nodes in a cluster.
PAGE 387
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmstartres Description This command is run by package control scripts, and not by users! Starts resource monitoring on the local node for an EMS resource that is configured in a Serviceguard package. cmstartres starts resource monitoring for an EMS resource on the local node. This resource must be configured in the specified package_name.
PAGE 388
Serviceguard Commands 388 Appendix A
PAGE 389
Enterprise Cluster Master Toolkit B Enterprise Cluster Master Toolkit The Enterprise Cluster Master Toolkit (ECMT) provides a group of example scripts and package configuration files for creating Serviceguard packages for several major database and internet software products. Each toolkit contains a README file that explains how to customize the package for your needs. The ECMT can be installed on HP-UX 11i v1, 11i v2, or 11i v3.
PAGE 390
Enterprise Cluster Master Toolkit 390 Appendix B
PAGE 391
Designing Highly Available Cluster Applications C Designing Highly Available Cluster Applications This appendix describes how to create or port applications for high availability, with emphasis on the following topics: • Automating Application Operation • Controlling the Speed of Application Failover • Designing Applications to Run on Multiple Systems • Restoring Client Connections • Handling Application Failures • Minimizing Planned Downtime Designing for high availability means reducing the
PAGE 392
Designing Highly Available Cluster Applications Automating Application Operation Automating Application Operation Can the application be started and stopped automatically or does it require operator intervention? This section describes how to automate application operations to avoid the need for user intervention. One of the first rules of high availability is to avoid manual intervention.
PAGE 393
Designing Highly Available Cluster Applications Automating Application Operation Define Application Startup and Shutdown Applications must be restartable without manual intervention. If the application requires a switch to be flipped on a piece of hardware, then automated restart is impossible. Procedures for application startup, shutdown and monitoring must be created so that the HA software can perform these functions automatically.
PAGE 394
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Controlling the Speed of Application Failover What steps can be taken to ensure the fastest failover? If a failure does occur causing the application to be moved (failed over) to another node, there are many things the application can do to reduce the amount of time it takes to get the application back up and running.
PAGE 395
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Use Raw Volumes If your application uses data, use raw volumes rather than filesystems. Raw volumes do not require an fsck of the filesystem, thus eliminating one of the potentially lengthy steps during a failover. Evaluate the Use of JFS If a file system must be used, a JFS offers significantly faster file system recovery as compared to an HFS. However, performance of the JFS may vary with the application.
PAGE 396
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Keep Logs Small Some databases permit logs to be buffered in memory to increase online performance. Of course, when a failure occurs, any in-flight transaction will be lost. However, minimizing the size of this in-memory log will reduce the amount of completed transaction data that would be lost in case of failure.
PAGE 397
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Another example is an application where a clerk is entering data about a new employee. Suppose this application requires that employee numbers be unique, and that after the name and number of the new employee is entered, a failure occurs.
PAGE 398
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Design for Multiple Servers If you use multiple active servers, multiple service points can provide relatively transparent service to a client. However, this capability requires that the client be smart enough to have knowledge about the multiple servers and the priority for addressing them. It also requires access to the data of the failed server or replicated data.
PAGE 399
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Designing Applications to Run on Multiple Systems If an application can be failed to a backup node, how will it work on that different system? The previous sections discussed methods to ensure that an application can be automatically restarted. This section will discuss some ways to ensure the application can run on multiple systems.
PAGE 400
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Each application or package should be given a unique name as well as a relocatable IP address. Following this rule separates the application from the system on which it runs, thus removing the need for user knowledge of which system the application runs on. It also makes it easier to move the application among different systems in a cluster for for load balancing or other reasons.
PAGE 401
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Avoid Using SPU IDs or MAC Addresses Design the application so that it does not rely on the SPU ID or MAC (link-level) addresses. The SPU ID is a unique hardware ID contained in non-volatile memory, which cannot be changed. A MAC address (also known as a LANIC id) is a link-specific address associated with the LAN hardware.
PAGE 402
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Applications should not reference official hostnames or IP addresses. The official hostname and corresponding IP address for the hostname refer to the primary LAN card and the stationary IP address for that card.
PAGE 403
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems develop alternate means of verifying where they are running. For example, an application might check a list of hostnames that have been provided in a configuration file. Bind to a Fixed Port When binding a socket, a port address can be specified or one can be assigned dynamically.
PAGE 404
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems For TCP stream sockets, the TCP level of the protocol stack resolves this problem for the client since it is a connection-based protocol. On the client, TCP ignores the stationary IP address and continues to use the previously bound relocatable IP address originally used by the client. With UDP datagram sockets, however, there is a problem.
PAGE 405
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems applications must move together. If the applications’ data stores are in separate volume groups, they can switch to different nodes in the event of a failover. The application data should be set up on different disk drives and if applicable, different mount points. The application should be designed to allow for different disks and separate mount points.
PAGE 406
Designing Highly Available Cluster Applications Restoring Client Connections Restoring Client Connections How does a client reconnect to the server after a failure? It is important to write client applications to specifically differentiate between the loss of a connection to the server and other application-oriented errors that might be returned. The application should take special action in case of connection loss.
PAGE 407
Designing Highly Available Cluster Applications Restoring Client Connections the retry to the current server should continue for the amount of time it takes to restart the server locally. This will keep the client from having to switch to the second server in the event of a application failure. • Use a transaction processing monitor or message queueing software to increase robustness.
PAGE 408
Designing Highly Available Cluster Applications Handling Application Failures Handling Application Failures What happens if part or all of an application fails? All of the preceding sections have assumed the failure in question was not a failure of the application, but of another component of the cluster. This section deals specifically with application problems.
PAGE 409
Designing Highly Available Cluster Applications Handling Application Failures ensure that the application is behaving correctly. If the application fails and it is not detected automatically, it might take hours for a user to determine the cause of the downtime and recover from it.
PAGE 410
Designing Highly Available Cluster Applications Minimizing Planned Downtime Minimizing Planned Downtime Planned downtime (as opposed to unplanned downtime) is scheduled; examples include backups, systems upgrades to new operating system revisions, or hardware replacements. For planned downtime, application designers should consider: • Reducing the time needed for application upgrades/patches.
PAGE 411
Designing Highly Available Cluster Applications Minimizing Planned Downtime Provide for Rolling Upgrades Provide for a “rolling upgrade” in a client/server environment. For a system with many components, the typical scenario is to bring down the entire system, upgrade every node to the new version of the software, and then restart the application on all the affected nodes. For large systems, this could result in a long downtime. An alternative is to provide for a rolling upgrade.
PAGE 412
Designing Highly Available Cluster Applications Minimizing Planned Downtime Providing Online Application Reconfiguration Most applications have some sort of configuration information that is read when the application is started. If to make a change to the configuration, the application must be halted and a new configuration file read, downtime is incurred. To avoid this downtime use configuration tools that interact with an application and make dynamic changes online.
PAGE 413
Integrating HA Applications with Serviceguard D Integrating HA Applications with Serviceguard The following is a summary of the steps you should follow to integrate an application into the Serviceguard environment: 1. Read the rest of this book, including the chapters on cluster and package configuration, and the Appendix “Designing Highly Available Cluster Applications.” 2.
PAGE 414
Integrating HA Applications with Serviceguard NOTE 414 • Can the application be installed cluster-wide? • Does the application work with a cluster-wide file name space? • Will the application run correctly with the data (file system) available on all nodes in the cluster? This includes being available on cluster nodes where the application is not currently running.
PAGE 415
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Checklist for Integrating HA Applications This section contains a checklist for integrating HA applications in both single and multiple systems. Defining Baseline Application Behavior on a Single System Define a baseline behavior for the application on a standalone system: 1. Install the application, database, and other required resources on one of the systems.
PAGE 416
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications c. Install the appropriate executables. d. With the application not running on the first system, try to bring it up on the second system. You might use the script you created in the step above. Is there anything different that you must do? Does it run? e. Repeat this process until you can get the application to run on the second system. 2. Configure the Serviceguard cluster: a. Create the cluster configuration. b.
PAGE 417
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications NOTE Appendix D CVM and CFS are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
PAGE 418
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Testing the Cluster 1. Test the cluster: • Have clients connect. • Provide a normal system load. • Halt the package on the first node and move it to the second node: # cmhaltpkg pkg1 # cmrunpkg -n node2 pkg1 # cmmodpkg -e pkg1 • Move it back. # cmhaltpkg pkg1 # cmrunpkg -n node1 pkg1 # cmmodpkg -e pkg1 • Fail one of the systems. For example, turn off the power on node 1.
PAGE 419
Software Upgrades E Software Upgrades There are three types of upgrade you can do: • rolling upgrade • non-rolling upgrade • migration with cold install Each of these is discussed below.
PAGE 420
Software Upgrades Types of Upgrade Types of Upgrade Rolling Upgrade In a rolling upgrade, you upgrade the HP-UX operating system (if necessary) and the Serviceguard software one node at a time without bringing down your cluster. A rolling upgrade can also be done any time one system needs to be taken offline for hardware maintenance or patch installations. This method is the least disruptive, but your cluster must meet both general and release-specific requirements.
PAGE 421
Software Upgrades Guidelines for Rolling Upgrade Guidelines for Rolling Upgrade You can normally do a rolling upgrade if: • You are not upgrading the nodes to a new version of HP-UX; or • You are upgrading to a new version of HP-UX, but using the update process (update-ux), rather than a cold install. update-ux supports many, but not all, upgrade paths. For more information, see the HP-UX Installation and Upgrade Guide for the target version of HP-UX.
PAGE 422
Software Upgrades Performing a Rolling Upgrade Performing a Rolling Upgrade Limitations of Rolling Upgrades The following limitations apply to rolling upgrades: • During a rolling upgrade, you should issue Serviceguard commands (other than cmrunnode and cmhaltnode) only on a node containing the latest revision of the software. Performing tasks on a node containing an earlier revision of the software will not work or will cause inconsistent results.
PAGE 423
Software Upgrades Performing a Rolling Upgrade • Rolling upgrades are not intended as a means of using mixed releases of Serviceguard or HP-UX within the cluster. HP strongly recommends that you upgrade all cluster nodes as quickly as possible to the new release level. • You cannot delete Serviceguard software (via swremove) from a node while a rolling upgrade is in progress.
PAGE 424
Software Upgrades Performing a Rolling Upgrade If the cluster fails before the rolling upgrade is complete (because of a catastrophic power failure, for example), you can restart the cluster by entering the cmruncl command from a node which has been upgraded to the latest version of the software. Keeping Kernels Consistent If you change kernel parameters as a part of doing an upgrade, be sure to change the parameters to the same values on all nodes that can run the same packages in case of failover.
PAGE 425
Software Upgrades Example of a Rolling Upgrade Example of a Rolling Upgrade NOTE Warning messages may appear during a rolling upgrade while the node is determining what version of software is running. This is a normal occurrence and not a cause for concern. The following example shows a simple rolling upgrade on two nodes running one package each, as shown in Figure E-1.
PAGE 426
Software Upgrades Example of a Rolling Upgrade This will cause pkg1 to be halted cleanly and moved to node 2. The Serviceguard daemon on node 1 is halted, and the result is shown in Figure E-2. Figure E-2 Running Cluster with Packages Moved to Node 2 Step 2. Upgrade node 1 to the next operating system release (“HP-UX (new)”), and install the next version of Serviceguard (“SG (new)”).
PAGE 427
Software Upgrades Example of a Rolling Upgrade Step 3. When upgrading is finished, enter the following command on node 1 to restart the cluster on node 1. # cmrunnode -n node1 At this point, different versions of the Serviceguard daemon (cmcld) are running on the two nodes, as shown in Figure E-4. Figure E-4 Node 1 Rejoining the Cluster Step 4. Repeat the process on node 2. Halt the node, as follows: # cmhaltnode -f node2 This causes both packages to move to node 1.
PAGE 428
Software Upgrades Example of a Rolling Upgrade Figure E-5 Running Cluster with Packages Moved to Node 1 Step 5. Move pkg2 back to its original node. Use the following commands: # cmhaltpkg pkg2 # cmrunpkg -n node2 pkg2 # cmmodpkg -e pkg2 The cmmodpkg command re-enables switching of the package, which was disabled by the cmhaltpkg command. The final running cluster is shown in Figure E-6.
PAGE 429
Software Upgrades Example of a Rolling Upgrade Figure E-6 Appendix E Running Cluster After Upgrades 429
PAGE 430
Software Upgrades Guidelines for Non-Rolling Upgrade Guidelines for Non-Rolling Upgrade Do a non-rolling upgrade if: • Your cluster does not meet the requirements for rolling upgrade as specified in the Release Notes for the target version of Serviceguard; or • The limitations imposed by rolling upgrades make it impractical for you to do a rolling upgrade (see “Limitations of Rolling Upgrades” on page 422); or • For some other reason you need or prefer to bring the cluster down before performing the u
PAGE 431
Software Upgrades Performing a Non-Rolling Upgrade Performing a Non-Rolling Upgrade Limitations of Non-Rolling Upgrades The following limitations apply to non-rolling upgrades: • Binary configuration files may be incompatible between releases of Serviceguard. Do not manually copy configuration files between nodes. • You must halt the entire cluster before performing a non-rolling upgrade. Steps for Non-Rolling Upgrades Use the following steps for a non-rolling software upgrade: Step 1.
PAGE 432
Software Upgrades Guidelines for Migrating a Cluster with Cold Install Guidelines for Migrating a Cluster with Cold Install There may be circumstances when you prefer to do a cold install of the HP-UX operating system rather than an upgrade. A cold install erases the existing operating system and data and then installs the new operating system and software; you must then restore the data. CAUTION The cold install process erases the existing software, operating system, and data.
PAGE 433
Software Upgrades Guidelines for Migrating a Cluster with Cold Install See “Creating the Storage Infrastructure and Filesystems with LVM and VxVM” on page 208 for more information. 2. Halt the cluster applications, and then halt the cluster. 3. Do a cold install of the HP-UX operating system. For more information on the cold install process, see the HP-UX Installation and Update Guide for the target version of HP-UX: go to http://docs.hp.
PAGE 434
Software Upgrades Guidelines for Migrating a Cluster with Cold Install 434 Appendix E
PAGE 435
Blank Planning Worksheets F Blank Planning Worksheets This appendix reprints blank versions of the planning worksheets described in the chapter “Planning and Documenting an HA Cluster.” You can duplicate any of these worksheets that you find useful and fill them in as a part of the planning process.
PAGE 436
Blank Planning Worksheets Worksheet for Hardware Planning Worksheet for Hardware Planning HARDWARE WORKSHEET Page ___ of ____ =============================================================================== Node Information: Host Name _____________________ Series No _____________________ Memory Capacity ____________________ Number of I/O Slots ________________ =============================================================================== LAN Information: Name of Subnet _________ Name of IP Interface
PAGE 437
Blank Planning Worksheets Power Supply Worksheet Power Supply Worksheet POWER SUPPLY WORKSHEET Page ___ of ____ =============================================================================== SPU Power: Host Name _____________________ Power Supply _______________________ Host Name _____________________ Power Supply _______________________ =============================================================================== Disk Power: Disk Unit __________________________ Power Supply _____________________
PAGE 438
Blank Planning Worksheets Quorum Server Worksheet Quorum Server Worksheet Quorum Server Data: ============================================================================== QS Hostname: _________________IP Address: ______________________ ============================================================================== Quorum Services are Provided for: Cluster Name: ___________________________________________________________ Host Names ____________________________________________ Host Names _________________
PAGE 439
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet LVM Volume Group and Physical Volume Worksheet PHYSICAL VOLUME WORKSHEET Page ___ of ____ =============================================================================== Volume Group Name: ______________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________
PAGE 440
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet Physical Volume Name: _____________________________________________________ 440 Appendix F
PAGE 441
Blank Planning Worksheets VxVM Disk Group and Disk Worksheet VxVM Disk Group and Disk Worksheet DISK GROUP WORKSHEET Page ___ of ____ =========================================================================== Disk Group Name: __________________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:____________________________________________________
PAGE 442
Blank Planning Worksheets Cluster Configuration Worksheet Cluster Configuration Worksheet =============================================================================== Name and Nodes: =============================================================================== Cluster Name: __________________________ RAC Version: _______________ Node Names: _________________________________________________________ Volume Groups (for packages):________________________________________ ===================================
PAGE 443
Blank Planning Worksheets Cluster Configuration Worksheet .
PAGE 444
Blank Planning Worksheets Package Configuration Worksheet Package Configuration Worksheet ============================================================================= Package Configuration File Data: ============================================================================= Package Name: ____________________________ Failover Policy:___________________________ Failback Policy: ____________________________ Primary Node: ______________________________ First Failover Node:_________________________ Addition
PAGE 445
Blank Planning Worksheets Package Configuration Worksheet NOTE Appendix F CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
PAGE 446
Blank Planning Worksheets Package Control Script Worksheet Package Control Script Worksheet LVM Volume Groups: VG[0]_______________VG[1]________________VG[2]________________ VGCHANGE: ______________________________________________ CVM Disk Groups: CVM_DG[0]______________CVM_DG[1]_____________CVM_DG[2]_______________ CVM_ACTIVATION_CMD: ______________________________________________ VxVM Disk Groups: VXVM_DG[0]_____________VXVM_DG[1]____________VXVM_DG[2]_____________ =======================================
PAGE 447
Blank Planning Worksheets Package Control Script Worksheet Deferred Resources: Deferred Resource Name __________________ NOTE Appendix F CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard).
PAGE 448
Blank Planning Worksheets Package Control Script Worksheet 448 Appendix F
PAGE 449
Migrating from LVM to VxVM Data Storage G Migrating from LVM to VxVM Data Storage This appendix describes how to migrate LVM volume groups to VxVM disk groups for use with the VERITAS Volume Manager (VxVM), or with the Cluster Volume Manager (CVM) on systems that support it. Topics are as follows: • Loading VxVM • Migrating Volume Groups • Customizing Packages for VxVM • Customizing Packages for CVM 3.5 and 4.
PAGE 450
Migrating from LVM to VxVM Data Storage Loading VxVM Loading VxVM Before you can begin migrating data, you must install the VERITAS Volume Manager software and all required VxVM licenses on all cluster nodes. This step requires each system to be rebooted, so it requires you to remove the node from the cluster before the installation, and restart the node after installation. This can be done as a part of a rolling upgrade procedure, described in Appendix E.
PAGE 451
Migrating from LVM to VxVM Data Storage Migrating Volume Groups Migrating Volume Groups The following procedure shows how to do the migration of individual volume groups for packages that are configured to run on a given node. It is recommended to convert all the volume groups for a package at the same time. It is assumed that VxVM software and an appropriate level of HP-UX and Serviceguard have been installed on the node, and that the node has rebooted and rejoined the cluster.
PAGE 452
Migrating from LVM to VxVM Data Storage Migrating Volume Groups As an alternative to defining the VxVM disk groups on a new set of disks, it is possible to convert existing LVM volume groups into VxVM disk groups in line using the vxvmconvert(1M) utility. This utility is described along with its limitations and cautions in the VERITAS Volume Manager Migration Guide for your version, available from http://www.docs.hp.com.
PAGE 453
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM Customizing Packages for VxVM After creating the VxVM disk group, you need to customize the Serviceguard package that will access the storage. Use the following procedure for disk groups that will be used with the VERITAS Volume Manager (VxVM). If you are using the Cluster Volume Manager (CVM), skip ahead to the next section. 1. Rename the old package control script as follows: # mv Package.ctl Package.ctl.bak 2.
PAGE 454
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM LV[0]="/dev/vx/dsk/dg01/lvol101" LV[1]="/dev/vx/dsk/dg01/lvol102" LV[2]="/dev/vx/dsk/dg02/lvol201" LV[3]="/dev/vx/dsk/dg02/lvol202" FS[0]="/mnt_dg0101" FS[1]="/mnt_dg0102" FS[2]="/mnt_dg0201" FS[3]="/mnt_dg0202" FS_MOUNT_OPT[0]="-o FS_MOUNT_OPT[1]="-o FS_MOUNT_OPT[2]="-o FS_MOUNT_OPT[3]="-o ro" rw" ro" rw" 4.
PAGE 455
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM 3.5 and 4.1 Customizing Packages for CVM 3.5 and 4.1 NOTE CVM (and CFS - Cluster File System) are supported on some, but not all current releases of HP-UX. Check the latest Release Notes for your version of Serviceguard for up-to-date information (http://www.docs.hp.com -> High Availability - > Serviceguard). After creating the CVM disk group, you need to customize the Serviceguard package that will access the storage.
PAGE 456
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM 3.5 and 4.1 For example lets say we have two volumes defined in each of the two disk groups from above, lvol101 and lvol102, and lvol201 and lvol202. These are mounted on /mnt_dg0101 and /mnt_dg0102, and /mnt_dg0201 and /mnt_dg0202, respectively. /mnt_dg0101 and /mnt_dg0201 are both mounted read-only.
PAGE 457
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM 3.5 and 4.1 9. Deport the disk group: # vxdg deport DiskGroupName 10. Start the cluster, if it is not already running: # cmruncl This will activate the special CVM package. 11. When CVM starts up, it selects a master node, and this is the node from which you must issue the disk group configuration commands.
PAGE 458
Migrating from LVM to VxVM Data Storage Removing LVM Volume Groups Removing LVM Volume Groups After testing the new VxVM disk groups, remove any LVM volume groups that are no longer wanted from the system using the standard LVM commands lvremove, pvremove, and vgremove. At a convenient time, you should also edit the cluster ASCII configuration file to remove the VOLUME_GROUP statements that refer to the LVM volume groups that are no longer used in the cluster.
PAGE 459
IPv6 Network Support H IPv6 Network Support This appendix describes some of the characteristics of IPv6 network addresses.
PAGE 460
IPv6 Network Support IPv6 Address Types IPv6 Address Types Several IPv6 types of addressing schemes are specified in the RFC 2373 (IPv6 Addressing Architecture). IPv6 addresses are 128-bit identifiers for interfaces and sets of interfaces. There are various address formats for IPv6 defined by the RFC 2373. IPv6 addresses are broadly classified as follows: The following table explains the three types of IPv6 address types: unicast, anycast, and multicast.
PAGE 461
IPv6 Network Support IPv6 Address Types multiple groups of 16-bits of zeros. The “::” can appear only once in an address and it can be used to compress the leading, trailing, or contiguous sixteen-bit zeroes in an address. Example: fec0:1:0:0:0:0:0:1234 can be represented as fec0:1::1234. • When dealing with a mixed environment of IPv4 and IPv6 nodes there is an alternative form of IPv6 address that will be used. It is x:x:x:x:x:x:d.d.d.
PAGE 462
IPv6 Network Support IPv6 Address Types Unicast Addresses IPv6 unicast addresses are classified into different types. They are global aggregatable unicast address, site-local address and link-local address. Typically a unicast address is logically divided as follows: Table H-2 n bits 128-n bits Subnet prefix Interface ID Interface identifiers in a IPv6 unicast address are used to identify the interfaces on a link. Interface identifiers are required to be unique on that link.
PAGE 463
IPv6 Network Support IPv6 Address Types IPv4 Mapped IPv6 Address There is a special type of IPv6 address that holds an embedded IPv4 address. This address is used to represent the addresses of IPv4-only nodes as IPv6 addresses. These addresses are used especially by applications that support both IPv6 and IPv4. These addresses are called as IPv4 Mapped IPv6 Addresses. The format of these address is as follows: Table H-4 80 bits 16 bits zeros 32 bits FFFF IPv4 address Example: ::ffff:192.168.0.
PAGE 464
IPv6 Network Support IPv6 Address Types Link-Local Addresses Link-local addresses have the following format: Table H-6 10 bits 1111111010 54 bits 0 64 bits interface ID Link-local address are supposed to be used for addressing nodes on a single link. Packets originating from or destined to a link-local address will not be forwarded by a router.
PAGE 465
IPv6 Network Support IPv6 Address Types “FF” at the beginning of the address identifies the address as a multicast address. The “flgs” field is a set of 4 flags “000T”. The higher order 3 bits are reserved and must be zero. The last bit ‘T’ indicates whether it is permanently assigned or not. A value of zero indicates that it is permanently assigned otherwise it is a temporary assignment. The “scop” field is a 4-bit field which is used to limit the scope of the multicast group.
PAGE 466
IPv6 Network Support Network Configuration Restrictions Network Configuration Restrictions Serviceguard supports IPv6 for data links only. The heartbeat IP must still be IPv4, but the package IPs can be IPv4 or IPv6. To configure IPv6, the system should be set up in what is called a dual-stack configuration, which requires the IPv6 product bundle. The restrictions for supporting IPv6 in Serviceguard are listed below. 466 • The heartbeat IP address must be IPv4.
PAGE 467
IPv6 Network Support Network Configuration Restrictions NOTE Appendix H Even though link-local IP addresses are not supported in the Serviceguard cluster configuration, the primary link-local address on the Serviceguard primary interface will be switched over the standby during a local switch. This is because of two requirements: First, the dual stack (IPv4/IPv6) kernel requires that the primary IP address associated with an interface must always be a link-local address.
PAGE 468
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature IPv6 Relocatable Address and Duplicate Address Detection Feature The IPv6 networking stack has a new feature, Duplicate Address Detection (DAD), that was not previously available in IPv4. When an address is being added, the DAD detects a duplicate address that is already being used on the network.
PAGE 469
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature # TRANSPORT_NAME[index]=ip6 # NDD_NAME[index]=ip6_nd_dad_solicit_count # NDD_VALUE[index]=n Where index is the next available integer value of the nddconf file, and n is a number: either 1 to turn the feature ON or 0 to turn it OFF.
PAGE 470
IPv6 Network Support Local Primary/Standby LAN Patterns Local Primary/Standby LAN Patterns The use of IPv6 allows a number of different patterns of failover among LAN cards configured in the cluster. This is true because each LAN card can support several IP addresses when a dual IPv4/IPv6 configuration is used. This section describes several ways in that local failover to a standby LAN can be configured.
PAGE 471
IPv6 Network Support Example Configurations Example Configurations An example of a LAN configuration on a cluster node using both IPv4 and IPv6 addresses is shown in below. Figure H-1 Example 1: IPv4 and IPv6 Addresses in Standby Configuration Following the loss of lan0 or lan2, lan1 can adopt either address, as shown below. Figure H-2 Example 1: IPv4 and IPv6 Addresses after Failover to Standby The same LAN card can be configured with both IPv4 and IPv6 addresses, as shown in below.
PAGE 472
IPv6 Network Support Example Configurations Figure H-3 Example 2: IPv4 and IPv6 Addresses in Standby Configuration This type of configuration allows failover of both addresses to the standby. This is shown in below.
PAGE 473
Maximum and Minimum Values for Cluster and Package Configuration Parameters I Maximum and Minimum Values for Cluster and Package Configuration Parameters Table I-1 shows the range of possible values for cluster configuration parameters.
PAGE 474
Maximum and Minimum Values for Cluster and Package Configuration Parameters Table I-2 shows the range of possible values for package configuration parameters. Table I-2 Package Paramet er 474 Minimum and Maximum Values of Package Configuration Parameters Minimum Value Maximum Value Run Script Timeout 10 seconds 4294 seconds if a non-zero value is specified 0 (NO_TIMEOUT) This is a recommended value.
PAGE 475
A Access Control Policies, 177, 192 Access Control Policy, 162 Access roles, 162 active node, 25 adding a package to a running cluster, 334 adding cluster nodes advance planning, 203 adding nodes to a running cluster, 314 adding packages on a running cluster, 274 additional package resource parameter in package configuration, 176 additional package resources monitoring, 85 addressing, SCSI, 141 administration adding nodes to a ruuning cluster, 314 cluster and package states, 297 halting a package, 319 halti
PAGE 476
logical volume infrastructure, 208 verifying the cluster configuration, 227 VxVM infrastructure, 215 bus type hardware planning, 143 C CFS Creating a storage infrastructure, 231 creating a storage infrastructure, 231 not supported on all HP-UX versions, 27 changes in cluster membership, 67 changes to cluster allowed while the cluster is running, 323 changes to packages allowed while the cluster is running, 337 changing the volume group configuration while the cluster is running, 330 checkpoints, 397 client
PAGE 477
scenario, 129 cluster re-formation time, 156 cluster startup manual, 67 cluster volume group creating physical volumes, 210 parameter in cluster manager configuration, 162 cluster with high availability disk array figure, 48, 49 clusters active/standby type, 53 larger size, 53 cmapplyconf, 229, 293 cmassistd daemon, 59 cmcheckconf, 227, 292 troubleshooting, 361 cmclconfd daemon, 59 cmcld daemon, 59 and LAN failover, 61 and node TOC, 60 and safety timer, 60 functions, 60 runtime priority, 61 cmclnodelist boo
PAGE 478
D data disks, 43 data congestion, 66 databases toolkits, 389 deactivating volume groups, 212 deciding when and where to run packages, 76 deferred resource name, 183 deleting a package configuration using cmdeleteconf, 335 deleting a package from a running cluster, 335 deleting nodes while the cluster is running, 329, 331 deleting the cluster configuration using cmdeleteconf, 256 dependencies configuring, 177 designing applications to run on multiple systems, 399 detecting failures in network manager, 104
PAGE 479
in troubleshooting, 347, 348 event monitoring service using, 85 exclusive access relinquishing via TOC, 130 expanding the cluster planning ahead, 136 expansion planning for, 170 F failback policy package configuration file parameter, 171 used by package manager, 82 FAILBACK_POLICY parameter in package configuration file, 171 used by package manager, 82 failover controlling the speed in applications, 394 defined, 25 failover behavior in packages, 86 failover package, 74 failover packages configuring, 261 fai
PAGE 480
parameter in package configuration, 173 HALT_SCRIPT_TIMEOUT (halt script timeout) in sample ASCII package configuration file, 262 parameter in package configuration, 173 halting a cluster, 316 halting a package, 319 halting the entire cluster, 316 handling application failures, 408 hardware blank planning worksheet, 436 monitoring, 347 hardware failures response to, 131 hardware for OPS on HP-UX power supplies, 52 hardware planning Disk I/O Bus Type, 143 disk I/O information for shared disks, 143 host IP a
PAGE 481
hardware planning, 140, 148, 149 portable, 102 releasing via TOC, 130 reviewing for packages, 358 switching, 77, 78, 110 J JFS, 395 K kernel hang, and TOC, 129 safety timer, 60, 61 kernel consistency in cluster configuration, 193, 202 kernel interrupts and possible TOC, 161 L LAN Critical Resource Analysis (CRA), 325 heartbeat, 65 interface name, 140, 148 planning information, 140 LAN CRA (Critical Resource Analysis), 325 LAN failover managed by cmcld, 61 LAN failure Serviceguard behavior, 36 LAN interface
PAGE 482
hardware planning, 139 memory requirements lockable memory for Serviceguard, 136 minimizing planned down time, 410 mirror copies of data protection against disk failure, 25 MirrorDisk/UX, 44 mirrored disks connected for high availability figure, 47 mirroring disks, 44 mirroring disks, 44 mkboot creating a root mirror with, 199 monitor cluster with Serviceguard commands, 251 monitor clusters with Serviceguard Manager, 251 monitored non-heartbeat subnet parameter in cluster manager configuration, 159 monitore
PAGE 483
halt (TOC), 129, 130 in Serviceguard cluster, 24 IP addresses, 102 timeout and TOC example, 130 node types active, 25 primary, 25 NODE_FAIL_FAST_ENABLED effect of setting, 132 in sample ASCII package configuration file, 262 parameter in package configuration, 173 NODE_FAILFAST_ENABLED parameter, 276 NODE_NAME in sample ASCII package configuration file, 262 parameter in cluster manager configuration, 157, 158 NODE_TIMEOUT and HEARTBEAT_INTERVAL, 129 and node TOC, 129 and safety timer, 60, 61 NODE_TIMEOUT
PAGE 484
package coordinator defined, 65 package failfast parameter in package configuration, 173 package failover behavior, 86 package failures responses, 132 package IP address defined, 102 package IP addresses, 102 defined, 102 reviewing, 358 package manager blank planning worksheet, 444, 446 testing, 344 package name parameter in package configuration, 171 package switching behavior changing, 321 package type parameter in package configuration, 176 Package types, 24 failover, 24 multi-node, 24 system multi-node,
PAGE 485
worksheet, 146 power supplies blank planning worksheet, 436 power supply and cluster lock, 52 blank planning worksheet, 437 UPS for OPS on HP-UX, 52 Predictive monitoring, 348 primary disks and mirrors on different buses figure, 51 primary LAN interfaces defined, 38 primary network interface, 38 primary node, 25 pvcreate creating a root mirror with, 199 PVG-strict mirroring creating volume groups with, 210 Q qs daemon, 59 QS_HOST parameter in cluster manager configuration, 157 QS_POLLING_INTERVAL parameter
PAGE 486
disks, 43 responses to cluster events, 340 to package and service failures, 132 responses to failures, 129 responses to hardware failures, 131 restart automatic restart of cluster, 67 following failure, 133 SERVICE_RESTART variable in package control script, 183 restartable transactions, 396 restarting the cluster automatically, 317 restoring client connections in applications, 406 retry count, 181 rhosts file for security, 189 rolling software upgrades, 419 example, 425 steps, 422 rolling upgrade limitati
PAGE 487
in sample ASCII package configuration file, 262 in sample package control script, 277 parameter in package configuration, 174, 175 SERVICE_RESTART array variable in package control script, 183 in sample package control script, 277 Serviceguard install, 207 introduction, 24 Serviceguard at a glance, 23 Serviceguard behavior after monitored resource failure, 36 Serviceguard behavior in LAN failure, 36 Serviceguard behavior in software failure, 36 Serviceguard commands to configure a package, 258 ServiceGuar
PAGE 488
cluster manager, 345 network manager, 345 package manager, 344 testing cluster operation, 344 time protocol (NTP) for clusters, 202 timeout node, 129 TOC and NODE_TIMEOUT, 129 and package availability, 130 and safety timer, 161 and the safety timer, 60 causes and scenarios, 129 defined, 60 when a node fails, 129 toolkits for databases, 389 traffic type LAN hardware planning, 140 troubleshooting approaches, 358 monitoring hardware, 347 replacing disks, 350 reviewing control scripts, 361 reviewing package IP
PAGE 489
CVM, 123 LVM, 121 migrating from LVM to VxVM, 449 VxVM, 122 VOLUME_GROUP parameter in cluster manager configuration, 162 vxfend for CVM and CFS, 64 VxM-CVM-pkg system multi-node package, 167 VxVM, 121, 122 creating a storage infrastructure, 215 migrating from LVM to VxVM, 449 planning, 153 VXVM_DG in package control script, 277 VxVM-CVM package, 75 VxVM-CVM-pkg, 245 W What is Serviceguard?, 24 worksheet cluster configuration, 163 hardware configuration, 144 package configuration data, 178 package control s