Managing Serviceguard 12th Edition, March 2006

ManualsBrandsHP ManualsSoftwareHP Serviceguard Software

Managing Serviceguard

Twelfth Edition

Manufacturing Part Number: B3936-90100

Reprinted March 2006

Summary of content (494 pages)

PAGE 1
Managing Serviceguard Twelfth Edition Manufacturing Part Number: B3936-90100 Reprinted March 2006
PAGE 2
Legal Notices © Copyright 1995-2006 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
PAGE 3
Contents 1. Serviceguard at a Glance What is Serviceguard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monitoring with Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 4
Contents Serviceguard Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Serviceguard Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 How the Cluster Manager Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Configuration of the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 5
Contents Volume Managers for Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Redundant Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Mirrored Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Storage on Disk Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Volume Manager . . . . . . . . . . . . . . . . .
PAGE 6
Contents Cluster Lock Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster Configuration Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Package Configuration Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logical Volume and File System Planning . . . . . . .
PAGE 7
Contents Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) . . . . . 240 Preparing the Cluster and the System Multi-node Package . . . . . . . . . . . . . . . . . . 240 Creating the Disk Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 Creating the Disk Group Cluster Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Creating Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 8
Contents Configuring Multi-node Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configuring Failover Packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating the Package Control Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Serviceguard Manager to Create the Package Control Script . . . . . . . . . . . Using Commands to Create the Package Control Script . . . . . . . . . . . . . . .
PAGE 9
Contents Reconfiguring a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconfiguring a Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconfiguring a Package on a Halted Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconfiguring a Package on a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding a Package to a Running Cluster . . . . . . . . . . . . . . . .
PAGE 10
Contents Reviewing Object Manager Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Serviceguard Manager Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing the System Multi-node Package Files . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Configuration Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing the Package Control Script . . . . . . . . . . . . . . . . . . . . . . .
PAGE 11
Contents Design for Replicated Data Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing Applications to Run on Multiple Systems . . . . . . . . . . . . . . . . . . . . . . . . . Avoid Node-Specific Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avoid Using SPU IDs or MAC Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assign Unique Names to Applications . . . . . . . . . . . . . . . . . . . . . .
PAGE 12
Contents F. Blank Planning Worksheets Worksheet for Hardware Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Power Supply Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quorum Server Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LVM Volume Group and Physical Volume Worksheet . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 13
Tables Table 1. Printing History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 Table 3-1. Package Configuration Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78 Table 3-2. Node Lists in Sample Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81 Table 3-3. Package Failover Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85 Table 3-4.
PAGE 14
Tables 14
PAGE 15
Printing History Table 1 Printing History Printing Date Part Number Edition January 1995 B3936-90001 First June 1995 B3936-90003 Second December 1995 B3936-90005 Third August 1997 B3936-90019 Fourth January 1998 B3936-90024 Fifth October 1998 B3936-90026 Sixth December 2000 B3936-90045 Seventh September 2001 B3936-90053 Eighth March 2002 B3936-90065 Ninth June 2003 B3936-90070 Tenth June 2004 B3936-90076 Eleventh June 2004 (Reprint June 2005) B3936-90076 Eleventh, Seco
PAGE 16
The printing date changes when a new edition is printed. (Minor corrections and updates which are incorporated at reprint do not cause the date to change.) The part number is revised when extensive technical changes are incorporated. New editions of this manual will incorporate all material updated since the previous edition. HP Printing Division: Infrastructure Solutions Division Hewlett-Packard Co. 19111 Pruneridge Ave.
PAGE 17
Preface The twelfth printing of this manual is updated for Serviceguard Version A.11.17. This guide describes how to configure Serviceguard to run on HP 9000 or HP Integrity servers under the HP-UX operating system. The contents are as follows: • Chapter 1, “Serviceguard at a Glance,” describes a Serviceguard cluster and provides a roadmap for using this guide. • Chapter 2, “Understanding Serviceguard Hardware Configurations,” provides a general view of the hardware configurations used by Serviceguard.
PAGE 18
Related Publications • Appendix C, “Designing Highly Available Cluster Applications,” gives guidelines for creating cluster-aware applications that provide optimal performance in a Serviceguard environment. • Appendix D, “Integrating HA Applications with Serviceguard,” presents suggestions for integrating your existing applications with Serviceguard. • Appendix E, “Rolling Software Upgrades,” shows how to move from one Serviceguard or HP-UX release to another without bringing down your applications.
PAGE 19
• From http://www.docs.hp.com -> High Availability - > HP Serviceguard Storage Management Suite: — HP Serviceguard Storage Management Suite Version A.01.00 Release Notes (T2771-90028) March 2006 — Storage Foundation CFS 4.1 HP Serviceguard Storage Management Suite Extracts (T2771-90009), December 2005 • Before attempting to use VxVM storage with Serviceguard, please refer to the documents posted at http://docs.hp.com . From the heading By OS Release, chose 11iv2.
PAGE 20
— Designing Disaster Tolerant High Availability Clusters December 2005 • From http://www.docs.hp.com -> High Availability - > HP Serviceguard Extension for Faster Failover: — HP Serviceguard Extension for Faster Failover, Version A.01.00, Release Notes (T2389-90001) June 2004 • From http://www.docs.hp.com -> High Availability - > Serviceguard Extension for SAP: — Managing Serviceguard Extension for SAP March 2006 (T2803-90002) • From http://www.docs.hp.
PAGE 21
Serviceguard at a Glance 1 Serviceguard at a Glance This chapter introduces Serviceguard on HP-UX, and shows where to find different kinds of information in this book. The following topics are presented: • What is Serviceguard? • Using Serviceguard Manager • A Roadmap for Configuring Clusters and Packages If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4, “Planning and Documenting an HA Cluster,” on page 131.
PAGE 22
Serviceguard at a Glance What is Serviceguard? What is Serviceguard? Serviceguard allows you to create high availability clusters of HP 9000 or HP Integrity servers. A high availability computer system allows application services to continue in spite of a hardware or software failure. Highly available systems protect users from software failures as well as from failure of a system processing unit (SPU), disk, or local area network (LAN) component.
PAGE 23
Serviceguard at a Glance What is Serviceguard? In Figure 1-1, node 1 (one of two SPU's) is running failover package A, and node 2 is running package B. Each package has a separate group of disks associated with it, containing data needed by the package's applications, and a mirror copy of the data. Note that both nodes are physically connected to both groups of mirrored disks. In this example, however, only one node at a time may access the data for a given group of disks.
PAGE 24
Serviceguard at a Glance What is Serviceguard? Figure 1-2 Typical Cluster After Failover After this transfer, the failover package typically remains on the adoptive node as long the adoptive node continues running. If you wish, however, you can configure the package to return to its primary node as soon as the primary node comes back online. Alternatively, you may manually transfer control of the package back to the primary node at the appropriate time.
PAGE 25
Serviceguard at a Glance What is Serviceguard? Serviceguard; disk arrays, which use various RAID levels for data protection; and HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, which eliminates failures related to power outage. These products are highly recommended along with Serviceguard to provide the greatest degree of availability.
PAGE 26
Serviceguard at a Glance Using Serviceguard Manager Using Serviceguard Manager Serviceguard Manager is the graphical user interface for Serviceguard. The Serviceguard Manager management station can be HP-UX, Linux, and Windows systems. From there, you can monitor, administer, and configure Serviceguard clusters on HP-UX or on Linux. • Monitor: You can see information about Serviceguard objects on your subnets. The objects are represented in a hierarchal tree, in a graphical map.
PAGE 27
Serviceguard at a Glance Using Serviceguard Manager Figure 1-3 Monitoring with Serviceguard Manager Administering with Serviceguard Manager You can also administer clusters, nodes, and packages if you have the appropriate access permissions (Serviceguard A.11.14 and A.11.15) or access control policies (Serviceguard A.11.16 and A.11.
PAGE 28
Serviceguard at a Glance Using Serviceguard Manager Figure 1-4 28 Serviceguard Manager Package Administration Chapter 1
PAGE 29
Serviceguard at a Glance Using Serviceguard Manager Configuring with Serviceguard Manager With Serviceguard version A.11.16, you can also configure clusters and packages. Both the server node and the target cluster must have Serviceguard version A.11.16 or later installed, and you must have root (UID=0) login to the cluster nodes.
PAGE 30
Serviceguard at a Glance Using Serviceguard Manager Serviceguard Manager Help To see online help, click on the “Help” menu item at the top of the screen. The following help topics under “Using Serviceguard Manager” are especially valuable for new users of the interface: • “Menu and Toolbar Commands” • “Navigating Serviceguard Manager” • “Map Legend” How Serviceguard Manager Works Serviceguard Manager from a Unix, Linux, or PC.
PAGE 31
Serviceguard at a Glance Using Serviceguard Manager Serviceguard Security Patch installed; the Security must also be enabled on that node. Also, refer to “Editing Security Files” on page 190 for access requirements. To connect, you need to specify a valid username and password from the session server’s /etc/passwd file. List the cluster or clusters you want to see. Click “unused nodes” to see nodes that are not currently configured into a cluster, but do have Serviceguard installed.
PAGE 32
Serviceguard at a Glance What are the Distributed Systems Administration Utilities? What are the Distributed Systems Administration Utilities? HP Distributed Systems Administration Utilities (DSAU) improve both Serviceguard cluster management and multisystem management. The tools provide: • Configuration synchronization • Log consolidation • Command fan-out With configuration synchronization, you specify a specific server as your configuration master; all your other systems are defined as clients.
PAGE 33
Serviceguard at a Glance A Roadmap for Configuring Clusters and Packages A Roadmap for Configuring Clusters and Packages This manual presents the tasks you need to perform in order to create a functioning HA cluster using Serviceguard. These tasks are shown in Figure 1-6. Figure 1-6 Tasks in Configuring a Serviceguard Cluster The tasks in Figure 1-6 are covered in step-by-step detail in chapters 4 through 7.
PAGE 34
Serviceguard at a Glance A Roadmap for Configuring Clusters and Packages 34 Chapter 1
PAGE 35
Understanding Serviceguard Hardware Configurations 2 Understanding Serviceguard Hardware Configurations This chapter gives a broad overview of how the Serviceguard hardware components work. The following topics are presented: • Redundancy of Cluster Components • Redundant Network Components • Redundant Disk Storage • Redundant Power Supplies • Larger Clusters Refer to the next chapter for information about Serviceguard software components.
PAGE 36
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Redundancy of Cluster Components In order to provide a high level of availability, a typical cluster uses redundant system components, for example two or more SPUs and two or more independent disks. This redundancy eliminates single points of failure. In general, the more redundancy, the greater your access to applications, data, and supportive services in the event of a failure.
PAGE 37
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Note that a package that does not access data from a disk on a shared bus can be configured to fail over to as many nodes as you have configured in the cluster (regardless of disk technology). For instance, if a package only runs local executables, it can be configured to failover to all nodes in the cluster that have local copies of those executables, regardless of the type of disk connectivity.
PAGE 38
Understanding Serviceguard Hardware Configurations Redundant Network Components Redundant Network Components To eliminate single points of failure for networking, each subnet accessed by a cluster node is required to have redundant network interfaces. Redundant cables are also needed to protect against cable failures. Each interface card is connected to a different cable, and the cables themselves are connected by a component such as a hub or a bridge.
PAGE 39
Understanding Serviceguard Hardware Configurations Redundant Network Components Figure 2-1 Redundant LANs In the figure, a two-node Serviceguard cluster has one bridged net configured with both a primary and a standby LAN card for the data/heartbeat subnet (Subnet A). Another LAN card provides an optional dedicated heartbeat LAN. Note that the primary and standby LAN segments are connected by a hub to provide a redundant data/heartbeat subnet. Each node has its own IP address for this subnet.
PAGE 40
Understanding Serviceguard Hardware Configurations Redundant Network Components NOTE You should verify that network traffic is not too high on the heartbeat/ data LAN. If traffic is too high, this LAN might not perform adequately in transmitting heartbeats if the dedicated heartbeat LAN fails. Providing Redundant FDDI Connections FDDI is a high speed fiber optic interconnect medium.
PAGE 41
Understanding Serviceguard Hardware Configurations Redundant Network Components Using Dual Attach FDDI Stations Another way of obtaining redundant FDDI connections is to configure dual attach stations on each node to create an FDDI ring, shown in Figure 2-3. An advantage of this configuration is that only one slot is used in the system card cage. In Figure 2-3, note that nodes 3 and 4 also use Ethernet to provide connectivity outside the cluster.
PAGE 42
Understanding Serviceguard Hardware Configurations Redundant Network Components NOTE The use of a serial (RS232) heartbeat line is supported only in a two-node cluster configuration. A serial heartbeat line is required in a two-node cluster that has only one heartbeat LAN. If you have at least two heartbeat LANs, or one heartbeat LAN and one standby LAN, a serial (RS232) heartbeat should not be used.
PAGE 43
Understanding Serviceguard Hardware Configurations Redundant Network Components Replacement of Failed Network Cards Depending on the system configuration, it is possible to replace failed network cards while the cluster is running. The process is described under “Replacement of LAN Cards” in the chapter “Troubleshooting Your Cluster.
PAGE 44
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Redundant Disk Storage Each node in a cluster has its own root disk, but each node is also physically connected to several other disks in such a way that more than one node can obtain access to the data and programs associated with a package it is configured for. This access is provided by a Storage Manager, such as Logical Volume Manager (LVM), VERITAS Volume Manager (VxVM), or VERITAS Cluster Volume Manager (CVM).
PAGE 45
Understanding Serviceguard Hardware Configurations Redundant Disk Storage shared bus. All SCSI addresses, including the addresses of all interface cards, must be unique for all devices on a shared bus. See the manual Configuring HP-UX for Peripherals for information on SCSI bus addressing and priority.
PAGE 46
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Disk Arrays using RAID Levels and Multiple Data Paths An alternate method of achieving protection for your data is to employ a disk array with hardware RAID levels that provide data redundancy, such as RAID Level 1 or RAID Level 5. The array provides data redundancy for the disks. This protection needs to be combined with the use of redundant host bus interfaces (SCSI or Fibre Channel) between each node and the array.
PAGE 47
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Replacement of Failed I/O Cards Depending on the system configuration, it is possible to replace failed disk I/O cards while the system remains online. The process is described under “Replacing I/O Cards” in the chapter “Troubleshooting Your Cluster.” Sample SCSI Disk Configurations Figure 2-5 shows a two node cluster. Each node has one root disk which is mirrored and one package for which it is the primary node.
PAGE 48
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-5 Mirrored Disks Connected for High Availability Figure 2-6 below shows a similar cluster with a disk array connected to each node on two I/O channels.
PAGE 49
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-6 Cluster with High Availability Disk Array Details on logical volume configuration for Serviceguard, including PV Links, are given in the chapter “Building an HA Cluster Configuration.” Sample Fibre Channel Disk Configuration In Figure 2-7 below, the root disks are shown with simple mirroring, but the shared storage is now accessed via redundant Fibre Channel switches attached to a disk array.
PAGE 50
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-7 Cluster with Fibre Channel Switched Disk Array This type of configuration also uses PV links or other multipath software such as VERITAS Dynamic Multipath (DMP) or EMC PowerPath. Root Disk Limitations on Shared SCSI Buses The IODC firmware does not support two or more nodes booting from the same SCSI bus at the same time.
PAGE 51
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-8 Root Disks on Different Shared Buses Note that if both nodes had their primary root disks connected to the same bus, you would have an unsupported configuration. You can put a mirror copy of Node B's root disk on the same SCSI bus as Node A's primary root disk, because three failures would have to occur for both systems to boot at the same time, which is an acceptable risk.
PAGE 52
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-9 Primaries and Mirrors on Different Shared Buses Note that you cannot use a disk within a disk array as a root disk if the array is on a shared bus.
PAGE 53
Understanding Serviceguard Hardware Configurations Redundant Power Supplies Redundant Power Supplies You can extend the availability of your hardware by providing battery backup to your nodes and disks. HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, can provide this protection from momentary power loss. Disks should be attached to power circuits in such a way that mirror copies are attached to different power sources.
PAGE 54
Understanding Serviceguard Hardware Configurations Larger Clusters Larger Clusters You can create clusters of up to 16 nodes with Serviceguard. Clusters of up to 16 nodes may be built by connecting individual SPUs via Ethernet and using FDDI networking. The possibility of configuring a cluster consisting of 16 nodes does not mean that all types of cluster configuration behave in the same way in a 16-node configuration.
PAGE 55
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-10 Eight-Node Active/Standby Cluster Point to Point Connections to Storage Devices Some storage devices allow point-to-point connection to a large number of host nodes without using a shared SCSI bus. An example is shown in Figure 2-11, a cluster consisting of eight nodes with a SCSI interconnect. The nodes access shared data on an XP or EMC disk array configured with 16 SCSI I/O ports.
PAGE 56
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-11 Eight-Node Cluster with XP or EMC Disk Array Fibre Channel switched configurations also are supported using either an arbitrated loop or fabric login topology. For additional information about supported cluster configurations, refer to the HP Unix Servers Configuration Guide, available through your HP representative.
PAGE 57
Understanding Serviceguard Software Components 3 Understanding Serviceguard Software Components This chapter gives a broad overview of how the Serviceguard software components work.
PAGE 58
Understanding Serviceguard Software Components Serviceguard Architecture Serviceguard Architecture The following figure shows the main software components used by Serviceguard. This chapter discusses these components in some detail.
PAGE 59
Understanding Serviceguard Software Components Serviceguard Architecture • /usr/lbin/cmlogd—Serviceguard Syslog Log Daemon • /usr/lbin/cmlvmd—Cluster Logical Volume Manager Daemon • /opt/cmom/lbin/cmomd—Cluster Object Manager Daemon • /usr/lbin/cmsnmpd—Cluster SNMP subagent (optionally running) • /usr/lbin/cmsrvassistd—Serviceguard Service Assistant Daemon • /usr/lbin/qs—Serviceguard Quorum Server Daemon • /usr/lbin/cmnetassistd - Network Sensor Assistant daemon.
PAGE 60
Understanding Serviceguard Software Components Serviceguard Architecture update the kernel timer, indicating a kernel hang. Before a TOC due to the expiration of the safety timer, messages will be written to /var/adm/syslog/syslog.log and the kernel’s message buffer. The cmcld daemon also detects the health of the networks on the system and performs local lan failover. Finally, this daemon handles the management of Serviceguard packages, determining where to run them and when to start them.
PAGE 61
Understanding Serviceguard Software Components Serviceguard Architecture to the object manager and receive responses from it. This daemon may not be running on your system; it is used only by clients of the object manager. cmomd accepts connections from clients, and examines queries. The queries are decomposed into categories (of classes) which are serviced by various providers.
PAGE 62
Understanding Serviceguard Software Components Serviceguard Architecture All members of the cluster initiate and maintain a connection to the quorum server. If the quorum server dies, the Serviceguard nodes will detect this and then periodically try to reconnect to the quorum server until it comes back up. If there is a cluster reconfiguration while the quorum server is down and there is a partition in the cluster that requires tie-breaking, the reconfiguration will fail.
PAGE 63
Understanding Serviceguard Software Components Serviceguard Architecture Chapter 3 • vxfend - When VERITAS CFS is deployed as part of the Serviceguard Storage Management Suite, the I/O fencing daemon vxfend is also included. It implements a quorum-type functionality for the VERITAS Cluster File System. vxfend is controlled by Serviceguard to synchronize quorum mechanisms.
PAGE 64
Understanding Serviceguard Software Components How the Cluster Manager Works How the Cluster Manager Works The cluster manager is used to initialize a cluster, to monitor the health of the cluster, to recognize node failure if it should occur, and to regulate the re-formation of the cluster when a node joins or leaves the cluster. The cluster manager operates as a daemon process that runs on each node.
PAGE 65
Understanding Serviceguard Software Components How the Cluster Manager Works cluster, that information is passed to the package coordinator (described further in this chapter, in “How the Package Manager Works” on page 73). Failover packages that were running on nodes that are no longer in the new cluster are transferred to their adoptive nodes. Note that if there is a transitory loss of heartbeat, the cluster may re-form with the same nodes as before.
PAGE 66
Understanding Serviceguard Software Components How the Cluster Manager Works Manual Startup of Entire Cluster A manual startup forms a cluster out of all the nodes in the cluster configuration. Manual startup is normally done the first time you bring up the cluster, after cluster-wide maintenance or upgrade, or after reconfiguration. Before startup, the same binary cluster configuration file must exist on all nodes in the cluster.
PAGE 67
Understanding Serviceguard Software Components How the Cluster Manager Works • A node halts because of a package failure. • A node halts because of a service failure. • Heavy network traffic prohibited the heartbeat signal from being received by the cluster. • The heartbeat network failed, and another network is not configured to carry heartbeat. Typically, re-formation results in a cluster with a different composition.
PAGE 68
Understanding Serviceguard Software Components How the Cluster Manager Works possibility of two sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-cluster with greater than 50% of the nodes will form the new cluster, and the cluster lock is not used. If you have a two-node cluster, you are required to configure a cluster lock.
PAGE 69
Understanding Serviceguard Software Components How the Cluster Manager Works Figure 3-2 Lock Disk Operation Serviceguard periodically checks the health of the lock disk and writes messages to the syslog file when a lock disk fails the health check. This file should be monitored for early detection of lock disk problems. You can choose between two lock disk options—a single or dual lock disk—based on the kind of high availability configuration you are building.
PAGE 70
Understanding Serviceguard Software Components How the Cluster Manager Works Dual Lock Disk If you are using disks that are internally mounted in the same cabinet as the cluster nodes, then a single lock disk would be a single point of failure in this type of cluster, since the loss of power to the node that has the lock disk in its cabinet would also render the cluster lock unavailable.
PAGE 71
Understanding Serviceguard Software Components How the Cluster Manager Works area in memory for each cluster, and when a node obtains the cluster lock, this area is marked so that other nodes will recognize the lock as “taken.” If communications are lost between two equal-sized groups of nodes, the group that obtains the lock from the Quorum Server will take over the cluster and the other nodes will perform a TOC.
PAGE 72
Understanding Serviceguard Software Components How the Cluster Manager Works three-node cluster is removed for maintenance, the cluster reforms as a two-node cluster. If a tie-breaking scenario later occurs due to a node or communication failure, the entire cluster will become unavailable. In a cluster with four or more nodes, you may not need a cluster lock since the chance of the cluster being split into two halves of equal size is very small.
PAGE 73
Understanding Serviceguard Software Components How the Package Manager Works How the Package Manager Works Packages are the means by which Serviceguard starts and halts configured applications. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator is known as the package coordinator.
PAGE 74
Understanding Serviceguard Software Components How the Package Manager Works known as VxVM-CVM-pkg for VERITAS CVM Version 3.5 and called SG-CFS-pkg for VERITAS CVM Version 4.1. It runs on all nodes that are active in the cluster and provides cluster membership information to the volume manager software. This type of package is configured and used only when you employ CVM for storage management.
PAGE 75
Understanding Serviceguard Software Components How the Package Manager Works Configuring Failover Packages Each package is separately configured. You create a failover package by using Serviceguard Manager or by editing a package ASCII configuration file template. (Detailed instructions are given in “Configuring Packages and Their Services” on page 269). Then you use the cmapplyconf command to check and apply the package to the cluster configuration database.
PAGE 76
Understanding Serviceguard Software Components How the Package Manager Works A package switch involves moving failover packages and their associated IP addresses to a new system. The new system must already have the same subnet configured and working properly, otherwise the packages will not be started. With package failovers, TCP connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically.
PAGE 77
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-6 shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2. Package 1's IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package1’s disk and Package2’s disk.
PAGE 78
Understanding Serviceguard Software Components How the Package Manager Works # Enter the failover policy for this package. This policy will be used # to select an adoptive node whenever the package needs to be started. # The default policy unless otherwise specified is CONFIGURED_NODE. # This policy will select nodes in priority order from the list of # NODE_NAME entries specified below. # The alternative policy is MIN_PACKAGE_NODE.
PAGE 79
Understanding Serviceguard Software Components How the Package Manager Works Table 3-1 Package Configuration Data (Continued) Package Name pkgC NODE_NAME List node3, node4, node1, node2 FAILOVER_POLICY MIN_PACKAGE_NODE When the cluster starts, each package starts as shown in Figure 3-7.
PAGE 80
Understanding Serviceguard Software Components How the Package Manager Works NOTE Using the MIN_PACKAGE_NODE policy, when node 2 is repaired and brought back into the cluster, it will then be running the fewest packages, and thus will become the new standby node.
PAGE 81
Understanding Serviceguard Software Components How the Package Manager Works # # # # # of running the package. Default is MANUAL which means no attempt will be made to move the package back to it primary node when it is running on an alternate node. The alternate policy is AUTOMATIC which means the package will be moved back to its primary node whenever the primary node is capable of running the package.
PAGE 82
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-11 Automatic Failback Configuration After Failover After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1. Figure 3-12 Automatic Failback Configuration After Restart of Node 1 NOTE Setting the FAILBACK_POLICY to AUTOMATIC can result in a package failback and application outage during a critical production period.
PAGE 83
Understanding Serviceguard Software Components How the Package Manager Works On Combining Failover and Failback Policies Combining a FAILOVER_POLICY of MIN_PACKAGE_NODE with a FAILBACK_POLICY of AUTOMATIC can result in a package’s running on a node where you did not expect it to run, since the node running the fewest packages will probably not be the same host every time a failover occurs.
PAGE 84
Understanding Serviceguard Software Components How the Package Manager Works the resource is available before starting the package. In addition, the package manager can fail the package to another node or take other action if the resource becomes unavailable after the package starts. You can specify a registered resource for a package by selecting it from the list of available resources displayed in the Serviceguard Manager Configuring Packages.
PAGE 85
Understanding Serviceguard Software Components How the Package Manager Works Choosing Package Failover Behavior To determine failover behavior, you can define a package failover policy that governs which nodes will automatically start up a package that is not running. In addition, you can define a failback policy that determines whether a package will be automatically returned to its primary node when that is possible.
PAGE 86
Understanding Serviceguard Software Components How the Package Manager Works Table 3-3 Package Failover Behavior (Continued) Options in Serviceguard Manager Switching Behavior 86 Parameters in ASCII File Package is automatically halted and restarted on its primary node if the primary node is available and the package is running on a non-primary node. • Failback policy set to Automatic. • FAILBACK_POLICY set to AUTOMATIC.
PAGE 87
Understanding Serviceguard Software Components How the Package Manager Works Table 3-3 Package Failover Behavior (Continued) Options in Serviceguard Manager Switching Behavior All packages switch following a TOC on the node when any service fails. An attempt is first made to reboot the system prior to the TOC. Chapter 3 • • Service Failfast set to Enabled for all services. Parameters in ASCII File • SERVICE_FAIL_FAST_ENABLED set to YES for all services. • AUTO_RUN set to YES for all packages.
PAGE 88
Understanding Serviceguard Software Components How Package Control Scripts Work How Package Control Scripts Work Packages are the means by which Serviceguard starts and halts configured applications. Failover packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 150 packages per cluster and a total of 900 services per cluster.
PAGE 89
Understanding Serviceguard Software Components How Package Control Scripts Work The CFS packages, however, are not created by performing cmapplyconf on package configuration files, but by a series of CFS-specific commands. Serviceguard determines most of their options; all user-determined options can be entered as parameters to the commands. (See the cfs admin commands in Appendix A.) A failover package can be configured to have a dependency on a multi-node or system multi-node package.
PAGE 90
Understanding Serviceguard Software Components How Package Control Scripts Work NOTE If you configure the package while the cluster is running, the package does not start up immediately after the cmapplyconf command completes. To start the package without halting and restarting the cluster, issue the cmrunpkg or cmmodpkg command. How does a failover package start up, and what is its behavior while it is running? Some of the many phases of package life are shown in Figure 3-13.
PAGE 91
Understanding Serviceguard Software Components How Package Control Scripts Work 7. When the node fails Before the Control Script Starts First, a node is selected. This node must be in the package’s node list, it must conform to the package’s failover policy, and any resources required by the package must be available on the chosen node. One resource is the subnet that is monitored for the package. If the subnet is not available, the package cannot start on this node.
PAGE 92
Understanding Serviceguard Software Components How Package Control Scripts Work Figure 3-14 Package Time Line for Run Script Execution At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error. Also, if the run script execution is not complete before the time specified in the RUN_SCRIPT_TIMEOUT, the package manager will kill the script.
PAGE 93
Understanding Serviceguard Software Components How Package Control Scripts Work Normal and Abnormal Exits from the Run Script Exit codes on leaving the run script determine what happens to the package next. A normal exit means the package startup was successful, but all other exits mean that the start operation did not complete successfully. • 0—normal exit. The package started normally, so all services are up on this node. • 1—abnormal exit, also known as NO_RESTART exit.
PAGE 94
Understanding Serviceguard Software Components How Package Control Scripts Work NOTE If you set restarts and also set SERVICE_FAILFAST_ENABLED to YES, the failfast will take place after restart attempts have failed. It does not make sense to set SERVICE_RESTART to “-R” for a service and also set SERVICE_FAILFAST_ENABLED to YES.
PAGE 95
Understanding Serviceguard Software Components How Package Control Scripts Work Package halting normally means that the package halt script executes (see the next section). However, if a failover package’s configuration has the SERVICE_FAILFAST_ENABLED flag set to yes for the service that fails, then the node will halt as soon as the failure is detected. If this flag is not set, the loss of a service will result in halting the package gracefully by running the halt script.
PAGE 96
Understanding Serviceguard Software Components How Package Control Scripts Work During Halt Script Execution This section applies only to failover packages. Once the package manager has detected the failure of a service or package that a failover pacakge depends on, or when the cmhaltpkg command has been issued for a particular failover package, then the package manager launches the halt script. That is, the failover package’s control script executes the ‘halt’ parameter.
PAGE 97
Understanding Serviceguard Software Components How Package Control Scripts Work This log has the same name as the halt script and the extension.log. Normal starts are recorded in the log, together with error messages or warnings related to halting the package. Normal and Abnormal Exits from the Halt Script The package’s ability to move to other nodes is affected by the exit conditions on leaving the halt script. The following are the possible exit codes: • 0—normal exit.
PAGE 98
Understanding Serviceguard Software Components How Package Control Scripts Work Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Node Failfast Enabled Service Failfast Enabled HP-UX Status on Primary after Error Service Failure NO YES TOC No N/A (TOC) Yes Service Failure YES NO Running Yes No Yes Service Failure NO NO Running Yes No Yes Run
PAGE 99
Understanding Serviceguard Software Components How Package Control Scripts Work Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results HP-UX Status on Primary after Error Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Package Allowed to Run on Alternate Node Node Failfast Enabled Service Failfast Enabled Halt Script Timeout YES Either Setting TOC N/A N/A (TOC) Yes, unless the timeout happened after the cmh
PAGE 100
Understanding Serviceguard Software Components How Package Control Scripts Work Table 3-4 Error Conditions and Package Movement for Failover Packages Package Error Condition Results HP-UX Status on Primary after Error Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Package Allowed to Run on Alternate Node Node Failfast Enabled Service Failfast Enabled Loss of Monitored Resource NO Either Setting Running Yes Yes, if the resource is not a deferred resource
PAGE 101
Understanding Serviceguard Software Components How the Network Manager Works How the Network Manager Works The purpose of the network manager is to detect and recover from network card and cable failures so that network services remain highly available to clients. In practice, this means assigning IP addresses for each package to the primary LAN interface card on the node where the package is running and monitoring the health of all interfaces, switching them when necessary.
PAGE 102
Understanding Serviceguard Software Components How the Network Manager Works Both stationary and relocatable IP addresses will switch to a standby LAN interface in the event of a LAN card failure. In addition, relocatable addresses (but not stationary addresses) can be taken over by an adoptive node if control of the package is transferred. This means that applications can access the package via its relocatable address without knowing which node the package currently resides on.
PAGE 103
Understanding Serviceguard Software Components How the Network Manager Works Monitoring LAN Interfaces and Detecting Failure At regular intervals, Serviceguard polls all the network interface cards specified in the cluster configuration file. Network failures are detected within each single node in the following manner. One interface on the node is assigned to be the poller.
PAGE 104
Understanding Serviceguard Software Components How the Network Manager Works — Each primary interface should have at least one standby interface, and it should be connected to a standby switch. — The primary switch should be directly connected to its standby. — There should be no single point of failure anywhere on all bridged nets.
PAGE 105
Understanding Serviceguard Software Components How the Network Manager Works During the transfer, IP packets will be lost, but TCP (Transmission Control Protocol) will retransmit the packets. In the case of UDP (User Datagram Protocol), the packets will not be retransmitted automatically by the protocol. However, since UDP is an unreliable service, UDP applications should be prepared to handle the case of lost network packets and recover appropriately.
PAGE 106
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-17 Cluster After Local Network Switching As the standby interface takes over, IP addresses will be switched to the hardware path associated with the standby interface. The switch is transparent at the TCP/IP level. All applications continue to run on their original nodes. During this time, IP traffic on Node 1 will be delayed as the transfer occurs.
PAGE 107
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-18 Local Switching After Cable Failure Local network switching will work with a cluster containing one or more nodes. You may wish to design a single-node cluster in order to take advantages of this local network switching feature in situations where you need only one node and do not wish to set up a more complex cluster.
PAGE 108
Understanding Serviceguard Software Components How the Network Manager Works Remote Switching A remote switch (that is, a package switch) involves moving packages and their associated IP addresses to a new system. The new system must already have the same subnetwork configured and working properly, otherwise the packages will not be started. With remote switching, TCP connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically.
PAGE 109
Understanding Serviceguard Software Components How the Network Manager Works recovery for environments which require high availability. Port aggregation capability is sometimes referred to as link aggregation or trunking. APA is also supported on dual-stack kernel. Once enabled, each link aggregate can be viewed as a single logical link of multiple physical ports with only one IP and MAC address.
PAGE 110
Understanding Serviceguard Software Components How the Network Manager Works Both the Single and Dual ported LANs in the non-aggregated configuration have four LAN cards, each associated with a separate non-aggregated IP address and MAC address, and each with its own LAN name (lan0, lan1, lan2, lan3). When these ports are aggregated all four ports are associated with a single IP address and MAC address.
PAGE 111
Understanding Serviceguard Software Components How the Network Manager Works interfaces to form their own subnets. Please refer to the document Using HP-UX VLAN (T1453-90001) for more details on how to configure VLAN interfaces. Support for HP-UX VLAN The support of VLAN is similar to other link technologies. VLAN interfaces can be used as heartbeat as well as data networks in the cluster.
PAGE 112
Understanding Serviceguard Software Components How the Network Manager Works • VLAN configurations are only supported on HP-UX 11i. • Only port-based and IP subnet-based VLANs are supported. Protocol-based VLAN will not be supported because Serviceguard does not support any transport protocols other than TCP/IP. • Each VLAN interface must be assigned an IP address in a unique subnet in order to operate properly unless it is used as a standby of a primary VLAN interface.
PAGE 113
Understanding Serviceguard Software Components Volume Managers for Data Storage Volume Managers for Data Storage A volume manager is a tool that lets you create units of disk storage known as storage groups. Storage groups contain logical volumes for use on single systems and in high availability clusters. In Serviceguard clusters, storage groups are activated by package control scripts.
PAGE 114
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-20 Physical Disks Within Shared Storage Units Figure 3-21 shows the individual disks combined in a multiple disk mirrored configuration. Figure 3-21 Mirrored Physical Disks Figure 3-22 shows the mirrors configured into LVM volume groups, shown in the figure as /dev/vgpkgA and /dev/vgpkgB. The volume groups are activated by Serviceguard packages for use by highly available applications.
PAGE 115
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-22 Multiple Devices Configured in Volume Groups Examples of Storage on Disk Arrays Figure 3-23 shows an illustration of storage configured on a disk array. Physical disks are configured by an array utility program into logical units or LUNs which are then seen by the operating system.
PAGE 116
Understanding Serviceguard Software Components Volume Managers for Data Storage NOTE LUN definition is normally done using utility programs provided by the disk array manufacturer. Since arrays vary considerably, you should refer to the documentation that accompanies your storage unit. Figure 3-24 shows LUNs configured with multiple paths (links) to provide redundant pathways to the data.
PAGE 117
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-25 Multiple Paths in Volume Groups Types of Volume Manager Serviceguard allows a choice of volume managers for data storage: • HP-UX Logical Volume Manager (LVM) and (optionally) Mirrordisk/UX • VERITAS Volume Manager for HP-UX (VxVM)—Base and add-on Products • VERITAS Cluster Volume Manager for HP-UX (CVM) Separate sections in Chapters 5 and 6 explain how to configure cluster storage using all of these volume m
PAGE 118
Understanding Serviceguard Software Components Volume Managers for Data Storage NOTE The HP-UX Logical Volume Manager is described in Managing Systems and Workgroups. A complete description of VERITAS volume management products is available in the VERITAS Volume Manager for HP-UX Release Notes. HP-UX Logical Volume Manager (LVM) Logical Volume Manager (LVM) is the default storage management product on HP-UX. Included with the operating system, LVM is available on all cluster nodes.
PAGE 119
Understanding Serviceguard Software Components Volume Managers for Data Storage • require a fast cluster startup time. • do not require shared storage group activation. (required with CFS) • do not have all nodes cabled to all disks. (required with CFS) • need to use software RAID mirroring or striped mirroring. • have multiple heartbeat subnets configured. Propagation of Disk Groups in VxVM With VxVM, a disk group can be created on any node, with the cluster up or not.
PAGE 120
Understanding Serviceguard Software Components Volume Managers for Data Storage Cluster information is provided via a special system multi-node package, which runs on all nodes in the cluster. The cluster must be up and must be running this package in order to configure VxVM disk groups for use with CVM. The VERITAS CVM package for version 3.5 is named VxVM-CVM-pkg; the package for CVM version 4.1 is named SG-CFS-pkg. CVM allows you to activate storage on more than one node at a time.
PAGE 121
Understanding Serviceguard Software Components Volume Managers for Data Storage Cluster Startup Time with CVM With CVM, all shared disk groups (DGs) are imported when the system multi-node’s control script starts up CVM. Depending on the number of DGs, the number of nodes and the configuration of these (number of disks, volumes, etc.) this can take some time (current timeout value for this package is 3 minutes but for larger configurations this may have to be increased).
PAGE 122
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Logical Volume Manager (LVM) Mirrordisk/UX 122 Advantages • Software is provided with all versions of HP-UX. • Provides up to 3-way mirroring using optional mirror disk UX software. • Supports up to 16 nodes in a Serviceguard cluster • Supports using PV links for multipl.
PAGE 123
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Shared Logical Volume Manager (SLVM) Base-VxVM Chapter 3 Advantages • Provided free with SGeRAC for multi-node access to RAC data • Supports up to 16 nodes in shared read/write mode for each cluster Tradeoffs • Lacks the flexibility and extended features of some other volume managers. • Limited mirroring support.
PAGE 124
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product VERITAS Volume Manager— Full VxVM product B9116AA (VxVM 3.5) B9116BA (VxVM 4.1) 124 Advantages Tradeoffs • Disk group configuration from any node. • Requires purchase of additional license. • DMP for active/active storage devices. • Cannot be used for a cluster lock. • Supports up to 16 nodes. • • Supports exclusive activation.
PAGE 125
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product VERITAS Cluster Volume Manager— B9117AA (CVM 3.5) B9117BA (CVM 4.1) Chapter 3 Advantages • Provides volume configuration propagation. • Supports cluster shareable disk groups. • Package startup time is faster than with VxVM. • Supports shared activation. • Supports exclusive activation.
PAGE 126
Understanding Serviceguard Software Components Responses to Failures Responses to Failures Serviceguard responds to different kinds of failures in specific ways. For most hardware failures, the response is not user-configurable, but for package and service failures, you can choose the system’s response, within limits.
PAGE 127
Understanding Serviceguard Software Components Responses to Failures Responses to Hardware Failures If a serious system problem occurs, such as a system panic or physical disruption of the SPU's circuits, Serviceguard recognizes a node failure and transfers the failover packages currently running on that node to an adoptive node elsewhere in the cluster. (System multi-node and multi-node packages do not failover.
PAGE 128
Understanding Serviceguard Software Components Responses to Failures Responses to Package and Service Failures In the default case, the failure of a failover package or of a service within a package causes the failover package to shut down by running the control script with the 'stop' parameter, and then restarting the package on an alternate node. A package will fail if it is configured to have a dependency on another package, and the dependency package fails.
PAGE 129
Understanding Serviceguard Software Components Responses to Failures Network Communication Failure An important element in the cluster is the health of the network itself. As it continuously monitors the cluster, each node listens for heartbeat messages from the other nodes confirming that all nodes are able to communicate with each other.
PAGE 130
Understanding Serviceguard Software Components Responses to Failures 130 Chapter 3
PAGE 131
Planning and Documenting an HA Cluster 4 Planning and Documenting an HA Cluster Building a Serviceguard cluster begins with a planning phase in which you gather and record information about all the hardware and software components of the configuration. Planning starts with a simple list of hardware and network components. As the installation and configuration continue, the list is extended and refined.
PAGE 132
Planning and Documenting an HA Cluster • Hardware Planning • Power Supply Planning • Quorum Server Planning • LVM Planning • CVM and VxVM Planning • Cluster Configuration Planning • Package Configuration Planning The description of each planning step in this chapter is demonstrated with an example worksheet. A complete set of blank worksheets are in Appendix F, “Blank Planning Worksheets,” on page 443.
PAGE 133
Planning and Documenting an HA Cluster General Planning General Planning A clear understanding of your high availability objectives will quickly help you to define your hardware requirements and design your system. Use the following questions as a guide for general planning: 1. What applications must continue to be available in the event of a failure? 2. What system resources (processing power, networking, SPU, memory, disk space) are needed to support these applications? 3.
PAGE 134
Planning and Documenting an HA Cluster General Planning additional disk hardware for shared data storage. If you intend to expand your cluster without the need to bring it down, careful planning of the initial configuration is required. Use the following guidelines: • Remember the rules for cluster locks when considering expansion. A one-node cluster does not require a cluster lock. A two-node cluster must have a cluster lock. In clusters larger than 3 nodes, a cluster lock is strongly recommended.
PAGE 135
Planning and Documenting an HA Cluster Hardware Planning Hardware Planning Hardware planning requires examining the physical hardware itself. One useful procedure is to sketch the hardware configuration in a diagram that shows adapter cards and buses, cabling, disks and peripherals. A sample diagram for a two-node cluster is shown in Figure 4-1. Figure 4-1 Sample Cluster Configuration Create a similar sketch for your own cluster, and record the information on the Hardware Worksheet.
PAGE 136
Planning and Documenting an HA Cluster Hardware Planning • Network Information • Disk I/O Information SPU Information SPU information includes the basic characteristics of the systems you are using in the cluster. Different models of computers can be mixed in the same cluster. This configuration model also applies to HP Integrity servers. HP-UX workstations are not supported for Serviceguard.
PAGE 137
Planning and Documenting an HA Cluster Hardware Planning LAN Information While a minimum of one LAN interface per subnet is required, at least two LAN interfaces, one primary and one or more standby, are needed to eliminate single points of network failure. It is recommended that you configure heartbeats on all subnets, including those to be used for client data. On the worksheet, enter the following for each LAN interface: Subnet Name Enter the IP address mask for the subnet.
PAGE 138
Planning and Documenting an HA Cluster Hardware Planning begin to attempt a failover when network traffic is not noticed for a time. (Serviceguard calculates the time depending on the type of LAN card.) The configuration file specifies one of two ways to decide when the network interface card has failed: • INOUT - The default method will count packets sent by polling, and declare a card down only the count stops incrementing for both the inbound and the outbound packets.
PAGE 139
Planning and Documenting an HA Cluster Hardware Planning RS232 Information If you plan to configure a serial line (RS232), you need to determine the serial device file that corresponds with the serial port on each node. 1. If you are using a MUX panel, make a note of the system slot number that corresponds to the MUX and also note the port number that appears next to the selected port on the panel. 2. On each node, use ioscan -fnC tty to display hardware addresses and device file names.
PAGE 140
Planning and Documenting an HA Cluster Hardware Planning Setting SCSI Addresses for the Largest Expected Cluster Size SCSI standards define priority according to SCSI address. To prevent controller starvation on the SPU, the SCSI interface cards must be configured at the highest priorities. Therefore, when configuring a highly available cluster, you should give nodes the highest priority SCSI addresses, and give disks addresses of lesser priority.
PAGE 141
Planning and Documenting an HA Cluster Hardware Planning NOTE When a boot/root disk is configured with a low-priority address on a shared SCSI bus, a system panic can occur if there is a timeout on accessing the boot/root device. This can happen in a cluster when many nodes and many disks are configured on the same bus.
PAGE 142
Planning and Documenting an HA Cluster Hardware Planning • • • • lvdisplay -v lvlnboot -v vxdg list (VxVM and CVM) vxprint (VxVM and CVM) These are standard HP-UX commands. See their man pages for information of specific usage. The commands should be issued from all nodes after installing the hardware and rebooting the system. The information will be useful when doing storage group and cluster configuration.
PAGE 143
Planning and Documenting an HA Cluster Hardware Planning RS232 Device File ___/dev/tty0p0__ Second Node Name ____ftsys10__________ ============================================================================= Disk I/O Information for Shared Disks: Bus Type _SCSI_ Slot Number _4__ Address _16_ Disk Device File __c0t1d0_ Bus Type _SCSI_ Slot Number _6_ Address _24_ Disk Device File __c0t2d0_ Bus Type ______ Slot Number ___ Address ____ Disk Device File _________ Attach a printout of the output from the
PAGE 144
Planning and Documenting an HA Cluster Power Supply Planning Power Supply Planning There are two sources of power for your cluster which you will have to consider in your design: line power and uninterruptible power sources (UPS). Loss of a power circuit should not bring down the cluster. Frequently, servers, mass storage devices, and other hardware have two or three separate power supplies, so they can survive the loss of power to one or more power supplies or power circuits.
PAGE 145
Planning and Documenting an HA Cluster Power Supply Planning Other Unit Enter the number of any other unit. Power Supply Enter the power supply unit number of the UPS to which the host or other device is connected. Be sure to follow UPS and cabinet power limits as well as SPU power limits. Power Supply Configuration Worksheet The following worksheet will help you organize and record your specific power supply configuration.
PAGE 146
Planning and Documenting an HA Cluster Power Supply Planning Other Power: 146 Unit Name __________________________ Power Supply _____________________ Unit Name __________________________ Power Supply _____________________ Chapter 4
PAGE 147
Planning and Documenting an HA Cluster Quorum Server Planning Quorum Server Planning The quorum server (QS) provides tie-breaking services for clusters. The QS is described in “Use of the Quorum Server as the Cluster Lock” on page 70. A quorum server: NOTE • can be used with up to 50 clusters, not exceeding 100 nodes total. • can support a cluster with any supported number of nodes.
PAGE 148
Planning and Documenting an HA Cluster Quorum Server Planning Enter the names (31 bytes or less) of all cluster nodes that will be supported by this quorum server. These entries will be entered into qs_authfile on the system that is running the quorum server process. Quorum Server Worksheet The following worksheet will help you organize and record your specific quorum server hardware configuration. Blank worksheets are in Appendix F, “Blank Planning Worksheets,” on page 443.
PAGE 149
Planning and Documenting an HA Cluster LVM Planning LVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM), or using VERITAS VxVM and CVM software, which are described in the next section. When designing your disk layout using LVM, you should consider the following: • The root disk should belong to its own volume group.
PAGE 150
Planning and Documenting an HA Cluster LVM Planning LVM Worksheet The following worksheet will help you organize and record your specific physical disk configuration. This worksheet is an example; blank worksheets are in Appendix F, “Blank Planning Worksheets,” on page 443. Make as many copies as you need. Fill out the worksheet and keep it for future reference. This worksheet only includes volume groups and physical volumes.
PAGE 151
Planning and Documenting an HA Cluster LVM Planning Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Chapter 4 151
PAGE 152
Planning and Documenting an HA Cluster CVM and VxVM Planning CVM and VxVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM, described in the previous section), or using VERITAS VxVM and CVM software. When designing a storage configuration using CVM or VxVM disk groups, consider the following: • You must create a rootdg disk group on each cluster node that will be using VxVM storage. This is not the same as the HP-UX root disk, if an LVM volume group is used.
PAGE 153
Planning and Documenting an HA Cluster CVM and VxVM Planning CVM and VxVM Worksheet The following worksheet will help you organize and record your specific physical disk configuration. This worksheet is an example; blank worksheets are in Appendix F, “Blank Planning Worksheets,” on page 443. Make as many copies as you need. Fill out the worksheet and keep it for future reference. This worksheet includes volume groups and physical volumes.
PAGE 154
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Configuration Planning A cluster should be designed to provide the quickest possible recovery from failures. The actual time required to recover from a failure depends on several factors: • The length of the cluster heartbeat interval and node timeout. They should each be set as short as practical, but not shorter than 1000000 (one second) and 2000000 (two seconds), respectively.
PAGE 155
Planning and Documenting an HA Cluster Cluster Configuration Planning Heartbeat Subnet and Re-formation Time The speed of cluster re-formation is partially dependent on the type of heartbeat network that is used. If two or more heartbeat subnets are used, the one with the fastest failover time is used. Cluster Lock Information The purpose of the cluster lock is to ensure that only one new cluster is formed in the event that exactly half of the previously clustered nodes try to form a new cluster.
PAGE 156
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Configuration Parameters For the operation of the cluster manager, you need to define a set of cluster parameters. These are stored in the binary cluster configuration file, which is located on all nodes in the cluster. These parameters can be entered by editing the cluster configuration template file created by issuing the cmquerycl command, as described in the chapter “Building an HA Cluster Configuration.
PAGE 157
Planning and Documenting an HA Cluster Cluster Configuration Planning The volume group containing the physical disk volume on which a cluster lock is written. Identifying a cluster lock volume group is essential in a two-node cluster. If you are creating two cluster locks, enter the volume group name or names for both locks. This parameter is only used when you employ a lock disk for tie-breaking services in the cluster. Use FIRST_CLUSTER_LOCK_VG for the first lock volume group.
PAGE 158
Planning and Documenting an HA Cluster Cluster Configuration Planning The use of a private heartbeat network is not advisable if you plan to use Remote Procedure Call (RPC) protocols and services. RPC assumes that each network adapter device or I/O card is connected to a route-able network. An isolated or private heartbeat LAN is not route-able, and could cause an RPC request-reply, directed to that LAN, to risk time-out without being serviced.
PAGE 159
Planning and Documenting an HA Cluster Cluster Configuration Planning Enter the physical volume name as it appears on both nodes in the cluster (the same physical volume may have a different name on each node). If you are creating two cluster locks, enter the physical volume names for both locks. The physical volume group identifier can contain up to 39 characters. SERIAL_DEVICE_FILE The name of the device file that corresponds to serial (RS232) port that you have chosen on each node.
PAGE 160
Planning and Documenting an HA Cluster Cluster Configuration Planning 30,000,000 in the ASCII file, or 30 seconds in Serviceguard Manager. The default setting yields the fastest cluster reformations. However, the use of the default value increases the potential for spurious reformations due to momentary system hangs or network load spikes. For a significant portion of installations, a setting of 5,000,000 to 8,000,000 (5 to 8 seconds) is more appropriate.
PAGE 161
Planning and Documenting an HA Cluster Cluster Configuration Planning MAX_CONFIGURED_PACKAGES This parameter sets the maximum number of packages that can be configured in the cluster. The minimum value is 0, and the maximum value is 150. The default value for Serviceguard A.11.17 is 150, and you can change it without halting the cluster. VOLUME_GROUP The name of an LVM volume group whose disks are attached to at least two nodes in the cluster. Such disks are considered cluster aware.
PAGE 162
Planning and Documenting an HA Cluster Cluster Configuration Planning begin to attempt a failover when network traffic is not noticed for a time. (Serviceguard calculates the time depending on the type of LAN card.) The configuration file specifies one of two ways to decide when the network interface card has failed: • INOUT - The default method will count inbound and outbound failures separately, and declare a card down only when both have reached a critical level.
PAGE 163
Planning and Documenting an HA Cluster Cluster Configuration Planning Quorum Server Timeout Extension: _______________ microseconds =========================================================================== Subnets: =============================================================================== Heartbeat Subnet: ___15.13.168.0______ Monitored Non-heartbeat Subnet: _____15.12.172.
PAGE 164
Planning and Documenting an HA Cluster Package Configuration Planning Package Configuration Planning Planning for packages involves assembling information about each group of highly available services. Some of this information is used in creating the package configuration file, and some is used for editing the package control script. NOTE LVM Volume groups that are to be activated by packages must also be defined as cluster aware in the cluster configuration file.
PAGE 165
Planning and Documenting an HA Cluster Package Configuration Planning • If a package moves to an adoptive node, what effect will its presence have on performance? Create a list by package of volume groups, logical volumes, and file systems. Indicate which nodes need to have access to common file systems at different times. It is recommended that you use customized logical volume names that are different from the default logical volume names (lvol1, lvol2, etc.).
PAGE 166
Planning and Documenting an HA Cluster Package Configuration Planning CAUTION Serviceguard manages VERITAS processes, specifically gab and LLT, through system multi-node packages. As a result, the VERITAS administration commands such as gabconfig, llthosts, and lltconfig should only be used in the display mode, such as gabconfig -a. You could crash nodes or the entire cluster if you use VERITAS commands such as the gab* or llt* commands to configure these components or affect their runtime behavior.
PAGE 167
Planning and Documenting an HA Cluster Package Configuration Planning In the package’s configuration file, you fill out the dependency parameter to specify the requirement SG-CFS-MP-id# =UP on the SAME_NODE. 2. The mount point packages should not run unless the disk group packages are running. Create the mount point packages using the cfsmntadm and cfsmount commands. Serviceguard names the mount point packages SG-CFS-MP-id#, automatically incrementing their ID numbers.
PAGE 168
Planning and Documenting an HA Cluster Package Configuration Planning NOTE Please note that the diskgroup and mount point multi-node packages (SG-CFS-DG_ID# and SG-CFS-MP_ID#) do not monitor the health of the disk group and mount point. They check that the application packages that depend on them have access to the disk groups and mount points.
PAGE 169
Planning and Documenting an HA Cluster Package Configuration Planning RESOURCE_NAME RESOURCE_POLLING_INTERVAL RESOURCE_START RESOURCE_UP_VALUE /net/interfaces/lan/status/lan0 60 DEFERRED = UP RESOURCE_NAME RESOURCE_POLLING_INTERVAL RESOURCE_START RESOURCE_UP_VALUE /net/interfaces/lan/status/lan1 60 DEFERRED = UP RESOURCE_NAME RESOURCE_POLLING_INTERVAL RESOURCE_START RESOURCE_UP_VALUE /net/interfaces/lan/status/lan2 60 AUTOMATIC = UP In the package control script, specify only the deferred resources,
PAGE 170
Planning and Documenting an HA Cluster Package Configuration Planning Table 3-3 on page 85 describes different types of failover behavior and how to set the parameters that determine each behavior. Package Configuration File Parameters Prior to generation of the package configuration file, assemble the following package configuration data. The parameter names given below are the names that appear in Serviceguard Manager.
PAGE 171
Planning and Documenting an HA Cluster Package Configuration Planning running the package. In the ASCII package configuration file, this parameter is known as FAILBACK_POLICY. Default is MANUAL, which means no attempt will be made to move the package back to its primary node when it is running on an alternate node. (This is the same behavior as in previous versions of Serviceguard.
PAGE 172
Planning and Documenting an HA Cluster Package Configuration Planning When a package starts up, the Package Switching flag is set to match the AUTO_RUN setting, but this flag can be changed temporarily with the cmmodpkg command while the package is running. Default is AUTO_RUN YES. Check the box to enable auto-run in Serviceguard Manager; enter AUTO_RUN YES or NO in the ASCII file. Local LAN failover Enter Enabled or Disabled.
PAGE 173
Planning and Documenting an HA Cluster Package Configuration Planning However, if the package halt script fails with “exit 1”, Serviceguard does not halt the node, but sets NO_RESTART for the package, which causes package switching (AUTO_RUN) to be disabled, thereby preventing the package from starting on any adoptive node. Controlscript pathname Serviceguard Manager will automatically create and maintain a control script if you use Guided Mode, the Serviceguard Manager default.
PAGE 174
Planning and Documenting an HA Cluster Package Configuration Planning Run script timeout and Halt script timeout If the script has not completed by the specified timeout value, Serviceguard will terminate the script. In the ASCII configuration file, these parameters are RUN_SCRIPT_TIMEOUT and HALT_SCRIPT_TIMEOUT. Enter a value in seconds. The default is 0, or no timeout. The minimum is 10 seconds, but the minimum HALT_SCRIPT_TIMEOUT value must be greater than the sum of all the Service Halt Timeout values.
PAGE 175
Planning and Documenting an HA Cluster Package Configuration Planning NOTE Do not use STORAGE_GROUP parameters to reference CVM disk groups in a cluster using the CFS file system. CFS resources are controlled by two multi-node packages, one for the disk group and one for the mount point. NOTE Use the parameter for CVM storage only. Do not enter CVM disk groups that are used in a CFS cluster.
PAGE 176
Planning and Documenting an HA Cluster Package Configuration Planning Service halt timeout In the event of a service halt, Serviceguard will first send out a SIGTERM signal to terminate the service. If the process is not terminated, Serviceguard will wait for the specified timeout before sending out the SIGKILL signal to force process termination. In the ASCII package configuration file, this parameter is SERVICE_HALT_TIMEOUT.
PAGE 177
Planning and Documenting an HA Cluster Package Configuration Planning Serviceguard Manager in the EMS (Event Monitoring Service) tab’s Browse button (“Available EMS resources”), or obtain it from the documentation supplied with the resource monitor. A maximum of 60 resources may be defined per cluster. Note also the limit on Resource Up Values described below. Maximum length of the resource name string is 1024 characters. Resource polling interval The frequency of monitoring a configured package resource.
PAGE 178
Planning and Documenting an HA Cluster Package Configuration Planning RESOURCE_UP_VALUE. The Resource Up Value appears on the “Description of selected EMS resources” list provided in Serviceguard Manager’s EMS Browse button, or you can obtain it from the documentation supplied with the resource monitor. You can configure a total of 15 Resource Up Values per package. For example, if there is only one resource in the package, then a maximum of 15 Resource Up Values can be defined.
PAGE 179
Planning and Documenting an HA Cluster Package Configuration Planning Package Configuration Worksheet Assemble your package configuration data in a separate worksheet for each package, as shown in the following example. This worksheet is an example; blank worksheets are in Appendix F, “Blank Planning Worksheets,” on page 443.
PAGE 180
Planning and Documenting an HA Cluster Package Configuration Planning : Package Control Script Variables The control script that accompanies each package must also be edited to assign values to a set of variables. The following variables can be set: PATH Specify the path to be used by the script. VGCHANGE Specifies the method of activation for LVM volume groups. Leave the default (VGCHANGE=“vgchange -a e”) if you want volume groups activated in exclusive mode.
PAGE 181
Planning and Documenting an HA Cluster Package Configuration Planning VxVM disk groups do not allow you to select specific activation commands. The VxVM disk group activation always uses the same command. NOTE VXVOL Controls the method of mirror recovery for mirrored VxVM volumes. Use the default VXVOL=“vxvol -g \$DiskGroup startall” if you want the package control script to wait until recovery has been completed.
PAGE 182
Planning and Documenting an HA Cluster Package Configuration Planning These array parameters, entered together as triplets into array variables. Each triplet specifies a filesystem, a logical volume, and a mount option string for a file system used by the package. In the package control script file, these variables are arrays, as follows: LV, FS and FS_MOUNT_OPT. On starting the package, the script may activate one or more storage groups, and it may mount logical volumes onto file systems.
PAGE 183
Planning and Documenting an HA Cluster Package Configuration Planning value less than 1 is specified, the script defaults the variable to 1 and writes a warning message in the package control script log file. CONCURRENT_FSCK_OPERATIONS Specifies the number of concurrent fsck commands to allow during package startup. The default is 1. Setting this variable to a higher value may improve performance when checking a large number of file systems.
PAGE 184
Planning and Documenting an HA Cluster Package Configuration Planning Enter a unique name for each specific service within the package. All services are monitored by Serviceguard. You may specify up to 30 services per package. Each name must be unique within the cluster. The service name is the name used by cmrunserv and cmhaltserv inside the package control script. It must be the same as the name specified for the service in the package ASCII configuration file.
PAGE 185
Planning and Documenting an HA Cluster Package Configuration Planning For each deferred resource specified in the package configuration ASCII file, you must enter the resource name in this array in the control script. The name should be spelled exactly as it is spelled in the RESOURCE_NAME parameter in the package ASCII configuration file. In the package control script file, enter the value into the array known as DEFERRED_RESOURCE_NAME.
PAGE 186
Planning and Documenting an HA Cluster Package Configuration Planning VXVM_DG[0]___/dev/vx/dg01____VXVM_DG[1]____________VXVM_DG[2]_____________ ================================================================================ Logical Volumes and File Systems: LV[0]___/dev/vg01/1v011____FS[0]____/mnt1___________FS_MOUNT_OPT[0]_________ LV[1]______________________FS[1]____________________FS_MOUNT_OPT[1]_________ LV[2]______________________FS[2]____________________FS_MOUNT_OPT[2]_________ FS Umount Count: ___
PAGE 187
Building an HA Cluster Configuration 5 Building an HA Cluster Configuration This chapter and the next take you through the configuration tasks required to set up a Serviceguard cluster. These procedures are carried out on one node, called the configuration node, and the resulting binary file is distributed by Serviceguard to all the nodes in the cluster. In the examples in this chapter, the configuration node is named ftsys9, and the sample target node is called ftsys10.
PAGE 188
Building an HA Cluster Configuration If you are using Serviceguard commands to configure the cluster and packages, use the man pages for each command to get information about syntax and usage.
PAGE 189
Building an HA Cluster Configuration Preparing Your Systems Preparing Your Systems Before configuring your cluster, ensure that all cluster nodes possess the appropriate security files, kernel configuration, and NTP (network time protocol) configuration. Understanding Where Files Are Located Serviceguard uses a special file, /etc/cmcluster.conf, to define the locations for configuration and log files within the HP-UX filesystem.
PAGE 190
Building an HA Cluster Configuration Preparing Your Systems NOTE Do not edit the /etc/cmcluster.conf configuration file. Editing Security Files Serviceguard daemons grant access to commands by matching incoming hostname and username against defined access control policies. To understand how to properly configure these policies, administrators need to understand how Serviceguard handles hostnames, IP addresses, usernames and the relevant configuration files.
PAGE 191
Building an HA Cluster Configuration Preparing Your Systems NOTE 15.145.162.131 10.8.0.131 10.8.1.131 gryf.uksr.hp.com gryf.uksr.hp.com gryf.uksr.hp.com gryf gryf gryf 15.145.162.132 10.8.0.132 10.8.1.132 sly.uksr.hp.com sly.uksr.hp.com sly.uksr.hp.com sly sly sly 15.145.162.150 bit.uksr.hp.com bit If you use a fully qualified domain name (FQDN), Serviceguard will only recognize the hostname portion. For example, two nodes gryf.uksr.hp.com and gryf.cup.hp.
PAGE 192
Building an HA Cluster Configuration Preparing Your Systems For NIS, enter (one line): hosts: files [NOTFOUND=continue UNAVAIL=continue] nis [NOTFOUND=return UNAVAIL=return Username Validation Serviceguard relies on the ident service of the client node to verify the username of the incoming network connection. If the Serviceguard daemon is unable to connect to the client's ident daemon, permission will be denied. Root on a node is defined as any user who has the UID of 0.
PAGE 193
Building an HA Cluster Configuration Preparing Your Systems Access Roles Serviceguard has two levels of access, root and non-root: • Root Access: Users who have been authorized for root access have total control over the configuration of the cluster and packages. • Non-root Access: Non-root users can be assigned one of four roles: — Monitor: These users have read-only access to the cluster and its packages.
PAGE 194
Building an HA Cluster Configuration Preparing Your Systems Setting access control policies uses different mechanisms depending on the state of the node. Nodes not configured into a cluster use different security configurations than nodes in a cluster. The following two sections discuss how to configure these access control policies. Setting Controls for an Unconfigured Node Serviceguard access control policies define what a remote node can do to the local node.
PAGE 195
Building an HA Cluster Configuration Preparing Your Systems Using the cmclnodelist File The cmclnodelist file is not created by default in new installations. If administrators wish to create this "bootstrap" file they should add a comment such as the following: ########################################################### # Do Not Edit This File # This is only a temporary file to bootstrap an unconfigured # node with Serviceguard version A.11.
PAGE 196
Building an HA Cluster Configuration Preparing Your Systems Using Equivalent Hosts For installations that wish to use hostsequiv, the primary IP addresses or hostnames for each node in the cluster need to be authorized. For more information on using hostsequiv, see man hosts.equiv(4) or the HP-UX guide, Managing Systems and Workgroups, posted at http://docs.hp.com.
PAGE 197
Building an HA Cluster Configuration Preparing Your Systems — MONITOR — FULL_ADMIN — PACKAGE_ADMIN MONITOR and FULL_ADMIN can only be set in the cluster configuration file and they apply to the entire cluster. PACKAGE_ADMIN can be set in the cluster or a package configuration file. If it is set in the cluster configuration file, PACKAGE_ADMIN applies to all configured packages. If it is set in a package configuration file, PACKAGE_ADMIN applies to that package only.
PAGE 198
Building an HA Cluster Configuration Preparing Your Systems # Policy 3: USER_NAME ANY_USER USER_HOST ANY_SERVICEGUARD_NODE USER_ROLE MONITOR In the above example, the configuration would fail, user john is assigned two roles. Policy 2 is redundant because PACKAGE_ADMIN already includes the role of MONITOR. Policy 3 does not conflict with any other policies, even though the wildcard ANY_USER includes the individual user john. Plan the cluster’s roles and validate them as soon as possible.
PAGE 199
Building an HA Cluster Configuration Preparing Your Systems To do this, add one of the following lines in the /etc/nsswitch.conf file: • for DNS, enter (one line): hosts: files [NOTFOUND=continue UNAVAIL=contine] dns [NOTFOUND=return UNAVAIL=return] • for NIS, enter (one line): hosts: files [NOTFOUND=continue UNAVAIL=contine] nis {NOTFOUND=return UNAVAIL=return] A workaround for the problem that still retains the ability to use conventional name lookup is to configure the /etc/nsswitch.
PAGE 200
Building an HA Cluster Configuration Preparing Your Systems NOTE For each cluster node, the public network IP address must be the first address listed. This enables other applications to talk to other nodes on public networks. 2. Edit or create the /etc/nsswitch.
PAGE 201
Building an HA Cluster Configuration Preparing Your Systems 2. Add this disk to the current root volume group. # vgextend /dev/vg00 /dev/dsk/c4t6d0 3. Make the new disk a boot disk. # mkboot -l /dev/rdsk/c4t6d0 4. Mirror the boot, primary swap, and root logical volumes to the new bootable disk. Ensure that all devices in vg00, such as /usr, /swap, etc., are mirrored.
PAGE 202
Building an HA Cluster Configuration Preparing Your Systems Root: lvol3 on: Swap: lvol2 on: Dump: lvol2 on: /dev/dsk/c4t6d0 /dev/dsk/c4t5d0 /dev/dsk/c4t6d0 /dev/dsk/c4t5d0 /dev/dsk/c4t6d0 /dev/dsk/c4t6d0, 0 Choosing Cluster Lock Disks The following guidelines apply if you are using a lock disk. The cluster lock disk is configured on a volume group that is physically connected to all cluster nodes. This volume group may also contain data that is used by packages.
PAGE 203
Building an HA Cluster Configuration Preparing Your Systems NOTE You must use the vgcfgbackup and vgcfgrestore commands to back up and restore the lock volume group configuration data regardless of how you create the lock volume group. Ensuring Consistency of Kernel Configuration Make sure that the kernel configurations of all cluster nodes are consistent with the expected behavior of the cluster during failover.
PAGE 204
Building an HA Cluster Configuration Preparing Your Systems Third-party applications that are running in a Serviceguard environment may require tuning of network and kernel parameters: • ndd is the network tuning utility. For more information, see the man page for ndd(1M) • kmtune is the system tuning utility. For more information, see the man page for kmtune(1M).
PAGE 205
Building an HA Cluster Configuration Preparing Your Systems Preparing for Changes in Cluster Size If you intend to add additional nodes to the cluster online, while it is running, ensure that they are connected to the same heartbeat subnets and to the same lock disks as the other cluster nodes. In selecting a cluster lock configuration, be careful to anticipate any potential need for additional cluster nodes.
PAGE 206
Building an HA Cluster Configuration Setting up the Quorum Server Setting up the Quorum Server The quorum server software, which has to be running during cluster configuration, must be installed on a system other than the nodes on which your cluster will be running. NOTE It is recommended that the node on which the quorum server is running be in the same subnet as the clusters for which it is providing services. This will help prevent any network delays which could affect quorum server operation.
PAGE 207
Building an HA Cluster Configuration Setting up the Quorum Server Running the Quorum Server The quorum server must be running during the following cluster operations: • when the cmquerycl command is issued. • when the cmapplyconf command is issued. • when there is a cluster re-formation. By default, quorum server run-time messages go to stdout and stderr. It is suggested that you create a directory /var/adm/qs, then redirect stdout and stderr to a file in this directory, for example, /var/adm/qs/qs.
PAGE 208
Building an HA Cluster Configuration Installing Serviceguard Installing Serviceguard Installing Serviceguard includes updating the software via Software Distributor. It is assumed that you have already installed HP-UX. Use the following steps for each node: 1. Mount the distribution media in the tape drive or CD ROM reader. 2. Run Software Distributor, using the swinstall command. 3. Specify the correct input device. 4. Choose the following bundle from the displayed list: T1905BA Serviceguard 5.
PAGE 209
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Creating the Storage Infrastructure and Filesystems with LVM and VxVM In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes.
PAGE 210
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Creating Volume Groups for Mirrored Individual Data Disks The procedure described in this section uses physical volume groups for mirroring of individual disks to ensure that each logical volume is mirrored to a disk on a different I/O bus. This kind of arrangement is known as PVG-strict mirroring.
PAGE 211
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Selecting Disks for the Volume Group Obtain a list of the disks on both nodes and identify which device files are used for the same disk on both.
PAGE 212
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM The first command creates the volume group and adds a physical volume to it in a physical volume group called bus0. The second command adds the second drive to the volume group, locating it in a different physical volume group named bus1. The use of physical volume groups allows the use of PVG-strict mirroring of disks and PV links. 4. Repeat this procedure for additional volume groups.
PAGE 213
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Creating Volume Groups for Disk Arrays Using PV Links If you are configuring volume groups that use mass storage on HP's HA disk arrays, you should use redundant I/O channels from each node, connecting them to separate ports on the array. Then you can define alternate links (also called PV links) to the LUNs or logical disks you have defined on the array.
PAGE 214
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Use the following steps to configure a volume group for this logical disk: 1. First, set up the group directory for vgdatabase: # mkdir /dev/vgdatabase 2.
PAGE 215
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Distributing Volume Groups to Other Nodes After creating volume groups for cluster data, you must make them available to any cluster node that will need to activate the volume group. The cluster lock volume group must be made available to all nodes. Deactivating the Volume Group At the time you create the volume group, it is active on the configuration node (ftsys9, for example).
PAGE 216
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM 5. Import the volume group data using the map file from node ftsys9. On node fsys10, enter: # vgimport -s -m /tmp/vgdatabase.map /dev/vgdatabase Note that the disk device names on ftsys10 may be different from their names on ftsys9. You should check to ensure that the physical volume names are correct throughout the cluster.
PAGE 217
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM # mount /dev/vgdatabase/lvol1 /mnt1 10. Unmount the volume group on ftsys10: # umount /mnt1 11. Deactivate the volume group on ftsys10: # vgchange -a n /dev/vgdatabase Making Physical Volume Group Files Consistent Skip ahead to the next section if you do not use physical volume groups for mirrored individual disks in your disk configuration.
PAGE 218
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Creating Additional Volume Groups The foregoing sections show in general how to create volume groups and logical volumes for use with Serviceguard. Repeat the procedure for as many volume groups as you need to create, substituting other volume group names, logical volume names, and physical volume names. Pay close attention to the disk device names.
PAGE 219
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM IMPORTANT The rootdg for the VERITAS Cluster Volume Manager 3.5 is not the same as the HP-UX root disk if an LVM volume group is used for the HP-UX root disk filesystem. Note also that rootdg cannot be used for shared storage. However, rootdg can be used for other local filesystems (e.g., /export/home), so it need not be wasted. (CVM 4.1 does not have this restriction.
PAGE 220
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM NOTE These commands make the disk and its data unusable by LVM, and allow it to be initialized by VxVM. (The commands should only be used if you have previously used the disk with LVM and do not want to save the data on it.
PAGE 221
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM # vxprint -g logdata The output of this command is shown in the following example: NOTE TY NAME ASSOC TUTILO PUTILO KSTATE LENGTH v pl ENABLED ENABLED 1024000 1024000 logdata fsgen logdata-01 system PLOFFS STATE ACTIVE ACTIVE The specific commands for creating mirrored and multi-path storage using VxVM are described in the VERITAS Volume Manager Reference Guide.
PAGE 222
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM where is the name of the disk group that will be activated by the control script.
PAGE 223
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with LVM and VxVM Note that the clearimport is done for disks previously imported with noautoimport set on any system that has Serviceguard installed, whether it is configured in a cluster or not.
PAGE 224
Building an HA Cluster Configuration Configuring the Cluster Configuring the Cluster This section describes how to define the basic cluster configuration. To do this in Serviceguard Manager, the graphical user interface, read the next section. If you want to use Serviceguard commands, skip ahead to the section entitled “Using Serviceguard Commands to Configure the Cluster.” Using Serviceguard Manager to Configure the Cluster Create a session on Serviceguard Manager.
PAGE 225
Building an HA Cluster Configuration Configuring the Cluster Using Serviceguard Commands to Configure the Cluster Use the cmquerycl command to specify a set of nodes to be included in the cluster and to generate a template for the cluster configuration file. Node names must be 31 bytes or less. Here is an example of the command: # cmquerycl -v -C /etc/cmcluster/clust1.config -n ftsys9 -n ftsys10 The example creates an ASCII template file in the default cluster configuration directory, /etc/cmcluster.
PAGE 226
Building an HA Cluster Configuration Configuring the Cluster Cluster Configuration Template File The following is an example of an ASCII configuration file generated with the cmquerycl command using the -w full option: # # # # # # # ********************************************************************** ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE *************** ***** For complete details about cluster parameters and how to ******* ***** set them, consult the Serviceguard manual.
PAGE 227
Building an HA Cluster Configuration Configuring the Cluster # # # # # # # # # # # # # # # # # # # # # # # # The default quorum server timeout is calculated from the Serviceguard cluster parameters, including NODE_TIMEOUT and HEARTBEAT_INTERVAL. If you are experiencing quorum server timeouts, you can adjust these parameters, or you can include the QS_TIMEOUT_EXTENSION parameter.
PAGE 228
Building an HA Cluster Configuration Configuring the Cluster # Warning: There are no standby network interfaces for lan0. NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP # List of serial device # For example: # SERIAL_DEVICE_FILE lodi lan0 15.13.168.94 file names /dev/tty0p0 # Warning: There are no standby network interfaces for lan0. # Cluster Timing Parameters (microseconds). # # # # # # # # # The NODE_TIMEOUT parameter defaults to 2000000 (2 seconds).
PAGE 229
Building an HA Cluster Configuration Configuring the Cluster # # # # # To enable Failover Optimization, set FAILOVER_OPTIMIZATION to TWO_NODE. The default is NONE. FAILOVER_OPTIMIZATION FAILOVER_OPTIMIZATION NONE # Configuration/Reconfiguration Timing Parameters (microseconds). AUTO_START_TIMEOUT 600000000 NETWORK_POLLING_INTERVAL 2000000 # Network Monitor Configuration Parameters. # The NETWORK_FAILURE_DETECTION parameter determines how LAN card failures are # detected.
PAGE 230
Building an HA Cluster Configuration Configuring the Cluster # # # # # # # # # # # # # # # # # # # # # # # # # # in the cluster * FULL_ADMIN: MONITOR and PACKAGE_ADMIN plus the administrative commands for the cluster. Access control policy does not set a role for configuration capability. To configure, a user must log on to one of the cluster’s nodes as root (UID=0). Access control policy cannot limit root users’ access.
PAGE 231
Building an HA Cluster Configuration Configuring the Cluster # OPS_VOLUME_GROUP # OPS_VOLUME_GROUP /dev/vgdatabase /dev/vg02 The man page for the cmquerycl command lists the definitions of all the parameters that appear in this file. Many are also described in the “Planning” chapter. Modify your /etc/cmcluster/clust1.config file to your requirements, using the data on the cluster worksheet. In the file, keywords are separated from definitions by white space.
PAGE 232
Building an HA Cluster Configuration Configuring the Cluster clear the cluster ID from the volume group. After you are done, do not forget to run vgchange -c y to re-write the cluster ID back to the volume group. NOTE You should not configure a second lock volume group or physical volume unless your configuration specifically requires it. See the discussion “Dual Cluster Lock” in the section “Cluster Lock” in Chapter 3.
PAGE 233
Building an HA Cluster Configuration Configuring the Cluster # # For example, to configure a quorum server running on node # "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to # add 2 seconds to the system assigned value for the quorum server # timeout, enter: # # QS_HOST qshost # QS_POLLING_INTERVAL 120000000 # QS_TIMEOUT_EXTENSION 2000000 Enter the QS_HOST, QS_POLLING_INTERVAL and, if desired, a QS_TIMEOUT_EXTENSION.
PAGE 234
Building an HA Cluster Configuration Configuring the Cluster NOTE Remember to tune HP-UX kernel parameters on each node to ensure that they are set high enough for the largest number of packages that will ever run concurrently on that node. Modifying Cluster Timing Parameters The cmquerycl command supplies default cluster timing parameters for HEARTBEAT_INTERVAL and NODE_TIMEOUT. Changing these parameters will directly impact the cluster’s reformation and failover times.
PAGE 235
Building an HA Cluster Configuration Configuring the Cluster Access Control Policies Beginning with Serviceguard Version A.11.16, Access Control Policies allow non-root user to use common administrative commands. Non-root users of Serviceguard Manager, the graphical user interface, need to have a configured access policy to view and to administer Serviceguard clusters, packages and packages. In new configurations, it is a good idea to immediately configure at least one monitor access policy.
PAGE 236
Building an HA Cluster Configuration Configuring the Cluster If you have edited an ASCII cluster configuration file using the command line, use the following command to verify the content of the file: # cmcheckconf -k -v -C /etc/cmcluster/clust1.config Both methods check the following: 236 • Network addresses and connections. • Cluster lock connectivity (if you are configuring a lock disk). • Validity of configuration parameters for the cluster and packages. • Uniqueness of names.
PAGE 237
Building an HA Cluster Configuration Configuring the Cluster • VOLUME_GROUP entries are not currently marked as cluster-aware. • There is only one heartbeat subnet configured if you are using CVM 3.5 disk storage. If the cluster is online, the check also verifies that all the conditions for the specific change in configuration have been met. NOTE Using the -k option means that cmcheckconf only checks disk connectivity to the LVM disks that are identified in the ASCII file.
PAGE 238
Building an HA Cluster Configuration Configuring the Cluster or # cmapplyconf -k -v -C /etc/cmcluster/clust1.ascii Using the -k option means that cmapplyconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmapplyconf tests the connectivity of all LVM disks on all nodes. Using -k can result in significantly faster operation of the command. NOTE • Deactivate the cluster lock volume group.
PAGE 239
Building an HA Cluster Configuration Configuring the Cluster NOTE You must use the vgcfgbackup command to store a copy of the cluster lock disk's configuration data whether you created the volume group using SAM or using HP-UX commands. If the cluster lock disk ever needs to be replaced while the cluster is running, you must use the vgcfgrestore command to restore lock information to the replacement disk.
PAGE 240
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes. This is done with Logical Volume Manager (LVM), VERITAS Volume Manager (VxVM), or VERITAS Cluster Volume Manager (CVM).
PAGE 241
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) 3. If you have not initialized your disk groups, or if you have an old install that needs to be re-initialized, use the vxinstall command to initialize VxVM/CVM disk groups. See “Initializing the VERITAS Volume Manager” on page 252. 4. The VERITAS cluster volumes are managed by a Serviceguard-supplied system multi-node package which runs on all nodes at once, and cannot failover. In CVM 4.
PAGE 242
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) # cfscluster status NOTE Node : Cluster Manager : CVM state : MOUNT POINT TYPE ftsys9 up up (MASTER) SHARED VOLUME DISK GROUP STATUS Node : Cluster Manager : CVM state : MOUNT POINT TYPE ftsys10 up up SHARED VOLUME DISK GROUP STATUS Because the CVM 4.
PAGE 243
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) NOTE If you want to create a cluster with CVM only - without CFS, stop here. Then, in your application package’s configuration file, add the dependency triplet, with DEPENDENCY_CONDITION set to SG-DG-pkg-id#=UP and LOCATION set to SAME_NODE. For more information about the DEPENDENCY parameter, see “Package Configuration File Parameters” on page 170. Creating the Disk Group Cluster Packages 1.
PAGE 244
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) 5. To view the package name that is monitoring a disk group, use the cfsdgadm show_package command: # cfsdgadm show_package logdata sg_cfs_dg-1 Creating Volumes 1. Make log_files volume on the logdata disk group: # vxassist -g logdata make log_files 1024m 2.
PAGE 245
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages. Use of these other forms of mount will not create an appropriate multi-node package which means that the cluster packages are not aware of the file system changes.
PAGE 246
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) MULTI_NODE_PACKAGES PACKAGE STATUS SG-CFS-pkg up SG-CFS-DG-1 up SG-CFS-MP-1 up STATE running running running AUTO_RUN enabled enabled enabled SYSTEM yes no no # ftsys9/etc/cmcluster/cfs> bdf Filesystem kbytes used avail %used /dev/vx/dsk/logdata/log_files 10485 17338 966793 Mounted on 2% tmp/logdata/log_files # ftsys10/etc/cmcluster/cfs> bdf Filesystem kbytes used avail %used /dev/vx/dsk/logd
PAGE 247
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) Mount Point Packages for Storage Checkpoints The VERITAS File System provides a unique storage checkpoint facility which quickly creates a persistent image of a filesystem at an exact point in time. Storage checkpoints significantly reduce I/O overhead by identifying and maintaining only the filesystem blocks that have changed since the last storage checkpoint or backup.
PAGE 248
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) CLUSTER cfs-cluster STATUS up NODE ftsys9 ftsys10 STATUS up up STATE running running MULTI_NODE_PACKAGES PACKAGE SG-CFS-pkg SG-CFS-DG-1 SG-CFS-MP-1 SG-CFS-CK-1 STATUS up up up up STATE running running running running AUTO_RUN enabled enabled enabled disabled SYSTEM yes no no no /tmp/check_logfiles now contains a point in time view of /tmp/logdata/log_files, and it is persistent.
PAGE 249
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) operations can be performed from that node. The snapshot of a cluster file system is accessible only on the node where it is created; the snapshot file system itself cannot be cluster mounted. For details on creating snapshots on cluster file systems, see the VERITAS Storage Foundation Cluster File System Installation and Administration Guide posted at http://docs.hp.com:.
PAGE 250
Building an HA Cluster Configuration Creating a Storage Infrastructure with VERITAS Cluster File System (CFS) # bdf Filesystem kbytes used avail %used Mounted on /dev/vg00/lvol3 544768 352233 180547 66% / /dev/vg00/lvol1 307157 80196 196245 29% /stand /dev/vg00/lvol5 1101824 678426 397916 63% /var /dev/vg00/lvol7 2621440 1702848 861206 66% /usr /dev/vg00/lvol4 4096 707 3235 18% /tmp /dev/vg00/lvol6 2367488 1718101 608857 74% /opt /dev/vghome/varopt 4194304 258609 3689741 7% /var/opt /dev/vghome/home 2097152
PAGE 251
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes.
PAGE 252
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) For more information, including details about configuration of plexes (mirrors), multipathing, and RAID, refer to the HP-UX documentation for the VERITAS Volume Manager. See the documents for HP Serviceguard Storage Management Suite posted at http://docs.hp.com.
PAGE 253
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) on the map and tree, like failover packages. These clusterwide packages’ properties have a special tab in the cluster properties, and their admin menu is available when you select their cluster. NOTE Cluster configuration is described in the previous section, “Configuring the Cluster” on page 224. Check the heartbeat configuration. The CVM 3.
PAGE 254
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) Begin package verification ... Modify the package configuration ([y]/n)? Completed the cluster update Y You can confirm this using the cmviewcl command. This output fshows results of the CVM 3.5 command above.
PAGE 255
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) Initializing Disks for CVM You need to initialize the physical disks that will be employed in CVM disk groups. If a physical disk has been previously used with LVM, you should use the pvremove command to delete the LVM header data from all the disks in the volume group (this is not necessary if you have not previously used the disk with LVM).
PAGE 256
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) Verify the configuration with the following command: # vxdg list Mirror Detachment Policies with CVM The default CVM disk mirror detachment policy is global, which means that as soon as one node cannot see a specific mirror copy (plex), all nodes cannot see it as well.
PAGE 257
Building an HA Cluster Configuration Creating the Storage Infrastructure and Filesystems with VERITAS Cluster Volume Manager (CVM) # mount /dev/vx/dsk/logdata/log_files /logs 4. Check to make sure the filesystem is present, then unmount it: # umount /logs 5. Use the following command to deactivate the disk group: # vxdg -g logdata set activation=off 6. After creating units of storage with VxVM commands, you need to specify the CVM disk groups in each package configuration file.
PAGE 258
Building an HA Cluster Configuration Using DSAU during Configuration Using DSAU during Configuration HP Distributed Systems Administration Utilities (DSAU) improve both Serviceguard cluster management and multisystem management.
PAGE 259
Building an HA Cluster Configuration Using DSAU during Configuration • Edit files • Execute shell commands • Disable use of a specific file • Signal processes • Check for processes • Clean up directories • Check for symbolic links in system files • Maintain symbolic links Template configuration-description files are provided with DSAU that include examples of the common files that need to be synchronized in a Serviceguard cluster.
PAGE 260
Building an HA Cluster Configuration Using DSAU during Configuration The DSAU utilities can use remsh or ssh as a transport. When using ssh, the DSAU csshsetup utility makes it easy to distribute ssh keys across all members of the cluster. DSAU also includes output filtering tools that help to consolidate the command output returned from each member and easily see places where members are returning identical information.
PAGE 261
Building an HA Cluster Configuration Managing the Running Cluster Managing the Running Cluster This section describes some approaches to routine management of the cluster. Additional tools and suggestions are found in Chapter 7, “Cluster and Package Maintenance.” Checking Cluster Operation with Serviceguard Manager Serviceguard Manager lets you see all the nodes and packages within a cluster and displays their current status. Refer to the section on “Using Serviceguard Manager” in Chapter 7.
PAGE 262
Building an HA Cluster Configuration Managing the Running Cluster • cmviewcl checks status of the cluster and many of its components. A non-root user with the role of Monitor can run this command from a cluster node or see status information in Serviceguard Manager. • cfscluster status gives information about a cluster configured with CFS, the VERITAS Cluster File System; cfsdgadm display gives information about the cluster’s disk groups. • cmrunnode is used to start Serviceguard on a node.
PAGE 263
Building an HA Cluster Configuration Managing the Running Cluster • Halt the node. In Serviceguard Manager menu use Halt Node. On the command line, use the cmhaltnode command. • Check the cluster membership on the map or tree to verify that the node has left the cluster. In Serviceguard Manager, open the map or tree or Cluster Properties. On the command line, use the cmviewcl command. • Start the node. In Serviceguard Manager use the Run Node command. On the command line, use the cmrunnode command.
PAGE 264
Building an HA Cluster Configuration Managing the Running Cluster Setting up Autostart Features Automatic startup is the process in which each node individually joins a cluster; Serviceguard provides a startup script to control the startup process. Automatic cluster start is the preferred way to start a cluster. No action is required by the system administrator. There are three cases: • The cluster is not running on any node, all cluster nodes must be reachable, and all must be attempting to start up.
PAGE 265
Building an HA Cluster Configuration Managing the Running Cluster NOTE The /sbin/init.d/cmcluster file may call files that Serviceguard stored in the directories: /etc/cmcluster/rc (HP-UX) and ${SGCONF}/rc (Linux). The directory is for Serviceguard use only! Do not move, delete, modify, or add files to this directory. Changing the System Message You may find it useful to modify the system's login message to include a statement such as the following: This system is a node in a high availability cluster.
PAGE 266
Building an HA Cluster Configuration Managing the Running Cluster Single-Node Operation Single-node operation occurs in a single-node cluster or in a multi-node cluster, following a situation where all but one node has failed, or where you have shut down all but one node, which will probably have applications running. As long as the Serviceguard daemon cmcld is active, other nodes can re-join the cluster at a later time.
PAGE 267
Building an HA Cluster Configuration Managing the Running Cluster Although the cluster must be halted, all nodes in the cluster should be powered up and accessible before you use the cmdeleteconf command. If a node is powered down, power it up and boot. If a node is inaccessible, you will see a list of inaccessible nodes together with the following message: It is recommended that you do not proceed with the configuration operation unless you are sure these nodes are permanently unavailable.
PAGE 268
Building an HA Cluster Configuration Managing the Running Cluster 268 Chapter 5
PAGE 269
Configuring Packages and Their Services 6 Configuring Packages and Their Services In addition to configuring the cluster, you need to identify the applications and services that you wish to group into packages.
PAGE 270
Configuring Packages and Their Services Creating the Package Configuration Creating the Package Configuration The package configuration process defines a set of application services that are run by the package manager when a package starts up on a node in the cluster. The configuration also includes a prioritized list of cluster nodes on which the package can run together with definitions of the acceptable types of failover allowed for the package.
PAGE 271
Configuring Packages and Their Services Creating the Package Configuration Package in Stages” on page 274, because many steps do not have to be done in sequence. The control script can be created automatically if you want. 4. Verifying the Package Configuration: Click the Check button. If you don’t see the log window, open one from the View menu. 5. Distributing the Configuration: Click Apply.
PAGE 272
Configuring Packages and Their Services Creating the Package Configuration CAUTION • Maintain VERITAS configuration files /etc/llttab, /etc/llthosts, /etc/gabtab • Launch required services: cmvxd, cmvxpingd, vxfsckd • Start/halt VERITAS processes in the proper order: llt, gab, vxfen, odm, cvm, cfs Serviceguard manages VERITAS processes, specifically gab and LLT, through system multi-node packages.
PAGE 273
Configuring Packages and Their Services Creating the Package Configuration CAUTION Once you create the disk group and mount point packages, it is critical that you administer these packages with the cfs commands, including cfsdgadm, cfsmntadm, cfsmount, and cfsumount. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
PAGE 274
Configuring Packages and Their Services Creating the Package Configuration # mkdir /etc/cmcluster/pkg1 You can use any directory names you wish. 2. Next, generate a package configuration template for the package: # cmmakepkg -p /etc/cmcluster/pkg1/pkg1.config You can use any file names you wish for the ASCII templates. 3.
PAGE 275
Configuring Packages and Their Services Creating the Package Configuration #********************************************************************** # ****** HIGH AVAILABILITY PACKAGE CONFIGURATION FILE (template)******* #********************************************************************** # ******* Note: This file MUST be edited before it can be used.******** # * For complete details about package parameters and how to set them,* # * consult the Serviceguard Extension for RAC manuals.
PAGE 276
Configuring Packages and Their Services Creating the Package Configuration # # # # # # # # FAILOVER_POLICY FAILBACK_POLICY Since an IP address can not be assigned to more than node at a time, relocatable IP addresses can not be assigned in the package control script for MULTI_NODE packages. If volume groups are used in a MULTI_NODE package, they must be activated in a shared mode and data integrity is left to the # application.
PAGE 277
Configuring Packages and Their Services Creating the Package Configuration # # # # # # # # # # # # # Enter the names of the nodes configured for this package. this line as necessary for additional adoptive nodes. NOTE: Repeat The order is relevant. Put the second Adoptive Node after the first one. Example : NODE_NAME NODE_NAME original_node adoptive_node If all nodes in the cluster are to be specified and order is not important, "NODE_NAME *" may be specified.
PAGE 278
Configuring Packages and Their Services Creating the Package Configuration NODE_FAIL_FAST_ENABLED NO # Enter the complete path for the run and halt scripts. In most cases # the run script and halt script specified here will be the same # script,the package control script generated by the cmmakepkg command. # This control script handles the run(ning) and halt(ing) of the # package. # # Enter the timeout, specified in seconds, for the run and halt # scripts.
PAGE 279
Configuring Packages and Their Services Creating the Package Configuration # STORAGE_GROUP dg03 # STORAGE_GROUP dg04 # # # Enter the names of the dependency condition for this package. # Dependencies are used to describe the relationship between packages # To define a dependency, all three attributes are required. # # DEPENDENCY_NAME must have a unique identifier for the dependency. # # DEPENDENCY_CONDITION # This is an expression describing what must be true for # the dependency to be satisfied.
PAGE 280
Configuring Packages and Their Services Creating the Package Configuration # NO. If set to YES, in the event of a service failure, the # cluster software will halt the node on which the service is # running. If SERVICE_FAIL_FAST_ENABLED is not specified, the # default will be NO. # # SERVICE_HALT_TIMEOUT is represented as a number of seconds.
PAGE 281
Configuring Packages and Their Services Creating the Package Configuration # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # one or more RESOURCE_UP_VALUE lines are required. The RESOURCE_POLLING_INTERVAL and the RESOURCE_START are optional. The RESOURCE_POLLING_INTERVAL indicates how often, in seconds, the resource is to be monitored. It will be defaulted to 60 seconds if RESOURCE_POLLING_INTERVAL is not specified.
PAGE 282
Configuring Packages and Their Services Creating the Package Configuration # defined by repeating the entire RESOURCE_NAME block. # # Example : RESOURCE_NAME /net/interfaces/lan/status/lan0 # RESOURCE_POLLING_INTERVAL 120 # RESOURCE_START AUTOMATIC # RESOURCE_UP_VALUE = RUNNING # RESOURCE_UP_VALUE = ONLINE # # Means that the value of resource /net/interfaces/lan/status/lan0 # will be checked every 120 seconds, and is considered to # be 'up' when its value is "RUNNING" or "ONLINE".
PAGE 283
Configuring Packages and Their Services Creating the Package Configuration • PACKAGE_TYPE. The traditional Serviceguard package is the FAILOVER package. It runs on one node at a time. If there is a failure on the node where the package is running, Serviceguard can fail the work over to another node. MULTI_NODE packages can run on several nodes at the same time. SYSTEM_MULTI_NODE run on all the nodes in the cluster at the same time.
PAGE 284
Configuring Packages and Their Services Creating the Package Configuration • LOCAL_LAN_FAILOVER_ALLOWED. Enter YES to permit switching of the package IP address to a standby LAN, or NO to keep the package address from switching locally. (Must be NO for multi-node and system multi-node packages.) • NODE_FAIL_FAST_ENABLED. If you enter YES, a node will be halted with a TOC any time a package fails on that node. This prevents Serviceguard from repeatedly trying (and failing) to start a package on that node.
PAGE 285
Configuring Packages and Their Services Creating the Package Configuration • To configure monitoring within the package for a registered resource, enter values for the following parameters. — RESOURCE_NAME. Enter the name of a registered resource that is to be monitored by Serviceguard. — RESOURCE_POLLING_INTERVAL. Enter the time between attempts to assure that the resource is healthy. — RESOURCE_UP_VALUE. Enter the value or values that determine when the resource is considered to be up.
PAGE 286
Configuring Packages and Their Services Creating the Package Configuration RESOURCE_START RESOURCE_UP_VALUE • AUTOMATIC = UP ACCESS_CONTROL_POLICY begins with Serviceguard version A.11.16. With it, you can allow non-root users to monitor the cluster and to administer packages. Configuration still requires root. The only role in the package configuration file is that of PACKAGE_ADMIN over the one configured package. Cluster-wide roles are defined in the cluster configuration file.
PAGE 287
Configuring Packages and Their Services Creating the Package Configuration Adding or Removing Packages on a Running Cluster You can add or remove packages while the cluster is running, subject to the value of MAX_CONFIGURED_PACKAGES in the cluster configuration file. To add or remove packages online, refer to Chapter 7, “Cluster and Package Maintenance,” on page 309.
PAGE 288
Configuring Packages and Their Services Creating the Package Control Script Creating the Package Control Script The package control script contains all the information necessary to run all the services in the package, monitor them during operation, react to a failure, and halt the package when necessary. You can use Serviceguard Manager (in Guided Mode) to create the control script as it creates your package configuration. You can also use HP-UX commands to create or modify the package control script.
PAGE 289
Configuring Packages and Their Services Creating the Package Control Script When you create the CFS/CVM 4.1 multi-node or system multi-node package, Serviceguard automatically creates their control scripts. It is highly recommended that you never edit the configuration or control script files for these packages, although Serviceguard does not forbid it. Create and modify the information using cfs admin commands only. For failover packages, create the control script by editing the control script (pkg_name.
PAGE 290
Configuring Packages and Their Services Creating the Package Control Script • If you are using CVM, enter the names of disk groups to be activated using the CVM_DG[] array parameters, and select the appropriate storage activation command, CVM_ACTIVATION_CMD. Do not use the VG[] or VXVM_DG[] parameters for CVM disk groups. • If you are using VxVM disk groups without CVM, enter the names of VxVM disk groups that will be imported using the VXVM_DG[] array parameters. Enter one disk group per array element.
PAGE 291
Configuring Packages and Their Services Creating the Package Control Script • When the command started by cmrunserv exits, Serviceguard determines that a failure has occurred and takes appropriate action, which may include transferring the package to an adoptive node. • If a run command is a shell script that runs some other command and then exits, Serviceguard will consider this normal exit as a failure.
PAGE 292
Configuring Packages and Their Services Creating the Package Control Script CAUTION Although Serviceguard uses the -C option within the package control script framework, this option should not normally be used from the command line. The “Troubleshooting” section shows some situations where you might need to use -C from the command line.
PAGE 293
Configuring Packages and Their Services Creating the Package Control Script • CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS—defines a number of parallel mount operations during package startup and unmount operations during package shutdown. You can use the -s option with FSCK_OPT and FS_UMOUNT_OPT parameters for environments that use a large number of filesystems. The -s option allows mount/umounts and fscks to be done in parallel. (With the standard 11iv1 (11.
PAGE 294
Configuring Packages and Their Services Creating the Package Control Script # # Uncomment the second line (VGCHANGE=”vgchange -a e -q n -s”), and comment # and you want the mirror resynchronization to ocurr in parallel with # the package startup. # # Uncomment the third line (VGCHANGE=”vgchange -a y”) if you wish to # use non-exclusive activation mode. Single node cluster configurations # must use non-exclusive activation.
PAGE 295
Configuring Packages and Their Services Creating the Package Control Script # VG[0]=vg01 # VG[1]=vg02 # # The volume group activation method is defined above. The filesystems # associated with these volume groups are specified below. # #VG[0]=”” # # CVM DISK GROUPS # Specify which cvm disk groups are used by this package. Uncomment # CVM_DG[0]=”” and fill in the name of your first disk group. You must # begin with CVM_DG[0], and increment the list in sequence.
PAGE 296
Configuring Packages and Their Services Creating the Package Control Script # # NOTE: When VxVM is initialized it will store the hostname of the # local node in its volboot file in a variable called ‘hostid’. # The MC Serviceguard package control scripts use both the values of # the hostname(1m) command and the VxVM hostid. As a result # the VxVM hostid should always match the value of the # hostname(1m) command.
PAGE 297
Configuring Packages and Their Services Creating the Package Control Script # # LV[1]=/dev/vg01/lvol2; FS[1]=/pkg01b; FS_MOUNT_OPT[1]=”-o rw” # FS_UMOUNT_OPT[1]=””; FS_FSCK_OPT[1]=””; FS_TYPE[1]=”vxfs” # #LV[0]=””; FS[0]=””; FS_MOUNT_OPT[0]=””; FS_UMOUNT_OPT[0]=””;FS_FSCK_OPT[0]=”” #FS_TYPE[0]=”” # # VOLUME RECOVERY # # When mirrored VxVM volumes are started during the package control # bring up, if recovery is required the default behavior is for # the package control script to wait until recovery has bee
PAGE 298
Configuring Packages and Their Services Creating the Package Control Script # for the system resources available on your cluster nodes. Some examples # of system resources that can affect the optimum number of concurrent # operations are: number of CPUs, amount of available memory, the kernel # configuration for nfile and nproc. In some cases, if you set the number # of concurrent operations too high, the package may not be able to start # or to halt.
PAGE 299
Configuring Packages and Their Services Creating the Package Control Script CONCURRENT_MOUNT_AND_UMOUNT_OPERATIONS=1 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # Example: If a package uses 50 JFS filesystems, pkg01aa through pkg01bx, which are mounted on the 50 logical volumes lvol1..
PAGE 300
Configuring Packages and Their Services Creating the Package Control Script # (netmask=ffff:ffff:ffff:ffff::) # # Hint: Run “netstat -i” to see the available IPv6 subnets by looking # at the address prefixes # IP/Subnet address pairs for each IP address you want to add to a subnet # interface card. Must be set in pairs, even for IP addresses on the same # subnet. # #IP[0]=”” #SUBNET[0]=”” # SERVICE NAMES AND COMMANDS.
PAGE 301
Configuring Packages and Their Services Creating the Package Control Script # DTC manager information for each DTC. # Example: DTC[0]=dtc_20 #DTC_NAME[0]= # #HA_NFS_SCRIPT_EXTENSION # If the package uses HA NFS, this variable can be used to alter the # name of the HA NFS script. If not set, the name of this script is # assumed to be "ha_nfs.sh". If set, the "sh" portion of the default # script name is replaced by the value of this variable. So if # HA_NFS_SCRIPT_EXTENSION is set to "package1.
PAGE 302
Configuring Packages and Their Services Creating the Package Control Script Adding Customer Defined Functions to the Package Control Script You can add additional shell commands to the package control script to be executed whenever the package starts or stops. Simply enter these commands in the CUSTOMER DEFINED FUNCTIONS area of the script. This gives you the ability to further customize the control script.
PAGE 303
Configuring Packages and Their Services Creating the Package Control Script Adding Serviceguard Commands in Customer Defined Functions You can add Serviceguard commands (such as cmmodpkg) in the Customer Defined Functions section of a package control script. However, these commands must not interact with the package itself. If you do run Serviceguard commands from the package control script, they must be run as background processes, otherwise the package will hang.
PAGE 304
Configuring Packages and Their Services Verifying the Package Configuration Verifying the Package Configuration Serviceguard automatically checks the configuration you enter and reports any errors. If Serviceguard Manager created the file, click the Check button or the Apply button. If you have edited an ASCII package configuration file, use the following command to verify the content of the file: # cmcheckconf -v -P /etc/cmcluster/pkg1/pkg1.config Errors are displayed on the standard output.
PAGE 305
Configuring Packages and Their Services Distributing the Configuration Distributing the Configuration You can use Serviceguard Manager or HP-UX commands to distribute the binary cluster configuration file among the nodes of the cluster. DSAU (Distributed Systems Administration Utilities) can help you streamline your distributation. Distributing the Configuration And Control Script with Serviceguard Manager When you have finished creating a package in Serviceguard Manager, click the Apply button.
PAGE 306
Configuring Packages and Their Services Distributing the Configuration • Activate the cluster lock volume group so that the lock disk can be initialized: # vgchange -a y /dev/vg01 • Generate the binary configuration file and distribute it across the nodes. # cmapplyconf -v -C /etc/cmcluster/cmcl.config -P \ /etc/cmcluster/pkg1/pkg1.config • If you are using a lock disk, deactivate the cluster lock volume group.
PAGE 307
Configuring Packages and Their Services Distributing the Configuration With command fan-out, you can send the same command from one designated system to all the systems in your Serviceguard cluster. This eliminates both visiting all systems in the configuration and many manual operations. For additional information on using DSAU, refer to the Managing Systems and Workgroups manual, posted at http://docs.hp.com.
PAGE 308
Configuring Packages and Their Services Distributing the Configuration 308 Chapter 6
PAGE 309
Cluster and Package Maintenance 7 Cluster and Package Maintenance This chapter describes how to see cluster configuration and status information, how to start and halt a cluster or an individual node, how to perform permanent reconfiguration, and how to start, halt, move, and modify packages during routine maintenance of the cluster.
PAGE 310
Cluster and Package Maintenance Reviewing Cluster and Package Status Reviewing Cluster and Package Status You can check status using Serviceguard Manager or on a cluster node’s command line. Reviewing Cluster and Package Status with Serviceguard Manager Serviceguard Manager shows status several ways. Figure 7-1 Reviewing Status: Serviceguard Manager Map • 310 On the map, cluster object icons have borders to show problems. To the right of the icons, a badge gives information about the type of problem.
PAGE 311
Cluster and Package Maintenance Reviewing Cluster and Package Status • Figure 7-2 Chapter 7 There are more details in the cluster, node, and package property sheets. Cluster multi-node packages’ properties are contained in the cluster properties.
PAGE 312
Cluster and Package Maintenance Reviewing Cluster and Package Status Reviewing Cluster and Package States with the cmviewcl Command Information about cluster status is stored in the status database, which is maintained on each individual node in the cluster. You can display information contained in this database by issuing the cmviewcl command: # cmviewcl -v You can issue the cmviewcl command with non-root access. To allow access, clusters with Serviceguard version A.11.
PAGE 313
Cluster and Package Maintenance Reviewing Cluster and Package Status The cmviewcl -v command output lists dependencies throughout the cluster. For a specific package’s dependencies, use the -p pkgname option. Types of Cluster and Package States A cluster or its component nodes may be in several different states at different points in time. The following sections describe many of the common conditions the cluster or package may be in. Cluster Status The status of a cluster may be one of the following: • Up.
PAGE 314
Cluster and Package Maintenance Reviewing Cluster and Package Status • Unknown. A node never sees itself in this state. Other nodes assign a node this state if it has never been an active cluster member. Package Status and State The status of a package can be one of the following: • Up. The package control script is active. • Down. The package control script is not active. • Unknown. Serviceguard cannot determine the status at this time.
PAGE 315
Cluster and Package Maintenance Reviewing Cluster and Package Status • Uninitialized. The service is included in the cluster configuration, but it was not started with a run command in the control script. • Unknown. Network Status The network interfaces have only status, as follows: • Up. • Down. • Unknown. Serviceguard cannot determine whether the interface is up or down. This can happen when the cluster is down. A standby interface has this status.
PAGE 316
Cluster and Package Maintenance Reviewing Cluster and Package Status Examples of Cluster and Package States The following sample output from the cmviewcl -v command shows status for the cluster in the sample configuration. Normal Running Status Everything is running normally; both nodes in the cluster are running, and the packages are in their primary locations. CLUSTER example NODE ftsys9 STATUS up STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 56/36.
PAGE 317
Cluster and Package Maintenance Reviewing Cluster and Package Status pkg2 up running enabled ftsys10 Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover configured_node Failback manual Script_Parameters: ITEM STATUS Service up Subnet up MAX_RESTARTS 0 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled RESTARTS 0 0 NAME ftsys10 ftsys9 NAME service2 15.13.168.
PAGE 318
Cluster and Package Maintenance Reviewing Cluster and Package Status applications to be able to access CVM disk groups. The system multi-node package is named SG-CFS-pkg if the cluster is using version 4.1 of the VERITAS Cluster Volume Manager.
PAGE 319
Cluster and Package Maintenance Reviewing Cluster and Package Status Script_Parameters: ITEM STATUS Service up MAX_RESTARTS 0 RESTARTS 0 NAME VxVM-CVM-pkg.srv CFS Package Status If the cluster is using the VERITAS Cluster File System, the system multi-node package SG-CFS-pkg must be running on all active nodes, and the multi-node packages for disk group and mount point must also be running on at least one of their configured nodes.
PAGE 320
Cluster and Package Maintenance Reviewing Cluster and Package Status Status After Halting a Package After halting the failover package pkg2 with the cmhaltpkg command, the output of cmviewcl-v is as follows: CLUSTER example NODE ftsys9 STATUS up STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 56/36.
PAGE 321
Cluster and Package Maintenance Reviewing Cluster and Package Status pkg2 down halted disabled unowned Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover configured_node Failback manual Script_Parameters: ITEM STATUS Resource up Subnet up Resource up Subnet up NODE_NAME ftsys9 ftsys9 ftsys10 ftsys10 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled NAME /example/float 15.13.168.0 /example/float 15.13.168.
PAGE 322
Cluster and Package Maintenance Reviewing Cluster and Package Status PACKAGE pkg1 STATUS up STATE running AUTO_RUN enabled NODE ftsys9 Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover configured_node Failback manual Script_Parameters: ITEM STATUS Service up Subnet up Resource up MAX_RESTARTS 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled PACKAGE pkg2 STATUS up STATE running RESTARTS 0 NAME ftsys9 ftsys10 AUTO_RUN disabled NAME service1
PAGE 323
Cluster and Package Maintenance Reviewing Cluster and Package Status PRIMARY STANDBY up up 28.1 32.1 lan0 lan1 Now pkg2 is running on node ftsys9. Note that it is still disabled from switching.
PAGE 324
Cluster and Package Maintenance Reviewing Cluster and Package Status NODE ftsys10 STATUS down STATE halted This output is seen on both ftsys9 and ftsys10. Viewing RS232 Status If you are using a serial (RS232) line as a heartbeat connection, you will see a list of configured RS232 device files in the output of the cmviewcl -v command.
PAGE 325
Cluster and Package Maintenance Reviewing Cluster and Package Status /dev/tty0p0 NODE STATUS ftsys10 down Network_Parameters: INTERFACE STATUS PRIMARY up Serial_Heartbeat: DEVICE_FILE_NAME /dev/tty0p0 unknown STATE running PATH 28.
PAGE 326
Cluster and Package Maintenance Reviewing Cluster and Package Status Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover min_package_node Failback automatic Script_Parameters: ITEM STATUS Resource up Subnet up Resource up Subnet up Resource up Subnet up Resource up Subnet up NODE_NAME manx manx burmese burmese tabby tabby persian persian NAME /resource/random 192.8.15.0 /resource/random 192.8.15.0 /resource/random 192.8.15.0 /resource/random 192.8.15.
PAGE 327
Cluster and Package Maintenance Reviewing Cluster and Package Status SYSTEM_MULTI_NODE_PACKAGES: PACKAGE VxVM-CVM-pkg STATUS up STATE running Checking Status with Cluster File System If the cluster his using the cluster file system, you can check status with the cfscluster command, as show in the example below: #cfscluster status Node : ftsys9 Cluster Manager : up CVM state : up (MASTER) MOUNT POINT TYPE SHARED VOLUME DISK GROUP STATUS /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/1vol1 reg
PAGE 328
Cluster and Package Maintenance Reviewing Cluster and Package Status #cmviewcl -v -p SG-CFS-pkg MULTI_NODE_PACKAGES PACKAGE STATUS STATE AUTO_RUN SYSTEM SG-CFS-pkg up running enabled yes NODE_NAME STATUS SWITCHING soy up enabled Script_Parameters: ITEM STATUS MAX_RESTARTS RESTARTS NAME Service up 0 0 SG-CFS-vxconfigd Service up 5 0 SG-CFS-sgcvmd Service up 5 0 SG-CFS-vxfsckd Service up 0 0 SG-CFS-cmvxd Service up 0 0 SG-CFS-cmvxpingd NODE_NAME STATUS SWITCHING tofu up enabled Script_Parameters: ITEM STATUS
PAGE 329
Cluster and Package Maintenance Reviewing Cluster and Package Status To see which package is monitoring a disk group, use the cfsdgadm show_package command. For example, for the diskgroup logdata, enter: # cfsdgadm show_package logdata SG-CFS-DG-1 Status of CFS mount point packages To see the status of the mount point package, use the cfsmntadm display command.
PAGE 330
Cluster and Package Maintenance Managing the Cluster and Nodes Managing the Cluster and Nodes Managing the cluster involves the following tasks: • Starting the Cluster When All Nodes are Down • Adding Previously Configured Nodes to a Running Cluster • Removing Nodes from Operation in a Running Cluster • Halting the Entire Cluster In Serviceguard A.11.
PAGE 331
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Manager to Start the Cluster Select the cluster icon, then right-click to display the action menu. Select “Run cluster .” The progress window shows messages as the action takes place. This will include messages for starting each node and package. Click OK on the progress window when the operation is complete.
PAGE 332
Cluster and Package Maintenance Managing the Cluster and Nodes Adding Previously Configured Nodes to a Running Cluster You can use Serviceguard Manager or the Serviceguard command line to bring a configured node up within a running cluster. Using Serviceguard Manager to Add a Configured Node to the Running Cluster Select the node icon, then right-click to display the action menu. Select “Run node .” The progress window shows messages as the action takes place.
PAGE 333
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Manager to Remove a Node from the Cluster Select the node icon, then right-click to display the action menu. Select “Halt node ” The progress window shows messages as the action takes place. This will include moving any packages on the node to adoptive nodes, if appropriate. Click OK on the progress window when the operation is complete.
PAGE 334
Cluster and Package Maintenance Managing the Cluster and Nodes Halting the Entire Cluster You can use Serviceguard Manager, or Serviceguard commands to halt a running cluster. Using Serviceguard Manager to Halt the Cluster Select the cluster, then right-click to display the action menu. Select “Halt cluster .” The progress window shows messages as the action takes place. This will include messages for halting each package and node.
PAGE 335
Cluster and Package Maintenance Managing Packages and Services Managing Packages and Services Managing packages and services involves the following tasks: • Starting a Package • Halting a Package • Moving a Package (halt, then start) • Changing Package Switching Behavior In Serviceguard A.11.16 and later, these commands can be done by non-root users, according to access policies in the cluster’s configuration files.
PAGE 336
Cluster and Package Maintenance Managing Packages and Services The progress window shows messages as the action takes place. This will include messages for starting the package. The cluster must be running in order to start a package. Using Serviceguard Commands to Start a Package Use the cmrunpkg command to run the package on a particular node, then use the cmmodpkg command to enable switching for the package.
PAGE 337
Cluster and Package Maintenance Managing Packages and Services You cannot halt a package unless all the packages that depend on it are down. If you try, Serviceguard will send a message telling why it cannot complete the operation. If this happens, you can repeat the halt command, this time including the dependency package(s); Serviceguard will halt the all the listed packages in the correct order.
PAGE 338
Cluster and Package Maintenance Managing Packages and Services Using Serviceguard Manager to Move a Failover Package The package must be running to start the operation. It is a good idea to check properties to be sure that the package’s dependencies can be met on the new node. You can select the package on the map or tree and drag it with your mouse to another cluster node. Or, select the icon of the package you wish to halt, and right-click to display the action list. Select “Move package to node.
PAGE 339
Cluster and Package Maintenance Managing Packages and Services Changing the Switching Behavior of Failover Packages There are two types of switching flags: • package switching is enabled (YES) or disabled (NO) for the package. • node switching is enabled (YES) or disabled (NO) on individual nodes. For failover packages, if package switching is NO the package cannot move to any other node. If node switching is NO, the package cannot move to that particular node.
PAGE 340
Cluster and Package Maintenance Managing Packages and Services Changing Package Switching with Serviceguard Commands You can change package switching behavior either temporarily or permanently using Serviceguard commands. To temporarily disable switching to other nodes for a running package, use the cmmodpkg command.
PAGE 341
Cluster and Package Maintenance Reconfiguring a Cluster Reconfiguring a Cluster You can reconfigure a cluster either when it is halted or while it is still running. Some operations can only be done when the cluster is halted. Table 7-1 shows the required cluster state for many kinds of changes. Table 7-1 Types of Changes to Permanent Cluster Configuration Change to the Cluster Configuration Chapter 7 Required Cluster State Add a new node All cluster nodes must be running.
PAGE 342
Cluster and Package Maintenance Reconfiguring a Cluster Table 7-1 Types of Changes to Permanent Cluster Configuration Change to the Cluster Configuration Failover Optimization to enable or disable Faster Failover product Required Cluster State Cluster must not be running. Reconfiguring a Halted Cluster You can make a permanent change in cluster configuration when the cluster is halted.
PAGE 343
Cluster and Package Maintenance Reconfiguring a Cluster In Serviceguard A.11.17, you can change MAX_CONFIGURED_PACKAGES while the cluster is running. The default in A.11.17 is that MAX_CONFIGURED_PACKAGES is the maximum number allowed in the cluster. Using Serviceguard Commands to Change MAX_CONFIGURED_ PACKAGES In Serviceguard A.11.17, you can change MAX_CONFIGURED_PACKAGES while the cluster is running. The default in A.11.17 is that MAX_CONFIGURED_PACKAGES is the maximum number allowed in the cluster.
PAGE 344
Cluster and Package Maintenance Reconfiguring a Cluster • The only configuration change allowed while a node is unreachable (for example, completely disconnected from the network) is to delete the unreachable node from the cluster configuration. If there are also packages that depend upon that node, the package configuration must also be modified to delete the node. This all must be done in one configuration request (cmapplyconf command).
PAGE 345
Cluster and Package Maintenance Reconfiguring a Cluster 5. Apply the changes to the configuration and send the new binary configuration file to all cluster nodes: # cmapplyconf -C clconfig.ascii Use cmrunnode to start the new node, and, if desired, set the AUTOSTART_CMCLD parameter to 1 in the /etc/rc.config.d/cmcluster file to enable the new node to join the cluster automatically each time it reboots.
PAGE 346
Cluster and Package Maintenance Reconfiguring a Cluster NOTE If you want to remove a node from the cluster, issue the cmapplyconf command from another node in the same cluster. If you try to issue the command on the node you want removed, you will get an error message. 1. Use the following command to store a current copy of the existing cluster configuration in a temporary file: # cmgetconf -c cluster1 temp.ascii 2.
PAGE 347
Cluster and Package Maintenance Reconfiguring a Cluster Using Serviceguard Manager to Change the LVM Configuration While the Cluster is Running Select the cluster on the tree or map. Choose Configuring Serviceguard from the Actions menu. (You need root permission on the cluster.) On the Logical Volumes tab highlight the node to add or remove, and click Add or Delete. Then click Apply. After Refresh, check the cluster’s Properties to confirm the change.
PAGE 348
Cluster and Package Maintenance Reconfiguring a Cluster Changing the VxVM or CVM Storage Configuration You can add VxVM disk groups to the cluster configuration while the cluster is running. To add new CVM disk groups, the cluster must be running. Create CVM disk groups from the CVM Master Node: • For CVM 3.5, and for CVM 4.1 without CFS, edit the configuration ASCII file of the package that uses CVM storage. Add the CVM storage group in a STORAGE_GROUP statement. Then issue the cmapplyconf command.
PAGE 349
Cluster and Package Maintenance Reconfiguring a Package Reconfiguring a Package The process of reconfiguration of a package is somewhat like the basic configuration described in Chapter 6. Refer to that chapter for details on the configuration process. The cluster can be either halted or running during package reconfiguration. The types of changes that can be made and the times when they take effect depend on whether the package is running or not.
PAGE 350
Cluster and Package Maintenance Reconfiguring a Package • Copy the modified control script to all nodes that can run the package. (Done automatically in Serviceguard Manager as part of Apply.) • Use the Serviceguard Manager Run Cluster command, or enter cmruncl on the command line to start the cluster on all nodes or on a subset of nodes, as desired. The package will start up as nodes come online.
PAGE 351
Cluster and Package Maintenance Reconfiguring a Package Adding a Package to a Running Cluster You can create a new package and add it to the cluster configuration while the cluster is up and while other packages are running. The number of packages you can add is subject to the value of Maximum Configured Packages in the cluster configuration file.
PAGE 352
Cluster and Package Maintenance Reconfiguring a Package package is down; the cluster may be up. This removes the package information from the binary configuration file on all the nodes in the cluster. The following example halts the failover package mypkg and removes the package configuration from the cluster: # cmhaltpkg mypkg # cmdeleteconf -p mypkg The command prompts for a verification before deleting the files unless you use the -f option.
PAGE 353
Cluster and Package Maintenance Reconfiguring a Package the file system or Serviceguard packages. Use of these other forms of mount will not create an appropriate multi-node package which means that the cluster packages are not aware of the file system changes. Resetting the Service Restart Counter The service restart counter is the number of times a package service has been automatically restarted.
PAGE 354
Cluster and Package Maintenance Reconfiguring a Package Refer to Table 7-2 to determine whether or not the package may be running while you implement a particular kind of change. Note that for all of the following cases the cluster may be running, and also packages other than the one being reconfigured may be running. Table 7-2 Types of Changes to Packages Change to the Package 354 Required Package State Add a new package Other packages may be in any state.
PAGE 355
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package Chapter 7 Required Package State Change halt script contents It is recommended that the package be halted. If the halt script for the package is modified while the package is running, timing may cause problems. Script timeouts Package may be either running or halted. Service timeouts Package must not be running. Service failfast Package must not be running.
PAGE 356
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package Adding CFS Packages Required Package State To add an SG-CFS-DG-id# disk group package, the SG-CFS-pkg Cluster File System package must be up and running. To add an SG-MP-id# mount point package to a node, the SG-DG-id# disk group package must be up and running on that node.
PAGE 357
Cluster and Package Maintenance Responding to Cluster Events Responding to Cluster Events Serviceguard does not require much ongoing system administration intervention. As long as there are no failures, your cluster will be monitored and protected. In the event of a failure, those packages that you have designated to be transferred to another node will be transferred automatically.
PAGE 358
Cluster and Package Maintenance Removing Serviceguard from a System Removing Serviceguard from a System If you wish to remove a node from Serviceguard use, use the swremove command to delete the software. If you issue the swremove command on a server that is still a member of a cluster, however, it will cause that cluster to halt, and the cluster to be deleted. To remove Serviceguard: 1. If the node is an active member of a cluster, halt the node first. 2.
PAGE 359
Troubleshooting Your Cluster 8 Troubleshooting Your Cluster This chapter describes how to verify cluster operation, how to review cluster status, how to add and replace hardware, and how to solve some typical cluster problems.
PAGE 360
Troubleshooting Your Cluster Testing Cluster Operation Testing Cluster Operation Once you have configured your Serviceguard cluster, you should verify that the various components of the cluster behave correctly in case of a failure. In this section, the following procedures test that the cluster responds properly in the event of a package failure, a node failure, or a LAN failure.
PAGE 361
Troubleshooting Your Cluster Testing Cluster Operation 4. Move the package back to the primary node using Serviceguard Manager: Select the package. From the Actions menu, choose Administering Serviceguard -> Move Package. Testing the Cluster Manager To test that the cluster manager is operating correctly, perform the following steps for each node on the cluster: 1. Turn off the power to the node SPU. 2.
PAGE 362
Troubleshooting Your Cluster Testing Cluster Operation 2. Disconnect the LAN connection from the Primary card. 3. Verify that a local switch has taken place so that the Standby card is now the Primary card. In Serviceguard Manager, check the cluster properties. Or, on the command line, use the cmviewcl -v command. 4. Reconnect the LAN to the original Primary card, and verify its status. In Serviceguard Manager, check the cluster properties. Or, on the command line, use the cmviewcl -v command.
PAGE 363
Troubleshooting Your Cluster Monitoring Hardware Monitoring Hardware Good standard practice in handling a high availability system includes careful fault monitoring so as to prevent failures if possible or at least to react to them swiftly when they occur.
PAGE 364
Troubleshooting Your Cluster Monitoring Hardware action in case of a problem. For example, you could configure a disk monitor to report when a mirror was lost from a mirrored volume group being used in the cluster Refer to the manual Using HA Monitors for additional information. Using EMS (Event Monitoring Service) Hardware Monitors A set of hardware monitors is available for monitoring and reporting on memory, CPU, and many other system values.
PAGE 365
Troubleshooting Your Cluster Monitoring Hardware HP ISEE also reports error conditions directly to an HP Response Center, alerting support personnel to the potential problem. HP ISEE is available through various support contracts. For more information, contact your HP representative.
PAGE 366
Troubleshooting Your Cluster Replacing Disks Replacing Disks The procedure for replacing a faulty disk mechanism depends on the type of disk configuration you are using. Separate descriptions are provided for replacing an array mechanism and a disk in a high availability enclosure. For more information, see When Good Disks Go Bad (5991-1236), posted at http://docs.hp.
PAGE 367
Troubleshooting Your Cluster Replacing Disks 5. On the node from which you issued the lvreduce command, issue the following command to restore the volume group configuration data to the newly inserted disk: # vgcfgrestore -n /dev/vg_sg01 /dev/dsk/c2t3d0 6. Issue the following command to extend the logical volume to the newly inserted disk: # lvextend -m 1 /dev/vg_sg01 /dev/dsk/c2t3d0 7. Finally, use the lvsync command for each logical volume that has extents on the failed physical volume.
PAGE 368
Troubleshooting Your Cluster Replacement of I/O Cards Replacement of I/O Cards Replacement of SCSI host bus adapters After a SCSI Host Bus Adapter (HBA) card failure, you can replace the card using the following steps. Normally disconnecting any portion of the SCSI bus will leave the SCSI bus in an unterminated state, which will cause I/O errors for other nodes connected to that SCSI bus, so the cluster would need to be halted before disconnecting any portion of the SCSI bus.
PAGE 369
Troubleshooting Your Cluster Replacement of LAN or Fibre Channel Cards Replacement of LAN or Fibre Channel Cards If you have a LAN or fibre channel card failure, which requires the LAN card to be replaced, you can replace it on-line or off-line depending on the type of hardware and operating system you are running. It is not necessary to bring the cluster down to do this. Off-Line Replacement The following steps show how to replace an I/O card off-line. These steps apply to both HP-UX 11.0 and 11i: 1.
PAGE 370
Troubleshooting Your Cluster Replacement of LAN or Fibre Channel Cards NOTE After replacing a Fibre Channel I/O card, it may necessary to reconfigure the SAN to use the World Wide Name (WWN) of the new Fibre Channel card if Fabric Zoning or other SAN security requiring WWN is used.
PAGE 371
Troubleshooting Your Cluster Replacing a Failed Quorum Server System Replacing a Failed Quorum Server System When a quorum server fails or becomes unavailable to the clusters it is providing quorum services for, this will not cause a failure on any cluster. However, the loss of the quorum server does increase the vulnerability of the clusters in case there is an additional failure. Use the following procedure to replace a defective quorum server system.
PAGE 372
Troubleshooting Your Cluster Replacing a Failed Quorum Server System The command will output an error message if the specified nodes cannot communicate with the quorum server.
PAGE 373
Troubleshooting Your Cluster Troubleshooting Approaches Troubleshooting Approaches The following sections offer a few suggestions for troubleshooting by reviewing the state of the running system and by examining cluster status data, log files, and configuration files.
PAGE 374
Troubleshooting Your Cluster Troubleshooting Approaches IPv6: Name lan1* lo0 Mtu Address/Prefix 1500 none 4136 ::1/128 Ipkts 0 10690 Opkts 0 10690 Reviewing the System Log File Messages from the Cluster Manager and Package Manager are written to the system log file. The default location of the log file is /var/adm/syslog/syslog.log. Also, package-related messages are logged into the package log file. The package log file is located in the package directory, by default.
PAGE 375
Troubleshooting Your Cluster Troubleshooting Approaches Dec 14 14:34:44 star04 CM-CMD[2054]: cmrunpkg -v pkg5 Dec 14 14:34:44 star04 cmcld[2048]: Request from node star04 to start package pkg5 on node star04. Dec 14 14:34:44 star04 cmcld[2048]: Executing '/etc/cmcluster/pkg5/pkg5_run start' for package pkg5. Dec 14 14:34:45 star04 LVM[2066]: vgchange -a n /dev/vg02 Dec 14 14:34:45 star04 cmcld[2048]: Package pkg5 run script exited with NO_RESTART.
PAGE 376
Troubleshooting Your Cluster Troubleshooting Approaches Reviewing Serviceguard Manager Log Files Serviceguard Manager maintains a log file of user activity. This file is stored in the HP-UX directory /var/opt/sgmgr or the Windows directory X:\Program Files\Hewlett-Packard\Serviceguard Manager\log (where X refers to the drive on which you have installed Serviceguard Manager). You can review these messages using the cmreadlog command, as in the following HP-UX example: # cmreadlog /var/opt/sgmgr/929917sgmgr.
PAGE 377
Troubleshooting Your Cluster Troubleshooting Approaches Information about the starting and halting of each package is found in the package’s control script log. This log provides the history of the operation of the package control script. By default, it is found at /etc/cmcluster/package_name/control_script.log; but another location may have been specified in the package configuration file’s SCRIPT_LOG_FILE parameter. This log documents all package run and halt activities.
PAGE 378
Troubleshooting Your Cluster Troubleshooting Approaches node-specific items for all nodes in the cluster. cmscancl actually runs several different HP-UX commands on all nodes and gathers the output into a report on the node where you run the command. To run the cmscancl command, the root user on the cluster nodes must have the .rhosts file configured to allow the command to complete successfully. Without that, the command can only collect information on the local node, rather than all cluster nodes.
PAGE 379
Troubleshooting Your Cluster Troubleshooting Approaches • lanscan can also be used to examine the LAN configuration. This command lists the MAC addresses and status of all LAN interface cards on the node. • arp -a can be used to check the arp tables. • landiag is useful to display, diagnose, and reset LAN card information. • linkloop verifies the communication between LAN cards at MAC address levels.
PAGE 380
Troubleshooting Your Cluster Solving Problems Solving Problems Problems with Serviceguard may be of several types. The following is a list of common categories of problem: • Serviceguard Command Hangs. • Cluster Re-formations. • System Administration Errors. • Package Control Script Hangs. • Problems with VxVM Disk Groups. • Package Movement Errors. • Node and Network Failures. • Quorum Server Problems.
PAGE 381
Troubleshooting Your Cluster Solving Problems Name: ftsys9.cup.hp.com Address: 15.13.172.229 If the output of this command does not include the correct IP address of the node, then check your name resolution services further. Cluster Re-formations Cluster re-formations may occur from time to time due to current cluster conditions. Some of the causes are as follows: • local switch on an Ethernet LAN if the switch takes longer than the cluster NODE_TIMEOUT value.
PAGE 382
Troubleshooting Your Cluster Solving Problems You can use the following commands to check the status of your disks: • bdf - to see if your package's volume group is mounted. • vgdisplay -v - to see if all volumes are present. • lvdisplay -v - to see if the mirrors are synchronized. • strings /etc/lvmtab - to ensure that the configuration is correct. • ioscan -fnC disk - to see physical disks. • diskinfo -v /dev/rdsk/cxtydz - to display information about a disk.
PAGE 383
Troubleshooting Your Cluster Solving Problems NOTE Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
PAGE 384
Troubleshooting Your Cluster Solving Problems Next, deactivate the package volume groups. These are specified by the VG[] array entries in the package control script. # vgchange -a n 4. Finally, re-enable the package for switching.
PAGE 385
Troubleshooting Your Cluster Solving Problems 2. b - vxfen 3. v w - cvm 4. f - cfs Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages.
PAGE 386
Troubleshooting Your Cluster Solving Problems execute the following command: vxdg deport dg_01 Once dg_01 has been deported from ftsys9, this package may be restarted via either cmmodpkg(1M) or cmrunpkg(1M). In the event that ftsys9 is either powered off or unable to boot, then dg_01 must be force imported. ******************* WARNING************************** The use of force import can lead to data corruption if ftsys9 is still running and has dg_01 imported.
PAGE 387
Troubleshooting Your Cluster Solving Problems CAUTION This force import procedure should only be used when you are certain the disk is not currently being accessed by another node. If you force import a disk that is already being accessed on another node, data corruption can result. Package Movement Errors These errors are similar to the system administration errors except they are caused specifically by errors in the package control script.
PAGE 388
Troubleshooting Your Cluster Solving Problems • netstat -in - to display LAN status and check to see if the package IP is stacked on the LAN card. • lanscan - to see if the LAN is on the primary interface or has switched to the standby interface. • arp -a - to check the arp tables. • lanadmin - to display, test, and reset the LAN cards. Since your cluster is unique, there are no cookbook solutions to all possible problems.
PAGE 389
Troubleshooting Your Cluster Solving Problems Attempt to get lock /sg/cluser1 unsuccessful. Reason: request_timedout Messages The coordinator node in Serviceguard sometimes sends a request to the quorum server to set the lock state. (This is different from a request to obtain the lock in tie-breaking.
PAGE 390
Troubleshooting Your Cluster Solving Problems 390 Chapter 8
PAGE 391
Serviceguard Commands A Serviceguard Commands The following is an alphabetical list of commands used for Serviceguard cluster configuration and maintenance. Man pages for these commands are available on your system after installation. Table A-1 Serviceguard Commands Command cfscluster Description • Configure or unconfigure SG-CFS-pkg, the system multi-node package used for clusters that use the VERITAS Cluster File System. • Start or stop the CVM package for the CFS.
PAGE 392
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cfsmntadm Description Add, delete, modify, or set policy on mounted filesystems in a VERITAS Cluster File System (CFS) cluster. Requires selected HP Serviceguard Storage Management Suite Bundle. cfsmount cfsumount Mount or unmount a VERITAS Cluster File System. The cmgetpkgenv command, below, displays status. Requires selected HP Serviceguard Storage Management Suite Bundle.
PAGE 393
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf Description Verify and apply Serviceguard cluster configuration and package configuration files. cmapplyconf verifies the cluster configuration and package configuration specified in the cluster_ascii_file and the associated pkg_ascii_file(s), creates or updates the binary configuration file, called cmclconfig, and distributes it to all nodes.
PAGE 394
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf (continued) Description It is recommended that the user run the cmgetconf command to get either the cluster ASCII configuration file or package ASCII configuration file whenever changes to the existing configuration are required. Note that cmapplyconf will verify and distribute cluster configuration or package files. It will not cause the cluster daemon to start or removed from the cluster configuration.
PAGE 395
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmdeleteconf Description Delete either the cluster or the package configuration. cmdeleteconf deletes either the entire cluster configuration, including all its packages, or only the specified package configuration. If neither cluster_name nor package_name is specified, cmdeleteconf will delete the local cluster’s configuration and all its packages.
PAGE 396
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltcl Description Halt a high availability cluster. cmhaltcl causes all nodes in a configured cluster to stop their cluster daemons, optionally halting all packages or applications in the process. This command will halt all the daemons on all currently running systems. If the user only wants to shutdown a subset of daemons, the cmhaltnode command should be used instead. cmhaltnode Halt a node in a high availability cluster.
PAGE 397
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltserv Description Halt a service from the high availability package halt script. This is not a command line executable command, it runs only from within the package control script. cmhaltserv is used in the high availability package halt script to halt a service. If any part of package is marked down, the package halt script is executed as part of the recovery process.
PAGE 398
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmmodnet Description Add or remove an address from a high availability cluster. cmmodnet is used in the high availability package control scripts to add or remove an IP_address from the current network interface running the given subnet_name. Extreme caution should be exercised when executing this command outside the context of the package control script.
PAGE 399
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmquerycl Description Query cluster or node configuration information. cmquerycl searches all specified nodes for cluster configuration and Logical Volume Manager (LVM) information. Cluster configuration information includes network information such as LAN interface, IP addresses, bridged networks and possible heartbeat networks. LVM information includes volume group (VG) interconnection and file system mount point information.
PAGE 400
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmrunnode Description Run a node in a high availability cluster. cmrunnode causes a node to start its cluster daemon to join the existing cluster Starting a node will not cause any active packages to be moved to the new node. However, if a package is DOWN, has its switching enabled, and is able to run on the new node, that package will automatically run there. cmrunpkg Run a high availability package.
PAGE 401
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmrunserv Description Run a service from the high availability package run script. This is not a command line executable command, it runs only from within the package control script. cmrunserv is used in the high availability package run script to run a service. If the service process dies, cmrunserv updates the status of the service to down.
PAGE 402
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmscancl Description Gather system configuration information from nodes with Serviceguard installed. cmscancl is a configuration report and diagnostic tool which gathers system software and hardware configuration information from a list of nodes, or from all the nodes in a cluster.
PAGE 403
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmstartres Description This command is run by packge control scripts, and not by users! Starts resource monitoring on the local node for an EMS resource that is configured in a Serviceguard package. cmstartres starts resource monitoring for an EMS resource on the local node. This resource must be configured in the specified package_name.
PAGE 404
Serviceguard Commands 404 Appendix A
PAGE 405
Enterprise Cluster Master Toolkit B Enterprise Cluster Master Toolkit The Enterprise Cluster Master Toolkit (ECMT) provides a group of example scripts and package configuration files for creating Serviceguard packages for several major database and internet software products. Each toolkit contains a README file that explains how to customize the package for your needs. The ECMT can be installed on HP-UX 11i v1 (HP Product Number B5139EA) or 11i v2 (HP Product Number T1909BA).
PAGE 406
Enterprise Cluster Master Toolkit 406 Appendix B
PAGE 407
Designing Highly Available Cluster Applications C Designing Highly Available Cluster Applications This appendix describes how to create or port applications for high availability, with emphasis on the following topics: • Automating Application Operation • Controlling the Speed of Application Failover • Designing Applications to Run on Multiple Systems • Restoring Client Connections • Handling Application Failures • Minimizing Planned Downtime Designing for high availability means reducing the
PAGE 408
Designing Highly Available Cluster Applications Automating Application Operation Automating Application Operation Can the application be started and stopped automatically or does it require operator intervention? This section describes how to automate application operations to avoid the need for user intervention. One of the first rules of high availability is to avoid manual intervention.
PAGE 409
Designing Highly Available Cluster Applications Automating Application Operation Define Application Startup and Shutdown Applications must be restartable without manual intervention. If the application requires a switch to be flipped on a piece of hardware, then automated restart is impossible. Procedures for application startup, shutdown and monitoring must be created so that the HA software can perform these functions automatically.
PAGE 410
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Controlling the Speed of Application Failover What steps can be taken to ensure the fastest failover? If a failure does occur causing the application to be moved (failed over) to another node, there are many things the application can do to reduce the amount of time it takes to get the application back up and running.
PAGE 411
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Use Raw Volumes If your application uses data, use raw volumes rather than filesystems. Raw volumes do not require an fsck of the filesystem, thus eliminating one of the potentially lengthy steps during a failover. Evaluate the Use of JFS If a file system must be used, a JFS offers significantly faster file system recovery as compared to an HFS. However, performance of the JFS may vary with the application.
PAGE 412
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Keep Logs Small Some databases permit logs to be buffered in memory to increase online performance. Of course, when a failure occurs, any in-flight transaction will be lost. However, minimizing the size of this in-memory log will reduce the amount of completed transaction data that would be lost in case of failure.
PAGE 413
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Another example is an application where a clerk is entering data about a new employee. Suppose this application requires that employee numbers be unique, and that after the name and number of the new employee is entered, a failure occurs.
PAGE 414
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Design for Multiple Servers If you use multiple active servers, multiple service points can provide relatively transparent service to a client. However, this capability requires that the client be smart enough to have knowledge about the multiple servers and the priority for addressing them. It also requires access to the data of the failed server or replicated data.
PAGE 415
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Designing Applications to Run on Multiple Systems If an application can be failed to a backup node, how will it work on that different system? The previous sections discussed methods to ensure that an application can be automatically restarted. This section will discuss some ways to ensure the application can run on multiple systems.
PAGE 416
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Each application or package should be given a unique name as well as a relocatable IP address. Following this rule separates the application from the system on which it runs, thus removing the need for user knowledge of which system the application runs on. It also makes it easier to move the application among different systems in a cluster for for load balancing or other reasons.
PAGE 417
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Avoid Using SPU IDs or MAC Addresses Design the application so that it does not rely on the SPU ID or MAC (link-level) addresses. The SPU ID is a unique hardware ID contained in non-volatile memory, which cannot be changed. A MAC address (also known as a LANIC id) is a link-specific address associated with the LAN hardware.
PAGE 418
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Applications should not reference official hostnames or IP addresses. The official hostname and corresponding IP address for the hostname refer to the primary LAN card and the stationary IP address for that card.
PAGE 419
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems develop alternate means of verifying where they are running. For example, an application might check a list of hostnames that have been provided in a configuration file. Bind to a Fixed Port When binding a socket, a port address can be specified or one can be assigned dynamically.
PAGE 420
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems For TCP stream sockets, the TCP level of the protocol stack resolves this problem for the client since it is a connection-based protocol. On the client, TCP ignores the stationary IP address and continues to use the previously bound relocatable IP address originally used by the client. With UDP datagram sockets, however, there is a problem.
PAGE 421
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems applications must move together. If the applications’ data stores are in separate volume groups, they can switch to different nodes in the event of a failover. The application data should be set up on different disk drives and if applicable, different mount points. The application should be designed to allow for different disks and separate mount points.
PAGE 422
Designing Highly Available Cluster Applications Restoring Client Connections Restoring Client Connections How does a client reconnect to the server after a failure? It is important to write client applications to specifically differentiate between the loss of a connection to the server and other application-oriented errors that might be returned. The application should take special action in case of connection loss.
PAGE 423
Designing Highly Available Cluster Applications Restoring Client Connections the retry to the current server should continue for the amount of time it takes to restart the server locally. This will keep the client from having to switch to the second server in the event of a application failure. • Use a transaction processing monitor or message queueing software to increase robustness.
PAGE 424
Designing Highly Available Cluster Applications Handling Application Failures Handling Application Failures What happens if part or all of an application fails? All of the preceding sections have assumed the failure in question was not a failure of the application, but of another component of the cluster. This section deals specifically with application problems.
PAGE 425
Designing Highly Available Cluster Applications Handling Application Failures ensure that the application is behaving correctly. If the application fails and it is not detected automatically, it might take hours for a user to determine the cause of the downtime and recover from it.
PAGE 426
Designing Highly Available Cluster Applications Minimizing Planned Downtime Minimizing Planned Downtime Planned downtime (as opposed to unplanned downtime) is scheduled; examples include backups, systems upgrades to new operating system revisions, or hardware replacements. For planned downtime, application designers should consider: • Reducing the time needed for application upgrades/patches.
PAGE 427
Designing Highly Available Cluster Applications Minimizing Planned Downtime Provide for Rolling Upgrades Provide for a “rolling upgrade” in a client/server environment. For a system with many components, the typical scenario is to bring down the entire system, upgrade every node to the new version of the software, and then restart the application on all the affected nodes. For large systems, this could result in a long downtime. An alternative is to provide for a rolling upgrade.
PAGE 428
Designing Highly Available Cluster Applications Minimizing Planned Downtime Providing Online Application Reconfiguration Most applications have some sort of configuration information that is read when the application is started. If to make a change to the configuration, the application must be halted and a new configuration file read, downtime is incurred. To avoid this downtime use configuration tools that interact with an application and make dynamic changes online.
PAGE 429
Integrating HA Applications with Serviceguard D Integrating HA Applications with Serviceguard The following is a summary of the steps you should follow to integrate an application into the Serviceguard environment: 1. Read the rest of this book, including the chapters on cluster and package configuration, and the Appendix “Designing Highly Available Cluster Applications.” 2.
PAGE 430
Integrating HA Applications with Serviceguard 430 • Can the application be installed cluster-wide? • Does the application work with a cluster-wide file name space? • Will the application run correctly with the data (file system) available on all nodes in the cluster? This includes being available on cluster nodes where the application is not currently running.
PAGE 431
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Checklist for Integrating HA Applications This section contains a checklist for integrating HA applications in both single and multiple systems. Defining Baseline Application Behavior on a Single System Define a baseline behavior for the application on a standalone system: 1. Install the application, database, and other required resources on one of the systems.
PAGE 432
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications c. Install the appropriate executables. d. With the application not running on the first system, try to bring it up on the second system. You might use the script you created in the step above. Is there anything different that you must do? Does it run? e. Repeat this process until you can get the application to run on the second system. 2. Configure the Serviceguard cluster: a. Create the cluster configuration. b.
PAGE 433
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Testing the Cluster 1. Test the cluster: • Have clients connect. • Provide a normal system load. • Halt the package on the first node and move it to the second node: # cmhaltpkg pkg1 # cmrunpkg -n node2 pkg1 # cmmodpkg -e pkg1 • Move it back. # cmhaltpkg pkg1 # cmrunpkg -n node1 pkg1 # cmmodpkg -e pkg1 • Fail one of the systems. For example, turn off the power on node 1.
PAGE 434
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications 434 Appendix D
PAGE 435
Rolling Software Upgrades E Rolling Software Upgrades You can upgrade the HP-UX operating system and the Serviceguard software one node at a time without bringing down your clusters. This process can also be used any time one system needs to be taken offline for hardware maintenance or patch installations. Until the process of upgrade is complete on all nodes, you cannot change the cluster configuration files, and you will not be able to use any of the features of the new Serviceguard release.
PAGE 436
Rolling Software Upgrades Steps for Rolling Upgrades Steps for Rolling Upgrades Use the following steps: 1. Halt the node you wish to upgrade. This will cause the node's packages to start up on an adoptive node. In Serviceguard Manager, select the node; from the Actions menu, choose Administering Serviceguard, Halt node. Or, on the Serviceguard command line, issue the cmhaltnode command. 2. Edit the /etc/rc.config.d/cmcluster file to include the following line: AUTOSTART_CMCLD = 0 3.
PAGE 437
Rolling Software Upgrades Steps for Rolling Upgrades Keeping Kernels Consistent If you change kernel parameters as a part of doing a rolling upgrade, be sure to change the parameters similarly on all nodes that can run the same packages in a failover scenario. Migrating cmclnodelist entries to A.11.16 or A.11.17 The cmclnodelist file is deleted when you upgrade to Serviceguard Version A.11.16 or A.11.17. The information in it is migrated to the new Access Control Policy form.
PAGE 438
Rolling Software Upgrades Example of Rolling Upgrade Example of Rolling Upgrade While you are performing a rolling upgrade warning messages may appear while the node is determining what version of software is running. This is a normal occurrence and not a cause for concern. The following example shows a simple rolling upgrade on two nodes running one package each, as shown in Figure E-1. (This and the following figures show the starting point of the upgrade as Serviceguard 10.10 and HP-UX 10.
PAGE 439
Rolling Software Upgrades Example of Rolling Upgrade Figure E-2 Running Cluster with Packages Moved to Node 2 Step 2. Upgrade node 1 to the next operating system release (in this example, HP-UX 11.00), and install the next version of Serviceguard (11.13), as shown in Figure E-3. Figure E-3 Node 1 Upgraded to HP-UX 11.00 Step 3. When upgrading is finished, enter the following command on node 1 to restart the cluster on node 1.
PAGE 440
Rolling Software Upgrades Example of Rolling Upgrade Figure E-4 Node 1 Rejoining the Cluster Step 4. Repeat the process on node 2. Halt the node, as follows: # cmhaltnode -f node2 This causes both packages to move to node 1. Then upgrade node 2 to HP-UX 11.00 and Serviceguard 11.13. Figure E-5 Running Cluster with Packages Moved to Node 1 Step 5. Move PKG2 back to its original node.
PAGE 441
Rolling Software Upgrades Example of Rolling Upgrade The cmmodpkg command re-enables switching of the package, which is disabled by the cmhaltpkg command. The final running cluster is shown in Figure E-6.
PAGE 442
Rolling Software Upgrades Limitations of Rolling Upgrades Limitations of Rolling Upgrades The following limitations apply to rolling upgrades: 442 • During rolling upgrade, you should issue Serviceguard commands (other than cmrunnode and cmhaltnode) only on a node containing the latest revision of the software. Performing tasks on a node containing an earlier revision of the software will not work or will cause inconsistent results.
PAGE 443
Blank Planning Worksheets F Blank Planning Worksheets This appendix reprints blank versions of the planning worksheets described in the chapter “Planning and Documenting an HA Cluster.” You can duplicate any of these worksheets that you find useful and fill them in as a part of the planning process.
PAGE 444
Blank Planning Worksheets Worksheet for Hardware Planning Worksheet for Hardware Planning HARDWARE WORKSHEET Page ___ of ____ =============================================================================== Node Information: Host Name _____________________ Series No _____________________ Memory Capacity ____________________ Number of I/O Slots ________________ =============================================================================== LAN Information: Name of Subnet _________ Name of IP Interface
PAGE 445
Blank Planning Worksheets Worksheet for Hardware Planning Attach a printout of the output from ioscan -f and lssf /dev/*dsk/*s2 after installing disk hardware and rebooting the system. Mark this printout to indicate which physical volume group each disk belongs to. .
PAGE 446
Blank Planning Worksheets Power Supply Worksheet Power Supply Worksheet POWER SUPPLY WORKSHEET Page ___ of ____ =============================================================================== SPU Power: Host Name _____________________ Power Supply _______________________ Host Name _____________________ Power Supply _______________________ =============================================================================== Disk Power: Disk Unit __________________________ Power Supply _____________________
PAGE 447
Blank Planning Worksheets Quorum Server Worksheet Quorum Server Worksheet Quorum Server Data: ============================================================================== QS Hostname: _________________IP Address: ______________________ ============================================================================== Quorum Services are Provided for: Cluster Name: ___________________________________________________________ Host Names ____________________________________________ Host Names _________________
PAGE 448
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet LVM Volume Group and Physical Volume Worksheet PHYSICAL VOLUME WORKSHEET Page ___ of ____ =============================================================================== Volume Group Name: ______________________________________________________ PV Link 1 PV Link2 Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:__
PAGE 449
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Appendix F 449
PAGE 450
Blank Planning Worksheets VxVM Disk Group and Disk Worksheet VxVM Disk Group and Disk Worksheet DISK GROUP WORKSHEET Page ___ of ____ =========================================================================== Disk Group Name: __________________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:____________________________________________________
PAGE 451
Blank Planning Worksheets Cluster Configuration Worksheet Cluster Configuration Worksheet =============================================================================== Name and Nodes: =============================================================================== Cluster Name: __________________________ RAC Version: _______________ Node Names: _________________________________________________________ Volume Groups (for packages):________________________________________ ===================================
PAGE 452
Blank Planning Worksheets Cluster Configuration Worksheet .
PAGE 453
Blank Planning Worksheets Package Configuration Worksheet Package Configuration Worksheet ============================================================================= Package Configuration File Data: ============================================================================= Package Name: ____________________________ Failover Policy:___________________________ Failback Policy: ____________________________ Primary Node: ______________________________ First Failover Node:_________________________ Addition
PAGE 454
Blank Planning Worksheets Package Control Script Worksheet Package Control Script Worksheet LVM Volume Groups: VG[0]_______________VG[1]________________VG[2]________________ VGCHANGE: ______________________________________________ CVM Disk Groups: CVM_DG[0]______________CVM_DG[1]_____________CVM_DG[2]_______________ CVM_ACTIVATION_CMD: ______________________________________________ VxVM Disk Groups: VXVM_DG[0]_____________VXVM_DG[1]____________VXVM_DG[2]_____________ =======================================
PAGE 455
Blank Planning Worksheets Package Control Script Worksheet Deferred Resources: Deferred Resource Name __________________ Appendix F 455
PAGE 456
Blank Planning Worksheets Package Control Script Worksheet 456 Appendix F
PAGE 457
Migrating from LVM to VxVM Data Storage G Migrating from LVM to VxVM Data Storage This appendix describes how to migrate LVM volume groups to VxVM disk groups for use with the VERITAS Volume Manager (VxVM) or with the Cluster Volume Manager (CVM). Topics are as follows: • Loading VxVM • Migrating Volume Groups • Customizing Packages for VxVM • Customizing Packages for CVM 3.5 and 4.
PAGE 458
Migrating from LVM to VxVM Data Storage Loading VxVM Loading VxVM Before you can begin migrating data, you must install the VERITAS Volume Manager software and all required VxVM licenses on all cluster nodes. This step requires each system to be rebooted, so it requires you to remove the node from the cluster before the installation, and restart the node after installation. This can be done as a part of a rolling upgrade procedure, described in Appendix E.
PAGE 459
Migrating from LVM to VxVM Data Storage Migrating Volume Groups Migrating Volume Groups The following procedure shows how to do the migration of individual volume groups for packages that are configured to run on a given node. It is recommended to convert all the volume groups for a package at the same time. It is assumed that VxVM software and an appropriate level of HP-UX and Serviceguard have been installed on the node, and that the node has rebooted and rejoined the cluster.
PAGE 460
Migrating from LVM to VxVM Data Storage Migrating Volume Groups As an alternative to defining the VxVM disk groups on a new set of disks, it is possible to convert existing LVM volume groups into VxVM disk groups in line using the vxvmconvert(1M) utility. This utility is described along with its limitations and cautions in the VERITAS Volume Manager Release Notes, available from http://www.docs.hp.com. If using the vxconvert(1M) utility, then skip the next step and go ahead to the following section.
PAGE 461
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM Customizing Packages for VxVM After creating the VxVM disk group, you need to customize the Serviceguard package that will access the storage. Use the following procedure for disk groups that will be used with the VERITAS Volume Manager (VxVM). If you are using the Cluster Volume Manager (CVM), skip ahead to the next section. 1. Rename the old package control script as follows: # mv Package.ctl Package.ctl.bak 2.
PAGE 462
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM LV[0]="/dev/vx/dsk/dg01/lvol101" LV[1]="/dev/vx/dsk/dg01/lvol102" LV[2]="/dev/vx/dsk/dg02/lvol201" LV[3]="/dev/vx/dsk/dg02/lvol202" FS[0]="/mnt_dg0101" FS[1]="/mnt_dg0102" FS[2]="/mnt_dg0201" FS[3]="/mnt_dg0202" FS_MOUNT_OPT[0]="-o FS_MOUNT_OPT[1]="-o FS_MOUNT_OPT[2]="-o FS_MOUNT_OPT[3]="-o ro" rw" ro" rw" 4.
PAGE 463
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM 3.5 and 4.1 Customizing Packages for CVM 3.5 and 4.1 After creating the CVM disk group, you need to customize the Serviceguard package that will access the storage. Use the following procedure if you will be using the disk groups with the Cluster Volume Manager (CVM). If you are using the VERITAS Volume Manager (VxVM), use the procedure in the previous section. 1. Rename the old package control script as follows: # mv Package.ctl Package.
PAGE 464
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM 3.5 and 4.1 LV[0]="/dev/vx/dsk/dg01/lvol101" LV[1]="/dev/vx/dsk/dg01/lvol102" LV[2]="/dev/vx/dsk/dg02/lvol201" LV[3]="/dev/vx/dsk/dg02/lvol202" FS[0]="/mnt_dg0101" FS[1]="/mnt_dg0102" FS[2]="/mnt_dg0201" FS[3]="/mnt_dg0202" FS_MOUNT_OPT[0]="-o FS_MOUNT_OPT[1]="-o FS_MOUNT_OPT[2]="-o FS_MOUNT_OPT[3]="-o ro" rw" ro" rw" 4.
PAGE 465
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM 3.5 and 4.1 11. When CVM starts up, it selects a master node, and this is the node from which you must issue the disk group configuration commands. To determine the master node, issue the following command from each node in the cluster: # vxdctl -c mode One node will identify itself as the master. 12.
PAGE 466
Migrating from LVM to VxVM Data Storage Removing LVM Volume Groups Removing LVM Volume Groups After testing the new VxVM disk groups, remove any LVM volume groups that are no longer wanted from the system using the standard LVM commands lvremove, pvremove, and vgremove. At a convenient time, you should also edit the cluster ASCII configuration file to remove the VOLUME_GROUP statements that refer to the LVM volume groups that are no longer used in the cluster.
PAGE 467
IPv6 Network Support H IPv6 Network Support This appendix describes some of the characteristics of IPv6 network addresses.
PAGE 468
IPv6 Network Support IPv6 Address Types IPv6 Address Types Several IPv6 types of addressing schemes are specified in the RFC 2373 (IPv6 Addressing Architecture). IPv6 addresses are 128-bit identifiers for interfaces and sets of interfaces. There are various address formats for IPv6 defined by the RFC 2373. IPv6 addresses are broadly classified as follows: The following table explains the three types of IPv6 address types: unicast, anycast, and multicast.
PAGE 469
IPv6 Network Support IPv6 Address Types multiple groups of 16-bits of zeros. The "::" can appear only once in an address and it can be used to compress the leading, trailing, or contiguous sixteen-bit zeroes in an address. Example: fec0:1:0:0:0:0:0:1234 can be represented as fec0:1::1234. • When dealing with a mixed environment of IPv4 and IPv6 nodes there is an alternative form of IPv6 address that will be used. It is x:x:x:x:x:x:d.d.d.
PAGE 470
IPv6 Network Support IPv6 Address Types Unicast Addresses IPv6 unicast addresses are classified into different types. They are global aggregatable unicast address, site-local address and link-local address. Typically a unicast address is logically divided as follows: Table H-2 n bits 128-n bits Subnet prefix Interface ID Interface identifiers in a IPv6 unicast address are used to identify the interfaces on a link. Interface identifiers are required to be unique on that link.
PAGE 471
IPv6 Network Support IPv6 Address Types IPv4 Mapped IPv6 Address There is a special type of IPv6 address that holds an embedded IPv4 address. This address is used to represent the addresses of IPv4-only nodes as IPv6 addresses. These addresses are used especially by applications that support both IPv6 and IPv4. These addresses are called as IPv4 Mapped IPv6 Addresses. The format of these address is as follows: Table H-4 80 bits 16 bits zeros 32 bits FFFF IPv4 address Example: ::ffff:192.168.0.
PAGE 472
IPv6 Network Support IPv6 Address Types Link-Local Addresses Link-local addresses have the following format: Table H-6 10 bits 1111111010 54 bits 0 64 bits interface ID Link-local address are supposed to be used for addressing nodes on a single link. Packets originating from or destined to a link-local address will not be forwarded by a router.
PAGE 473
IPv6 Network Support IPv6 Address Types “FF” at the beginning of the address identifies the address as a multicast address. The “flgs” field is a set of 4 flags “000T”. The higher order 3 bits are reserved and must be zero. The last bit ‘T’ indicates whether it is permanently assigned or not. A value of zero indicates that it is permanently assigned otherwise it is a temporary assignment. The “scop” field is a 4-bit field which is used to limit the scope of the multicast group.
PAGE 474
IPv6 Network Support Network Configuration Restrictions Network Configuration Restrictions Serviceguard now supports IPv6 for data links only. The heartbeat IP must still be IPv4, but the package IPs can be IPv4 or IPv6. To configure IPv6, the system should be set up in what is called a dual-stack configuration which requires the IPv6 product bundle (IPv6NCF11i B.11.11.0109.5C) installed.
PAGE 475
IPv6 Network Support Network Configuration Restrictions NOTE Appendix H Even though link-local IP addresses are not supported in the Serviceguard cluster configuration, the primary link-local address on the Serviceguard primary interface will be switched over the standby during a local switch. This is because of two requirements: First, the dual stack (IPv4/IPv6) kernel requires that the primary IP address associated with an interface must always be a link-local address.
PAGE 476
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature IPv6 Relocatable Address and Duplicate Address Detection Feature The IPv6 networking stack has a new feature, Duplicate Address Detection (DAD), that was not previously available in IPv4. When an address is being added, the DAD detects a duplicate address that is already being used on the network.
PAGE 477
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature # TRANSPORT_NAME[index]=ip6 # NDD_NAME[index]=ip6_nd_dad_solicit_count # NDD_VALUE[index]=n Where index is the next available integer value of the nddconf file, and n is a number: either 1 to turn the feature ON or 0 to turn it OFF.
PAGE 478
IPv6 Network Support Local Primary/Standby LAN Patterns Local Primary/Standby LAN Patterns The use of IPv6 allows a number of different patterns of failover among LAN cards configured in the cluster. This is true because each LAN card can support several IP addresses when a dual IPv4/IPv6 configuration is used. This section describes several ways in that local failover to a standby LAN can be configured.
PAGE 479
IPv6 Network Support Example Configurations Example Configurations An example of a LAN configuration on a cluster node using both IPv4 and IPv6 addresses is shown in below. Figure H-1 Example 1: IPv4 and IPv6 Addresses in Standby Configuration Following the loss of lan0 or lan2, lan1 can adopt either address, as shown below. Figure H-2 Example 1: IPv4 and IPv6 Addresses after Failover to Standby The same LAN card can be configured with both IPv4 and IPv6 addresses, as shown in below.
PAGE 480
IPv6 Network Support Example Configurations Figure H-3 Example 2: IPv4 and IPv6 Addresses in Standby Configuration This type of configuration allows failover of both addresses to the standby. This is shown in below.
PAGE 481
A Access Control Policies, 178, 193 Access Control Policy, 161 Access roles, 161 active node, 23 adding a package to a running cluster, 351 adding cluster nodes advance planning, 205 adding nodes to a running cluster, 332 adding nodes while the cluster is running, 344 adding packages on a running cluster, 287 additional package resource parameter in package configuration, 176, 177 additional package resources monitoring, 83 addressing, SCSI, 140 administration adding nodes to a ruuning cluster, 332 cluste
PAGE 482
logical volume infrastructure, 209 verifying the cluster configuration, 235 VxVM infrastructure, 218 bus type hardware planning, 141 C CFS Creating a storage infrastructure, 240 creating a storage infrastructure, 240 changes in cluster membership, 66 changes to cluster allowed while the cluster is running, 341 changes to packages allowed while the cluster is running, 354 changing the volume group configuration while the cluster is running, 347 checkpoints, 413 client connections restoring in applications, 4
PAGE 483
parameter in cluster manager configuration, 161 cluster with high availability disk array figure, 49, 50 CLUSTER_NAME (cluster name) in sample configuration file, 226 clusters active/standby type, 54 larger size, 54 cmapplyconf , 237, 305 cmassistd daemon, 59 cmcheckconf, 236, 304 troubleshooting, 377 cmclconfd daemon, 58 , 59 cmcld daemon, 58, 59 cmclnodelist bootstrap file, 193 cmdeleteconf deleting a package configuration, 351 deleting the cluster configuration, 266 cmfileassistd daemon, 58, 60 cmlogd da
PAGE 484
data congestion, 65 databases toolkits, 405 deactivating volume groups, 215 deciding when and where to run packages, 75 deferred resource name, 184, 185 deleting a package configuration using cmdeleteconf, 351 deleting a package from a running cluster, 351 deleting nodes while the cluster is running, 345, 348 deleting the cluster configuration using cmdeleteconf, 266 dependencies configuring, 178 designing applications to run on multiple systems, 415 detecting failures in network manager, 103 disk choosin
PAGE 485
F failback policy package configuration file parameter, 170 used by package manager, 80 FAILBACK_POLICY parameter in package configuration file, 170 used by package manager, 80 failover controlling the speed in applications, 410 defined, 23 failover behavior in packages, 85 failover package, 73 failover packages configuring, 273 failover policy package configuration parameter, 170 used by package manager, 77 FAILOVER_POLICY parameter in package configuration file, 170 used by package manager, 77 failure kin
PAGE 486
parameter in package configuration, 173 halting a cluster, 334 halting a package, 336 halting the entire cluster, 334 handling application failures, 424 hardware blank planning worksheet, 444 monitoring, 363 hardware failures response to, 127 hardware for OPS on HP-UX power supplies, 53 hardware planning Disk I/O Bus Type, 141 disk I/O information for shared disks, 141 host IP address, 137, 147, 148 host name, 136 I/O bus addresses, 141 I/O slot numbers, 141 LAN information, 136 LAN interface name, 137, 147
PAGE 487
switching, 76, 77, 108 J JFS, 411 K kernel consistency in cluster configuration, 194, 195, 203 L LAN heartbeat, 64 interface name, 137, 147 planning information, 136 LAN failure Serviceguard behavior, 36 LAN interfaces monitoring with network manager, 103 primary and secondary, 38 LAN planning host IP address, 137, 147, 148 traffic type, 137 larger clusters, 54 link-level addresses, 417 LLT for CVM and CFS, 62 load sharing with IP addresses, 102 local switching, 104 parameter in package configuration, 172 L
PAGE 488
parameter in cluster manager configuration, 158 monitored resource failure Serviceguard behavior, 36 monitoring hardware , 363 monitoring LAN interfaces in network manager, 103 Monitoring, cluster, 260 mount options in control script, 181 moving a package, 337 multi-node package, 73 multi-node package configuration, 272 multi-node packages configuring, 272 multiple systems designing applications for, 415 N name resolution services, 198 network adding and deleting package IP addresses, 102 failure, 105 load
PAGE 489
NTP time protocol for clusters, 203 O online hardware maintenance by means of in-line SCSI terminators, 367 OPS startup and shutdown instances, 289 optimizing packages for large numbers of storage units, 292 outages insulating users from , 408 P package adding and deleting package IP addresses, 102 basic concepts, 36 changes allowed while the cluster is running, 354 halting, 336 local interface switching, 104 moving, 337 reconfiguring while the cluster is running, 350 reconfiguring with the cluster offlin
PAGE 490
parameter in package ASCII configuration file, 170 PACKAGE_TYPE parameter in package ASCII configuration file, 176 packages deciding where and when to run, 75 launching OPS instances, 289 parameter AUTO_RUN, 289 NODE_FAILFAST_ENABLED, 289 parameters for failover, 85 parameters for cluster manager initial configuration, 64 PATH, 180 performance optimizing packages for large numbers of storage units, 292 performance variables in package control script, 182, 183 physical volume for cluster lock, 68 parameter i
PAGE 491
quorum server blank planning worksheet, 447 installing, 206 parameters in cluster manager configuration, 156 planning, 147 status and state, 317 use in re-forming a cluster, 70 worksheet, 148 R RAID for data protection, 45 raw volumes, 411 README for database toolkits, 405 reconfiguring a package while the cluster is running, 350 reconfiguring a package with the cluster offline, 349 reconfiguring a running cluster, 343 reconfiguring the entire cluster, 342 reconfiguring the lock volume group, 342 recovery t
PAGE 492
RS232 connection for heartbeats, 139 RS232 heartbeat line, configuring, 139 RS232 serial heartbeat line, 42 RS232 status, viewing, 324 RUN_SCRIPT in sample ASCII package configuration file, 274 parameter in package configuration, 173 RUN_SCRIPT_TIMEOUT in sample ASCII package configuration file, 274 RUN_SCRIPT_TIMEOUT (run script timeout) parameter in package configuration, 173 running cluster adding or removing packages, 287 S SAM using to configure packages, 270 sample cluster configuration figure, 135
PAGE 493
SG-CFS-pkg system multi-node package, 166 SGCONF, 189 shared disks planning, 141 shutdown and startup defined for applications, 409 single cluster lock choosing, 69 single point of failure avoiding, 22 single-node operation, 265 size of cluster preparing for changes, 205 SMN package, 73 SNA applications, 421 software failure Serviceguard behavior, 36 software planning CVM and VxVM, 152 LVM, 149 solving problems, 380 SPU information planning, 136 standby LAN interfaces defined, 38 standby network interface,
PAGE 494
typical cluster after failover figure, 24 typical cluster configuration figure, 21 setting up on another node with LVM Commands, 215 worksheet, 150, 153 volume group and physical volume planning, U uname(2), 418 unmount count, 182 UPS in power planning, 144 power supply for OPS on HP-UX, 53 use of the cluster lock, 68, 70 USER_HOST, 161 USER_NAME, 161 USER_ROLE, 161 Volume groups in control script, 181 volume managers, 113 comparison, 121 CVM, 119 LVM, 118 migrating from LVM to VxVM, 457 VxVM, 118 VOLUME