Managing Serviceguard 11th Edition, Version A.11.16, Second Printing June 2004

ManualsBrandsHP ManualsSoftwareHP Serviceguard Software

Managing Serviceguard Version A.11.16,

Eleventh Edition

Second Printing

Manufacturing Part Number : B3936-90079

May 2004

reprinted June 2005

Summary of content (452 pages)

PAGE 1
Managing Serviceguard Version A.11.
PAGE 2
Legal Notices © Copyright 1995-2005 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.1211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. government under vendor’s standard commercial license. The information contained in this document is subject to change without notice.
PAGE 3
Contents 1. Serviceguard at a Glance What is Serviceguard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monitoring with Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 4
Contents Serviceguard Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How the Cluster Manager Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Configuration of the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heartbeat Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manual Startup of Entire Cluster .
PAGE 5
Contents Stationary and Relocatable IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Adding and Deleting Relocatable IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Monitoring LAN Interfaces and Detecting Failure . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Automatic Port Aggregation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 VLAN Configurations . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
Contents LVM Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CVM and VxVM Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CVM and VxVM Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster Configuration Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heartbeat Subnet and Re-formation Time . . .
PAGE 7
Contents Creating Additional Volume Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating a Storage Infrastructure with VxVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initializing the VERITAS Volume Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Converting Disks from LVM to VxVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initializing Disks for VxVM . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 8
Contents Creating the Package Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Using Serviceguard Manager to Configure a Package . . . . . . . . . . . . . . . . . . . . . . 244 Using Serviceguard Commands to Configure a Package . . . . . . . . . . . . . . . . . . . . . 245 Adding or Removing Packages on a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . 255 Writing the Package Control Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 9
Contents Reconfiguring a Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconfiguring a Halted Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconfiguring a Running Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconfiguring a Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Contents Reviewing Package IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing the System Log File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Object Manager Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Serviceguard Manager Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing Configuration Files . . . . . . . . . . . . . . . . . . . . . .
PAGE 11
Contents Design for Replicated Data Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing Applications to Run on Multiple Systems . . . . . . . . . . . . . . . . . . . . . . . . . Avoid Node-Specific Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avoid Using SPU IDs or MAC Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assign Unique Names to Applications . . . . . . . . . . . . . . . . . . . . . .
PAGE 12
Contents F. Blank Planning Worksheets Worksheet for Hardware Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Power Supply Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quorum Server Worksheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LVM Volume Group and Physical Volume Worksheet . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 13
Tables Table 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 Table 3-1. Package Configuration Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76 Table 3-2. Node Lists in Sample Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79 Table 3-3. Package Failover Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83 Table 3-4.
PAGE 14
Tables 14
PAGE 15
Figures Figure 1-1. Typical Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 Figure 1-2. Typical Cluster After Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 Figure 1-3. Monitoring with Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . .28 Figure 1-4. Serviceguard Manager Package Administration . . . . . . . . . . . . . . . . . . .29 Figure 1-5. Configuring with Serviceguard Manager . . . . . . . . . . . . . . .
PAGE 16
Figures Figure 3-17. Cluster After Local Network Switching. . . . . . . . . . . . . . . . . . . . . . . . .101 Figure 3-18. Local Switching After Cable Failure . . . . . . . . . . . . . . . . . . . . . . . . . . .102 Figure 3-19. Aggregated Networking Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104 Figure 3-20. Physical Disks Within Shared Storage Units . . . . . . . . . . . . . . . . . . . .109 Figure 3-21. Mirrored Physical Disks . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 17
Printing History Table 1 Printing Date Part Number Edition January 1995 B3936-90001 First June 1995 B3936-90003 Second December 1995 B3936-90005 Third August 1997 B3936-90019 Fourth January 1998 B3936-90024 Fifth October 1998 B3936-90026 Sixth December 2000 B3936-90045 Seventh September 2001 B3936-90053 Eighth March 2002 B3936-90065 Ninth June 2003 B3936-90070 Tenth June 2004 B3936-90076 Eleventh June 2004 (Reprint June 2005) B3936-90076 Eleventh, Second printing.
PAGE 18
HP Printing Division: Infrastructure Solutions Division Hewlett-Packard Co. 19111 Pruneridge Ave.
PAGE 19
Preface This second printing adds new information to the first printing of the eleventh edition, primarily in “Editing Security Files” on page 182. This guide describes how to configure Serviceguard to run on HP 9000 Series 800 or HP Integrity servers under the HP-UX operating system. The contents are as follows: • Chapter 1, “Serviceguard at a Glance,” describes a Serviceguard cluster and provides a roadmap for using this guide.
PAGE 20
Related Publications • Appendix C, “Designing Highly Available Cluster Applications,” gives guidelines for creating cluster-aware applications that provide optimal performance in an Serviceguard environment. • Appendix D, “Integrating HA Applications with Serviceguard,” presents suggestions for integrating your existing applications with Serviceguard. • Appendix E, “Rolling Software Upgrades,” shows how to move from one Serviceguard or HP-UX release to another without bringing down your applications.
PAGE 21
— Enterprise Cluster Master Toolkit Version B.02.21 Release Notes (HP-UX 11iv2) December 2004 (T1909-90024) — Using High Availability Monitors (B5736-90042) — Using the Event Monitoring Service (B7609-90022) — Managing Highly Available NFS (B5140-90017) — HP Serviceguard Quorum Server Release Notes (B8467-90026) From http://www.docs.hp.
PAGE 22
Problem Reporting If you have any problems with the software or documentation, please contact your local Hewlett-Packard Sales Office or Customer Service Center.
PAGE 23
Serviceguard at a Glance 1 Serviceguard at a Glance This chapter introduces Serviceguard on HP-UX, and shows where to find different kinds of information in this book. The following topics are presented: • What is Serviceguard? • Using Serviceguard Manager • A Roadmap for Configuring Clusters and Packages If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4, “Planning and Documenting an HA Cluster.
PAGE 24
Serviceguard at a Glance What is Serviceguard? What is Serviceguard? Serviceguard allows you to create high availability clusters of HP 9000 or HP Integrity servers. A high availability computer system allows application services to continue in spite of a hardware or software failure. Highly available systems protect users from software failures as well as from failure of a system processing unit (SPU), disk, or local area network (LAN) component.
PAGE 25
Serviceguard at a Glance What is Serviceguard? node which are central to the operation of the cluster. TCP/IP services also are used for other types of inter-node communication. (The heartbeat is explained in more detail in the chapter “Understanding Serviceguard Software.”) Failover Under normal conditions, a fully operating Serviceguard cluster simply monitors the health of the cluster's components while the packages are running on individual nodes.
PAGE 26
Serviceguard at a Glance What is Serviceguard? After this transfer, the package typically remains on the adoptive node as long the adoptive node continues running. If you wish, however, you can configure the package to return to its primary node as soon as the primary node comes back online. Alternatively, you may manually transfer control of the package back to the primary node at the appropriate time. Figure 1-2 does not show the power connections to the cluster, but these are important as well.
PAGE 27
Serviceguard at a Glance Using Serviceguard Manager Using Serviceguard Manager Serviceguard Manager is the graphical user interface for Serviceguard. The Serviceguard Manager management station can be HP-UX, Linux, and Windows systems. From there, you can monitor, administer, and configure Serviceguard clusters on HP-UX or on Linux. • Monitor: You can see information about Serviceguard objects on your subnets. The objects are represented in a hierarchal tree, in a graphical map.
PAGE 28
Serviceguard at a Glance Using Serviceguard Manager Figure 1-3 Monitoring with Serviceguard Manager Administering with Serviceguard Manager You can also administer clusters, nodes, and packages if you have the appropriate access permissions (Serviceguard A.11.14 and A.11.15) or access control policies (Serviceguard A.11.
PAGE 29
Serviceguard at a Glance Using Serviceguard Manager Figure 1-4 Serviceguard Manager Package Administration Configuring with Serviceguard Manager With Serviceguard version A.11.16, you can also configure clusters and packages. Both the server node and the target cluster must have Serviceguard version A.11.16 installed, and you must have root (UID=0) login to the cluster nodes.
PAGE 30
Serviceguard at a Glance Using Serviceguard Manager Figure 1-5 Configuring with Serviceguard Manager Serviceguard Manager Help To see online help, click on the “Help” menu item at the top of the screen.
PAGE 31
Serviceguard at a Glance Using Serviceguard Manager • “Menu and Toolbar Commands” • “Navigating Serviceguard Manager” • “Map Legend” How Serviceguard Manager Works To start Serviceguard Manager on a Unix or Linux management station, type the sgmgr command. You can enter the options on the command line, or in a dialog box after the interface opens. For command syntax and options, check online Help -> Troubleshooting, or enter man sgmgr on the command line.
PAGE 32
Serviceguard at a Glance Using Serviceguard Manager • In Serviceguard version A.11.16 clusters, the cluster configuration file, or a package configuration file, must have an Access Control Policy with this triplet: the intended user, the COM server hostname, and a role of at least Monitor. • In earlier versions of Serviceguard, the /etc/cmcluster/cmclnodelist/ file must have this pair listed: COM host_node, and user root.
PAGE 33
Serviceguard at a Glance A Roadmap for Configuring Clusters and Packages A Roadmap for Configuring Clusters and Packages This manual presents the tasks you need to perform in order to create a functioning HA cluster using Serviceguard. These tasks are shown in Figure 1-6. Figure 1-6 Tasks in Configuring an Serviceguard Cluster The tasks in Figure 1-6 are covered in step-by-step detail in chapters 4 through 7.
PAGE 34
Serviceguard at a Glance A Roadmap for Configuring Clusters and Packages 34 Chapter 1
PAGE 35
Understanding Serviceguard Hardware Configurations 2 Understanding Serviceguard Hardware Configurations This chapter gives a broad overview of how the Serviceguard hardware components work. The following topics are presented: • Redundancy of Cluster Components • Redundant Network Components • Redundant Disk Storage • Redundant Power Supplies • Larger Clusters Refer to the next chapter for information about Serviceguard software components.
PAGE 36
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Redundancy of Cluster Components In order to provide a high level of availability, a typical cluster uses redundant system components, for example two or more SPUs and two or more independent disks. This redundancy eliminates single points of failure. In general, the more redundancy, the greater your access to applications, data, and supportive services in the event of a failure.
PAGE 37
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Note that a package that does not access data from a disk on a shared bus can be configured to fail over to as many nodes as you have configured in the cluster (regardless of disk technology). For instance, if a package only runs local executables, it can be configured to failover to all nodes in the cluster that have local copies of those executables, regardless of the type of disk connectivity.
PAGE 38
Understanding Serviceguard Hardware Configurations Redundant Network Components Redundant Network Components To eliminate single points of failure for networking, each subnet accessed by a cluster node is required to have redundant network interfaces. Redundant cables are also needed to protect against cable failures. Each interface card is connected to a different cable, and the cables themselves are connected by a component such as a hub or a bridge.
PAGE 39
Understanding Serviceguard Hardware Configurations Redundant Network Components Figure 2-1 Redundant LANs In the figure, a two-node Serviceguard cluster has one bridged net configured with both a primary and a standby LAN card for the data/heartbeat subnet (Subnet A). Another LAN card provides an optional dedicated heartbeat LAN. Note that the primary and standby LAN segments are connected by a hub to provide a redundant data/heartbeat subnet. Each node has its own IP address for this subnet.
PAGE 40
Understanding Serviceguard Hardware Configurations Redundant Network Components NOTE You should verify that network traffic is not too high on the heartbeat/ data LAN. If traffic is too high, this LAN might not perform adequately in transmitting heartbeats if the dedicated heartbeat LAN fails. Providing Redundant FDDI Connections FDDI is a high speed fiber optic interconnect medium.
PAGE 41
Understanding Serviceguard Hardware Configurations Redundant Network Components Using Dual Attach FDDI Stations Another way of obtaining redundant FDDI connections is to configure dual attach stations on each node to create an FDDI ring, shown in Figure 2-3. An advantage of this configuration is that only one slot is used in the system card cage. In Figure 2-3, note that nodes 3 and 4 also use Ethernet to provide connectivity outside the cluster.
PAGE 42
Understanding Serviceguard Hardware Configurations Redundant Network Components NOTE The use of a serial (RS232) heartbeat line is supported only in a two-node cluster configuration. A serial heartbeat line is required in a two-node cluster that has only one heartbeat LAN. If you have at least two heartbeat LANs, or one heartbeat LAN and one standby LAN, a serial (RS232) heartbeat should not be used.
PAGE 43
Understanding Serviceguard Hardware Configurations Redundant Network Components Replacement of Failed Network Cards Depending on the system configuration, it is possible to replace failed network cards while the cluster is running. The process is described under “Replacement of LAN Cards” in the chapter “Troubleshooting Your Cluster.
PAGE 44
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Redundant Disk Storage Each node in a cluster has its own root disk, but each node is also physically connected to several other disks in such a way that more than one node can obtain access to the data and programs associated with a package it is configured for. This access is provided by a Storage Manager, such as Logical Volume Manager (LVM), VERITAS Volume Manager (VxVM), or VERITAS Cluster Volume Manager (CVM).
PAGE 45
Understanding Serviceguard Hardware Configurations Redundant Disk Storage shared bus. All SCSI addresses, including the addresses of all interface cards, must be unique for all devices on a shared bus. See the manual Configuring HP-UX for Peripherals for information on SCSI bus addressing and priority.
PAGE 46
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Disk Arrays using RAID Levels and Multiple Data Paths An alternate method of achieving protection for your data is to employ a disk array with hardware RAID levels that provide data redundancy, such as RAID Level 1 or RAID Level 5. The array provides data redundancy for the disks. This protection needs to be combined with the use of redundant host bus interfaces (SCSI or Fibre Channel) between each node and the array.
PAGE 47
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Replacement of Failed I/O Cards Depending on the system configuration, it is possible to replace failed disk I/O cards while the system remains online. The process is described under “Replacing I/O Cards” in the chapter “Troubleshooting Your Cluster.” Sample SCSI Disk Configurations Figure 2-5 shows a two node cluster. Each node has one root disk which is mirrored and one package for which it is the primary node.
PAGE 48
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-5 Mirrored Disks Connected for High Availability Figure 2-6 below shows a similar cluster with a disk array connected to each node on two I/O channels.
PAGE 49
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-6 Cluster with High Availability Disk Array Details on logical volume configuration for Serviceguard, including PV Links, are given in the chapter “Building an HA Cluster Configuration.” Sample Fibre Channel Disk Configuration In Figure 2-7 below, the root disks are shown with simple mirroring, but the shared storage is now accessed via redundant Fibre Channel switches attached to a disk array.
PAGE 50
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-7 Cluster with Fibre Channel Switched Disk Array This type of configuration also uses PV links or other multipath software such as VERITAS Dynamic Multipath (DMP) or EMC PowerPath. Root Disk Limitations on Shared SCSI Buses The IODC firmware does not support two or more nodes booting from the same SCSI bus at the same time.
PAGE 51
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-8 Root Disks on Different Shared Buses Note that if both nodes had their primary root disks connected to the same bus, you would have an unsupported configuration. You can put a mirror copy of Node B's root disk on the same SCSI bus as Node A's primary root disk, because three failures would have to occur for both systems to boot at the same time, which is an acceptable risk.
PAGE 52
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-9 Primaries and Mirrors on Different Shared Buses Note that you cannot use a disk within a disk array as a root disk if the array is on a shared bus.
PAGE 53
Understanding Serviceguard Hardware Configurations Redundant Power Supplies Redundant Power Supplies You can extend the availability of your hardware by providing battery backup to your nodes and disks. HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, can provide this protection from momentary power loss. Disks should be attached to power circuits in such a way that mirror copies are attached to different power sources.
PAGE 54
Understanding Serviceguard Hardware Configurations Larger Clusters Larger Clusters You can create clusters of up to 16 nodes with Serviceguard. Clusters of up to 16 nodes may be built by connecting individual SPUs via Ethernet; and you can configure up to 8 systems as an Serviceguard cluster using FDDI networking. The possibility of configuring a cluster consisting of 16 nodes does not mean that all types of cluster configuration behave in the same way in a 16-node configuration.
PAGE 55
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-10 Eight-Node Active/Standby Cluster Point to Point Connections to Storage Devices Some storage devices allow point-to-point connection to a large number of host nodes without using a shared SCSI bus. An example is shown in Figure 2-11, a cluster consisting of eight nodes with a Fibre Channel interconnect. (Client connection is provided through Ethernet.
PAGE 56
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-11 Eight-Node Cluster with XP or EMC Disk Array Fibre Channel switched configurations also are supported using either an arbitrated loop or fabric login topology. For additional information about supported cluster configurations, refer to the HP Unix Servers Configuration Guide, available through your HP representative.
PAGE 57
Understanding Serviceguard Software Components 3 Understanding Serviceguard Software Components This chapter gives a broad overview of how the Serviceguard software components work.
PAGE 58
Understanding Serviceguard Software Components Serviceguard Architecture Serviceguard Architecture The following figure shows the main software components used by Serviceguard. This chapter discusses these components in some detail. Figure 3-1 Serviceguard Software Components Serviceguard Daemons There are nine daemon processes associated with Serviceguard.
PAGE 59
Understanding Serviceguard Software Components Serviceguard Architecture • /usr/lbin/cmsrvassistd—Serviceguard Service Assistant Daemon • /usr/lbin/cmtaped—Serviceguard Shared Tape Daemon • /usr/lbin/qs—Serviceguard Quorum Server Daemon Each of these daemons logs to the /var/adm/syslog/syslog.log file except for the quorum server, which logs to the standard output (it is suggested you redirect output to a file named /var/adm/qs/qs.log) and /opt/cmom/lbin/cmomd, which logs to /var/opt/cmom/cmomd.log.
PAGE 60
Understanding Serviceguard Software Components Serviceguard Architecture important that user processes should have a lower priority than 20, otherwise they might prevent Serviceguard from updating the kernel safety timer, thus causing the node to undergo a TOC. Syslog Log Daemon: cmlogd cmlogd is used by cmcld to write messages to syslog. Any message written to syslog by cmcld it written through cmlogd. This is to prevent any delays in writing to syslog from impacting the timing of cmcld.
PAGE 61
Understanding Serviceguard Software Components Serviceguard Architecture This will only be running if the /etc/rc.config.d/cmsnmpagt file has been edited to autostart this subagent. For proper execution, the cmsnmpd has to start before the Serviceguard cluster comes up. Service Assistant Daemon: cmsrvassistd This daemon forks and execs any script or processes as required by the cluster daemon, cmcld.
PAGE 62
Understanding Serviceguard Software Components How the Cluster Manager Works How the Cluster Manager Works The cluster manager is used to initialize a cluster, to monitor the health of the cluster, to recognize node failure if it should occur, and to regulate the re-formation of the cluster when a node joins or leaves the cluster. The cluster manager operates as a daemon process that runs on each node.
PAGE 63
Understanding Serviceguard Software Components How the Cluster Manager Works Packages which were running on nodes that are no longer in the new cluster are transferred to their adoptive nodes. Note that if there is a transitory loss of heartbeat, the cluster may re-form with the same nodes as before. In such cases, packages do not halt or switch, though the application may experience a slight performance impact during the re-formation.
PAGE 64
Understanding Serviceguard Software Components How the Cluster Manager Works Manual Startup of Entire Cluster A manual startup forms a cluster out of all the nodes in the cluster configuration. Manual startup is normally done the first time you bring up the cluster, after cluster-wide maintenance or upgrade, or after reconfiguration. Before startup, the same binary cluster configuration file must exist on all nodes in the cluster.
PAGE 65
Understanding Serviceguard Software Components How the Cluster Manager Works • A node halts because of a package failure. • A node halts because of a service failure. • Heavy network traffic prohibited the heartbeat signal from being received by the cluster. • The heartbeat network failed, and another network is not configured to carry heartbeat. Typically, re-formation results in a cluster with a different composition.
PAGE 66
Understanding Serviceguard Software Components How the Cluster Manager Works possibility of two sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-cluster with greater than 50% of the nodes will form the new cluster, and the cluster lock is not used. If you have a two-node cluster, you are required to configure a cluster lock.
PAGE 67
Understanding Serviceguard Software Components How the Cluster Manager Works Figure 3-2 Lock Disk Operation Serviceguard periodically checks the health of the lock disk and writes messages to the syslog file when a lock disk fails the health check. This file should be monitored for early detection of lock disk problems. You can choose between two lock disk options—a single or dual lock disk—based on the kind of high availability configuration you are building.
PAGE 68
Understanding Serviceguard Software Components How the Cluster Manager Works Dual Lock Disk If you are using disks that are internally mounted in the same cabinet as the cluster nodes, then a single lock disk would be a single point of failure in this type of cluster, since the loss of power to the node that has the lock disk in its cabinet would also render the cluster lock unavailable.
PAGE 69
Understanding Serviceguard Software Components How the Cluster Manager Works lock, this area is marked so that other nodes will recognize the lock as “taken.” If communications are lost between two equal-sized groups of nodes, the group that obtains the lock from the Quorum Server will take over the cluster and the other nodes will perform a TOC. Without a cluster lock, a failure of either group of nodes will cause the other group, and therefore the cluster, to halt.
PAGE 70
Understanding Serviceguard Software Components How the Cluster Manager Works In a cluster with four or more nodes, you may not need a cluster lock since the chance of the cluster being split into two halves of equal size is very small. However, be sure to configure your cluster to prevent the failure of exactly half the nodes at one time.
PAGE 71
Understanding Serviceguard Software Components How the Package Manager Works How the Package Manager Works Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator is known as the package coordinator. The package coordinator does the following: • Decides when and where to run, halt or move packages. The package manager on all nodes does the following: • Executes the user-defined control script to run and halt packages and package services.
PAGE 72
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-4 Package Moving During Failover Configuring Packages Each package is separately configured. You create a package by using Serviceguard Manager or by editing a package ASCII configuration file (detailed instructions are given in Chapter 6). Then you use the cmapplyconf command to check and apply the package to the cluster configuration database.
PAGE 73
Understanding Serviceguard Software Components How the Package Manager Works The parameter is coded in the package ASCII configuration file: # The default for AUTO_RUN is YES. In the event of a # failure, this permits the cluster software to transfer the package # to an adoptive node. Adjust as necessary. AUTO_RUN YES A package switch involves moving packages and their associated IP addresses to a new system.
PAGE 74
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-5 Before Package Switching Figure 3-6 shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2. Package 1's IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package A’s disk and Package B’s disk.
PAGE 75
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-6 After Package Switching Failover Policy The Package Manager selects a node for a package to run on based on the priority list included in the package configuration file together with the FAILOVER_POLICY parameter, also coded in the file or set with Serviceguard Manager.
PAGE 76
Understanding Serviceguard Software Components How the Package Manager Works CONFIGURED_NODE. # This policy will select nodes in priority order from the list of # NODE_NAME entries specified below. # The alternative policy is MIN_PACKAGE_NODE. This policy will select # the node, from the list of NODE_NAME entries below, which is # running the least number of packages at the time of failover.
PAGE 77
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-7 Rotating Standby Configuration before Failover If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8, which shows a failure on node 2: Figure 3-8 Rotating Standby Configuration after Failover NOTE Using the MIN_PACKAGE_NODE policy, when node 2 is repaired and brought back into the cluster, it will then be running the fewest packages, and thus will become th
PAGE 78
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-9 CONFIGURED_NODE Policy Packages after Failover If you use CONFIGURED_NODE as the value for the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.
PAGE 79
Understanding Serviceguard Software Components How the Package Manager Works # means the package will be moved back to its primary node whenever the # primary node is capable of running the package.
PAGE 80
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-11 Automatic Failback Configuration After Failover After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1. Figure 3-12 Automatic Failback Configuration After Restart of Node 1 NOTE Setting the FAILBACK_POLICY to AUTOMATIC can result in a package failback and application outage during a critical production period.
PAGE 81
Understanding Serviceguard Software Components How the Package Manager Works On Combining Failover and Failback Policies Combining a FAILOVER_POLICY of MIN_PACKAGE_NODE with a FAILBACK_POLICY of AUTOMATIC can result in a package’s running on a node where you did not expect it to run, since the node running the fewest packages will probably not be the same host every time a failover occurs.
PAGE 82
Understanding Serviceguard Software Components How the Package Manager Works You can specify a registered resource for a package by selecting it from the list of available resources displayed in the Serviceguard Manager Configuring Packages. The size of the list displayed by Serviceguard Manager depends on which resource monitors have been registered on your system. Alternatively, you can obtain information about registered resources on your system by using the command /opt/resmon/bin/resls.
PAGE 83
Understanding Serviceguard Software Components How the Package Manager Works The following table describes different types of failover behavior and the settings in Serviceguard Manager or in the ASCII package configuration file that determine each behavior. Table 3-3 Package Failover Behavior Options in Serviceguard Manager Switching Behavior Package switches normally after detection of service, network, or EMS failure. Halt script runs before switch takes place.
PAGE 84
Understanding Serviceguard Software Components How the Package Manager Works Table 3-3 Package Failover Behavior (Continued) Options in Serviceguard Manager Switching Behavior 84 If desired, package must be manually returned to its primary node if it is running on a non-primary node. • Failback policy set to Manual. (Default) • Failover policy set to Configured Node.
PAGE 85
Understanding Serviceguard Software Components How Package Control Scripts Work How Package Control Scripts Work Packages are the means by which Serviceguard starts and halts configured applications. Packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 150 packages per cluster and a total of 900 services per cluster.
PAGE 86
Understanding Serviceguard Software Components How Package Control Scripts Work How does the package start up, and what is its behavior while it is running? Some of the many phases of package life are shown in Figure 3-13. Figure 3-13 Package Time Line Showing Important Events The following are the most important moments in a package’s life: 1. Before the control script starts 2. During run script execution 3. While services are running 4. When a service, subnet, or monitored resource fails 5.
PAGE 87
Understanding Serviceguard Software Components How Package Control Scripts Work the package cannot start on this node. Another type of resource is a dependency on a monitored external resource. If monitoring shows a value for a configured resource that is outside the permitted range, the package cannot start. Once a node is selected, a check is then done to make sure the node allows the package to start on it. Then services are started up for a package by the control script on the selected node.
PAGE 88
Understanding Serviceguard Software Components How Package Control Scripts Work At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error. Also, if the run script execution is not complete before the time specified in the RUN_SCRIPT_TIMEOUT, the package manager will kill the script.
PAGE 89
Understanding Serviceguard Software Components How Package Control Scripts Work an error, but starting the package on another node might succeed. A package with a RESTART exit is disabled from running on the local node, but can still run on other nodes. • Timeout—Another type of exit occurs when the RUN_SCRIPT_TIMEOUT is exceeded. In this scenario, the package is killed and disabled globally. It is not disabled on the current node, however.
PAGE 90
Understanding Serviceguard Software Components How Package Control Scripts Work card. If a service fails but the RESTART parameter for that service is set to a value greater than 0, the service will restart, up to the configured number of restarts, without halting the package. If there is a configured EMS resource dependency and there is a trigger that causes an event, the package will be halted.
PAGE 91
Understanding Serviceguard Software Components How Package Control Scripts Work When a Package is Halted with a Command The Serviceguard cmhaltpkg command has the effect of executing the package halt script, which halts the services that are running for a specific package. This provides a graceful shutdown of the package that is followed by disabling automatic package startup (AUTO_RUN).
PAGE 92
Understanding Serviceguard Software Components How Package Control Scripts Work Figure 3-15 Package Time Line for Halt Script Execution At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). Also, if the halt script execution is not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script. During halt script execution, messages are written to a log file in the same directory as the halt script.
PAGE 93
Understanding Serviceguard Software Components How Package Control Scripts Work Package Control Script Error and Exit Conditions Table 3-4 shows the possible combinations of error condition, failfast setting and package movement for failover packages.
PAGE 94
Understanding Serviceguard Software Components How Package Control Scripts Work Table 3-4 Error Conditions and Package Movement (Continued) Package Error Condition Results HP-UX Status on Primary after Error Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Package Allowed to Run on Alternate Node Node Failfast Enabled Service Failfast Enabled Halt Script Exit 1 NO Either Setting Running N/A Yes No Halt Script Timeout YES Either Setting TOC N/A N/A
PAGE 95
Understanding Serviceguard Software Components How Package Control Scripts Work Table 3-4 Error Conditions and Package Movement (Continued) Package Error Condition Error or Exit Code Loss of Monitored Resource Chapter 3 Node Failfast Enabled Service Failfast Enabled NO Either Setting Results HP-UX Status on Primary after Error Halt script runs after Error or Exit Running Yes Package Allowed to Run on Primary Node after Error Yes, if the resource is not a deferred resource.
PAGE 96
Understanding Serviceguard Software Components How the Network Manager Works How the Network Manager Works The purpose of the network manager is to detect and recover from network card and cable failures so that network services remain highly available to clients. In practice, this means assigning IP addresses for each package to the primary LAN interface card on the node where the package is running and monitoring the health of all interfaces, switching them when necessary.
PAGE 97
Understanding Serviceguard Software Components How the Network Manager Works adoptive node if control of the package is transferred. This means that applications can access the package via its relocatable address without knowing which node the package currently resides on. Types of IP Addresses Both IPv4 and IPv6 address types are supported in Serviceguard. IPv4 addresses are the traditional addresses of the form “n.n.n.n” where ‘n’ is a decimal digit between 0 and 255.
PAGE 98
Understanding Serviceguard Software Components How the Network Manager Works node is assigned to be the poller. The poller will poll the other primary and standby interfaces in the same bridged net on that node to see whether they are still healthy. Normally, the poller is a standby interface; if there are no standby interfaces in a bridged net, the primary interface is assigned the polling task. (Bridged nets are explained in the section on “Redundant Network Components” in Chapter 2.
PAGE 99
Understanding Serviceguard Software Components How the Network Manager Works Local Switching A local network switch involves the detection of a local network interface failure and a failover to the local backup LAN card. The backup LAN card must not have any IP addresses configured. In the case of local network switch, TCP/IP connections are not lost for Ethernet, but IEEE 802.3 connections will be lost.
PAGE 100
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-16 shows two nodes connected in one bridged net. LAN segments 1 and 2 are connected by a hub. Figure 3-16 Cluster Before Local Network Switching Node 1 and Node 2 are communicating over LAN segment 2. LAN segment 1 is a standby. In Figure 3-17, we see what would happen if the LAN segment 2 network interface card on Node 1 were to fail.
PAGE 101
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-17 Cluster After Local Network Switching As the standby interface takes over, IP addresses will be switched to the hardware path associated with the standby interface. The switch is transparent at the TCP/IP level. All applications continue to run on their original nodes. During this time, IP traffic on Node 1 will be delayed as the transfer occurs.
PAGE 102
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-18 Local Switching After Cable Failure Local network switching will work with a cluster containing one or more nodes. You may wish to design a single-node cluster in order to take advantages of this local network switching feature in situations where you need only one node and do not wish to set up a more complex cluster.
PAGE 103
Understanding Serviceguard Software Components How the Network Manager Works connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically. Note that if the package is dependent on multiple subnetworks, all subnetworks must be available on the target node before the package will be started. Note that remote switching is supported only between LANs of the same type.
PAGE 104
Understanding Serviceguard Software Components How the Network Manager Works Once enabled, each link aggregate can be viewed as a single logical link of multiple physical ports with only one IP and MAC address. HP-APA can aggregate up to four physical ports into one link aggregate; the number of link aggregates allowed per system is 50. Empty link aggregates will have zero MAC addresses. You can aggregate the ports within a multi-ported networking card (cards with up to four ports are currently available).
PAGE 105
Understanding Serviceguard Software Components How the Network Manager Works this example, the aggregated ports are collectively known as lan900, the name by which the aggregate is known on HP-UX 11i (on HP-UX 11.0, the aggregates would begin with lan100).
PAGE 106
Understanding Serviceguard Software Components How the Network Manager Works Support for HP-UX VLAN VLAN is supported with Serviceguard starting from A.11.14 on HP-UX 11i. The support of VLAN is similar to other link technologies. VLAN interfaces can be used as heartbeat as well as data networks in the cluster. The Network Manager will monitor the health of VLAN interfaces configured in the cluster, and perform local and remote failover of VLAN interfaces when failure is detected.
PAGE 107
Understanding Serviceguard Software Components How the Network Manager Works • Only port-based and IP subnet-based VLANs are supported. Protocol-based VLAN will not be supported because Serviceguard does not support any transport protocols other than TCP/IP. • Each VLAN interface must be assigned an IP address in a unique subnet in order to operate properly unless it is used as a standby of a primary VLAN interface. • VLAN interfaces over APA aggregates are not supported.
PAGE 108
Understanding Serviceguard Software Components Volume Managers for Data Storage Volume Managers for Data Storage A volume manager is a tool that lets you create units of disk storage known as storage groups. Storage groups contain logical volumes for use on single systems and in high availability clusters. In Serviceguard clusters, storage groups are activated by package control scripts.
PAGE 109
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-20 Physical Disks Within Shared Storage Units Figure 3-21 shows the individual disks combined in a multiple disk mirrored configuration. Figure 3-21 Mirrored Physical Disks Figure 3-22 shows the mirrors configured into LVM volume groups, shown in the figure as /dev/vgpkgA and /dev/vgpkgB. The volume groups are activated by Serviceguard packages for use by highly available applications.
PAGE 110
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-22 Multiple Devices Configured in Volume Groups Examples of Storage on Disk Arrays Figure 3-23 shows an illustration of storage configured on a disk array. Physical disks are configured by an array utility program into logical units or LUNs which are then seen by the operating system.
PAGE 111
Understanding Serviceguard Software Components Volume Managers for Data Storage NOTE LUN definition is normally done using utility programs provided by the disk array manufacturer. Since arrays vary considerably, you should refer to the documentation that accompanies your storage unit. Figure 3-24 shows LUNs configured with multiple paths (links) to provide redundant pathways to the data.
PAGE 112
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-25 Multiple Paths in Volume Groups Types of Volume Manager Serviceguard allows a choice of volume managers for data storage: • HP-UX Logical Volume Manager (LVM) and (optionally) MirrorDisk/UX • VERITAS Volume Manager for HP-UX (VxVM)—Base and Add-on Products • VERITAS Cluster Volume Manager for HP-UX (CVM) Separate sections in Chapters 5 and 6 explain how to configure cluster storage using all of these volume m
PAGE 113
Understanding Serviceguard Software Components Volume Managers for Data Storage NOTE The HP-UX Logical Volume Manager is described in Managing Systems and Workgroups. A complete description of VERITAS volume management products is available in the VERITAS Volume Manager for HP-UX Release Notes. HP-UX Logical Volume Manager (LVM) Logical Volume Manager (LVM) is the legacy storage management product on HP-UX. Included with the operating system, LVM is available on all cluster nodes.
PAGE 114
Understanding Serviceguard Software Components Volume Managers for Data Storage VxVM can be used in clusters that: • are of any size, up to 16 nodes. • require a fast cluster startup time. • do not require shared storage group activation. • do not have all nodes cabled to all disks. • need to use RAID 5 or striped mirroring. • have multiple heartbeat subnets configured. Propagation of Disk Groups in VxVM With VxVM, a disk group can be created on any node, with the cluster up or not.
PAGE 115
Understanding Serviceguard Software Components Volume Managers for Data Storage system multi-node package known as VxVM-CVM-pkg, which runs on all nodes in the cluster. The cluster must be up and must be running this package in order to configure VxVM disk groups for use with CVM. CVM allows you to activate storage on one node at a time, or you can perform write activation on one node and read activation on another node at the same time (for example, allowing backups).
PAGE 116
Understanding Serviceguard Software Components Volume Managers for Data Storage Single Heartbeat Subnet Required with CVM It is normally recommended that you configure all subnets that interconnect cluster nodes as heartbeat networks, since this increases protection against multiple faults at no additional cost. However, if you will be using the VERITAS Cluster Volume Manager (CVM), only a single heartbeat subnet is allowed.
PAGE 117
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product Logical Volume Manager (LVM) Base-VxVM Chapter 3 Pros • Legacy system is robust and familiar to HP-UX users • Existing packages do not need to be changed • Supports up to 16 nodes per cluster • Supports use of PV links for multiple data paths • Supports exclusive activation as well as read-only activation from multiple nodes • Can be used to con
PAGE 118
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-5 Pros and Cons of Volume Managers with Serviceguard Product VERITAS Volume Manager— B9116AA (VxVM) VERITAS Cluster Volume Manager— B9117AA (CVM) 118 Pros • Disk group configuration from any node • Enhanced set of volume management features, including software mirroring, RAID 0/1, RAID 5, and dynamic multi-pathing for active/active storage devices • Cluster startup time is faster than with CVM.
PAGE 119
Understanding Serviceguard Software Components Responses to Failures Responses to Failures Serviceguard responds to different kinds of failures in specific ways. For most hardware failures, the response is not user-configurable, but for package and service failures, you can choose the system’s response, within limits.
PAGE 120
Understanding Serviceguard Software Components Responses to Failures Responses to Hardware Failures If a serious system problem occurs, such as a system panic or physical disruption of the SPU's circuits, Serviceguard recognizes a node failure and transfers the packages currently running on that node to an adoptive node elsewhere in the cluster. The new location for each package is determined by that package's configuration file, which lists primary and alternate nodes for the package.
PAGE 121
Understanding Serviceguard Software Components Responses to Failures If you wish, you can modify this default behavior by specifying that the node should crash (TOC) before the transfer takes place. (In a very few cases, Serviceguard will attempt to reboot the system prior to a TOC when this behavior is specified.) If there is enough time to flush the buffers in the buffer cache, the reboot is successful, and a TOC does not take place.
PAGE 122
Understanding Serviceguard Software Components Responses to Failures 122 Chapter 3
PAGE 123
Planning and Documenting an HA Cluster 4 Planning and Documenting an HA Cluster Building an Serviceguard cluster begins with a planning phase in which you gather and record information about all the hardware and software components of the configuration. Planning starts with a simple list of hardware and network components. As the installation and configuration continue, the list is extended and refined.
PAGE 124
Planning and Documenting an HA Cluster • CVM and VxVM Planning • Cluster Configuration Planning • Package Configuration Planning The description of each planning step in this chapter is accompanied by a worksheet on which you can optionally record the parameters and other data relevant for successful setup and maintenance. As you go through each step, record all the important details of the configuration so as to document your production system.
PAGE 125
Planning and Documenting an HA Cluster General Planning General Planning A clear understanding of your high availability objectives will quickly help you to define your hardware requirements and design your system. Use the following questions as a guide for general planning: 1. What applications must continue to be available in the event of a failure? 2. What system resources (processing power, networking, SPU, memory, disk space) are needed to support these applications? 3.
PAGE 126
Planning and Documenting an HA Cluster General Planning Planning for Expansion When you first set up the cluster, you indicate a set of nodes and define a group of packages for the initial configuration. At a later time, you may wish to add additional nodes and packages, or you may wish to use additional disk hardware for shared data storage. If you intend to expand your cluster without the need to bring it down, careful planning of the initial configuration is required.
PAGE 127
Planning and Documenting an HA Cluster Hardware Planning Hardware Planning Hardware planning requires examining the physical hardware itself. One useful procedure is to sketch the hardware configuration in a diagram that shows adapter cards and buses, cabling, disks and peripherals. A sample diagram for a two-node cluster is shown in Figure 4-1. Figure 4-1 Sample Cluster Configuration Create a similar sketch for your own cluster, and record the information on the Hardware Worksheet.
PAGE 128
Planning and Documenting an HA Cluster Hardware Planning • Network Information • Disk I/O Information SPU Information SPU information includes the basic characteristics of the systems you are using in the cluster. Different models of computers can be mixed in the same cluster. This configuration model also applies to HP Integrity servers. The series 700 workstations are not supported for Serviceguard.
PAGE 129
Planning and Documenting an HA Cluster Hardware Planning LAN Information While a minimum of one LAN interface per subnet is required, at least two LAN interfaces, one primary and one or more standby, are needed to eliminate single points of network failure. It is recommended that you configure heartbeats on all subnets, including those to be used for client data. On the worksheet, enter the following for each LAN interface: Subnet Name Enter the IP address mask for the subnet.
PAGE 130
Planning and Documenting an HA Cluster Hardware Planning attempt a failover when network traffic is not noticed for a time. (Serviceguard calculates the time depending on the type of LAN card.) The configuration file specifies one of two ways to decide when the network interface card has failed: • INOUT - The default method will count packets sent by polling, and declare a card down only the count stops incrementing for both the inbound and the outbound packets.
PAGE 131
Planning and Documenting an HA Cluster Hardware Planning RS232 Information If you plan to configure a serial line (RS232), you need to determine the serial device file that corresponds with the serial port on each node. 1. If you are using a MUX panel, make a note of the system slot number that corresponds to the MUX and also note the port number that appears next to the selected port on the panel. 2. On each node, use ioscan -fnC tty to display hardware addresses and device file names.
PAGE 132
Planning and Documenting an HA Cluster Hardware Planning Setting SCSI Addresses for the Largest Expected Cluster Size SCSI standards define priority according to SCSI address. To prevent controller starvation on the SPU, the SCSI interface cards must be configured at the highest priorities. Therefore, when configuring a highly available cluster, you should give nodes the highest priority SCSI addresses, and give disks addresses of lesser priority.
PAGE 133
Planning and Documenting an HA Cluster Hardware Planning NOTE When a boot/root disk is configured with a low-priority address on a shared SCSI bus, a system panic can occur if there is a timeout on accessing the boot/root device. This can happen in a cluster when many nodes and many disks are configured on the same bus.
PAGE 134
Planning and Documenting an HA Cluster Hardware Planning • • • • • vgdisplay -v lvdisplay -v lvlnboot -v vxdg list (VxVM and CVM) vxprint (VxVM and CVM) These are standard HP-UX commands. See their man pages for information of specific usage. The commands should be issued from all nodes after installing the hardware and rebooting the system. The information will be useful when doing storage group and cluster configuration.
PAGE 135
Planning and Documenting an HA Cluster Hardware Planning Second Node Name ____ftsys10__________ ============================================================================= Disk I/O Information for Shared Disks: Bus Type _SCSI_ Slot Number _4__ Address _16_ Disk Device File __c0t1d0_ Bus Type _SCSI_ Slot Number _6_ Address _24_ Disk Device File __c0t2d0_ Bus Type ______ Slot Number ___ Address ____ Disk Device File _________ Attach a printout of the output from the ioscan -fnC disk command after inst
PAGE 136
Planning and Documenting an HA Cluster Power Supply Planning Power Supply Planning There are two sources of power for your cluster which you will have to consider in your design: line power and uninterruptible power sources (UPS). Loss of a power circuit should not bring down the cluster. Frequently, servers, mass storage devices, and other hardware have two or three separate power supplies, so they can survive the loss of power to one or more power supplies or power circuits.
PAGE 137
Planning and Documenting an HA Cluster Power Supply Planning Other Unit Enter the number of any other unit. Power Supply Enter the power supply unit number of the UPS to which the host or other device is connected. Be sure to follow UPS and cabinet power limits as well as SPU power limits. Power Supply Configuration Worksheet The following worksheet will help you organize and record your specific power supply configuration. Make as many copies as you need.
PAGE 138
Planning and Documenting an HA Cluster Power Supply Planning 138 Unit Name __________________________ Power Supply _____________________ Unit Name __________________________ Power Supply _____________________ Chapter 4
PAGE 139
Planning and Documenting an HA Cluster Quorum Server Planning Quorum Server Planning The quorum server (QS) provides tie-breaking services for clusters. The QS is described in chapter 3 under “Use of the Quorum Server as the Cluster Lock.” A quorum server: NOTE • can be used with up to 50 clusters, not exceeding 100 nodes total. • can support a cluster with any supported number of nodes.
PAGE 140
Planning and Documenting an HA Cluster Quorum Server Planning Enter the names (31 bytes or less) of all cluster nodes that will be supported by this quorum server. These entries will be entered into qs_authfile on the system that is running the quorum server process. Quorum Server Worksheet The following worksheet will help you organize and record your specific quorum server hardware configuration. Make as many copies as you need. Fill out the worksheet and keep it for future reference.
PAGE 141
Planning and Documenting an HA Cluster LVM Planning LVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM), or using VERITAS VxVM and CVM software, which are described in the next section. When designing your disk layout using LVM, you should consider the following: • The root disk should belong to its own volume group.
PAGE 142
Planning and Documenting an HA Cluster LVM Planning LVM Worksheet The following worksheet will help you organize and record your specific physical disk configuration. Make as many copies as you need. Fill out the worksheet and keep it for future reference. This worksheet only includes volume groups and physical volumes.
PAGE 143
Planning and Documenting an HA Cluster LVM Planning ______________/dev/dsk/c6t2d0________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Chapter 4 143
PAGE 144
Planning and Documenting an HA Cluster CVM and VxVM Planning CVM and VxVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM, described in the previous section), or using VERITAS VxVM and CVM software. When designing a storage configuration using CVM or VxVM disk groups, consider the following: • You must create a rootdg disk group on each cluster node that will be using VxVM storage. This is not the same as the HP-UX root disk, if an LVM volume group is used.
PAGE 145
Planning and Documenting an HA Cluster CVM and VxVM Planning includes volume groups and physical volumes. The Package Configuration worksheet (presented later in this chapter) contains more space for recording information about the logical volumes and file systems that are part of each volume group.
PAGE 146
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Configuration Planning A cluster should be designed to provide the quickest possible recovery from failures. The actual time required to recover from a failure depends on several factors: • The length of the cluster heartbeat interval and node timeout. They should each be set as short as practical, but not shorter than 1000000 (one second) and 2000000 (two seconds), respectively.
PAGE 147
Planning and Documenting an HA Cluster Cluster Configuration Planning Heartbeat Subnet and Re-formation Time The speed of cluster re-formation is partially dependent on the type of heartbeat network that is used. Ethernet results in a slower failover time than the other types. If two or more heartbeat subnets are used, the one with the fastest failover time is used.
PAGE 148
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Lock Disks and Planning for Expansion You can add additional cluster nodes after the cluster is up and running, but doing so without bringing down the cluster requires you to follow some rules. Recall that a cluster with more than 4 nodes may not have a lock disk. Thus, if you plan to add enough nodes to bring the total to more than 4, you should use a quorum server.
PAGE 149
Planning and Documenting an HA Cluster Cluster Configuration Planning The quorum server timeout is the time during which the quorum server is not communicating with the cluster. After this time, the cluster will mark the quorum server DOWN. This time is calculated based on Serviceguard parameters, but you can increase it by adding an additional number of microseconds as an extension. The QS_TIMEOUT_EXTENSION is an optional parameter.
PAGE 150
Planning and Documenting an HA Cluster Cluster Configuration Planning IP notation indicating the subnet that will carry the cluster heartbeat. Note that heartbeat IP addresses must be on the same subnet on each node. A heartbeat IP address can only be an IPv4 address. If you will be using VERITAS CVM disk groups for storage, you can only use a single heartbeat subnet. In this case, the heartbeat should be configured with standby LANs or as a group of aggregated ports.
PAGE 151
Planning and Documenting an HA Cluster Cluster Configuration Planning the first physical lock volume and SECOND_CLUSTER_LOCK_PV for the second physical lock volume. If there is a second physical lock volume, the parameter SECOND_CLUSTER_LOCK_PV is included in the file on a separate line. These parameters are only used when you employ a lock disk for tie-breaking services in the cluster.
PAGE 152
Planning and Documenting an HA Cluster Cluster Configuration Planning The time after which a node may decide that the other node has become unavailable and initiate cluster reformation. This parameter is entered in microseconds. Default value is 2,000,000 microseconds in the ASCII file. Minimum is 2 * (Heartbeat Interval). The maximum recommended value for this parameter is 30,000,000 in the ASCII file, or 30 seconds in Serviceguard Manager. The default setting yields the fastest cluster reformations.
PAGE 153
Planning and Documenting an HA Cluster Cluster Configuration Planning interface to make sure it can still send and receive information. Changing this value can affect how quickly a network failure is detected. The minimum value is 1,000,000 (1 second). The maximum value recommended is 15 seconds, and the maximum value supported is 30 seconds. MAX_CONFIGURED_PACKAGES This parameter sets the maximum number of packages that can be configured in the cluster.
PAGE 154
Planning and Documenting an HA Cluster Cluster Configuration Planning Specify three things for each policy: USER_NAME, USER_HOST, and USER_ROLE. For Serviceguard Manager, USER_HOST must be the name of the Session node. Policies set in the configuration file of a cluster and its packages must not be conflicting or redundant. For more information, see “Editing Security Files” on page 182.
PAGE 155
Planning and Documenting an HA Cluster Cluster Configuration Planning The suitability of an option depends mainly on your network configuration. To see more about whether the new INONLY_OR _INOUT option is the best for your network configuration, see “Inbound Failure Detection Enhancement” http://docs.hp.com/hpux/ha -> Serviceguard White Papers. Cluster Configuration Worksheet The following worksheet will help you to organize and record your cluster configuration.
PAGE 156
Planning and Documenting an HA Cluster Cluster Configuration Planning | Power Supply No: ________ =========================================================================== Timing Parameters: =============================================================================== Heartbeat Interval: _1 sec_ =============================================================================== Node Timeout: _2 sec_ =============================================================================== Network Polling Interval: _2
PAGE 157
Planning and Documenting an HA Cluster Package Configuration Planning Package Configuration Planning Planning for packages involves assembling information about each group of highly available services. Some of this information is used in creating the package configuration file, and some is used for editing the package control script. NOTE LVM Volume groups that are to be activated by packages must also be defined as cluster aware in the cluster configuration file.
PAGE 158
Planning and Documenting an HA Cluster Package Configuration Planning • If a package moves to an adoptive node, what effect will its presence have on performance? Create a list by package of volume groups, logical volumes, and file systems. Indicate which nodes need to have access to common filesystems at different times. It is recommended that you use customized logical volume names that are different from the default logical volume names (lvol1, lvol2, etc.).
PAGE 159
Planning and Documenting an HA Cluster Package Configuration Planning Parameters for Configuring EMS Resources Serviceguard provides a set of parameters for configuring EMS resources. These are RESOURCE_NAME, RESOURCE_POLLING_INTERVAL, RESOURCE_START, and RESOURCE_UP_VALUE. Enter these parameters to the package configuration file for each resource the package will be dependent on.
PAGE 160
Planning and Documenting an HA Cluster Package Configuration Planning RESOURCE_START RESOURCE_UP_VALUE AUTOMATIC = UP In the package control script, specify only the deferred resources, using the DEFERRED_RESOURCE_NAME parameter: DEFERRED_RESOURCE_NAME[0]="/net/interfaces/lan/status/lan0" DEFERRED_RESOURCE_NAME[1]="/net/interfaces/lan/status/lan1" Planning for Expansion You can add packages to a running cluster. This process is described in the chapter “Cluster and Package Administration.
PAGE 161
Planning and Documenting an HA Cluster Package Configuration Planning the ASCII cluster configuration file appear at the end of each entry. The following parameters must be identified and entered on the worksheet for each package: Package name The name of the package. The package name must be unique in the cluster. It is used to start, stop, modify, and view the package. In the ASCII package configuration file, this parameter is PACKAGE_NAME.
PAGE 162
Planning and Documenting an HA Cluster Package Configuration Planning The alternate policy is MIN_PACKAGE_NODE, which creates a list of packages running on each node that can run this package, and selects the node that is running the fewest packages. Failback policy The policy used to determine what action the package manager should take if the package is not running on its primary node and its primary node is capable of running the package.
PAGE 163
Planning and Documenting an HA Cluster Package Configuration Planning In the ASCII package configuration file, the AUTO_RUN parameter was formerly known as PKG_SWITCHING_ENABLED. The parameter determines how a package may start up. If Automatic Switching is enabled (AUTO_RUN set to YES), the package will start up automatically on an eligible node if one is available, and will be able to fail over automatically to another node.
PAGE 164
Planning and Documenting an HA Cluster Package Configuration Planning If NODE_FAIL_FAST_ENABLED is set to YES, the node where the package is running will be halted if one of the following failures occurs: • A package subnet fails and no backup network is available • An EMS resource fails • The halt script does not exist • Serviceguard is unable to execute the halt script • The halt script or the run script times out However, if the package halt script fails with “exit 1”, Serviceguard does not hal
PAGE 165
Planning and Documenting an HA Cluster Package Configuration Planning RUN_SCRIPT and the HALT_SCRIPT in the ASCII file. When the package starts, its run script is executed and passed the parameter ‘start’; similarly, at package halt time, the halt script is executed and passed the parameter ‘stop’. If you choose to write separate package run and halt scripts, be sure to include identical configuration information (such as node names, IP addresses, etc.) in both scripts.
PAGE 166
Planning and Documenting an HA Cluster Package Configuration Planning VxVM disk groups are imported at package run time and exported at package halt time. If you are using a large number of VxVM disks in your package, the timeout must be high enough to allow all of them to finish the import or export. NOTE CVM diskgroups This parameter is used for CVM disk groups. Enter the names of all the CVM disk groups the package will use.
PAGE 167
Planning and Documenting an HA Cluster Package Configuration Planning running with a TOC. (An attempt is first made to reboot the node prior to the TOC.)The default is Disabled. In the ASCII package configuration file, this parameter is SERVICE_FAIL_FAST_ENABLED, and possible values are YES and NO. The default is NO. Define one SERVICE_FAIL_FAST_ENABLED entry for each service. Service halt timeout In the event of a service halt, Serviceguard will first send out a SIGTERM signal to terminate the service.
PAGE 168
Planning and Documenting an HA Cluster Package Configuration Planning Serviceguard Manager in the EMS tab’s Browse button (“Available EMS resources”), or obtain it from the documentation supplied with the resource monitor. A maximum of 60 resources may be defined per cluster. Note also the limit on Resource Up Values described below. Maximum length of the resource name string is 1024 characters. Resource polling interval The frequency of monitoring a configured package resource.
PAGE 169
Planning and Documenting an HA Cluster Package Configuration Planning The criteria for judging whether an additional package resource has failed or not. In the ASCII package configuration file, this parameter is called RESOURCE_UP_VALUE. The Resource Up Value appears on the “Description of selected EMS resources” list provided in Serviceguard Manger’s EMS Browse button, or you can obtain it from the documentation supplied with the resource monitor.
PAGE 170
Planning and Documenting an HA Cluster Package Configuration Planning Additional Failover Nodes:__________________________________ Package Run Script: __/etc/cmcluster/pkg1/control.sh__Timeout: _NO_TIMEOUT_ Package Halt Script: __/etc/cmcluster/pkg1/control.
PAGE 171
Planning and Documenting an HA Cluster Package Configuration Planning Use VGCHANGE=“vgchange -a y” if you wish to use non-exclusive activation mode. Single node cluster configurations must use non-exclusive activation. CVM_ACTIVATION_CMD Specifies the command for activation of VERITAS CVM disk groups. Use the default CVM_ACTIVATION_CMD=“vxdg -g \$DiskGroup set activation=exclusivewrite” if you want disk groups activated in exclusive write mode.
PAGE 172
Planning and Documenting an HA Cluster Package Configuration Planning This array parameter contains a list of the LVM volume groups that will be activated by the package. Enter each VG on a separate line. CVM Disk Groups This array parameter contains a list of the VERITAS CVM disk groups that will be used by the package. Enter each disk group on a separate line. VxVM Disk Groups This array parameter contains a list of the VERITAS VxVM disk groups that will be activated by the package.
PAGE 173
Planning and Documenting an HA Cluster Package Configuration Planning The number of mount retrys for each filesystem. The default is 0. During startup, if a mount point is busy and FS_MOUNT_RETRY_COUNT is 0, package startup will fail and the script will exit with 1. If a mount point is busy and FS_MOUNT_RETRY_COUNT is greater than 0, the script will attempt to kill the user responsible for the busy mount point for the number of times specified in FS_MOUNT_RETRY_COUNT.
PAGE 174
Planning and Documenting an HA Cluster Package Configuration Planning Specifies the number of mounts and umounts to allow during package startup or shutdown. The default is 1. Setting this variable to a higher value may improve performance while mounting or unmounting a large number of file systems. If a value less than 1 is specified, the script defaults the variable to 1 and writes a warning message in the package control script log file.
PAGE 175
Planning and Documenting an HA Cluster Package Configuration Planning The service name must not contain any of the following illegal characters: space, slash (/), backslash (\), and asterisk (*). All other characters are legal. The service name can contain up to 39 characters. Service Command For each named service, enter a Service Command. This command will be executed through the control script by means of the cmrunserv command.
PAGE 176
Planning and Documenting an HA Cluster Package Configuration Planning DTC Manager Data Enter a DTC Name for each DTC. For information on using a DTC with Serviceguard, see the chapter entitled “Configuring DTC Manager for Operation with Serviceguard” in the manual Using the HP DTC Manager/UX. The package control script will clean up the environment and undo the operations in the event of an error. Refer to the section on “How Package Control Scripts Work” in Chapter 3 for more information.
PAGE 177
Planning and Documenting an HA Cluster Package Configuration Planning CONCURRENT MOUNT/UMOUNT OPERATIONS: ______________________________________ CONCURRENT FSCK OPERATIONS: ______________________________________________ =============================================================================== Network Information: IP[0] ____15.13.171.14____ SUBNET ____15.13.
PAGE 178
Planning and Documenting an HA Cluster Package Configuration Planning 178 Chapter 4
PAGE 179
Building an HA Cluster Configuration 5 Building an HA Cluster Configuration This chapter and the next take you through the configuration tasks required to set up a Serviceguard cluster. These procedures are carried out on one node, called the configuration node, and the resulting binary file is distributed by Serviceguard to all the nodes in the cluster. In the examples in this chapter, the configuration node is named ftsys9, and the sample target node is called ftsys10.
PAGE 180
Building an HA Cluster Configuration Preparing Your Systems Preparing Your Systems Before configuring your cluster, ensure that all cluster nodes possess the appropriate security files, kernel configuration and NTP (network time protocol) configuration. Understanding Where Files Are Located Serviceguard uses a special file, /etc/cmcluster.conf, to define the locations for configuration and log files within the HP-UX filesystem.
PAGE 181
Building an HA Cluster Configuration Preparing Your Systems NOTE Chapter 5 Do not edit the /etc/cmcluster.conf configuration file.
PAGE 182
Building an HA Cluster Configuration Preparing Your Systems Editing Security Files Serviceguard daemons grant access to commands by matching incoming hostname and username against defined access control policies. To understand how to properly configure these policies, administrators need to understand how Serviceguard handles hostnames, IP addresses, usernames and the relevant configuration files. For redundancy, Serviceguard utilizes all available IPv4 networks for communication.
PAGE 183
Building an HA Cluster Configuration Preparing Your Systems 10.8.1.132 sly.uksr.hp.com sly 15.145.162.150 bit.uksr.hp.com bit NOTE If you use of fully qualified domain name (FQDN), Serviceguard will only recognize the hostname portion. For example, two nodes gryf.uksr.hp.com and gryf.cup.hp.com could not be in the same cluster, as they would both be treated as the same host gryf. Serviceguard also supports domain name aliases.
PAGE 184
Building an HA Cluster Configuration Preparing Your Systems Username Validation Serviceguard relies on the ident service of the client node to verify the username of the incoming network connection. If the Serviceguard daemon is unable to connect to the client's ident daemon, permission will be denied. Root on a node is defined as any user who has the UID of 0.
PAGE 185
Building an HA Cluster Configuration Preparing Your Systems Access Roles Serviceguard has two levels of access: • Root Access: Users who have been authorized for root access have total control over the configuration of the cluster and packages. • Non-root Access: Non-root users can be assigned one of four roles: — Monitor: These users have read-only access to the cluster and its packages. Command line users can issue these commands: cmviewcl, cmquerycl, cmgetconf, and cmviewconf.
PAGE 186
Building an HA Cluster Configuration Preparing Your Systems Setting access control policies uses different mechanisms depending on the state of the node. Nodes not configured into a cluster use different security configurations than nodes in a cluster. The following two sections discuss how to configure these access control policies. Setting Controls for an Unconfigured Node Serviceguard access control policies define what a remote node can do to the local node.
PAGE 187
Building an HA Cluster Configuration Preparing Your Systems Using the cmclnodelist File The cmclnodelist file is not created by default in new installations. If administrators wish to create this "bootstrap" file they should add a comment such as the following: ########################################################### # Do Not Edit This File # This is only a temporary file to bootstrap an unconfigured # node with Serviceguard version A.11.
PAGE 188
Building an HA Cluster Configuration Preparing Your Systems Using Equivalent Hosts For installations that wish to use hostsequiv, the primary IP addresses or hostnames for each node in the cluster needs to be authorized. For more information on using hostsequiv, see man hosts.equiv(4) or the HP-UX guide, “Managing Systems and Workgroups”. Though hostsequiv allows defining any user on any node as equivalent to root, Serviceguard will not grant root access to any user who is not root on the remote node.
PAGE 189
Building an HA Cluster Configuration Preparing Your Systems A workaround for the problem that still retains the ability to use conventional name lookup is to configure the /etc/nsswitch.conf file to search the /etc/hosts file when other lookup strategies are not working. In case name services are not available, Serviceguard commands will then use the /etc/hosts file on the local system to do name resolution.
PAGE 190
Building an HA Cluster Configuration Preparing Your Systems If a line beginning with the string “hosts:” already exists, then make sure that the text immediately to the right of this string is: files [NOTFOUND=continue] dns This step is critical so that the nodes in the cluster can still resolve hostnames to IP addresses while DNS is down or if the primary LAN is down. 3.
PAGE 191
Building an HA Cluster Configuration Preparing Your Systems NOTE The boot, root, and swap logical volumes must be done in exactly the following order to ensure that the boot volume occupies the first contiguous set of extents on the new disk, followed by the swap and the root.
PAGE 192
Building an HA Cluster Configuration Preparing Your Systems Choosing Cluster Lock Disks The following guidelines apply if you are using a lock disk. The cluster lock disk is configured on a volume group that is physically connected to all cluster nodes. This volume group may also contain data that is used by packages. When you are using dual cluster lock disks, it is required that the default IO timeout values are used for the cluster lock physical volumes.
PAGE 193
Building an HA Cluster Configuration Preparing Your Systems Ensuring Consistency of Kernel Configuration Make sure that the kernel configurations of all cluster nodes are consistent with the expected behavior of the cluster during failover. In particular, if you change any kernel parameters on one cluster node, they may also need to be changed on other cluster nodes that can run the same packages.
PAGE 194
Building an HA Cluster Configuration Preparing Your Systems Serviceguard has also been tested with non-default values for these two network parameters: • ip6_nd_dad_solicit_count - This network parameter enables the Duplicate Address Detection feature for IPv6 address. For more information, see “IPv6 Relocatable Address and Duplicate Address Detection Feature” on page 431 of this manual.
PAGE 195
Building an HA Cluster Configuration Preparing Your Systems nodes may not use a lock disk, but a two-node cluster must use a cluster lock. Thus, if you will eventually need five nodes, you should build an initial configuration that uses a quorum server. If you intend to remove a node from the cluster configuration while the cluster is running, ensure that the resulting cluster configuration will still conform to the rules for cluster locks described above.
PAGE 196
Building an HA Cluster Configuration Setting up the Quorum Server Setting up the Quorum Server The quorum server software, which has to be running during cluster configuration, must be installed on a system other than the nodes on which your cluster will be running. NOTE It is recommended that the node on which the quorum server is running be in the same subnet as the clusters for which it is providing services. This will help prevent any network delays which could affect quorum server operation.
PAGE 197
Building an HA Cluster Configuration Setting up the Quorum Server Running the Quorum Server The quorum server must be running during the following cluster operations: • when the cmquerycl command is issued. • when the cmapplyconf command is issued. • when there is a cluster re-formation. By default, quorum server run-time messages go to stdout and stderr. It is suggested that you create a directory /var/adm/qs, then redirect stdout and stderr to a file in this directory, for example, /var/adm/qs/qs.
PAGE 198
Building an HA Cluster Configuration Installing Serviceguard Installing Serviceguard Installing Serviceguard includes updating the software via Software Distributor. It is assumed that you have already installed HP-UX. Use the following steps for each node: 1. Mount the distribution media in the tape drive or CD ROM reader. 2. Run Software Distributor, using the swinstall command. 3. Specify the correct input device. 4. Choose the following bundle from the displayed list: B3935DA Serviceguard 5.
PAGE 199
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM Creating a Storage Infrastructure with LVM In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes. This is done with Logical Volume Manager (LVM), VERITAS Cluster Volume Manager (CVM), or VERITAS Volume Manager (VxVM). You can also use a mixture of volume types, depending on your needs.
PAGE 200
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM Using SAM to Create Volume Groups and Logical Volumes You can use SAM to prepare the volume group and logical volume structure needed for HA packages. In SAM, choose the “Disks and File Systems Area.” Then use the following procedure for each volume group and file system you are using with the package: 1. Select the Volume Groups subarea. 2. From the Actions menu, choose Create or Extend. 3.
PAGE 201
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM Creating Physical Volumes On the configuration node (ftsys9), use the pvcreate command to define disks as physical volumes. This only needs to be done on the configuration node.
PAGE 202
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM Creating Logical Volumes Use the following command to create logical volumes (the example is for /dev/vgdatabase): # lvcreate -L 120 -m 1 -s g /dev/vgdatabase This command creates a 120 MB mirrored volume named lvol1. The name is supplied by default, since no name is specified in the command. The -s g option means that mirroring is PVG-strict, that is, the mirror copies of data will be in different physical volume groups.
PAGE 203
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM Creating Volume Groups for Disk Arrays Using PV Links If you are configuring volume groups that use mass storage on HP's HA disk arrays, you should use redundant I/O channels from each node, connecting them to separate ports on the array. Then you can define alternate links (also called PV links) to the LUNs or logical disks you have defined on the array.
PAGE 204
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM /dev/dsk/c0t15d0 /dev/dsk/c1t3d0 Use the following steps to configure a volume group for this logical disk: 1. First, set up the group directory for vgdatabase: # mkdir /dev/vgdatabase 2.
PAGE 205
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM Distributing Volume Groups to Other Nodes After creating volume groups for cluster data, you must make them available to any cluster node that will need to activate the volume group. The cluster lock volume group must be made available to all nodes. Deactivating the Volume Group At the time you create the volume group, it is active on the configuration node (ftsys9, for example).
PAGE 206
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM # ls -l /dev/*/group 5. Import the volume group data using the map file from node ftsys9. On node fsys10, enter: # vgimport -s -m /tmp/vgdatabase.map /dev/vgdatabase Note that the disk device names on ftsys10 may be different from their names on ftsys9. You should check to ensure that the physical volume names are correct throughout the cluster.
PAGE 207
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM 9. Mount and verify the volume group on ftsys10: # mount /dev/vgdatabase/lvol1 /mnt1 10. Unmount the volume group on ftsys10: # umount /mnt1 11. Deactivate the volume group on ftsys10: # vgchange -a n /dev/vgdatabase Making Physical Volume Group Files Consistent Skip ahead to the next section if you do not use physical volume groups for mirrored individual disks in your disk configuration.
PAGE 208
Building an HA Cluster Configuration Creating a Storage Infrastructure with LVM Creating Additional Volume Groups The foregoing sections show in general how to create volume groups and logical volumes for use with Serviceguard. Repeat the procedure for as many volume groups as you need to create, substituting other volume group names, logical volume names, and physical volume names. Pay close attention to the disk device names.
PAGE 209
Building an HA Cluster Configuration Creating a Storage Infrastructure with VxVM Creating a Storage Infrastructure with VxVM In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes. This is done with Logical Volume Manager (LVM), VERITAS Volume Manager (VxVM), or VERITAS Cluster Volume Manager (CVM). You can also use a mixture of volume types, depending on your needs.
PAGE 210
Building an HA Cluster Configuration Creating a Storage Infrastructure with VxVM Note that you should create a root disk group only once on each node. Converting Disks from LVM to VxVM You can use the vxvmconvert(1m) utility to convert LVM volume groups into VxVM disk groups. Before you can do this, the volume group must be deactivated, which means that any package that uses the volume group must be halted. Follow the conversion procedures outlined in the VERITAS Volume Manager Migration Guide.
PAGE 211
Building an HA Cluster Configuration Creating a Storage Infrastructure with VxVM # pvremove /dev/rdsk/c0t3d2 # pvcreate -f /dev/rdsk/c0t3d2 Then, use the vxdiskadm program to initialize multiple disks for VxVM, or use the vxdisksetup command to initialize one disk at a time, as in the following example: # /usr/lib/vxvm/bin/vxdisksetup -i c0t3d2 Creating Disk Groups Use vxdiskadm, or use the vxdg command, to create disk groups, as in the following example: # vxdg init logdata c0t3d2 Verify the configuration
PAGE 212
Building an HA Cluster Configuration Creating a Storage Infrastructure with VxVM v pl NOTE logdata fsgen logdata-01 system ENABLED ENABLED 1024000 1024000 ACTIVE ACTIVE The specific commands for creating mirrored and multi-path storage using VxVM are described in the VERITAS Volume Manager Reference Guide. Creating File Systems If your installation uses file systems, create them next. Use the following commands to create a file system for mounting on the logical volume just created: 1.
PAGE 213
Building an HA Cluster Configuration Creating a Storage Infrastructure with VxVM # vxdctl enable Re-Importing Disk Groups After deporting disk groups, they are not available for use on the node until they are imported again either by a package control script or with a vxdg import command.
PAGE 214
Building an HA Cluster Configuration Configuring the Cluster Configuring the Cluster This section describes how to define the basic cluster configuration. To do this in Serviceguard Manager, the graphical user interface, read the next section. If you want to use Serviceguard commands, skip ahead to the section entitled “Using Serviceguard Commands to Configure the Cluster.” Using Serviceguard Manager to Configure the Cluster Create a session on Serviceguard Manager.
PAGE 215
Building an HA Cluster Configuration Configuring the Cluster Using Serviceguard Commands to Configure the Cluster Use the cmquerycl command to specify a set of nodes to be included in the cluster and to generate a template for the cluster configuration file. Node names must be 31 bytes or less. Here is an example of the command: # cmquerycl -v -C /etc/cmcluster/clust1.config -n ftsys9 -n ftsys10 The example creates an ASCII template file in the default cluster configuration directory, /etc/cmcluster.
PAGE 216
Building an HA Cluster Configuration Configuring the Cluster Cluster Configuration Template File The following is an example of an ASCII configuration file generated with the cmquerycl command using the -w full option: # # # # # # # ********************************************************************** ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE *************** ***** For complete details about cluster parameters and how to ******* ***** set them, consult the Serviceguard manual.
PAGE 217
Building an HA Cluster Configuration Configuring the Cluster # # # # # # # # # # # # # # # # # # # # # # # # The default quorum server timeout is calculated from the Serviceguard cluster parameters, including NODE_TIMEOUT and HEARTBEAT_INTERVAL. If you are experiencing quorum server timeouts, you can adjust these parameters, or you can include the QS_TIMEOUT_EXTENSION parameter.
PAGE 218
Building an HA Cluster Configuration Configuring the Cluster # Warning: There are no standby network interfaces for lan0. NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP # List of serial device # For example: # SERIAL_DEVICE_FILE lodi lan0 15.13.168.94 file names /dev/tty0p0 # Warning: There are no standby network interfaces for lan0. # Cluster Timing Parameters (microseconds). # # # # # # # # # The NODE_TIMEOUT parameter defaults to 2000000 (2 seconds).
PAGE 219
Building an HA Cluster Configuration Configuring the Cluster # # # # # To enable Failover Optimization, set FAILOVER_OPTIMIZATION to TWO_NODE. The default is NONE. FAILOVER_OPTIMIZATION FAILOVER_OPTIMIZATION NONE # Configuration/Reconfiguration Timing Parameters (microseconds). AUTO_START_TIMEOUT 600000000 NETWORK_POLLING_INTERVAL 2000000 # Network Monitor Configuration Parameters. # The NETWORK_FAILURE_DETECTION parameter determines how LAN card failures are detec ted.
PAGE 220
Building an HA Cluster Configuration Configuring the Cluster # # # # # # # # # # # # # # # # # # # # # # # # # # # # * MONITOR: read-only capabilities for the cluster and packages * PACKAGE_ADMIN: MONITOR, plus administrative commands for packages in the cluster * FULL_ADMIN: MONITOR and PACKAGE_ADMIN plus the administrative commands for the cluster. Access control policy does not set a role for configuration capability. To configure, a user must log on to one of the cluster’s nodes as root (UID=0).
PAGE 221
Building an HA Cluster Configuration Configuring the Cluster # # # # is also still supported for compatibility with earlier versions.) For example: OPS_VOLUME_GROUP /dev/vgdatabase OPS_VOLUME_GROUP /dev/vg02 The man page for the cmquerycl command lists the definitions of all the parameters that appear in this file. Many are also described in the “Planning” chapter. Modify your /etc/cmcluster/clust1.config file to your requirements, using the data on the cluster worksheet.
PAGE 222
Building an HA Cluster Configuration Configuring the Cluster NOTE You should not configure a second lock volume group or physical volume unless your configuration specifically requires it. See the discussion “Dual Cluster Lock” in the section “Cluster Lock” in Chapter 3.
PAGE 223
Building an HA Cluster Configuration Configuring the Cluster server # timeout, enter: # # QS_HOST qshost # QS_POLLING_INTERVAL 120000000 # QS_TIMEOUT_EXTENSION 2000000 Enter the QS_HOST, QS_POLLING_INTERVAL and, if desired, a QS_TIMEOUT_EXTENSION. Identifying Heartbeat Subnets The cluster ASCII file includes entries for IP addresses on the heartbeat subnet.
PAGE 224
Building an HA Cluster Configuration Configuring the Cluster NOTE Remember to tune HP-UX kernel parameters on each node to ensure that they are set high enough for the largest number of packages that will ever run concurrently on that node. Modifying Cluster Timing Parameters The cmquerycl command supplies default cluster timing parameters for HEARTBEAT_INTERVAL and NODE_TIMEOUT. Changing these parameters will directly impact the cluster’s reformation and failover times.
PAGE 225
Building an HA Cluster Configuration Configuring the Cluster Access Control Policies New in Serviceguard Version 11.16, Access Control Policies allow non-root user to use common administrative commands. Non-root users of Serviceguard Manager, the graphical user interface, need to have a configured access policy to view and to administer Serviceguard clusters, packages and packages. In new configurations, it is a good idea to immediately configure at least one monitor access policy.
PAGE 226
Building an HA Cluster Configuration Configuring the Cluster Both methods check the following: 226 • Network addresses and connections. • Cluster lock connectivity (if you are configuring a lock disk). • Validity of configuration parameters for the cluster and packages. • Uniqueness of names. • Existence and permission of scripts specified in the command line. • If all nodes specified are in the same heartbeat subnet. • If you specify the wrong configuration filename.
PAGE 227
Building an HA Cluster Configuration Configuring the Cluster If the cluster is online, the check also verifies that all the conditions for the specific change in configuration have been met. NOTE Using the -k option means that cmcheckconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmcheckconf tests the connectivity of all LVM disks on all nodes.
PAGE 228
Building an HA Cluster Configuration Configuring the Cluster Using the -k option means that cmapplyconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmapplyconf tests the connectivity of all LVM disks on all nodes. Using -k can result in significantly faster operation of the command. NOTE • Deactivate the cluster lock volume group.
PAGE 229
Building an HA Cluster Configuration Configuring the Cluster NOTE You must use the vgcfgbackup command to store a copy of the cluster lock disk's configuration data whether you created the volume group using SAM or using HP-UX commands. If the cluster lock disk ever needs to be replaced while the cluster is running, you must use the vgcfgrestore command to restore lock information to the replacement disk.
PAGE 230
Building an HA Cluster Configuration Creating a Storage Infrastructure with CVM Creating a Storage Infrastructure with CVM In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes. This is done with Logical Volume Manager (LVM), VERITAS Volume Manager (VxVM), or VERITAS Cluster Volume Manager (CVM). You can also use a mixture of volume types, depending on your needs.
PAGE 231
Building an HA Cluster Configuration Creating a Storage Infrastructure with CVM # vxinstall This displays a menu-driven program that steps you through the VxVM/CVM initialization sequence. From the main menu, choose the “Custom” option, and specify the disk you wish to include in rootdg. IMPORTANT The rootdg in the VERITAS Volume Manager is not the same as the HP-UX root disk if an LVM volume group is used for the HP-UX root file system (/). Note also that rootdg cannot be used for shared storage.
PAGE 232
Building an HA Cluster Configuration Creating a Storage Infrastructure with CVM WARNING The VxVM-CVM-pkg.conf file should never be edited. After this command completes successfully, you can create disk groups for shared use as described in the following sections. The cluster is now running with a special system multi-node package named VxVM-CVM-pkg, which is on all nodes.
PAGE 233
Building an HA Cluster Configuration Creating a Storage Infrastructure with CVM Initializing Disks for CVM You need to initialize the physical disks that will be employed in CVM disk groups. If a physical disk has been previously used with LVM, you should use the pvremove command to delete the LVM header data from all the disks in the volume group (this is not necessary if you have not previously used the disk with LVM).
PAGE 234
Building an HA Cluster Configuration Creating a Storage Infrastructure with CVM Verify the configuration with the following command: # vxdg list Mirror Detachment Policies with CVM The default CVM disk mirror detachment policy is ‘global’, which means that as soon as one node cannot see a specific mirror copy (plex), all nodes cannot see it as well.
PAGE 235
Building an HA Cluster Configuration Creating a Storage Infrastructure with CVM Adding Disk Groups to the Package Configuration After creating units of storage with VxVM commands, you need to specify the CVM disk groups in each package configuration ASCII file. Use one DISK_GROUP parameter for each disk group the package will use. You also need to identify the CVM disk groups, file systems, logical volumes, and mount options in the package control script.
PAGE 236
Building an HA Cluster Configuration Managing the Running Cluster Managing the Running Cluster This section describes some approaches to routine management of the cluster. Additional tools and suggestions are found in Chapter 7, “Cluster and Package Maintenance.” Checking Cluster Operation with Serviceguard Manager Serviceguard Manager lets you see all the nodes and packages within a cluster and displays their current status. Refer to the section on “Using Serviceguard Manager” in Chapter 7.
PAGE 237
Building an HA Cluster Configuration Managing the Running Cluster • cmrunnode is used to start a node. A non-root user with the role of Full Admin, can run this command from a cluster node or through Serviceguard Manager. • cmhaltnode is used to manually stop a running node. (This command is also used by shutdown(1m).) A non-root with the role of Full Admin can run this command from a cluster node or through Serviceguard Manager. • cmruncl is used to manually start a stopped cluster.
PAGE 238
Building an HA Cluster Configuration Managing the Running Cluster • Check the cluster membership on the map or tree to verify that the node has left the cluster. In Serviceguard Manager, open the map or tree or Cluster Properties. On the command line, use the cmviewcl command. • Start the node. In Serviceguard Manager use the Run Node command. On the command line, use the cmrunnode command.
PAGE 239
Building an HA Cluster Configuration Managing the Running Cluster Setting up Autostart Features Automatic startup is the process in which each node individually joins a cluster; Serviceguard provides a startup script to control the startup process. Automatic cluster start is the preferred way to start a cluster. No action is required by the system administrator. There are three cases: • The cluster is not running on any node, all cluster nodes must be reachable, and all must be attempting to start up.
PAGE 240
Building an HA Cluster Configuration Managing the Running Cluster This system is a node in a high availability cluster. Halting this system may cause applications and services to start up on another node in the cluster. You might wish to include a list of all cluster nodes in this message, together with additional cluster-specific information. The /etc/issue and /etc/motd files may be customized to include cluster-related information.
PAGE 241
Building an HA Cluster Configuration Managing the Running Cluster However, you should not try to restart Serviceguard, since data corruption might occur if the node were to attempt to start up a new instance of the application that is still running on the node. Instead of restarting the cluster, choose an appropriate time to shutdown and reboot the node, which will allow the applications to shut down and then permit Serviceguard to restart the cluster after rebooting.
PAGE 242
Building an HA Cluster Configuration Managing the Running Cluster 242 Chapter 5
PAGE 243
Configuring Packages and Their Services 6 Configuring Packages and Their Services In addition to configuring the cluster, you need to identify the applications and services that you wish to group into packages.
PAGE 244
Configuring Packages and Their Services Creating the Package Configuration Creating the Package Configuration The package configuration process defines a set of application services that are run by the package manager when a package starts up on a node in the cluster. The configuration also includes a prioritized list of cluster nodes on which the package can run together with definitions of the acceptable types of failover allowed for the package.
PAGE 245
Configuring Packages and Their Services Creating the Package Configuration If you want, you can configure your control script yourself. This may be necessary if you do not use a standard control script. However, once you edit a control script yourself, Serviceguard Manager will never be able to see or modify it again. If you choose to edit the control script, you must also distribute it yourself. See “Configuring in Stages” on page 245.
PAGE 246
Configuring Packages and Their Services Creating the Package Configuration 7. Distribute the control script to all nodes. 8. Run the package and ensure that applications run as expected and that the package fails over correctly when services are disrupted. Package Configuration Template File The following is a sample package configuration file template customized for a typical package. Use the information on the Package Configuration worksheet to complete the file.
PAGE 247
Configuring Packages and Their Services Creating the Package Configuration # Since SYSTEM_MULTI_NODE packages run on multiple nodes at # one time, following parameters are ignored: # # FAILOVER_POLICY # FAILBACK_POLICY # Since an IP address can not be assigned to more than node at a # time, relocatable IP addresses can not be assigned in the # package control script for multiple node packages.
PAGE 248
Configuring Packages and Their Services Creating the Package Configuration # # # # # # # # # # # # # Enter the names of the nodes configured for this package. this line as necessary for additional adoptive nodes. NOTE: Repeat The order is relevant. Put the second Adoptive Node after the first one. Example : NODE_NAME NODE_NAME original_node adoptive_node If all nodes in the cluster are to be specified and order is not important, "NODE_NAME *" may be specified.
PAGE 249
Configuring Packages and Their Services Creating the Package Configuration # # # # # # # # # # # Enter the complete path for the run and halt scripts. In most cases the run script and halt script specified here will be the same script, the package control script generated by the cmmakepkg command. This control script handles the run(ning) and halt(ing) of the package. Enter the timeout, specified in seconds, for the run and halt scripts.
PAGE 250
Configuring Packages and Their Services Creating the Package Configuration # default will be NO. # # SERVICE_HALT_TIMEOUT is represented as a number of seconds. # This timeout is used to determine the length of time (in # seconds) the cluster software will wait for the service to # halt before a SIGKILL signal is sent to force the termination # of the service. In the event of a service halt, the cluster # software will first send a SIGTERM signal to terminate the # service.
PAGE 251
Configuring Packages and Their Services Creating the Package Configuration # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # The RESOURCE_POLLING_INTERVAL indicates how often, in seconds, the resource is to be monitored. It will be defaulted to 60 seconds if RESOURCE_POLLING_INTERVAL is not specified. The RESOURCE_START option can be set to either AUTOMATIC or DEFERRED. The default setting for RESOURCE_START is AUTOMATIC.
PAGE 252
Configuring Packages and Their Services Creating the Package Configuration # RESOURCE_POLLING_INTERVAL 120 # RESOURCE_START AUTOMATIC # RESOURCE_UP_VALUE = RUNNING # RESOURCE_UP_VALUE = ONLINE # # Means that the value of resource /net/interfaces/lan/status/lan0 # will be checked every 120 seconds, and is considered to # be ’up’ when its value is "RUNNING" or "ONLINE". ## Uncomment the following lines to specify Package Resource Dependencies.
PAGE 253
Configuring Packages and Their Services Creating the Package Configuration • FAILBACK_POLICY. Enter either MANUAL or AUTOMATIC. • NODE_NAME. Enter the name of each node in the cluster on a separate line. • AUTO_RUN. Enter YES to allow the package to start on the first available node, or NO to keep the package from automatic startup. • LOCAL_LAN_FAILOVER_ALLOWED. Enter YES to permit switching of the package IP address to a standby LAN, or NO to keep the package from switching locally.
PAGE 254
Configuring Packages and Their Services Creating the Package Configuration • RESOURCE_UP_VALUE. Enter the value or values that determine when the resource is considered to be up. During monitoring, if a different value is found for the resource, the package will fail. • RESOURCE_START. The RESOURCE_START option is used to determine when Serviceguard should start up resource monitoring for EMS resources. The RESOURCE_START option can be set to either AUTOMATIC or DEFERRED.
PAGE 255
Configuring Packages and Their Services Creating the Package Configuration There must be no redundancy or conflict in roles. If there is, configuration will fail and you will get a message. It is a good idea, therefore, to look at the cluster configuration file (cmgetconf) before creating any roles in the package’s file. If a role is configured in the cluster, do not configure a role for the same username/hostnode in the package. The exception is wildcards.
PAGE 256
Configuring Packages and Their Services Writing the Package Control Script Writing the Package Control Script The package control script contains all the information necessary to run all the services in the package, monitor them during operation, react to a failure, and halt the package when necessary. You can use Serviceguard Manager (in Guided Mode) to create the control script as it creates your package configuration. You can also use HP-UX commands to create or modify the package control script.
PAGE 257
Configuring Packages and Their Services Writing the Package Control Script # cmmakepkg -s /etc/cmcluster/pkg1/pkg1.sh You may customize the script, as described in the section “Customizing the Package Control Script.” Creating Packages For Database Products To coordinate the startup and shutdown of database software with cluster node startup and shutdown, you can use the database template files provided with the separately purchasable Enterprise Cluster Master Toolkit product (B5139DA).
PAGE 258
Configuring Packages and Their Services Writing the Package Control Script NOTE • Add the names of logical volumes and file systems that will be mounted on them. • Select the appropriate options for the storage activation command (not applicable for basic VxVM disk groups), and also include options for mounting filesystems, if desired. • Specify the filesystem mount retry and unmount count options.
PAGE 259
Configuring Packages and Their Services Writing the Package Control Script How Control Scripts Manage VxVM Disk Groups VxVM disk groups outside CVM are outside the control of the Serviceguard cluster. The package control script uses standard VxVM commands to import and deport these disk groups. (For details on importing and deporting disk groups, refer to the discussion of the import and deport options in the vxdg man page.
PAGE 260
Configuring Packages and Their Services Writing the Package Control Script disk in the disk group. However, if the node is part of a Serviceguard cluster then on reboot the host ID will be cleared by the owning node from all disks which have the noautoimport flag set, even if the disk group is not under Serviceguard control. This allows all cluster nodes, which have access to the disk group, to be able to import the disks as part of cluster operation.
PAGE 261
Configuring Packages and Their Services Writing the Package Control Script # # # # # # # ********************************************************************** * * * HIGH AVAILABILITY PACKAGE CONTROL SCRIPT (template) * * * * Note: This file MUST be edited before it can be used. * * * ********************************************************************** # # # # # The PACKAGE and NODE environment variables are set by Serviceguard at the time the control script is executed.
PAGE 262
Configuring Packages and Their Services Writing the Package Control Script # Leave the default # (CVM_ACTIVATION_CMD=”vxdg -g \$DiskGroup set activation=exclusivewrite”) # if you want disk groups activated in the exclusive write mode. # # Uncomment the first line # (CVM_ACTIVATION_CMD=”vxdg -g \$DiskGroup set activation=readonly”), # and comment out the default, if you want disk groups activated in # the readonly mode.
PAGE 263
Configuring Packages and Their Services Writing the Package Control Script # # The cvm disk group activation method is defined above. The filesystems # associated with these volume groups are specified below in the CVM_* # variables. # #CVM_DG[0]=”” # VxVM DISK GROUPS # Specify which VxVM disk groups are used by this package. Uncomment # VXVM_DG[0]=”” and fill in the name of your first disk group. You must # begin with VXVM_DG[0], and increment the list in sequence.
PAGE 264
Configuring Packages and Their Services Writing the Package Control Script # Specify the filesystems which are used by this package. Uncomment # LV[0]=””; FS[0]=””; FS_MOUNT_OPT[0]=””; FS_UMOUNT_OPT[0]=””; FS_FSCK_OPT[0]=”” # FS_TYPE[0]=”” and fill in the name of your first logical volume, # filesystem, mount, umount and fsck options and filesystem type # for the file system.
PAGE 265
Configuring Packages and Their Services Writing the Package Control Script # Specify the number of unmount attempts for each filesystem during package # shutdown. The default is set to 1. FS_UMOUNT_COUNT=1 # # # # # # # # # # # FILESYSTEM MOUNT RETRY COUNT. Specify the number of mount retrys for each filesystem. The default is 0. During startup, if a mount point is busy and FS_MOUNT_RETRY_COUNT is 0, package startup will fail and the script will exit with 1.
PAGE 266
Configuring Packages and Their Services Writing the Package Control Script # deactivations to allow during package startup or shutdown. # Setting this value to an appropriate number may improve the performance # while activating or deactivating a large number of volume groups in the # package. If the specified value is less than 1, the script defaults it # to 1 and proceeds with a warning message in the package control script # logfile.
PAGE 267
Configuring Packages and Their Services Writing the Package Control Script # # LV[1]=/dev/vg01/lvol2; FS[1]=/pkg01ab; FS_MOUNT_OPT[1]=”-o rw” # FS_UMOUNT_OPT[1]=”-s”; FS_FSCK_OPT[1]=”-s”; FS_TYPE[0]=”vxfs” # : : : # : : : # : : : # LV[49]=/dev/vg01/lvol50; FS[49]=/pkg01bx; FS_MOUNT_OPT[49]=”-o rw” # FS_UMOUNT_OPT[49]=”-s”; FS_FSCK_OPT[49]=”-s”; FS_TYPE[0]=”vxfs” # # IP ADDRESSES # Specify the IP and Subnet address pairs which are used by this package.
PAGE 268
Configuring Packages and Their Services Writing the Package Control Script # SERVICE NAMES AND COMMANDS. # Specify the service name, command, and restart parameters which are # used by this package. Uncomment SERVICE_NAME[0]=””, SERVICE_CMD[0]=””, # SERVICE_RESTART[0]=”” and fill in the name of the first service, command, # and restart parameters. You must begin with SERVICE_NAME[0],SERVICE_CMD[0], # and SERVICE_RESTART[0] and increment the list in sequence.
PAGE 269
Configuring Packages and Their Services Writing the Package Control Script # # # # HA_NFS_SCRIPT_EXTENSION is set to "package1.sh", for example, the name of the HA NFS script becomes "ha_nfs.package1.sh". In any case, the HA NFS script must be placed in the same directory as the package control script. This allows multiple packages to be run out of the # same directory, as needed by SGeSAP.
PAGE 270
Configuring Packages and Their Services Writing the Package Control Script An example of this portion of the script is shown below, with the date and echo commands included to log starts and halts of the package to a special file. # START OF CUSTOMER DEFINED FUNCTIONS # This function is a place holder for customer defined functions. # You should define all actions you want to happen here, before the service is # started. You can create as many functions as you need.
PAGE 271
Configuring Packages and Their Services Writing the Package Control Script time, Pkg1 tries to cmmodpkg Pkg2. However, that cmmodpkg command has to wait for Pkg2 startup to complete. Pkg2 tries to cmmodpkg Pkg1, but Pkg2 has to wait for Pkg1 startup to complete, thereby causing a command loop. To avoid this situation, it is a good idea to always specify a RUN_SCRIPT_TIMEOUT and a HALT_SCRIPT_TIMEOUT for all packages, especially packages that use Serviceguard commands in their control scripts.
PAGE 272
Configuring Packages and Their Services Verifying the Package Configuration Verifying the Package Configuration Serviceguard automatically checks the configuration you enter and reports any errors. If Serviceguard Manager created the file, click the Check button or the Apply button. If you have edited an ASCII package configuration file, use the following command to verify the content of the file: # cmcheckconf -v -P /etc/cmcluster/pkg1/pkg1.config Errors are displayed on the standard output.
PAGE 273
Configuring Packages and Their Services Distributing the Configuration Distributing the Configuration You can use Serviceguard Manger or HP-UX commands to distribute the binary cluster configuration file among the nodes of the cluster. Distributing the Configuration File And Control Script with Serviceguard Manager When you have finished creating a package in Serviceguard Manager, click the Apply button. The binary configuration file will be created and automatically distributed to the cluster nodes.
PAGE 274
Configuring Packages and Their Services Distributing the Configuration # vgchange -a y /dev/vg01 • Generate the binary configuration file and distribute it across the nodes. # cmapplyconf -v -C /etc/cmcluster/cmcl.config -P \ /etc/cmcluster/pkg1/pkg1.config • If you are using a lock disk, deactivate the cluster lock volume group. # vgchange -a n /dev/vg01 The cmapplyconf command creates a binary version of the cluster configuration file and distributes it to all nodes in the cluster.
PAGE 275
Cluster and Package Maintenance 7 Cluster and Package Maintenance This chapter describes how to see cluster configuration and status information, how to start and halt a cluster or an individual node, how to perform permanent reconfiguration, and how to start, halt, move, and modify packages during routine maintenance of the cluster.
PAGE 276
Cluster and Package Maintenance Reviewing Cluster and Package Status Reviewing Cluster and Package Status You can check status using Serviceguard Manager or on a cluster node’s command line. Reviewing Cluster and Package Status with Serviceguard Manager Serviceguard Manager shows status several ways. Figure 7-1 Reviewing Status: Serviceguard Manager Map • 276 On the map, cluster object icons have borders to show problems. To the right of the icons, a badge gives information about the type of problem.
PAGE 277
Cluster and Package Maintenance Reviewing Cluster and Package Status • Figure 7-2 There are more details in the cluster, node, and package property sheets. Reviewing Status: Serviceguard Manager Property Sheet Reviewing Cluster and Package States with the cmviewcl Command Information about cluster status is stored in the status database, which is maintained on each individual node in the cluster.
PAGE 278
Cluster and Package Maintenance Reviewing Cluster and Package Status # cmviewcl -v You can issue the cmviewcl command with non-root access: In clusters with Serviceguard version 11.16 or later, create a Monitor role in the cluster configuration file. In earlier versions, add a non-root pair to the cmclnodelist file ( ).
PAGE 279
Cluster and Package Maintenance Reviewing Cluster and Package Status Node Status and State The status of a node is either up (active as a member of the cluster) or down (inactive in the cluster), depending on whether its cluster daemon is running or not. Note that a node might be down from the cluster perspective, but still up and running HP-UX. A node may also be in one of the following states: • Failed. A node never sees itself in this state.
PAGE 280
Cluster and Package Maintenance Reviewing Cluster and Package Status • Switching Enabled for a Node. Enabled means that the package can switch to the referenced node. Disabled means that the package cannot switch to the specified node until the node is enabled for the package using the cmmodpkg command. Every package is marked Enabled or Disabled for each node that is either a primary or adoptive node for the package. Service Status Services have only status, as follows: • Up.
PAGE 281
Cluster and Package Maintenance Reviewing Cluster and Package Status • MIN_PACKAGE_NODE. The package fails over to the node in the cluster with the fewest running packages on it. Packages can also be configured with one of two values for the FAILBACK_POLICY parameter: • AUTOMATIC. With this setting, a package, following a failover, returns to its primary node when the primary node becomes available again. • MANUAL.
PAGE 282
Cluster and Package Maintenance Reviewing Cluster and Package Status Subnet 15.13.168.0 up 0 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled NAME ftsys9 Alternate ftsys10 (cur rent) NODE ftsys10 up STATUS up enabled STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 28.1 32.
PAGE 283
Cluster and Package Maintenance Reviewing Cluster and Package Status CLUSTER example NODE ftsys9 STATUS up STATUS up STATE running Quorum Server Status: NAME STATUS lp-qs up ...
PAGE 284
Cluster and Package Maintenance Reviewing Cluster and Package Status SYSTEM_MULTI_NODE_PACKAGES: PACKAGE STATUS VxVM-CVM-pkg up STATE running NODE ftsys7 STATUS down STATE halted NODE ftsys8 STATUS down STATE halted NODE STATUS ftsys9 up Script_Parameters: ITEM STATUS Service up VxVM-CVM-pkg.srv NODE STATUS ftsys10 up Script_Parameters: ITEM STATUS Service up VxVM-CVM-pkg.
PAGE 285
Cluster and Package Maintenance Reviewing Cluster and Package Status Failback manual Script_Parameters: ITEM STATUS Service up MAX_RESTARTS 0 RESTARTS 0 NAME servi 0 0 15.13 ce1 Subnet .168.0 Resource ple/float up up /exam Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled NAME ftsys9 Alternate ftsys10 (cur rent) NODE ftsys10 up STATUS up Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up enabled STATE running PATH 28.1 32.
PAGE 286
Cluster and Package Maintenance Reviewing Cluster and Package Status NODE_TYPE Primary Alternate STATUS up up SWITCHING enabled enabled NAME ftsys10 ftsys9 Pkg2 now has the status “down”, and it is shown as in the unowned state, with package switching disabled. Resource “/example/float,” which is configured as a dependency of pkg2, is down on one node. Note that switching is enabled for both nodes, however.
PAGE 287
Cluster and Package Maintenance Reviewing Cluster and Package Status ple/float Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled NAME ftsys9 Alternate ftsys10 (cur rent) PACKAGE pkg2 up STATUS up enabled STATE running AUTO_RUN disabled NODE ftsys9 Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover configured_node Failback manual Script_Parameters: ITEM STATUS NAME RESTARTS Service up service2.1 0 Subnet up 15.13.168.
PAGE 288
Cluster and Package Maintenance Reviewing Cluster and Package Status # cmmodpkg -e pkg2 The output of the cmviewcl command is now as follows: CLUSTER example NODE ftsys9 PACKAGE pkg1 pkg2 NODE ftsys10 STATUS up STATUS up STATUS up up STATUS up STATE running STATE running running AUTO_RUN enabled enabled NODE ftsys9 ftsys9 STATE running Both packages are now running on ftsys9 and pkg2 is enabled for switching. Ftsys10 is running the daemon and no packages are running on ftsys10.
PAGE 289
Cluster and Package Maintenance Reviewing Cluster and Package Status CLUSTER example NODE ftsys9 STATUS up STATUS up Network_Parameters: INTERFACE STATUS PRIMARY up Serial_Heartbeat: DEVICE_FILE_NAME /dev/tty0p0 STATE running PATH 56/36.1 STATUS up NAME lan0 CONNECTED_TO: ftsys10 /dev/tty0 p0 NODE ftsys10 STATUS up Network_Parameters: INTERFACE STATUS PRIMARY up Serial_Heartbeat: DEVICE_FILE_NAME /dev/tty0p0 /dev/tty0p0 STATE running PATH 28.
PAGE 290
Cluster and Package Maintenance Reviewing Cluster and Package Status Serial_Heartbeat: DEVICE_FILE_NAME /dev/tty0p0 STATUS unknown CONNECTED_TO: ftsys9 /dev/tty0p0 The following shows status when the serial line is not working: CLUSTER example NODE ftsys9 STATUS up STATUS up Network_Parameters: INTERFACE STATUS PRIMARY up Serial_Heartbeat: DEVICE_FILE_NAME /dev/tty0p0 NODE STATUS ftsys10 up Network_Parameters: INTERFACE STATUS PRIMARY up Serial_Heartbeat: DEVICE_FILE_NAME /dev/tty0p0 /dev/tty0p0 STATE
PAGE 291
Cluster and Package Maintenance Reviewing Cluster and Package Status ITEM Resource Subnet Resource Subnet Resource Subnet Resource Subnet STATUS up up up up up up up up NODE_NAME manx manx burmese burmese tabby tabby persian persian NAME /resource/random 192.8.15.0 /resource/random 192.8.15.0 /resource/random 192.8.15.0 /resource/random 192.8.15.
PAGE 292
Cluster and Package Maintenance Managing the Cluster and Nodes Managing the Cluster and Nodes Managing the cluster involves the following tasks: • Starting the Cluster When All Nodes are Down • Adding Previously Configured Nodes to a Running Cluster • Removing Nodes from Operation in a Running Cluster • Halting the Entire Cluster In Serviceguard 11.
PAGE 293
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Manager to Start the Cluster Select the cluster icon, then right-click to display the action menu. Select “Run cluster .” The progress window shows messages as the action takes place. This will include messages for starting each node and package. Click OK on the progress window when the operation is complete.
PAGE 294
Cluster and Package Maintenance Managing the Cluster and Nodes Adding Previously Configured Nodes to a Running Cluster You can use Serviceguard Manager or Serviceguard commands to bring a configured node up within a running cluster. Using Serviceguard Manager to Add a Configured Node to the Cluster Select the node icon, then right-click to display the action menu. Select “Run node .” The progress window shows messages as the action takes place.
PAGE 295
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Manager to Remove a Node from the Cluster Select the node icon, then right-click to display the action menu. Select “Halt node ” The progress window shows messages as the action takes place. This will include moving any packages on the node to adoptive nodes, if appropriate. Click OK on the progress window when the operation is complete.
PAGE 296
Cluster and Package Maintenance Managing the Cluster and Nodes Using Serviceguard Manager to Halt the Cluster Select the cluster, then right-click to display the action menu. Select “Halt cluster .” The progress window shows messages as the action takes place. This will include messages for halting each package and node. Click OK on the progress window when the operation is complete. Using Serviceguard Commands to Halt a Cluster The cmhaltcl command can be used to halt the entire cluster.
PAGE 297
Cluster and Package Maintenance Managing Packages and Services Managing Packages and Services Managing packages and services involves the following tasks: • Starting a Package • Halting a Package • Moving a Package (halt, then start) • Changing Package Switching Behavior In Serviceguard 11.16 and later, these commands can be done by non-root users, according to access policies in the cluster’s configuration files.
PAGE 298
Cluster and Package Maintenance Managing Packages and Services Using Serviceguard Commands to Start a Package Use the cmrunpkg command to run the package on a particular node, then use the cmmodpkg command to enable switching for the package. Example: # cmrunpkg -n ftsys9 pkg1 # cmmodpkg -e pkg1 This starts up the package on ftsys9, then enables package switching. This sequence is necessary when a package has previously been halted on some node, since halting the package disables switching.
PAGE 299
Cluster and Package Maintenance Managing Packages and Services Using Serviceguard Manager to Move a Package The package must be running to start the operation. You can select the package on the map or tree and drag it with your mouse to another cluster node. Or, select the icon of the package you wish to halt, and right-click to display the action list. Select “Move package to node.” Or, select the package and go to the toolbar menu and choose Actions -> Administering.
PAGE 300
Cluster and Package Maintenance Managing Packages and Services the package first starts in the cluster. The initial setting for node switching is to allow switching to all nodes that are configured to run the package. Both node switching and package switching can be changed dynamically as the cluster is running. Changing Package Switching with Serviceguard Manager To change package switching or node switching in Serviceguard Manager, select the package on the tree or map.
PAGE 301
Cluster and Package Maintenance Managing Packages and Services See the subsequent section “Reconfiguring a Package on a Running Cluster” for detailed instructions on reconfiguration.
PAGE 302
Cluster and Package Maintenance Reconfiguring a Cluster Reconfiguring a Cluster You can reconfigure a cluster either when it is halted or while it is still running. Some operations can only be done when the cluster is halted. Table 7-1 shows the required cluster state for many kinds of changes. Table 7-1 Types of Changes to Permanent Cluster Configuration Change to the Cluster Configuration 302 Required Cluster State Add a new node All cluster nodes must be running.
PAGE 303
Cluster and Package Maintenance Reconfiguring a Cluster Table 7-1 Types of Changes to Permanent Cluster Configuration Change to the Cluster Configuration Failover Optimization to enable or disable Faster Failover product Required Cluster State Cluster must not be running. Reconfiguring a Halted Cluster You can make a permanent change in cluster configuration when the cluster is halted.
PAGE 304
Cluster and Package Maintenance Reconfiguring a Cluster When halted, select the cluster in the map or tree. From the Actions menu, select Configuring Serviceguard. When the Configuring Cluster window opens, click the Parameters tab. Enter the new number. Click Apply. Close the configuration window. (After refresh, check the cluster’s Properties to see the change.) Using Serviceguard Commands to Change MAX_CONFIGURED_PACKAGES The cluster must be halted.
PAGE 305
Cluster and Package Maintenance Reconfiguring a Cluster packages that depend upon that node, the package configuration must also be modified to delete the node. This all must be done in one configuration request (cmapplyconf command). Changes to the package configuration are described in a later section. The following sections describe how to perform dynamic reconfiguration tasks using Serviceguard Manager or Serviceguard commands.
PAGE 306
Cluster and Package Maintenance Reconfiguring a Cluster Use cmrunnode to start the new node, and, if desired, set the AUTOSTART_CMCLD parameter to 1 in the /etc/rc.config.d/cmcluster file to enable the new node to join the cluster automatically each time it reboots. NOTE If you add a node to a running cluster that uses CVM disk groups, the disk groups will be available for import when the node joins the cluster.
PAGE 307
Cluster and Package Maintenance Reconfiguring a Cluster 3. Edit the file clconfig.ascii to check the information about the nodes that remain in the cluster. 4. Verify the new configuration: # cmcheckconf -C clconfig.ascii 5. Apply the changes to the configuration and send the new binary configuration file to all cluster nodes: # cmapplyconf -C clconfig.
PAGE 308
Cluster and Package Maintenance Reconfiguring a Cluster Using Serviceguard Commands to Change the LVM Configuration While the Cluster is Running Use the cmgetconf command to obtain a current copy of the cluster's existing configuration. Example: # cmgetconf -c clconfig.ascii Edit the file clconfig.ascii to add or delete volume groups. Then use the cmcheckconf command to verify the new configuration.
PAGE 309
Cluster and Package Maintenance Reconfiguring a Cluster Create CVM disk groups from this node. Open the configuration ASCII file of the package that uses the CVM storage; add the CVM storage group in a STORAGE_GROUP statement. Then issue the cmapplyconf command. Similarly, you can delete VxVM or CVM disk groups provided they are not being used by a cluster node at the time.
PAGE 310
Cluster and Package Maintenance Reconfiguring a Package Reconfiguring a Package The process of reconfiguration of a package is somewhat like the basic configuration described in Chapter 6. Refer to that chapter for details on the configuration process. The cluster can be either halted or running during package reconfiguration. The types of changes that can be made and the times when they take effect depend on whether the package is running or not.
PAGE 311
Cluster and Package Maintenance Reconfiguring a Package Reconfiguring a Package on a Running Cluster You can reconfigure a package while the cluster is running, and in some cases you can reconfigure the package while the package itself is running. Only certain changes may be made while the package is running. To modify the package in Serviceguard Manager, select it and then choose Configuring Serviceguard from the Actions menu. When the configuration window opens, choose options as described in Chapter 6.
PAGE 312
Cluster and Package Maintenance Reconfiguring a Package distributing the configuration with HP-UX commands. For example, to use HP-UX commands to verify the configuration of newly created pkg1 on a running cluster: # cmcheckconf -P /etc/cmcluster/pkg1/pkg1conf.ascii Use an HP-UX command like the following to distribute the new package configuration to all nodes in the cluster: # cmapplyconf -P /etc/cmcluster/pkg1/pkg1conf.
PAGE 313
Cluster and Package Maintenance Reconfiguring a Package NOTE The maximum number of allowable restarts for a given service is set in the package control script parameter SERVICE_RESTART[]. This parameter is not the same as the restart counter, which is maintained separately by the package manager. When a package service successfully restarts after several attempts, the package manager does not automatically reset the restart count.
PAGE 314
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package 314 Required Package State Remove a subnet Package must not be running. Add a resource Package must not be running. Remove a resource Package must not be running. Add a volume group Volume group may be configured into the cluster while the cluster is running. The package may be in any state, because the change is made in the control script.
PAGE 315
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package Chapter 7 Required Package State Change the order of nodes where a package may run Package may be either running or halted. Change the Package Failover Policy Package may be either running or halted. Change the Package Failback Policy Package may be either running or halted. Change access policy Package may be either running or halted.
PAGE 316
Cluster and Package Maintenance Responding to Cluster Events Responding to Cluster Events Serviceguard does not require much ongoing system administration intervention. As long as there are no failures, your cluster will be monitored and protected. In the event of a failure, those packages that you have designated to be transferred to another node will be transferred automatically.
PAGE 317
Cluster and Package Maintenance Removing Serviceguard from a System Removing Serviceguard from a System If you wish to remove a node from Serviceguard use, use the swremove command to delete the software. Note the following: Chapter 7 • The cluster should not be running on the node from which you will be deleting Serviceguard. • The node from which you are deleting Serviceguard should not be in the cluster configuration.
PAGE 318
Cluster and Package Maintenance Removing Serviceguard from a System 318 Chapter 7
PAGE 319
Troubleshooting Your Cluster 8 Troubleshooting Your Cluster This chapter describes how to verify cluster operation, how to review cluster status, how to add and replace hardware, and how to solve some typical cluster problems.
PAGE 320
Troubleshooting Your Cluster Testing Cluster Operation Testing Cluster Operation Once you have configured your Serviceguard cluster, you should verify that the various components of the cluster behave correctly in case of a failure. In this section, the following procedures test that the cluster responds properly in the event of a package failure, a node failure, or a LAN failure.
PAGE 321
Troubleshooting Your Cluster Testing Cluster Operation 4. Move the package back to the primary node using Serviceguard: Select the package. From the Actions menu, choose Administering Serviceguard -> Move Package. Testing the Cluster Manager To test that the cluster manager is operating correctly, perform the following steps for each node on the cluster: 1. Turn off the power to the node SPU. 2.
PAGE 322
Troubleshooting Your Cluster Testing Cluster Operation 2. Disconnect the LAN connection from the Primary card. (Be careful not to break the subnet if you are using ThinLAN cables.) 3. Verify that a local switch has taken place so that the Standby card is now the Primary card. In Serviceguard Manager, check the cluster properties. Or, on the command line, use the cmviewcl -v command. 4. Reconnect the LAN to the original Primary card, and verify its status.
PAGE 323
Troubleshooting Your Cluster Monitoring Hardware Monitoring Hardware Good standard practice in handling a high availability system includes careful fault monitoring so as to prevent failures if possible or at least to react to them swiftly when they occur.
PAGE 324
Troubleshooting Your Cluster Monitoring Hardware Refer to the EMS Hardware Monitors User’s Guide (B6191-90020) for additional information. Hardware Monitors and Persistence Requests When hardware monitors are disabled using the monconfig tool, associated hardware monitor persistent requests are removed from the persistence files. When hardware monitoring is re-enabled, the monitor requests that were initialized using the monconfig tool are re-created.
PAGE 325
Troubleshooting Your Cluster Replacing Disks Replacing Disks The procedure for replacing a faulty disk mechanism depends on the type of disk configuration you are using. Separate descriptions are provided for replacing an array mechanism and a disk in a high availability enclosure. Replacing a Faulty Array Mechanism With any HA disk array configured in RAID 1 or RAID 5, refer to the array’s documentation for instruction on how to replace a faulty mechanism.
PAGE 326
Troubleshooting Your Cluster Replacing Disks # vgcfgrestore -n /dev/vg_sg01 /dev/dsk/c2t3d0 6. Issue the following command to extend the logical volume to the newly inserted disk: # lvextend -m 1 /dev/vg_sg01 /dev/dsk/c2t3d0 7. Finally, use the lvsync command for each logical volume that has extents on the failed physical volume. This synchronizes the extents of the new disk with the extents of the other mirror.
PAGE 327
Troubleshooting Your Cluster Replacing Disks ILT cable is connected to an HBA, then termination must be disabled on that HBA. Disabling the termination is done on the HBA by removing the termination resistor packs, setting the appropriate DIP switches on the HBA, or by programmatically disabling termination, depending on the HBA being used. (Consult the documentation for the HBA to see which method works for a particular HBA.
PAGE 328
Troubleshooting Your Cluster Replacing Disks Figure 8-1 F/W SCSI-2 Buses with In-line Terminators The use of in-line SCSI terminators allows you to do hardware maintenance on a given node by temporarily moving its packages to another node and then halting the original node while its hardware is serviced. Following the replacement, the packages can be moved back to the original node.
PAGE 329
Troubleshooting Your Cluster Replacing Disks 5. Replace or upgrade hardware on the node, as needed. 6. Halt all nodes connected to the shared SCSI bus, and power them down. 7. Reconnect the node to the in-line terminator cable or Y cable if necessary. 8. Power-on the nodes in the cluster. If AUTOSTART_CMCLD is set to 1 in the /etc/rc.config.d/cmcluster file, the nodes will automatically start the cluster and the packages. 9.
PAGE 330
Troubleshooting Your Cluster Replacement of I/O Cards Replacement of I/O Cards After an I/O card failure, you can replace the card using the following steps. It is not necessary to bring the cluster down to do this if you are using SCSI inline terminators or Y cables at each node. 1. Halt the node. In Serviceguard Manager, select the node; from the Actions menu, choose Administering Serviceguard -> Halt node. Or, from the Serviceguard command line, use the cmhaltnode command.
PAGE 331
Troubleshooting Your Cluster Replacement of LAN Cards Replacement of LAN Cards If you have a LAN card failure, which requires the LAN card to be replaced, you can replace it on-line or off-line depending on the type of hardware and operating system you are running. It is not necessary to bring the cluster down to do this. Off-Line Replacement The following steps show how to replace a LAN card off-line. These steps apply to both HP-UX 11.0 and 11i: 1. Halt the node by using the cmhaltnode command. 2.
PAGE 332
Troubleshooting Your Cluster Replacement of LAN Cards After Replacing the Card After the on-line or off-line replacement of LAN cards has been done, Serviceguard will detect that the MAC address (LLA) of the card has changed from the value stored in the cluster binary configuration file, and it will notify the other nodes in the cluster of the new MAC address. The cluster will operate normally after this.
PAGE 333
Troubleshooting Your Cluster Replacing a Failed Quorum Server System Replacing a Failed Quorum Server System When a quorum server fails or becomes unavailable to the clusters it is providing quorum services for, this will not cause a failure on any cluster. However, the loss of the quorum server does increase the vulnerability of the clusters in case there is an additional failure. Use the following procedure to replace a defective quorum server system.
PAGE 334
Troubleshooting Your Cluster Replacing a Failed Quorum Server System The command will output an error message if the specified nodes cannot communicate with the quorum server.
PAGE 335
Troubleshooting Your Cluster Troubleshooting Approaches Troubleshooting Approaches The following sections offer a few suggestions for troubleshooting by reviewing the state of the running system and by examining cluster status data, log files, and configuration files.
PAGE 336
Troubleshooting Your Cluster Troubleshooting Approaches lan0 305189 lan0 305189 lan1* 41716 1500 15.13.168 15.13.171.23 959269 1500 15.13.168 15.13.171.20 959269 1500 none none 418623 Reviewing the System Log File Messages from the Cluster Manager and Package Manager are written to the system log file. The default location of the log file is /var/adm/syslog/syslog.log. You can use a text editor, such as vi, or the more command to view the log file for historical information on your cluster.
PAGE 337
Troubleshooting Your Cluster Troubleshooting Approaches Dec 14 17:33:53 star04 cmlvmd[2049]: Clvmd initialized success fully. Dec 14 14:34:44 star04 CM-CMD[2054]: cmrunpkg -v pkg5 Dec 14 14:34:44 star04 cmcld[2048]: Request from node star04 t o start package pkg5 on node star04. Dec 14 14:34:44 star04 cmcld[2048]: Executing '/etc/cmcluster/ pkg5/pkg5_run start' for package pkg5.
PAGE 338
Troubleshooting Your Cluster Troubleshooting Approaches Reviewing Object Manager Log Files The Serviceguard Object Manager daemon cmomd logs messages to the file /var/opt/cmom/cmomd.log. You can review these messages using the cmreadlog command, as follows: # cmreadlog /var/opt/cmom/cmomd.log Messages from cmomd include information about the processes that request data from the Object Manager, including type of data, timestamp, etc.
PAGE 339
Troubleshooting Your Cluster Troubleshooting Approaches script appears in the package configuration file, and ensure that all services named in the package configuration file also appear in the package control script. Information about the starting and halting of each package is found in the package’s control script log. This log provides the history of the operation of the package control script. It is found at /etc/cmcluster/package_name/control_script.log.
PAGE 340
Troubleshooting Your Cluster Troubleshooting Approaches node-specific items for all nodes in the cluster. cmscancl actually runs several different HP-UX commands on all nodes and gathers the output into a report on the node where you run the command.
PAGE 341
Troubleshooting Your Cluster Troubleshooting Approaches • landiag is useful to display, diagnose, and reset LAN card information. • linkloop verifies the communication between LAN cards at MAC address levels. For example, if you enter # linkloop -i4 0x08000993AB72 you should see displayed the following message: Link Connectivity to LAN station: 0x08000993AB72 OK • cmscancl can be used to verify that primary and standby LANs are on the same bridged net.
PAGE 342
Troubleshooting Your Cluster Solving Problems Solving Problems Problems with Serviceguard may be of several types. The following is a list of common categories of problem: • Serviceguard Command Hangs. • Cluster Re-formations. • System Administration Errors. • Package Control Script Hangs. • Problems with VxVM Disk Groups. • Package Movement Errors. • Node and Network Failures. • Quorum Server Problems.
PAGE 343
Troubleshooting Your Cluster Solving Problems Cluster Re-formations Cluster re-formations may occur from time to time due to current cluster conditions. Some of the causes are as follows: • local switch on an Ethernet LAN if the switch takes longer than the cluster NODE_TIMEOUT value. To prevent this problem, you can increase the cluster NODE_TIMEOUT value, or you can use a different LAN type. • excessive network traffic on heartbeat LANs.
PAGE 344
Troubleshooting Your Cluster Solving Problems • ioscan -fnC disk - to see physical disks. • diskinfo -v /dev/rdsk/cxtydz - to display information about a disk. • lssf /dev/dsk/*d0 - to check LV and paths. • vxdg list - to list VERITAS disk groups. • vxprint - to show VERITAS disk group details.
PAGE 345
Troubleshooting Your Cluster Solving Problems 2. Ensure that package IP addresses are removed from the system. This step is accomplished via the cmmodnet(1m) command. First determine which package IP addresses are installed by inspecting the output resulting from running the command netstat -in.
PAGE 346
Troubleshooting Your Cluster Solving Problems control script cannot be trusted to perform cleanup actions correctly, thus the script is terminated and the package administrator is given the opportunity to assess what cleanup steps must be taken. If you want the package to switch automatically in the event of a control script timeout, set the NODE_FAIL_FAST_ENABLED parameter to YES. (If you are using Serviceguard Manager, set Package Failfast to Enabled.
PAGE 347
Troubleshooting Your Cluster Solving Problems Once dg_01 has been deported from ftsys9, this package may be restarted via either cmmodpkg(1M) or cmrunpkg(1M). In the event that ftsys9 is either powered off or unable to boot, then dg_01 must be force imported. ******************* WARNING************************** The use of force import can lead to data corruption if ftsys9 is still running and has dg_01 imported.
PAGE 348
Troubleshooting Your Cluster Solving Problems Package Movement Errors These errors are similar to the system administration errors except they are caused specifically by errors in the package control script. The best way to prevent these errors is to test your package control script before putting your high availability application on line. Adding a “set -x” statement in the second line of your control script will give you details on where your script may be failing.
PAGE 349
Troubleshooting Your Cluster Solving Problems Since your cluster is unique, there are no cookbook solutions to all possible problems. But if you apply these checks and commands and work your way through the log files, you will be successful in identifying and solving problems. Troubleshooting Quorum Server Authorization File Problems The following kind of message in a Serviceguard node’s syslog file or in the output of cmviewcl -v may indicate an authorization problem: Access denied to quorum server 192.6.
PAGE 350
Troubleshooting Your Cluster Solving Problems Messages The coordinator node in Serviceguard sometimes sends a request to the quorum server to set the lock state. (This is different from a request to obtain the lock in tie-breaking.
PAGE 351
Serviceguard Commands A Serviceguard Commands The following is an alphabetical list of commands used for ServiceGuard cluster configuration and maintenance. Man pages for these commands are available on your system after installation. Table A-1 MC/ServiceGuard Commands Command cmapplyconf Description Verify and apply ServiceGuard cluster configuration and package configuration files.
PAGE 352
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmapplyconf (continued) Description It is recommended that the user run the cmgetconf command to get either the cluster ASCII configuration file or package ASCII configuration file whenever changes to the existing configuration are required. Note that cmapplyconf will verify and distribute cluster configuration or package files. It will not cause the cluster daemon to start or removed from the cluster configuration.
PAGE 353
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmdeleteconf Description Delete either the cluster or the package configuration. cmdeleteconf deletes either the entire cluster configuration, including all its packages, or only the specified package configuration. If neither cluster_name nor package_name is specified, cmdeleteconf will delete the local cluster’s configuration and all its packages.
PAGE 354
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmhaltcl Description Halt a high availability cluster. cmhaltcl causes all nodes in a configured cluster to stop their cluster daemons, optionally halting all packages or applications in the process. This command will halt all the daemons on all currently running systems. If the user only wants to shutdown a subset of daemons, the cmhaltnode command should be used instead. cmhaltnode Halt a node in a high availability cluster.
PAGE 355
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmhaltserv Description Halt a service from the high availability package halt script. This is not a command line executable command, it runs only from within the package control script. cmhaltserv is used in the high availability package halt script to halt a service. If any part of package is marked down, the package halt script is executed as part of the recovery process.
PAGE 356
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmmodnet Description Add or remove an address from a high availability cluster. cmmodnet is used in the high availability package control scripts to add or remove an IP_address from the current network interface running the given subnet_name. Extreme caution should be exercised when executing this command outside the context of the package control script.
PAGE 357
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmquerycl Description Query cluster or node configuration information. cmquerycl searches all specified nodes for cluster configuration and Logical Volume Manager (LVM) information. Cluster configuration information includes network information such as LAN interface, IP addresses, bridged networks and possible heartbeat networks. LVM information includes volume group (VG) interconnection and file system mount point information.
PAGE 358
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmrunnode Description Run a node in a high availability cluster. cmrunnode causes a node to start its cluster daemon to join the existing cluster Starting a node will not cause any active packages to be moved to the new node. However, if a package is DOWN, has its switching enabled, and is able to run on the new node, that package will automatically run there. cmrunpkg Run a high availability package.
PAGE 359
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmrunserv Description Run a service from the high availability package run script. This is not a command line executable command, it runs only from within the package control script. cmrunserv is used in the high availability package run script to run a service. If the service process dies, cmrunserv updates the status of the service to down.
PAGE 360
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmscancl Description Gather system configuration information from nodes with ServiceGuard installed. cmscancl is a configuration report and diagnostic tool which gathers system software and hardware configuration information from a list of nodes, or from all the nodes in a cluster.
PAGE 361
Serviceguard Commands Table A-1 MC/ServiceGuard Commands (Continued) Command cmstartres Description Starts resource monitoring on the local node for an EMS resource that is configured in a ServiceGuard package. cmstartres starts resource monitoring for an EMS resource on the local node. This resource must be configured in the specified package_name. cmstopres Stops resource monitoring on the local node for an EMS resource that is configured in a ServiceGuard package.
PAGE 362
Serviceguard Commands 362 Appendix A
PAGE 363
Enterprise Cluster Master Toolkit B Enterprise Cluster Master Toolkit The Enterprise Cluster Master Toolkit (ECMT) provides a group of example scripts and package configuration files for creating Serviceguard packages for several major database and internet software products. Each toolkit contains a README file that explains how to customize the package for your needs. ECMT for HP-UX 11iv 1, The ECMT can be installed on HP-UX 11i v1 (HP Product Number B5139EA)or 11i v2 (HP Product Number T1909BA).
PAGE 364
Enterprise Cluster Master Toolkit 364 Appendix B
PAGE 365
Designing Highly Available Cluster Applications C Designing Highly Available Cluster Applications This appendix describes how to create or port applications for high availability, with emphasis on the following topics: • Automating Application Operation • Controlling the Speed of Application Failover • Designing Applications to Run on Multiple Systems • Restoring Client Connections • Handling Application Failures • Minimizing Planned Downtime Designing for high availability means reducing the
PAGE 366
Designing Highly Available Cluster Applications Automating Application Operation Automating Application Operation Can the application be started and stopped automatically or does it require operator intervention? This section describes how to automate application operations to avoid the need for user intervention. One of the first rules of high availability is to avoid manual intervention.
PAGE 367
Designing Highly Available Cluster Applications Automating Application Operation Define Application Startup and Shutdown Applications must be restartable without manual intervention. If the application requires a switch to be flipped on a piece of hardware, then automated restart is impossible. Procedures for application startup, shutdown and monitoring must be created so that the HA software can perform these functions automatically.
PAGE 368
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Controlling the Speed of Application Failover What steps can be taken to ensure the fastest failover? If a failure does occur causing the application to be moved (failed over) to another node, there are many things the application can do to reduce the amount of time it takes to get the application back up and running.
PAGE 369
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Use Raw Volumes If your application uses data, use raw volumes rather than filesystems. Raw volumes do not require an fsck of the filesystem, thus eliminating one of the potentially lengthy steps during a failover. Evaluate the Use of JFS If a file system must be used, a JFS offers significantly faster file system recovery as compared to an HFS. However, performance of the JFS may vary with the application.
PAGE 370
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Keep Logs Small Some databases permit logs to be buffered in memory to increase online performance. Of course, when a failure occurs, any in-flight transaction will be lost. However, minimizing the size of this in-memory log will reduce the amount of completed transaction data that would be lost in case of failure.
PAGE 371
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Another example is an application where a clerk is entering data about a new employee. Suppose this application requires that employee numbers be unique, and that after the name and number of the new employee is entered, a failure occurs.
PAGE 372
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Design for Multiple Servers If you use multiple active servers, multiple service points can provide relatively transparent service to a client. However, this capability requires that the client be smart enough to have knowledge about the multiple servers and the priority for addressing them. It also requires access to the data of the failed server or replicated data.
PAGE 373
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Designing Applications to Run on Multiple Systems If an application can be failed to a backup node, how will it work on that different system? The previous sections discussed methods to ensure that an application can be automatically restarted. This section will discuss some ways to ensure the application can run on multiple systems.
PAGE 374
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Each application or package should be given a unique name as well as a relocatable IP address. Following this rule separates the application from the system on which it runs, thus removing the need for user knowledge of which system the application runs on. It also makes it easier to move the application among different systems in a cluster for for load balancing or other reasons.
PAGE 375
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Avoid Using SPU IDs or MAC Addresses Design the application so that it does not rely on the SPU ID or MAC (link-level) addresses. The SPU ID is a unique hardware ID contained in non-volatile memory, which cannot be changed. A MAC address (also known as a LANIC id) is a link-specific address associated with the LAN hardware.
PAGE 376
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Applications should not reference official hostnames or IP addresses. The official hostname and corresponding IP address for the hostname refer to the primary LAN card and the stationary IP address for that card.
PAGE 377
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems develop alternate means of verifying where they are running. For example, an application might check a list of hostnames that have been provided in a configuration file. Bind to a Fixed Port When binding a socket, a port address can be specified or one can be assigned dynamically.
PAGE 378
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems For TCP stream sockets, the TCP level of the protocol stack resolves this problem for the client since it is a connection-based protocol. On the client, TCP ignores the stationary IP address and continues to use the previously bound relocatable IP address originally used by the client. With UDP datagram sockets, however, there is a problem.
PAGE 379
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems must move together. If the applications’ data stores are in separate volume groups, they can switch to different nodes in the event of a failover. The application data should be set up on different disk drives and if applicable, different mount points. The application should be designed to allow for different disks and separate mount points. If possible, the application should not assume a specific mount point.
PAGE 380
Designing Highly Available Cluster Applications Restoring Client Connections Restoring Client Connections How does a client reconnect to the server after a failure? It is important to write client applications to specifically differentiate between the loss of a connection to the server and other application-oriented errors that might be returned. The application should take special action in case of connection loss.
PAGE 381
Designing Highly Available Cluster Applications Restoring Client Connections the retry to the current server should continue for the amount of time it takes to restart the server locally. This will keep the client from having to switch to the second server in the event of a application failure. • Use a transaction processing monitor or message queueing software to increase robustness.
PAGE 382
Designing Highly Available Cluster Applications Handling Application Failures Handling Application Failures What happens if part or all of an application fails? All of the preceding sections have assumed the failure in question was not a failure of the application, but of another component of the cluster. This section deals specifically with application problems.
PAGE 383
Designing Highly Available Cluster Applications Handling Application Failures ensure that the application is behaving correctly. If the application fails and it is not detected automatically, it might take hours for a user to determine the cause of the downtime and recover from it.
PAGE 384
Designing Highly Available Cluster Applications Minimizing Planned Downtime Minimizing Planned Downtime Planned downtime (as opposed to unplanned downtime) is scheduled; examples include backups, systems upgrades to new operating system revisions, or hardware replacements. For planned downtime, application designers should consider: • Reducing the time needed for application upgrades/patches.
PAGE 385
Designing Highly Available Cluster Applications Minimizing Planned Downtime Provide for Rolling Upgrades Provide for a “rolling upgrade” in a client/server environment. For a system with many components, the typical scenario is to bring down the entire system, upgrade every node to the new version of the software, and then restart the application on all the affected nodes. For large systems, this could result in a long downtime. An alternative is to provide for a rolling upgrade.
PAGE 386
Designing Highly Available Cluster Applications Minimizing Planned Downtime Providing Online Application Reconfiguration Most applications have some sort of configuration information that is read when the application is started. If to make a change to the configuration, the application must be halted and a new configuration file read, downtime is incurred. To avoid this downtime use configuration tools that interact with an application and make dynamic changes online.
PAGE 387
Integrating HA Applications with Serviceguard D Integrating HA Applications with Serviceguard The following is a summary of the steps you should follow to integrate an application into the Serviceguard environment: 1. Read the rest of this book, including the chapters on cluster and package configuration, and the appendix “Designing Highly Available Cluster Applications.” 2.
PAGE 388
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Checklist for Integrating HA Applications This section contains a checklist for integrating HA applications in both single and multiple systems. Defining Baseline Application Behavior on a Single System 1. Define a baseline behavior for the application on a standalone system: • Install the application, database, and other required resources on one of the systems.
PAGE 389
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications • Create the LVM infrastructure on the second system. • Add the appropriate users to the system. • Install the appropriate executables. • With the application not running on the first system, try to bring it up on the second system. You might use the script you created in the step above.
PAGE 390
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications • Repeat failover from node 2 back to node 1. 2. Be sure to test all combinations of application load during the testing. Repeat the failover processes under different application states such as heavy user load versus no user load, batch jobs vs online transactions, etc. 3. Record timelines of the amount of time spent during the failover for each application state.
PAGE 391
Rolling Software Upgrades E Rolling Software Upgrades You can upgrade the HP-UX operating system and the Serviceguard software one node at a time without bringing down your clusters. This process can also be used any time one system needs to be taken offline for hardware maintenance or patch installations. Until the process of upgrade is complete on all nodes, you cannot change the cluster configuration files, and you will not be able to use any of the features of the new Serviceguard release.
PAGE 392
Rolling Software Upgrades Steps for Rolling Upgrades Steps for Rolling Upgrades Use the following steps: 1. Halt the node you wish to upgrade. This will cause the node's packages to start up on an adoptive node. In Serviceguard Manager, select the node; from the Actions menu, choose Administering Serviceguard, Halt node. Or, on the Serviceguard command line, issue the cmhaltnode command. 2. Edit the /etc/rc.config.d/cmcluster file to include the following line: AUTOSTART_CMCLD = 0 3.
PAGE 393
Rolling Software Upgrades Steps for Rolling Upgrades Keeping Kernels Consistent If you change kernel parameters as a part of doing a rolling upgrade, be sure to change the parameters similarly on all nodes that can run the same packages in a failover scenario. Migrating cmclnodelist entries to A.11.16 The cmclnodelist file is deleted when you upgrade to Serviceguard Version A.11.16. The information in it is migrated to the new Access Control Policy form.
PAGE 394
Rolling Software Upgrades Example of Rolling Upgrade Example of Rolling Upgrade While you are performing a rolling upgrade warning messages may appear while the node is determining what version of software is running. This is a normal occurrence and not a cause for concern. The following example shows a simple rolling upgrade on two nodes running one package each, as shown in Figure E-1. (This and the following figures show the starting point of the upgrade as Serviceguard 10.10 and HP-UX 10.
PAGE 395
Rolling Software Upgrades Example of Rolling Upgrade Figure E-2 Running Cluster with Packages Moved to Node 2 Step 2. Upgrade node 1 to the next operating system release (in this example, HP-UX 11.00), and install the next version of Serviceguard (11.13), as shown in Figure E-3. Figure E-3 Node 1 Upgraded to HP-UX 11.00 Step 3. When upgrading is finished, enter the following command on node 1 to restart the cluster on node 1.
PAGE 396
Rolling Software Upgrades Example of Rolling Upgrade Figure E-4 Node 1 Rejoining the Cluster Step 4. Repeat the process on node 2. Halt the node, as follows: # cmhaltnode -f node2 This causes both packages to move to node 1. Then upgrade node 2 to HP-UX 11.00 and Serviceguard 11.13. Figure E-5 Running Cluster with Packages Moved to Node 1 Step 5. Move PKG2 back to its original node.
PAGE 397
Rolling Software Upgrades Example of Rolling Upgrade The cmmodpkg command re-enables switching of the package, which is disabled by the cmhaltpkg command. The final running cluster is shown in Figure E-6.
PAGE 398
Rolling Software Upgrades Limitations of Rolling Upgrades Limitations of Rolling Upgrades The following limitations apply to rolling upgrades: 398 • During rolling upgrade, you should issue Serviceguard commands (other than cmrunnode and cmhaltnode) only on a node containing the latest revision of the software. Performing tasks on a node containing an earlier revision of the software will not work or will cause inconsistent results.
PAGE 399
Blank Planning Worksheets F Blank Planning Worksheets This appendix reprints blank versions of the planning worksheets described in the chapter “Planning and Documenting an HA Cluster.” You can duplicate any of these worksheets that you find useful and fill them in as a part of the planning process.
PAGE 400
Blank Planning Worksheets Worksheet for Hardware Planning Worksheet for Hardware Planning HARDWARE WORKSHEET Page ___ of ____ =============================================================================== Node Information: Host Name _____________________ Series No _____________________ Memory Capacity ____________________ Number of I/O Slots ________________ =============================================================================== LAN Information: Name of Subnet _________ Name of IP Interface
PAGE 401
Blank Planning Worksheets Worksheet for Hardware Planning Attach a printout of the output from ioscan -f and lssf /dev/*dsk/*s2 after installing disk hardware and rebooting the system. Mark this printout to indicate which physical volume group each disk belongs to. .
PAGE 402
Blank Planning Worksheets Power Supply Worksheet Power Supply Worksheet POWER SUPPLY WORKSHEET Page ___ of ____ =============================================================================== SPU Power: Host Name _____________________ Power Supply _______________________ Host Name _____________________ Power Supply _______________________ =============================================================================== Disk Power: Disk Unit __________________________ Power Supply _____________________
PAGE 403
Blank Planning Worksheets Quorum Server Worksheet Quorum Server Worksheet Quorum Server Data: ============================================================================== QS Hostname: _________________IP Address: ______________________ ============================================================================== Quorum Services are Provided for: Cluster Name: ___________________________________________________________ Host Names ____________________________________________ Host Names _________________
PAGE 404
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet LVM Volume Group and Physical Volume Worksheet PHYSICAL VOLUME WORKSHEET Page ___ of ____ =============================================================================== Volume Group Name: ______________________________________________________ PV Link 1 PV Link2 Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:__
PAGE 405
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Appendix F 405
PAGE 406
Blank Planning Worksheets VxVM Disk Group and Disk Worksheet VxVM Disk Group and Disk Worksheet DISK GROUP WORKSHEET Page ___ of ____ ============================================================== ============= Disk Group Name: __________________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:___________________________________________________
PAGE 407
Blank Planning Worksheets VxVM Disk Group and Disk Worksheet Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _________________________________________________
PAGE 408
Blank Planning Worksheets Cluster Configuration Worksheet Cluster Configuration Worksheet =============================================================================== Name and Nodes: =============================================================================== Cluster Name: __________________________ OPS Version: _______________ Node Names: _________________________________________________________ Volume Groups (for packages):________________________________________ ===================================
PAGE 409
Blank Planning Worksheets Cluster Configuration Worksheet Host node: Role: =============================================================================== Appendix F 409
PAGE 410
Blank Planning Worksheets Package Configuration Worksheet Package Configuration Worksheet ============================================================================= Package Configuration File Data: ============================================================================= Package Name: ____________________________ Failover Policy:___________________________ Failback Policy: ____________________________ Primary Node: ______________________________ First Failover Node:_________________________ Addition
PAGE 411
Blank Planning Worksheets Package Control Script Worksheet Package Control Script Worksheet LVM Volume Groups: VG[0]_______________VG[1]________________VG[2]________________ VGCHANGE: ______________________________________________ CVM Disk Groups: CVM_DG[0]______________CVM_DG[1]_____________CVM_DG[2]_______________ CVM_ACTIVATION_CMD: ______________________________________________ VxVM Disk Groups: VXVM_DG[0]_____________VXVM_DG[1]____________VXVM_DG[2]_____________ =======================================
PAGE 412
Blank Planning Worksheets Package Control Script Worksheet Deferred Resources: Deferred Resource Name __________________ 412 Appendix F
PAGE 413
Migrating from LVM to VxVM Data Storage G Migrating from LVM to VxVM Data Storage This appendix describes how to migrate LVM volume groups to VxVM disk groups for use with the VERITAS Volume Manager (VxVM) or with the Cluster Volume Manager (CVM).
PAGE 414
Migrating from LVM to VxVM Data Storage Loading VxVM Loading VxVM Before you can begin migrating data, you must install the VERITAS Volume Manager software and all required VxVM licenses on all cluster nodes. This step requires each system to be rebooted, so it requires you to remove the node from the cluster before the installation, and restart the node after installation. This can be done as a part of a rolling upgrade procedure, described in Appendix E.
PAGE 415
Migrating from LVM to VxVM Data Storage Migrating Volume Groups Migrating Volume Groups The following procedure shows how to do the migration of individual volume groups for packages that are configured to run on a given node. It is recommended to convert all the volume groups for a package at the same time. It is assumed that VxVM software and an appropriate level of HP-UX and Serviceguard have been installed on the node, and that the node has rebooted and rejoined the cluster.
PAGE 416
Migrating from LVM to VxVM Data Storage Migrating Volume Groups As an alternative to defining the VxVM disk groups on a new set of disks, it is possible to convert existing LVM volume groups into VxVM disk groups in line using the vxvmconvert(1M) utility. This utility is described along with its limitations and cautions in the VERITAS Volume Manager Release Notes, available from http://www.docs.hp.com. If using the vxconvert(1M) utility, then skip the next step and go ahead to the following section.
PAGE 417
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM Customizing Packages for VxVM After creating the VxVM disk group, you need to customize the Serviceguard package that will access the storage. Use the following procedure for disk groups that will be used with the VERITAS Volume Manager (VxVM). If you are using the Cluster Volume Manager (CVM), skip ahead to the next section. 1. Rename the old package control script as follows: # mv Package.ctl Package.ctl.bak 2.
PAGE 418
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM FS[0]="/mnt_dg0101" FS[1]="/mnt_dg0102" FS[2]="/mnt_dg0201" FS[3]="/mnt_dg0202" FS_MOUNT_OPT[0]="-o FS_MOUNT_OPT[1]="-o FS_MOUNT_OPT[2]="-o FS_MOUNT_OPT[3]="-o ro" rw" ro" rw" 4. Be sure to copy from the old script any user-specific code that may have been added, including environment variables and customer defined functions. 5. Distribute the new package control scripts to all nodes in the cluster. 6.
PAGE 419
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM Customizing Packages for CVM After creating the VxVM disk group, you need to customize the Serviceguard package that will access the storage. Use the following procedure if you will be using the disk groups with the Cluster Volume Manager (CVM). If you are using the VERITAS Volume Manager (VxVM), use the procedure in the previous section. 1. Rename the old package control script as follows: # mv Package.ctl Package.ctl.bak 2.
PAGE 420
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM FS[0]="/mnt_dg0101" FS[1]="/mnt_dg0102" FS[2]="/mnt_dg0201" FS[3]="/mnt_dg0202" FS_MOUNT_OPT[0]="-o FS_MOUNT_OPT[1]="-o FS_MOUNT_OPT[2]="-o FS_MOUNT_OPT[3]="-o ro" rw" ro" rw" 4. Be sure to copy from the old script any user-specific code that may have been added, including environment variables and customer defined functions. 5.
PAGE 421
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM 11. When CVM starts up, it selects a master node, and this is the node from which you must issue the disk group configuration commands. To determine the master node, issue the following command from each node in the cluster: # vxdctl -c mode One node will identify itself as the master. 12. Make the disk group visible to the other nodes in the cluster by issuing the following command on the master node: # vxdg -s import DiskGroupName 13.
PAGE 422
Migrating from LVM to VxVM Data Storage Removing LVM Volume Groups Removing LVM Volume Groups After testing the new VxVM disk groups, remove any LVM volume groups that are no longer wanted from the system using the standard LVM commands lvremove, pvremove, and vgremove. At a convenient time, you should also edit the cluster ASCII configuration file to remove the VOLUME_GROUP statements that refer to the LVM volume groups that are no longer used in the cluster.
PAGE 423
IPv6 Network Support H IPv6 Network Support This appendix describes some of the characteristics of IPv6 network addresses.
PAGE 424
IPv6 Network Support IPv6 Address Types IPv6 Address Types Several IPv6 types of addressing schemes are specified in the RFC 2373 (IPv6 Addressing Architecture). IPv6 addresses are 128-bit identifiers for interfaces and sets of interfaces. There are various address formats for IPv6 defined by the RFC 2373. IPv6 addresses are broadly classified as follows: The following table explains the three types of IPv6 address types: unicast, anycast, and multicast.
PAGE 425
IPv6 Network Support IPv6 Address Types multiple groups of 16-bits of zeros. The "::" can appear only once in an address and it can be used to compress the leading, trailing, or contiguous sixteen-bit zeroes in an address. Example: fec0:1:0:0:0:0:0:1234 can be represented as fec0:1::1234. • When dealing with a mixed environment of IPv4 and IPv6 nodes there is an alternative form of IPv6 address that will be used. It is x:x:x:x:x:x:d.d.d.
PAGE 426
IPv6 Network Support IPv6 Address Types Unicast Addresses IPv6 unicast addresses are classified into different types. They are global aggregatable unicast address, site-local address and link-local address. Typically a unicast address is logically divided as follows: Table H-2 n bits 128-n bits Subnet prefix Interface ID Interface identifiers in a IPv6 unicast address are used to identify the interfaces on a link. Interface identifiers are required to be unique on that link.
PAGE 427
IPv6 Network Support IPv6 Address Types IPv4 Mapped IPv6 Address There is a special type of IPv6 address that holds an embedded IPv4 address. This address is used to represent the addresses of IPv4-only nodes as IPv6 addresses. These addresses are used especially by applications that support both IPv6 and IPv4. These addresses are called as IPv4 Mapped IPv6 Addresses. The format of these address is as follows: Table H-4 80 bits 16 bits zeros 32 bits FFFF IPv4 address Example: ::ffff:192.168.0.
PAGE 428
IPv6 Network Support IPv6 Address Types Link-Local Addresses Link-local addresses have the following format: Table H-6 10 bits 1111111010 54 bits 0 64 bits interface ID Link-local address are supposed to be used for addressing nodes on a single link. Packets originating from or destined to a link-local address will not be forwarded by a router.
PAGE 429
IPv6 Network Support IPv6 Address Types “FF” at the beginning of the address identifies the address as a multicast address. The “flgs” field is a set of 4 flags “000T”. The higher order 3 bits are reserved and must be zero. The last bit ‘T’ indicates whether it is permanently assigned or not. A value of zero indicates that it is permanently assigned otherwise it is a temporary assignment. The “scop” field is a 4-bit field which is used to limit the scope of the multicast group.
PAGE 430
IPv6 Network Support Network Configuration Restrictions Network Configuration Restrictions Serviceguard now supports IPv6 for data links only. The heartbeat IP must still be IPv4, but the package IPs can be IPv4 or IPv6. To configure IPv6, the system should be set up in what is called a dual-stack configuration which requires the IPv6 product bundle (IPv6NCF11i B.11.11.0109.5C) installed.
PAGE 431
IPv6 Network Support Network Configuration Restrictions NOTE Even though link-local IP addresses are not supported in the Serviceguard cluster configuration, the primary link-local address on the Serviceguard primary interface will be switched over the standby during a local switch. This is because of two requirements: First, the dual stack (IPv4/IPv6) kernel requires that the primary IP address associated with an interface must always be a link-local address.
PAGE 432
IPv6 Network Support Network Configuration Restrictions If the result is 1, the feature is turned on. If the result is 0, the feature is turned off. To temporarily change the state of DAD on your computer, use the ndd -set command to change the kernel parameter. # ndd -set /dev/ip6 ip6_nd_dad_solicit_count n where n is a number: either 1 to turn the feature on, or 0 to turn it off.
PAGE 433
IPv6 Network Support Local Primary/Standby LAN Patterns Local Primary/Standby LAN Patterns The use of IPv6 allows a number of different patterns of failover among LAN cards configured in the cluster. This is true because each LAN card can support several IP addresses when a dual IPv4/IPv6 configuration is used. This section describes several ways in that local failover to a standby LAN can be configured.
PAGE 434
IPv6 Network Support Example Configurations Example Configurations An example of a LAN configuration on a cluster node using both IPv4 and IPv6 addresses is shown in below. Figure H-1 Example 1: IPv4 and IPv6 Addresses in Standby Configuration Following the loss of lan0 or lan2, lan1 can adopt either address, as shown below. Figure H-2 Example 1: IPv4 and IPv6 Addresses after Failover to Standby The same LAN card can be configured with both IPv4 and IPv6 addresses, as shown in below.
PAGE 435
IPv6 Network Support Example Configurations Figure H-3 Example 2: IPv4 and IPv6 Addresses in Standby Configuration This type of configuration allows failover of both addresses to the standby. This is shown in below.
PAGE 436
IPv6 Network Support Duplicate Address Detection Feature Duplicate Address Detection Feature The IPv6 networking stack has a new feature, Duplicate Address Detection (DAD), that was not previously available in IPv4. When an address is being added, the DAD detects a duplicate address that is already being used on the network. It sends out a multicast message to the network neighborhood, and requires at least one second to listen for responses from other nodes.
PAGE 437
IPv6 Network Support Duplicate Address Detection Feature # NDD_VALUE[index]=n Where index is the next available integer value of the nddconf file, and n is a number: either 1 to turn the feature ON or 0 to turn it OFF.
PAGE 438
IPv6 Network Support Duplicate Address Detection Feature 438 Appendix H
PAGE 439
A Access Control Policies, 169 Access Control Policy, 153 Access roles, 153 active node, 25 adding a package to a running cluster, 311 adding cluster nodes advance planning, 194 adding nodes to a running cluster, 294 adding nodes while the cluster is running, 305 adding packages on a running cluster, 255 additional package resource parameter in package configuration, 167, 168 additional package resources monitoring, 81 addressing, SCSI, 132 administration adding nodes to a ruuning cluster, 294 cluster and
PAGE 440
verifying the cluster configuration, 225 VxVM infrastructure, 209 bus type hardware planning, 133 C changes in cluster membership, 64 changes to cluster allowed while the cluster is running, 302 changes to packages allowed while the cluster is running, 313 changing the volume group configuration while the cluster is running, 307 checkpoints, 371 client connections restoring in applications, 380 cluster configuring with commands, 215 MC/ServiceGuard, 24 redundancy of components, 36 typical configuration, 23
PAGE 441
in sample configuration file, 216 clusters active/standby type, 54 larger size, 54 cmapplyconf, 227, 273 cmcheckconf, 226, 272 troubleshooting, 339 cmdeleteconf deleting a package configuration, 312 deleting the cluster configuration, 241 cmmodnet assigning IP addresses in control scripts, 96 cmquerycl troubleshooting, 339 CONCURRENT_DISKGROUP_OPERATIO NS parameter in package control script, 173 CONCURRENT_FSCK_OPERATIONS parameter in package control script, 173 CONCURRENT_MOUNT_OPERATIONS parameter in pack
PAGE 442
protection through mirroring, 24 disk group planning, 144 disk group and disk planning, 144 disk I/O hardware planning, 133 disk layout planning, 141 disk logical units hardware planning, 133 disk management, 108 disk monitor, 46 disk monitor (EMS), 82 disk storage creating the infrastructure with CVM, 230 disks in MC/ServiceGuard, 44 replacing, 325 supported types in MC/ServiceGuard, 44 disks, mirroring, 45 distributing the cluster and package configuration, 272, 273 DNS services, 188 down time minimizing
PAGE 443
MC/ServiceGuard software components, 58 mirrored disks connected for high availability, 48 node 1 rejoining the cluster, 396 node 1 upgraded to HP-UX 10.
PAGE 444
heartbeat messages, 24 defined, 62 heartbeat subnet address parameter in cluster manager configuration, 149 HEARTBEAT_INTERVAL in sample configuration file, 216 HEARTBEAT_INTERVAL (heartbeat timeout) parameter in cluster manager configuration, 151 HEARTBEAT_IP in sample configuration file, 216 parameter in cluster manager configuration, 149 high availability, 24 HA cluster defined, 36 objectives in planning, 125 host IP address hardware planning, 129, 139, 140 host name hardware planning, 128 how the cluste
PAGE 445
cluster locks and power supplies, 53 use of the cluster lock disk, 66 use of the quorum server, 68 lock disk 4 or more nodes, 67 lock volume group identifying in configuration file, 221 planning, 147 lock volume group, reconfiguring, 303 logical volumes blank planning worksheet, 407 creating for a cluster, 202, 211, 233 creating the infrastructure, 199, 209 planning, 141 worksheet, 142, 144 lssf using to obtain a list of disks, 200 LV in sample package control script, 258 lvextend creating a root mirror wit
PAGE 446
failure, 100 load sharing with IP addresses, 97 local interface switching, 99 local switching, 100 redundancy, 38, 42 remote system switching, 102 network communication failure, 121 network components in MC/ServiceGuard, 38 Network Failure Detection parameter, 97 network manager adding and deleting package IP addresses, 97 main functions, 96 monitoring LAN interfaces, 97 testing, 321 network planning subnet, 129, 139 network polling interval (NETWORK_POLLING_INTERVAL) parameter in cluster manager configura
PAGE 447
reconfiguring with the cluster offline, 310 remote switching, 102 starting, 297 toolkits for databases, 363 package administration, 297 solving problems, 342 package administration access, 169 package and cluster maintenance, 275 package configuration additional package resource parameter, 167, 168 automatic switching parameter, 162 control script pathname parameter, 162 distributing the configuration file, 272, 273 failback policy parameter, 162 failover policy parameter, 161 in SAM, 244 local switching p
PAGE 448
disk I/O information, 133 for expansion, 160 hardware configuration, 127 high availability objectives, 125 LAN information, 128 overview, 123 package configuration, 157 power, 136 quorum server, 139 SCSI addresses, 132 SPU information, 128 volume groups and physical volumes, 141 worksheets, 134 worksheets for physical volume planning, 404 planning and documenting an HA cluster, 123 planning for cluster expansion, 126 planning worksheets blanks, 399 point of failure in networking, 38, 42 point to point con
PAGE 449
for heartbeat, 24 re-formation of cluster, 64 re-formation time, 147 relocatable IP address defined, 96 relocatable IP addresses in MC/ServiceGuard packages, 96 remote switching, 102 removing MC/ServiceGuard from a system, 317 removing nodes from operation in a running cluster, 294 removing packages on a running cluster, 255 replacing disks, 325 Resource Name parameter in package configuration, 167, 168 resource polling interval parameter in package configuration, 168 resource up interval parameter in pac
PAGE 450
using for heartbeats, 131 SERIAL_DEVICE_FILE(RS232) parameter in cluster manager configuration, 151 service administration, 297 service command variable in package control script, 175, 176 service configuration step by step, 243 service fail fast parameter in package configuration, 166 service failures responses, 120 service halt timeout parameter in package configuration, 167 service name, 174 parameter in package configuration, 166 variable in package control script, 174 service restart parameter variable
PAGE 451
subnet hardware planning, 129, 139 parameter in package configuration, 167 supported disks in MC/ServiceGuard, 44 supported networks in MC/ServiceGuard, 38 switching ARP messages after switching, 103 local interface switching, 99 remote system switching, 102 switching IP addresses, 73, 74, 103 system log file troubleshooting, 336 system message changing for clusters, 239 system multi-node package, 71 used with CVM, 231 T tasks in MC/ServiceGuard configuration figure, 33 template ASCII cluster configuration
PAGE 452
VxVM, 113 VOLUME_GROUP in sample configuration file, 216 parameter in cluster manager configuration, 153 VxVM, 112, 113 creating a storage infrastructure, 209 migrating from LVM to VxVM, 413 planning, 144 VXVM_DG in package control script, 258 VxVM-CVM package, 71 VxVM-CVM-pkg, 231 W What is MC/ServiceGuard?, 24 worksheet cluster configuration, 155 hardware configuration, 134 package configuration data, 169 package control script, 176 power supply configuration, 137 quorum server configuration, 140 use in p