Managing Serviceguard Fifteenth Edition Manufacturing Part Number: B3936-90135 Reprinted May 2008
Legal Notices © Copyright 1995-2007 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Contents 1. Serviceguard at a Glance What is Serviceguard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About Veritas CFS and CVM from Symantec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Serviceguard Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Serviceguard Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Serviceguard Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 How the Cluster Manager Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Configuration of the Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Heartbeat Messages .
Contents Types of Redundant Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About Device File Names (Device Special Files). . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Mirrored Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Storage on Disk Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Volume Manager . . . . . . . . . . . . . . . . . . . . .
Contents Cluster Configuration Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Heartbeat Subnet and Re-formation Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Cluster Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Cluster Configuration: Next Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Package Configuration Planning . . . . . . . . .
Contents Specifying a Quorum Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Obtaining Cross-Subnet Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Identifying Heartbeat Subnets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Specifying Maximum Number of Configured Packages . . . . . . . . . . . . . . . . . . . . . 237 Modifying Cluster Timing Parameters . . . . . . . . . . . . . .
Contents How Control Scripts Manage VxVM Disk Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Configuring Veritas System Multi-node Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Configuring Veritas Multi-node Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 7. Cluster and Package Maintenance Reviewing Cluster and Package Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Responding to Cluster Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single-Node Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disabling Serviceguard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Removing Serviceguard from a System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 402 403 404 8.
Contents Reviewing the Package Control Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the cmcheckconf Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the cmviewconf Command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reviewing the LAN Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solving Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Assign Unique Names to Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Use uname(2) With Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bind to a Fixed Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bind to Relocatable IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Give Each Application its Own Volume Group . . . .
Contents Step 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Step 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Network Configuration Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IPv6 Relocatable Address and Duplicate Address Detection Feature . . . . . . . . . . . . Local Primary/Standby LAN Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 530 532 533 I.
Contents 14
Tables Table 1. Printing History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 Table 3-1. Package Configuration Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79 Table 3-2. Node Lists in Sample Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82 Table 3-3. Error Conditions and Package Movement for Failover Packages . . . . . . .96 Table 3-4. Pros and Cons of Volume Managers with Serviceguard . . . .
Tables 16
Printing History Table 1 Printing History Printing Date Part Number Edition January 1995 B3936-90001 First June 1995 B3936-90003 Second December 1995 B3936-90005 Third August 1997 B3936-90019 Fourth January 1998 B3936-90024 Fifth October 1998 B3936-90026 Sixth December 2000 B3936-90045 Seventh September 2001 B3936-90053 Eighth March 2002 B3936-90065 Ninth June 2003 B3936-90070 Tenth June 2004 B3936-90076 Eleventh June 2005 B3936-90076 Eleventh, First reprint October 2
Table 1 Printing History (Continued) Printing Date May 2008 Part Number B3936-90135 Edition Fifteenth, First Reprint The last printing date and part number indicate the current edition, which applies to Serviceguard version A.11.18. See the latest edition of the Release Notes for a summary of changes in that release. HP Printing Division: Infrastructure Solutions Division Hewlett-Packard Co. 19111 Pruneridge Ave.
Preface This fourteenth printing of the manual applies to Serviceguard Version A.11.18. Earlier versions are available at http://www.docs.hp.com -> High Availability -> Serviceguard. This guide describes how to configure Serviceguard to run on HP 9000 or HP Integrity servers under the HP-UX operating system. The contents are as follows: • Chapter 1, “Serviceguard at a Glance,” describes a Serviceguard cluster and provides a roadmap for using this guide.
Related Publications • Appendix C, “Designing Highly Available Cluster Applications,” gives guidelines for creating cluster-aware applications that provide optimal performance in a Serviceguard environment. • Appendix D, “Integrating HA Applications with Serviceguard,” presents suggestions for integrating your existing applications with Serviceguard. • Appendix E, “Software Upgrades,” shows how to move from one Serviceguard or HP-UX release to another without bringing down your applications.
— Managing HP Serviceguard for Linux • Documentation for your version of Veritas storage products from http://www.docs.hp.com -> High Availability -> HP Serviceguard Storage Management Suite — For Veritas Volume Manager (VxVM) storage with Serviceguard, go to http://docs.hp.com. From the heading Operating Environments, choose 11i v3. Then, scroll down to the section Veritas Volume Manager and File System.
• From http://www.docs.hp.com -> High Availability -> Continentalcluster: — Understanding and Designing Serviceguard Disaster Tolerant Architectures — Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • From http://www.docs.hp.com -> High Availability -> HP Serviceguard Extension for Faster Failover, the latest edition of: — HP Serviceguard Extension for Faster Failover, Version A.01.00, Release Notes • From http://www.docs.hp.
• From http://www.docs.hp.com -> Network and Systems Management -> System Administration: — Distributed Systems Administration Utilities Release Notes — Distributed Systems Administration Utilities User’s Guide • From http://www.docs.hp.
Serviceguard at a Glance 1 Serviceguard at a Glance This chapter introduces Serviceguard on HP-UX, and shows where to find different kinds of information in this book. The following topics are presented: • What is Serviceguard? • Using Serviceguard Manager • A Roadmap for Configuring Clusters and Packages If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4, “Planning and Documenting an HA Cluster,” on page 131.
Serviceguard at a Glance What is Serviceguard? What is Serviceguard? Serviceguard allows you to create high availability clusters of HP 9000 or HP Integrity servers (or a mixture of both; see the release notes for your version for details and restrictions). A high availability computer system allows application services to continue in spite of a hardware or software failure.
Serviceguard at a Glance What is Serviceguard? A multi-node package can be configured to run on one or more cluster nodes. It is considered UP as long as it is running on any of its configured nodes. In Figure 1-1, node 1 (one of two SPU's) is running failover package A, and node 2 is running package B. Each package has a separate group of disks associated with it, containing data needed by the package's applications, and a mirror copy of the data.
Serviceguard at a Glance What is Serviceguard? Figure 1-2 Typical Cluster After Failover After this transfer, the failover package typically remains on the adoptive node as long the adoptive node continues running. If you wish, however, you can configure the package to return to its primary node as soon as the primary node comes back online. Alternatively, you may manually transfer control of the package back to the primary node at the appropriate time.
Serviceguard at a Glance What is Serviceguard? • Mirrordisk/UX or Veritas Volume Manager, which provide disk redundancy to eliminate single points of failure in the disk subsystem; • Event Monitoring Service (EMS), which lets you monitor and detect failures that are not directly handled by Serviceguard; • disk arrays, which use various RAID levels for data protection; • HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, which eliminates failures related to power outage.
Serviceguard at a Glance Using Serviceguard Manager Using Serviceguard Manager Serviceguard Manager is the graphical user interface for Serviceguard. It is available as a “plug-in” to the System Management Homepage (SMH). SMH is a web-based graphical user interface (GUI) that replaces SAM as the system administration GUI as of HP-UX 11i v3 (but you can still run the SAM terminal interface; see “Using SAM” on page 32).
Serviceguard at a Glance Using Serviceguard Manager Administering Clusters with Serviceguard Manager Serviceguard Manager allows you administer clusters, nodes, and packages if access control policies permit: • Cluster: halt, run • Cluster nodes: halt, run • Package: halt, run, move from one node to another, reset node- and package-switching flags Configuring Clusters with Serviceguard Manager You can configure clusters and legacy packages in Serviceguard Manager; modular packages must be configured
Serviceguard at a Glance Using SAM Using SAM You can use SAM, the System Administration Manager, to do many of the HP-UX system administration tasks described in this manual (that is, tasks, such as configuring disks and filesystems, that are not specifically Serviceguard tasks). To launch SAM, enter /usr/sbin/sam on the command line. As of HP-UX 11i v3, SAM offers a Terminal User Interface (TUI) which also acts as a gateway to the web-based System Management Homepage (SMH).
Serviceguard at a Glance What are the Distributed Systems Administration Utilities? What are the Distributed Systems Administration Utilities? HP Distributed Systems Administration Utilities (DSAU) simplify the task of managing multiple systems, including Serviceguard clusters.
Serviceguard at a Glance A Roadmap for Configuring Clusters and Packages A Roadmap for Configuring Clusters and Packages This manual presents the tasks you need to perform in order to create a functioning HA cluster using Serviceguard. These tasks are shown in Figure 1-3. Figure 1-3 Tasks in Configuring a Serviceguard Cluster The tasks in Figure 1-3 are covered in step-by-step detail in chapters 4 through 7. HP recommends you gather all the data that is needed for configuration before you start.
Understanding Serviceguard Hardware Configurations 2 Understanding Serviceguard Hardware Configurations This chapter gives a broad overview of how the Serviceguard hardware components work. The following topics are presented: • Redundancy of Cluster Components • Redundant Network Components • Redundant Disk Storage • Redundant Power Supplies • Larger Clusters Refer to the next chapter for information about Serviceguard software components.
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Redundancy of Cluster Components In order to provide a high level of availability, a typical cluster uses redundant system components, for example two or more SPUs and two or more independent disks. This redundancy eliminates single points of failure. In general, the more redundancy, the greater your access to applications, data, and supportive services in the event of a failure.
Understanding Serviceguard Hardware Configurations Redundancy of Cluster Components Note that a package that does not access data from a disk on a shared bus can be configured to fail over to as many nodes as you have configured in the cluster (regardless of disk technology). For instance, if a package only runs local executables, it can be configured to failover to all nodes in the cluster that have local copies of those executables, regardless of the type of disk connectivity.
Understanding Serviceguard Hardware Configurations Redundant Network Components Redundant Network Components To eliminate single points of failure for networking, each subnet accessed by a cluster node is required to have redundant network interfaces. Redundant cables are also needed to protect against cable failures. Each interface card is connected to a different cable, and the cables themselves are connected by a component such as a hub or a bridge.
Understanding Serviceguard Hardware Configurations Redundant Network Components Rules and Restrictions • A single subnet cannot be configured on different network interfaces (NICs) on the same node. • For IPv4 subnets, Serviceguard does not support different subnets on the same LAN interface. — For IPv6, Serviceguard supports up to two subnets per LAN interface (site-local and global).
Understanding Serviceguard Hardware Configurations Redundant Network Components Redundant Ethernet Configuration The use of redundant network components is shown in Figure 2-1, which is an Ethernet configuration. Figure 2-1 Redundant LANs In the figure, a two-node Serviceguard cluster has one bridged net configured with both a primary and a standby LAN card for the data/heartbeat subnet (subnetA). Another LAN card provides an optional dedicated heartbeat LAN.
Understanding Serviceguard Hardware Configurations Redundant Network Components NOTE You should verify that network traffic is not too heavy on the heartbeat/ data LAN. If traffic is too heavy, this LAN might not perform adequately in transmitting heartbeats if the dedicated heartbeat LAN fails. Cross-Subnet Configurations As of Serviceguard A.11.
Understanding Serviceguard Hardware Configurations Redundant Network Components Restrictions The following restrictions apply: • All nodes in the cluster must belong to the same network domain (that is, the domain portion of the fully-qualified domain name must be the same). • The nodes must be fully connected at the IP level. • A minimum of two heartbeat paths must be configured for each cluster node. • There must be less than 200 milliseconds of latency in the heartbeat network.
Understanding Serviceguard Hardware Configurations Redundant Network Components • If a monitored_subnet is configured for PARTIAL monitored_subnet_access in a package’s configuration file, it must be configured on at least one of the nodes on the node_name list for that package. Conversely, if all of the subnets that are being monitored for this package are configured for PARTIAL access, each node on the node_name list must have at least one of these subnets configured.
Understanding Serviceguard Hardware Configurations Redundant Network Components Replacing Failed Network Cards Depending on the system configuration, it is possible to replace failed network cards while the cluster is running. The process is described under “Replacement of LAN Cards” in the chapter “Troubleshooting Your Cluster.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Redundant Disk Storage Each node in a cluster has its own root disk, but each node is also physically connected to several other disks in such a way that more than one node can obtain access to the data and programs associated with a package it is configured for. This access is provided by a Storage Manager, such as Logical Volume Manager (LVM), or Veritas Volume Manager (VxVM) (or Veritas Cluster Volume Manager (CVM).
Understanding Serviceguard Hardware Configurations Redundant Disk Storage When planning and assigning SCSI bus priority, remember that one node can dominate a bus shared by multiple nodes, depending on what SCSI addresses are assigned to the controller for each node on the shared bus. All SCSI addresses, including the addresses of all interface cards, must be unique for all devices on a shared bus.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage another node until the failing node is halted. Mirroring the root disk allows the system to continue normal operation when a root disk failure occurs. Disk Arrays using RAID Levels and Multiple Data Paths An alternate method of achieving protection for your data is to employ a disk array with hardware RAID levels that provide data redundancy, such as RAID Level 1 or RAID Level 5. The array provides data redundancy for the disks.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage set up to trigger a package failover or to report disk failure events to a Serviceguard, to another application, or by email. For more information, refer to the manual Using High Availability Monitors (B5736-90074), available at http://docs.hp.com -> High Availability -> Event Monitoring Service and HA Monitors -> Installation and User’s Guide.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Replacing Failed I/O Cards Depending on the system configuration, it is possible to replace failed disk I/O cards while the system remains online. The process is described under “Replacing I/O Cards” in the chapter “Troubleshooting Your Cluster.” Sample SCSI Disk Configurations Figure 2-2 shows a two node cluster. Each node has one root disk which is mirrored and one package for which it is the primary node.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-2 Mirrored Disks Connected for High Availability Figure 2-3 below shows a similar cluster with a disk array connected to each node on two I/O channels. See “About Multipathing” on page 47.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Figure 2-3 Cluster with High Availability Disk Array Details on logical volume configuration for Serviceguard are in the chapter “Building an HA Cluster Configuration.
Understanding Serviceguard Hardware Configurations Redundant Disk Storage Sample Fibre Channel Disk Configuration In Figure 2-4, the root disks are shown with simple mirroring, but the shared storage is now accessed via redundant Fibre Channel switches attached to a disk array. The cabling is set up so that each node is attached to both switches, and both switches are attached to the disk array with redundant links.
Understanding Serviceguard Hardware Configurations Redundant Power Supplies Redundant Power Supplies You can extend the availability of your hardware by providing battery backup to your nodes and disks. HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, can provide this protection from momentary power loss. Disks should be attached to power circuits in such a way that mirror copies are attached to different power sources.
Understanding Serviceguard Hardware Configurations Larger Clusters Larger Clusters You can create clusters of up to 16 nodes with Serviceguard. Clusters of up to 16 nodes may be built by connecting individual SPUs via Ethernet. The possibility of configuring a cluster consisting of 16 nodes does not mean that all types of cluster configuration behave in the same way in a 16-node configuration.
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-5 Eight-Node Active/Standby Cluster Point to Point Connections to Storage Devices Some storage devices allow point-to-point connection to a large number of host nodes without using a shared SCSI bus. An example is shown in Figure 2-11, a cluster consisting of eight nodes with a SCSI interconnect. The nodes access shared data on an XP or EMC disk array configured with 16 SCSI I/O ports.
Understanding Serviceguard Hardware Configurations Larger Clusters Figure 2-6 Eight-Node Cluster with XP or EMC Disk Array Fibre Channel switched configurations also are supported using either an arbitrated loop or fabric login topology. For additional information about supported cluster configurations, refer to the HP Unix Servers Configuration Guide, available through your HP representative.
Understanding Serviceguard Software Components 3 Understanding Serviceguard Software Components This chapter gives a broad overview of how the Serviceguard software components work.
Understanding Serviceguard Software Components Serviceguard Architecture Serviceguard Architecture The following figure shows the main software components used by Serviceguard. This chapter discusses these components in some detail. NOTE Veritas CFS may not yet be supported on the version of HP-UX you are running; see “About Veritas CFS and CVM from Symantec” on page 29.
Understanding Serviceguard Software Components Serviceguard Architecture Serviceguard Daemons Serviceguard uses the following daemons: • /usr/lbin/cmclconfd—Serviceguard Configuration Daemon • /usr/lbin/cmcld—Serviceguard Cluster Daemon • /usr/lbin/cmfileassistd—Serviceguard File Management daemon • /usr/lbin/cmlogd—Serviceguard Syslog Log Daemon • /usr/lbin/cmlvmd—Cluster Logical Volume Manager Daemon • /opt/cmom/lbin/cmomd—Cluster Object Manager Daemon • /usr/lbin/cmsnmpd—Cluster SNMP subage
Understanding Serviceguard Software Components Serviceguard Architecture Cluster Daemon: cmcld This daemon determines cluster membership by sending heartbeat messages to cmcld daemons on other nodes in the Serviceguard cluster. It runs at a real time priority and is locked in memory. The cmcld daemon sets a safety timer in the kernel which is used to detect kernel hangs.
Understanding Serviceguard Software Components Serviceguard Architecture Syslog Log Daemon: cmlogd cmlogd is used by cmcld to write messages to syslog. Any message written to syslog by cmcld it written through cmlogd. This is to prevent any delays in writing to syslog from impacting the timing of cmcld. Cluster Logical Volume Manager Daemon: cmlvmd This daemon is responsible for keeping track of all the volume group(s) that have been made cluster aware.
Understanding Serviceguard Software Components Serviceguard Architecture You must also edit /etc/rc.config.d/cmsnmpagt to auto-start cmsnmpd. Configure cmsnmpd to start before the Serviceguard cluster comes up. For more information, see the cmsnmpd (1m) manpage. Service Assistant Daemon: cmsrvassistd This daemon forks and execs any script or processes as required by the cluster daemon, cmcld.
Understanding Serviceguard Software Components Serviceguard Architecture Network Manager Daemon: cmnetd This daemon monitors the health of cluster networks, and performs local LAN failover. It also handles the addition and deletion of relocatable package IP addresses for both IPv4 and IPv6. Lock LUN Daemon: cmldisklockd If a lock LUN is being used, cmdisklockd runs on each node in the cluster and is started by cmcld when the node joins the cluster.
Understanding Serviceguard Software Components Serviceguard Architecture 64 • vxfend - When Veritas CFS is deployed as part of the Serviceguard Storage Management Suite, the I/O fencing daemon vxfend is also included. It implements a quorum-type functionality for the Veritas Cluster File System. vxfend is controlled by Serviceguard to synchronize quorum mechanisms.
Understanding Serviceguard Software Components How the Cluster Manager Works How the Cluster Manager Works The cluster manager is used to initialize a cluster, to monitor the health of the cluster, to recognize node failure if it should occur, and to regulate the re-formation of the cluster when a node joins or leaves the cluster. The cluster manager operates as a daemon process that runs on each node.
Understanding Serviceguard Software Components How the Cluster Manager Works (described further in this chapter, in “How the Package Manager Works” on page 74). Failover packages that were running on nodes that are no longer in the new cluster are transferred to their adoptive nodes. Note that if there is a transitory loss of heartbeat, the cluster may re-form with the same nodes as before.
Understanding Serviceguard Software Components How the Cluster Manager Works Each node sends its heartbeat message at a rate specified by the cluster heartbeat interval. The cluster heartbeat interval is set in the cluster configuration file, which you create as a part of cluster configuration, described fully in Chapter 5, “Building an HA Cluster Configuration,” on page 197. Manual Startup of Entire Cluster A manual startup forms a cluster out of all the nodes in the cluster configuration.
Understanding Serviceguard Software Components How the Cluster Manager Works • An SPU or network failure was detected on an active node. • An inactive node wants to join the cluster. The cluster manager daemon has been started on that node. • A node has been added to or deleted from the cluster configuration. • The system administrator halted a node. • A node halts because of a package failure. • A node halts because of a service failure.
Understanding Serviceguard Software Components How the Cluster Manager Works The cluster lock is used as a tie-breaker only for situations in which a running cluster fails and, as Serviceguard attempts to form a new cluster, the cluster is split into two sub-clusters of equal size. Each sub-cluster will attempt to acquire the cluster lock. The sub-cluster which gets the cluster lock will form the new cluster, preventing the possibility of two sub-clusters running at the same time.
Understanding Serviceguard Software Components How the Cluster Manager Works When a node obtains the cluster lock, this area is marked so that other nodes will recognize the lock as “taken.” The operation of the lock disk or lock LUN is shown in Figure 3-2. Figure 3-2 Lock Disk or Lock LUN Operation Serviceguard periodically checks the health of the lock disk or LUN and writes messages to the syslog file if the device fails the health check.
Understanding Serviceguard Software Components How the Cluster Manager Works Single Lock Disk or LUN A single lock disk or lock LUN should be configured on a power circuit separate from that of any node in the cluster. For example, using three power circuits for a two-node cluster is highly recommended, with a separately powered disk or LUN for the cluster lock. In two-node clusters, this single lock device must not share a power circuit with either node, and a lock disk must be an external disk.
Understanding Serviceguard Software Components How the Cluster Manager Works a single lock disk. Thus, the only recommended usage of the dual cluster lock is when the single cluster lock cannot be isolated at the time of a failure from exactly one half of the cluster nodes. If one of the dual lock disks fails, Serviceguard will detect this when it carries out periodic checking, and it will write a message to the syslog file.
Understanding Serviceguard Software Components How the Cluster Manager Works Figure 3-3 Quorum Server Operation The quorum server runs on a separate system, and can provide quorum services for multiple clusters. No Cluster Lock Normally, you should not configure a cluster of three or fewer nodes without a cluster lock. In two-node clusters, a cluster lock is required.
Understanding Serviceguard Software Components How the Package Manager Works How the Package Manager Works Packages are the means by which Serviceguard starts and halts configured applications. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator is known as the package coordinator.
Understanding Serviceguard Software Components How the Package Manager Works Failover Packages A failover package starts up on an appropriate node (see node_name on page 288) when the cluster starts. A package failover takes place when the package coordinator initiates the start of a package on a new node. A package failover involves both halting the existing package (in the case of a service, network, or resource failure), and starting the new instance of the package.
Understanding Serviceguard Software Components How the Package Manager Works Deciding When and Where to Run and Halt Failover Packages The package configuration file assigns a name to the package and includes a list of the nodes on which the package can run. Failover packages list the nodes in order of priority (i.e., the first node in the list is the highest priority node). In addition, failover packages’ files contain three parameters that determine failover behavior.
Understanding Serviceguard Software Components How the Package Manager Works started. (In a cross-subnet configuration, all the monitored subnets that are specified for this package, and configured on the target node, must be up.) If the package has a dependency on a resource or another package, the dependency must be met on the target node before the package can start. The switching of relocatable IP addresses on a single subnet is shown in Figure 3-5 and Figure 3-6.
Understanding Serviceguard Software Components How the Package Manager Works NOTE For design and configuration information about site-aware disaster-tolerant clusters (which span subnets), see the documents listed under “Cross-Subnet Configurations” on page 41.
Understanding Serviceguard Software Components How the Package Manager Works If you use configured_node as the value for the failover policy, the package will start up on the highest priority node available in the node list. When a failover occurs, the package will move to the next highest priority node in the list that is available. If you use min_package_node as the value for the failover policy, the package will start up on the node that is currently running the fewest other packages.
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-7 Rotating Standby Configuration before Failover If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8, which shows a failure on node 2: Figure 3-8 Rotating Standby Configuration after Failover NOTE Using the min_package_node policy, when node 2 is repaired and brought back into the cluster, it will then be running the fewest packages, and thus will become th
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-9 CONFIGURED_NODE Policy Packages after Failover If you use configured_node as the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.
Understanding Serviceguard Software Components How the Package Manager Works Figure 3-10 Automatic Failback Configuration before Failover Table 3-2 Node Lists in Sample Cluster Package Name NODE_NAME List FAILOVER POLICY FAILBACK POLICY pkgA node1, node4 CONFIGURED_NODE AUTOMATIC pkgB node2, node4 CONFIGURED_NODE AUTOMATIC pkgC node3, node4 CONFIGURED_NODE AUTOMATIC Node1 panics, and after the cluster reforms, pkgA starts running on node4: Figure 3-11 82 Automatic Failback Configuration
Understanding Serviceguard Software Components How the Package Manager Works After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1. Figure 3-12 Automatic Failback Configuration After Restart of Node 1 NOTE Setting the failback_policy to automatic can result in a package failback and application outage during a critical production period.
Understanding Serviceguard Software Components How the Package Manager Works Using the Event Monitoring Service Basic package resources include cluster nodes, LAN interfaces, and services, which are the individual processes within an application. All of these are monitored by Serviceguard directly. In addition, you can use the Event Monitoring Service registry through which add-on monitors can be configured.
Understanding Serviceguard Software Components How the Package Manager Works Once a monitor is configured as a package resource dependency, the monitor will notify the package manager if an event occurs showing that a resource is down. The package may then be failed over to an adoptive node. The EMS HA Monitors can also be used to report monitored events to a target application such as OpenView IT/Operations for graphical display or for operator notification.
Understanding Serviceguard Software Components How Packages Run How Packages Run Packages are the means by which Serviceguard starts and halts configured applications. Failover packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 150 packages per cluster and a total of 900 services per cluster.
Understanding Serviceguard Software Components How Packages Run package, that node switching is disabled for the package on particular nodes, or that the package has a dependency that is not being met. When a package has failed on one node and is enabled to switch to another node, it will start up automatically in a new location where its dependencies are met. This process is known as package switching, or remote switching.
Understanding Serviceguard Software Components How Packages Run Figure 3-13 Legacy Package Time Line Showing Important Events The following are the most important moments in a package’s life: 1. Before the control script starts. (For modular packages, this is the master control script.) 2. During run script execution. (For modular packages, during control script execution to start the package.) 3. While services are running 4.
Understanding Serviceguard Software Components How Packages Run Before the Control Script Starts First, a node is selected. This node must be in the package’s node list, it must conform to the package’s failover policy, and any resources required by the package must be available on the chosen node. One resource is the subnet that is monitored for the package. If the subnet is not available, the package cannot start on this node.
Understanding Serviceguard Software Components How Packages Run Figure 3-14 Package Time Line (Legacy Package) At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error. NOTE This diagram is specific to legacy packages. Modular packages also run external scripts and “pre-scripts” as explained above.
Understanding Serviceguard Software Components How Packages Run NOTE After the package run script has finished its work, it exits, which means that the script is no longer executing once the package is running normally. After the script exits, the PIDs of the services started by the script are monitored by the package manager directly.
Understanding Serviceguard Software Components How Packages Run Service Startup with cmrunserv Within the package control script, the cmrunserv command starts up the individual services. This command is executed once for each service that is coded in the file. You can configure a number of restarts for each service. The cmrunserv command passes this number to the package manager, which will restart the service the appropriate number of times if the service should fail.
Understanding Serviceguard Software Components How Packages Run During normal operation, while all services are running, you can see the status of the services in the “Script Parameters” section of the output of the cmviewcl command.
Understanding Serviceguard Software Components How Packages Run You cannot halt a multi-node or system multi-node package unless all packages that have a configured dependency on it are down. Use cmviewcl to check the status of dependents. For example, if pkg1 and pkg2 depend on PKGa, both pkg1 and pkg2 must be halted before you can halt PKGa. NOTE If you use cmhaltpkg command with the -n option, the package is halted only if it is running on that node.
Understanding Serviceguard Software Components How Packages Run Figure 3-15 Legacy Package Time Line for Halt Script Execution At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). Also, if the halt script execution is not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script. During halt script execution, messages are written to a log file.
Understanding Serviceguard Software Components How Packages Run • 1—abnormal exit, also known as no_restart exit. The package did not halt normally. Services are killed, and the package is disabled globally. It is not disabled on the current node, however. • Timeout—Another type of exit occurs when the halt_script_timeout is exceeded. In this scenario, the package is killed and disabled globally. It is not disabled on the current node, however.
Understanding Serviceguard Software Components How Packages Run Table 3-3 Error Conditions and Package Movement for Failover Packages Package Error Condition Results Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Node Failfast Enabled Service Failfast Enabled HP-UX Status on Primary after Error Run Script Exit 2 YES Either Setting system reset No N/A (system reset) Yes Run Script Exit 2 NO Either Setting Running No No Yes Run Script Timeout YES
Understanding Serviceguard Software Components How Packages Run Table 3-3 Error Conditions and Package Movement for Failover Packages Package Error Condition Results Halt script runs after Error or Exit Package Allowed to Run on Primary Node after Error Node Failfast Enabled Service Failfast Enabled HP-UX Status on Primary after Error Service Failure Either Setting NO Running Yes No Yes Loss of Network YES Either Setting system reset No N/A (system reset) Yes Loss of Network NO Eithe
Understanding Serviceguard Software Components How the Network Manager Works How the Network Manager Works The purpose of the network manager is to detect and recover from network card failures so that network services remain highly available to clients. In practice, this means assigning IP addresses for each package to the primary LAN interface card on the node where the package is running and monitoring the health of all interfaces, switching them when necessary.
Understanding Serviceguard Software Components How the Network Manager Works The IP addresses associated with a package are called relocatable IP addresses (also known as package IP addresses or floating IP addresses) because the addresses can actually move from one cluster node to another on the same subnet. You can use up to 200 relocatable IP addresses in a cluster, spread over as many as 150 packages. This can be a combination of IPv4 and IPv6 addresses.
Understanding Serviceguard Software Components How the Network Manager Works various combinations) can be defined as stationary IPs in a cluster. Both IPv4 and IPv6 addresses also can be used as relocatable (package) IP addresses. Adding and Deleting Relocatable IP Addresses When a package is started, a relocatable IP address can be added to a specified IP subnet. When the package is stopped, the relocatable IP address is deleted from the specified subnet.
Understanding Serviceguard Software Components How the Network Manager Works Whenever a LAN driver reports an error, Serviceguard immediately declares that the card is bad and performs a local switch, if applicable. For example, when the card fails to send, Serviceguard will immediately receive an error notification and it will mark the card as down. Serviceguard Network Manager also looks at the numerical counts of packets sent and received on an interface to determine if a card is having a problem.
Understanding Serviceguard Software Components How the Network Manager Works NOTE You can change the value of the NETWORK_FAILURE_DETECTION parameter while the cluster is up and running. Local Switching A local network switch involves the detection of a local network interface failure and a failover to the local backup LAN card (also known as the standby LAN card). The backup LAN card must not have any IP addresses configured.
Understanding Serviceguard Software Components How the Network Manager Works During the transfer, IP packets will be lost, but TCP (Transmission Control Protocol) will retransmit the packets. In the case of UDP (User Datagram Protocol), the packets will not be retransmitted automatically by the protocol. However, since UDP is an unreliable service, UDP applications should be prepared to handle the case of lost network packets and recover appropriately.
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-17 Cluster After Local Network Switching As the standby interface takes over, IP addresses will be switched to the hardware path associated with the standby interface. The switch is transparent at the TCP/IP level. All applications continue to run on their original nodes. During this time, IP traffic on Node 1 will be delayed as the transfer occurs.
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-18 Local Switching After Cable Failure Local network switching will work with a cluster containing one or more nodes. You may wish to design a single-node cluster in order to take advantage of this local network switching feature in situations where you need only one node and do not wish to set up a more complex cluster.
Understanding Serviceguard Software Components How the Network Manager Works Remote Switching A remote switch (that is, a package switch) involves moving packages to a new system. In the most common configuration, in which all nodes are on the same subnet(s), the package IP (relocatable IP; see “Stationary and Relocatable IP Addresses” on page 99) moves as well, and the new system must already have the subnet configured and working properly, otherwise the packages will not be started.
Understanding Serviceguard Software Components How the Network Manager Works change. Currently, the ARP messages are sent at the time the IP address is added to the new system. An ARP message is sent in the form of an ARP request. The sender and receiver protocol address fields of the ARP request message are both set to the same floating IP address. This ensures that nodes receiving the message will not send replies.
Understanding Serviceguard Software Components How the Network Manager Works Figure 3-19 Aggregated Networking Ports Both the Single and Dual ported LANs in the non-aggregated configuration have four LAN cards, each associated with a separate non-aggregated IP address and MAC address, and each with its own LAN name (lan0, lan1, lan2, lan3). When these ports are aggregated all four ports are associated with a single IP address and MAC address.
Understanding Serviceguard Software Components How the Network Manager Works For information about implementing APA with Serviceguard, see the latest version of the HP Auto Port Aggregation (APA) Support Guide and other APA documents posted at docs.hp.com in the IO Cards and Networking Software collection. VLAN Configurations Virtual LAN configuration using HP-UX VLAN software is supported in Serviceguard clusters.
Understanding Serviceguard Software Components How the Network Manager Works • A maximum of 30 network interfaces per node is supported. The interfaces can be physical NIC ports, VLAN interfaces, APA aggregates, or any combination of these. • Local failover of VLANs must be onto the same link types. For example, you must fail over from VLAN-over-Ethernet to VLAN-over-Ethernet. • The primary and standby VLANs must have same VLAN ID (or tag ID).
Understanding Serviceguard Software Components Volume Managers for Data Storage Volume Managers for Data Storage A volume manager is a tool that lets you create units of disk storage known as storage groups. Storage groups contain logical volumes for use on single systems and in high availability clusters. In Serviceguard clusters, storage groups are activated by package control scripts.
Understanding Serviceguard Software Components Volume Managers for Data Storage For instructions on migrating a system to agile addressing, see the white paper Migrating from HP-UX 11i v2 to HP-UX 11i v3 at http://docs.hp.com. NOTE It is possible, though not a best practice, to use legacy DSFs (that is, DSFs using the older naming convention) on some nodes after migrating to agile addressing on others; this allows you to migrate different nodes at different times, if necessary.
Understanding Serviceguard Software Components Volume Managers for Data Storage Examples of Mirrored Storage Figure 3-20 shows an illustration of mirrored storage using HA storage racks. In the example, node1 and node2 are cabled in a parallel configuration, each with redundant paths to two shared storage devices. Each of two nodes also has two (non-shared) internal disks which are used for the root file system, swap etc.
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-21 Mirrored Physical Disks Figure 3-22 shows the mirrors configured into LVM volume groups, shown in the figure as /dev/vgpkgA and /dev/vgpkgB. The volume groups are activated by Serviceguard packages for use by highly available applications.
Understanding Serviceguard Software Components Volume Managers for Data Storage Examples of Storage on Disk Arrays Figure 3-23 shows an illustration of storage configured on a disk array. Physical disks are configured by an array utility program into logical units or LUNs which are then seen by the operating system. Figure 3-23 Physical Disks Combined into LUNs NOTE LUN definition is normally done using utility programs provided by the disk array manufacturer.
Understanding Serviceguard Software Components Volume Managers for Data Storage Figure 3-24 Multiple Paths to LUNs Finally, the multiple paths are configured into volume groups as shown in Figure 3-25.
Understanding Serviceguard Software Components Volume Managers for Data Storage Types of Volume Manager Serviceguard allows a choice of volume managers for data storage: • HP-UX Logical Volume Manager (LVM) and (optionally) Mirrordisk/UX • Veritas Volume Manager for HP-UX (VxVM)—Base and add-on Products • Veritas Cluster Volume Manager for HP-UX Separate sections in Chapters 5 and 6 explain how to configure cluster storage using all of these volume managers.
Understanding Serviceguard Software Components Volume Managers for Data Storage Veritas Volume Manager (VxVM) The Base Veritas Volume Manager for HP-UX (Base-VxVM) is provided at no additional cost with HP-UX 11i. This includes basic volume manager features, including a Java-based GUI, known as VEA. It is possible to configure cluster storage for Serviceguard with only Base-VXVM. However, only a limited set of features is available.
Understanding Serviceguard Software Components Volume Managers for Data Storage Veritas Cluster Volume Manager (CVM) NOTE Check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information on CVM support: http://www.docs.hp.com -> High Availability - > Serviceguard. You may choose to configure cluster storage with the Veritas Cluster Volume Manager (CVM) instead of the Volume Manager (VxVM).
Understanding Serviceguard Software Components Volume Managers for Data Storage CVM can be used in clusters that: • run applications that require fast disk group activation after package failover; • require storage activation on more than one node at a time, for example to perform a backup from one node while a package using the volume is active on another node.
Understanding Serviceguard Software Components Volume Managers for Data Storage Heartbeat configurations are configured differently depending on whether you are using CVM 3.5, or 4.1 and later. You can create redundancy in the following ways: 1) dual (multiple) heartbeat networks 2) single heartbeat network with standby LAN card(s) 3) single heartbeat network with APA CVM 3.5 supports only options 2 and 3. Options 1 and 2 are the minimum recommended configurations for CVM 4.1 and later.
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-4 Pros and Cons of Volume Managers with Serviceguard Product Mirrordisk/UX Shared Logical Volume Manager (SLVM) Base-VxVM Chapter 3 Advantages • Software mirroring • Lower cost solution • Provided free with SGeRAC for multi-node access to RAC data • Supports up to 16 nodes in shared read/write mode for each cluster Tradeoffs • Lacks extended features of other volume managers • Lacks the flexibility and e
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-4 Pros and Cons of Volume Managers with Serviceguard Product Veritas Volume Manager— Full VxVM product B9116AA (VxVM 3.5) B9116BA (VxVM 4.1) B9116CA (VxVM 5.0) 124 Advantages Tradeoffs • Disk group configuration from any node. • Requires purchase of additional license • DMP for active/active storage devices. • Cannot be used for a cluster lock • Supports exclusive activation.
Understanding Serviceguard Software Components Volume Managers for Data Storage Table 3-4 Pros and Cons of Volume Managers with Serviceguard Product Veritas Cluster Volume Manager— B9117AA (CVM 3.5) B9117BA (CVM 4.1) B9117CA (CVM 5.0) Advantages • Provides volume configuration propagation. • Disk groups must be configured on a master node • Supports cluster shareable disk groups. • • Package startup time is faster than with VxVM. CVM can only be used with up to 8 cluster nodes.
Understanding Serviceguard Software Components Responses to Failures Responses to Failures Serviceguard responds to different kinds of failures in specific ways. For most hardware failures, the response is not user-configurable, but for package and service failures, you can choose the system’s response, within limits.
Understanding Serviceguard Software Components Responses to Failures 2. If the node cannot get a quorum (if it cannot get the cluster lock) then 3. The node halts (system reset). Example Situation. Assume a two-node cluster, with Package1 running on SystemA and Package2 running on SystemB. Volume group vg01 is exclusively activated on SystemA; volume group vg02 is exclusively activated on SystemB. Package IP addresses are assigned to SystemA and SystemB respectively. Failure.
Understanding Serviceguard Software Components Responses to Failures For more information on cluster failover, see the white paper Optimizing Failover Time in a Serviceguard Environment at http://www.docs.hp.com -> High Availability -> Serviceguard -> White Papers.
Understanding Serviceguard Software Components Responses to Failures Serviceguard does not respond directly to power failures, although a loss of power to an individual cluster component may appear to Serviceguard like the failure of that component, and will result in the appropriate switching behavior. Power protection is provided by HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust.
Understanding Serviceguard Software Components Responses to Failures NOTE In a very few cases, Serviceguard will attempt to reboot the system before a system reset when this behavior is specified. If there is enough time to flush the buffers in the buffer cache, the reboot succeeds, and a system reset does not take place. Either way, the system will be guaranteed to come down within a predetermined number of seconds.
Planning and Documenting an HA Cluster 4 Planning and Documenting an HA Cluster Building a Serviceguard cluster begins with a planning phase in which you gather and record information about all the hardware and software components of the configuration.
Planning and Documenting an HA Cluster General Planning General Planning A clear understanding of your high availability objectives will help you to define your hardware requirements and design your system. Use the following questions as a guide for general planning: 1. What applications must continue to be available in the event of a failure? 2. What system resources (processing power, networking, SPU, memory, disk space) are needed to support these applications? 3.
Planning and Documenting an HA Cluster General Planning additional disk hardware for shared data storage. If you intend to expand your cluster without the need to bring it down, careful planning of the initial configuration is required. Use the following guidelines: • Remember the rules for cluster locks when considering expansion. A one-node cluster does not require a cluster lock. A two-node cluster must have a cluster lock. In clusters larger than 3 nodes, a cluster lock is strongly recommended.
Planning and Documenting an HA Cluster Hardware Planning Hardware Planning Hardware planning requires examining the physical hardware itself. One useful procedure is to sketch the hardware configuration in a diagram that shows adapter cards and buses, cabling, disks and peripherals. A sample diagram for a two-node cluster is shown in Figure 4-1. NOTE Under agile addressing, the storage units in this example would have names such as disk1, disk2, disk3, etc.
Planning and Documenting an HA Cluster Hardware Planning Create a similar sketch for your own cluster, and record the information on the Hardware Worksheet (see “Hardware Configuration Worksheet” on page 141). Indicate which device adapters occupy which slots, and determine the bus address for each adapter. Update the details as you do the cluster configuration (described in Chapter 5). Use one form for each SPU.
Planning and Documenting an HA Cluster Hardware Planning Network Information Serviceguard monitors LAN interfaces. NOTE Serviceguard supports communication across routers between nodes in the same cluster; for more information, see the documents listed under “Cross-Subnet Configurations” on page 41.
Planning and Documenting an HA Cluster Hardware Planning IP Address Enter this node’s host IP address(es), to be used on this interface. If the interface is a standby and does not have an IP address, enter 'Standby.' An IPv4 address is a string of 4 digits separated with decimals, in this form: nnn.nnn.nnn.nnn An IPV6 address is a string of 8 hexadecimal values separated with colons, in this form: xxx:xxx:xxx:xxx:xxx:xxx:xxx:xxx.
Planning and Documenting an HA Cluster Hardware Planning Setting SCSI Addresses for the Largest Expected Cluster Size SCSI standards define priority according to SCSI address. To prevent controller starvation on the SPU, the SCSI interface cards must be configured at the highest priorities. Therefore, when configuring a highly available cluster, you should give nodes the highest priority SCSI addresses, and give disks addresses of lesser priority.
Planning and Documenting an HA Cluster Hardware Planning NOTE When a boot/root disk is configured with a low-priority address on a shared SCSI bus, a system panic can occur if there is a timeout on accessing the boot/root device. This can happen in a cluster when many nodes and many disks are configured on the same bus.
Planning and Documenting an HA Cluster Hardware Planning • • • • • • • • • • • diskinfo ioscan -fnC disk or ioscan -fnNC disk lssf /dev/*dsk/* bdf mount swapinfo vgdisplay -v lvdisplay -v lvlnboot -v vxdg list (VxVM and CVM) vxprint (VxVM and CVM) These are standard HP-UX commands. See their man pages for complete information about usage. The commands should be issued from all nodes after installing the hardware and rebooting the system.
Planning and Documenting an HA Cluster Hardware Planning Hardware Configuration Worksheet The following worksheet will help you organize and record your specific cluster hardware configuration. This worksheet is an example; blank worksheets are in Appendix F. Make as many copies as you need.
Planning and Documenting an HA Cluster Power Supply Planning Power Supply Planning There are two sources of power for your cluster which you will have to consider in your design: line power and uninterruptible power sources (UPS). Loss of a power circuit should not bring down the cluster. Frequently, servers, mass storage devices, and other hardware have two or three separate power supplies, so they can survive the loss of power to one or more power supplies or power circuits.
Planning and Documenting an HA Cluster Power Supply Planning Other Unit Enter the number of any other unit. Power Supply Enter the power supply unit number of the UPS to which the host or other device is connected. Be sure to follow UPS and cabinet power limits as well as SPU power limits.
Planning and Documenting an HA Cluster Power Supply Planning Power Supply Configuration Worksheet The following worksheet will help you organize and record your specific power supply configuration. This worksheet is an example; blank worksheets are in Appendix F. Make as many copies as you need.
Planning and Documenting an HA Cluster Cluster Lock Planning Cluster Lock Planning The purpose of the cluster lock is to ensure that only one new cluster is formed in the event that exactly half of the previously clustered nodes try to form a new cluster. It is critical that only one new cluster is formed and that it alone has access to the disks specified in its packages. You can specify an LVM lock disk, a lock LUN, or a quorum server as the cluster lock.
Planning and Documenting an HA Cluster Cluster Lock Planning Planning for Expansion Bear in mind that a cluster with more than 4 nodes cannot use a lock disk or lock LUN. Thus, if you plan to add enough nodes to bring the total to more than 4, you should use a quorum server. Using a Quorum Server The operation of Quorum Server is described under “Use of the Quorum Server as the Cluster Lock” on page 72. See also “Cluster Lock” on page 68.
Planning and Documenting an HA Cluster Cluster Lock Planning Quorum Server Worksheet You may find it useful to use the Quorum Server Worksheet that follows to identify a quorum server for use with one or more clusters. You may also want to enter quorum server host and timing parameters on the Cluster Configuration Worksheet. Blank worksheets are in Appendix F. On the QS worksheet, enter the following: Quorum Server Host Enter the host name (and alternate address, if any) for the quorum server.
Planning and Documenting an HA Cluster Cluster Lock Planning Quorum Server Data: ============================================================================== QS Hostname: __________IP Address: _______________IP Address:_______________ ============================================================================== Quorum Services are Provided for: Cluster Name: ___________________________________________________________ Host Names ____________________________________________ Host Names ___________________
Planning and Documenting an HA Cluster LVM Planning LVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM), or using Veritas VxVM and CVM software as described in the next section. When designing your disk layout using LVM, you should consider the following: Chapter 4 • The root disk should belong to its own volume group.
Planning and Documenting an HA Cluster LVM Planning IMPORTANT LVM2 volume groups are not supported for cluster lock disks. If you plan to use the EMS HA Disk Monitor, refer to the section on “Rules for Using EMS Disk Monitor with Serviceguard” in the manual Using High Availability Monitors (B5736-90074) at http://docs.hp.
Planning and Documenting an HA Cluster LVM Planning ============================================================================= Volume Group Name: __________/dev/vg01__________________________________ Name of First Physical Volume Group: _______bus0___________________________ Physical Volume Name: ____________/dev/dsk/c1t2d0__________________________ Physical Volume Name: ____________/dev/dsk/c2t2d0__________________________ Physical Volume Name: ____________/dev/dsk/c3t2d0_______________________
Planning and Documenting an HA Cluster CVM and VxVM Planning CVM and VxVM Planning You can create storage groups using the HP-UX Logical Volume Manager (LVM, described in the previous section), or using Veritas VxVM and CVM software. NOTE Check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information on CVM support: http://www.docs.hp.com -> High Availability -> Serviceguard.
Planning and Documenting an HA Cluster CVM and VxVM Planning Chapter 4 • A cluster lock disk must be configured into an LVM volume group; you cannot use a VxVM or CVM disk group. (See “Cluster Lock Planning” on page 145 for information about cluster lock options.) • VxVM disk group names should not be entered into the cluster configuration file. These names are not inserted into the cluster configuration file by cmquerycl.
Planning and Documenting an HA Cluster CVM and VxVM Planning CVM and VxVM Worksheet The following worksheet will help you organize and record your specific physical disk configuration. This worksheet is an example; blank worksheets are in Appendix F. Make as many copies as you need. Fill out the worksheet and keep it for future reference. This worksheet includes volume groups and physical volumes. If you are using the cluster file system, begin planning your multi-node packages here.
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Configuration Planning A cluster should be designed to provide the quickest possible recovery from failures. The actual time required to recover from a failure depends on several factors: • The length of the cluster heartbeat interval and node timeout. See the parameter descriptions for HEARTBEAT_INTERVAL and NODE_TIMEOUT under “Cluster Configuration Parameters” on page 156 for recommendations.
Planning and Documenting an HA Cluster Cluster Configuration Planning Heartbeat Subnet and Re-formation Time The speed of cluster re-formation is partially dependent on the type of heartbeat network that is used. If two or more heartbeat subnets are used, the one with the fastest failover time is used. NOTE For heartbeat requirements, see the discussion of the HEARTBEAT_IP parameter later in this chapter. Cluster Configuration Parameters You need to define a set of cluster parameters.
Planning and Documenting an HA Cluster Cluster Configuration Planning All other characters are legal. The cluster name can contain up to 39 characters (bytes). Make sure that the cluster name is unique within the subnets configured on the cluster nodes; under some circumstances Serviceguard may not be able to detect a duplicate name and unexpected problems may result.
Planning and Documenting an HA Cluster Cluster Configuration Planning QS_ADDR An alternate fully-qualified hostname or IP address for the quorum server. This parameter is used only if you use a quorum server and want to specify an address on an alternate subnet by which it can be reached. For more information, see “Using a Quorum Server” on page 146 and “Specifying a Quorum Server” on page 232. This parameter cannot be changed while the cluster is running; see the QS_HOST discussion above for details.
Planning and Documenting an HA Cluster Cluster Configuration Planning Lock volume groups must also be defined in VOLUME_GROUP parameters in the cluster configuration file. NOTE SITE_NAME The name of a site to which nodes (see NODE_NAME) belong. Can be used only in a site-aware extended-distance cluster, which requires additional software; see the documents listed under “Cross-Subnet Configurations” on page 41 for more information. You can define multiple SITE_NAMEs.
Planning and Documenting an HA Cluster Cluster Configuration Planning example, ftsys9 must appear in exactly that form in the cluster configuration and package configuration files, and as ftsys9.cup.hp.com in the DNS database). SITE The name of a site (defined by SITE_NAME) to which the node identified by the preceding NODE_NAME entry belongs.
Planning and Documenting an HA Cluster Cluster Configuration Planning Heartbeat configuration requirements: A minimum Serviceguard configuration on HP-UX 11i v2 or 11i v3 needs two network interface cards for the heartbeat in all cases, using one of the following configurations: NOTE • Two heartbeat subnets; or • One heartbeat subnet with a standby; or • One heartbeat subnet using APA with two physical ports in hot standby mode or LAN monitor mode.
Planning and Documenting an HA Cluster Cluster Configuration Planning separately to the heartbeat subnet on another node (that is, each heartbeat path must be physically separate). See “Cross-Subnet Configurations” on page 41. NOTE Because Veritas Cluster File System from Symantec (CFS) requires link-level traffic communication (LLT) among the nodes, Serviceguard cannot be configured in cross-subnet configurations with CFS alone.
Planning and Documenting an HA Cluster Cluster Configuration Planning The use of a private heartbeat network is not advisable if you plan to use Remote Procedure Call (RPC) protocols and services. RPC assumes that each network adapter device or I/O card is connected to a route-able network. An isolated or private heartbeat LAN is not route-able, and could cause an RPC request-reply, directed to that LAN, to risk time-out without being serviced.
Planning and Documenting an HA Cluster Cluster Configuration Planning For information about changing the configuration online, see “Changing the Cluster Networking Configuration while the Cluster Is Running” on page 367. CLUSTER_LOCK_LUN The path on this node for the LUN used for the cluster lock. Used only if a lock LUN is used for tie-breaking services. Enter the path as it appears on each node in the cluster (the same physical device may have a different name on each node).
Planning and Documenting an HA Cluster Cluster Configuration Planning HEARTBEAT_INTERVAL The normal interval, in microseconds, between the transmission of heartbeat messages from each node to the cluster coordinator. Default value is 1,000,000 microseconds; setting the parameter to a value less than the default is not recommended. The default should be used where possible. The maximum recommended value is 15 seconds and the maximum value supported is 30 seconds or half the NODE_TIMEOUT.
Planning and Documenting an HA Cluster Cluster Configuration Planning There are more complex cases that require you to make a trade-off between fewer failovers and faster failovers. For example, a network event such as a broadcast storm may cause kernel interrupts to be turned off on some or all nodes while the packets are being processed, preventing the nodes from sending and processing heartbeat messages. This in turn could prevent the kernel’s safety timer from being reset, causing a system reset.
Planning and Documenting an HA Cluster Cluster Configuration Planning MAX_CONFIGURED_PACKAGES This parameter sets the maximum number of packages that can be configured in the cluster. The minimum value is 0, and the maximum value is 150. The default value for Serviceguard is 150, and you can change it without halting the cluster. VOLUME_GROUP The name of an LVM volume group whose disks are attached to at least two nodes in the cluster. Such disks are considered cluster-aware.
Planning and Documenting an HA Cluster Cluster Configuration Planning Cluster Configuration: Next Step When you are ready to configure the cluster, proceed to “Configuring the Cluster” on page 228. If you find it useful to record your configuration ahead of time, use the worksheet in Appendix F.
Planning and Documenting an HA Cluster Package Configuration Planning Package Configuration Planning Planning for packages involves assembling information about each group of highly available services. NOTE As of Serviceguard A.11.18, there is a new and simpler way to configure packages.
Planning and Documenting an HA Cluster Package Configuration Planning Logical Volume and File System Planning NOTE LVM Volume groups that are to be activated by packages must also be defined as cluster-aware in the cluster configuration file. See “Cluster Configuration Planning” on page 155. Disk groups (for Veritas volume managers) that are to be activated by packages must be defined in the package configuration file, described below.
Planning and Documenting an HA Cluster Package Configuration Planning HP recommends that you use customized logical volume names that are different from the default logical volume names (lvol1, lvol2, etc.). Choosing logical volume names that represent the high availability applications that they are associated with (for example, lvoldatabase) will simplify cluster administration.
Planning and Documenting an HA Cluster Package Configuration Planning CAUTION Serviceguard manages Veritas processes, specifically gab and LLT, through system multi-node packages. As a result, the Veritas administration commands such as gabconfig, llthosts, and lltconfig should only be used in display mode, for example gabconfig -a. You could crash nodes or the entire cluster if you use Veritas commands such as the gab* or llt* commands to configure these components or affect their runtime behavior.
Planning and Documenting an HA Cluster Package Configuration Planning You create a chain of package dependencies for application failover packages and the non-failover packages: 1. The failover package’s applications should not run on a node unless the mount point packages are already running. In the package’s configuration file, you fill out the dependency parameter to specify the requirement SG-CFS-MP-id# =UP on the SAME_NODE. 2.
Planning and Documenting an HA Cluster Package Configuration Planning other forms of mount will not create an appropriate multi-node package which means that the cluster packages are not aware of the file system changes. NOTE The Disk Group (DG) and Mount Point (MP) multi-node packages (SG-CFS-DG_ID# and SG-CFS-MP_ID#) do not monitor the health of the disk group and mount point. They check that the application packages that depend on them have access to the disk groups and mount points.
Planning and Documenting an HA Cluster Package Configuration Planning The following table describes different types of failover behavior and the settings in the package configuration file that determine each behavior. See “Package Parameter Explanations” on page 287 for more information.
Planning and Documenting an HA Cluster Package Configuration Planning Table 4-2 Package Failover Behavior (Continued) Switching Behavior All packages switch following a system reset (an immediate halt without a graceful shutdown) on the node when a specific service fails. Halt scripts are not run. All packages switch following a system reset on the node when any service fails. An attempt is first made to reboot the system prior to the system reset.
Planning and Documenting an HA Cluster Package Configuration Planning Serviceguard provides a set of parameters for configuring EMS (Event Monitoring Service) resources. These are resource_name, resource_polling_interval, resource_start, and resource_up_value. Configure each of these parameters in the package configuration file for each resource the package will be dependent on. The resource_start parameter determines when Serviceguard starts up resource monitoring for EMS resources.
Planning and Documenting an HA Cluster Package Configuration Planning If a resource is configured to be AUTOMATIC in a legacy configuration file, you do not need to define DEFERRED_RESOURCE_NAME in the package control script. About Package Dependencies Starting in Serviceguard A.11.17, a package can have dependencies on other packages, meaning the package will not start on a node unless the packages it depends on are running on that node. In Serviceguard A.11.
Planning and Documenting an HA Cluster Package Configuration Planning Rules Assume that we want to make pkg1 depend on pkg2. NOTE pkg1 can depend on more than one other package, and pkg2 can depend on another package or packages; we are assuming only two packages in order to make the rules as clear as possible. • pkg1 will not start on any node unless pkg2 is running on that node.
Planning and Documenting an HA Cluster Package Configuration Planning • A package cannot depend on itself, directly or indirectly. That is, not only must pkg1 not specify itself in the dependency_condition (see page 294), but pkg1 must not specify a dependency on pkg2 if pkg2 depends on pkg1, or if pkg2 depends on pkg3 which depends on pkg1, etc.
Planning and Documenting an HA Cluster Package Configuration Planning NOTE This applies only when the packages are automatically started (package switching enabled); cmrunpkg will never force a package to halt. Keep in mind that you do not have to set priority, even when one or more packages depend on another. The default value, no_priority, may often result in the behavior you want.
Planning and Documenting an HA Cluster Package Configuration Planning If pkg1 depends on pkg2, and pkg1’s priority is lower than or equal to pkg2’s, pkg2’s node order dominates. Assuming pkg2’s node order is node1, node2, node3, then: • On startup: — pkg2 will start on node1, or node2 if node1 is not available or does not at present meet all of its dependencies, etc.
Planning and Documenting an HA Cluster Package Configuration Planning — if pkg2 has failed back to node1 and node1 does not meet all of pkg1’s dependencies, pkg1 will halt. If pkg1 depends on pkg2, and pkg1’s priority is higher than pkg2’s, pkg1’s node order dominates. Assuming pkg1’s node order is node1, node2, node3, then: • On startup: — pkg1 will select node1 to start on. — pkg2 will start on node1, provided it can run there (no matter where node1 appears on pkg2’s node_name list).
Planning and Documenting an HA Cluster Package Configuration Planning But you also need to weigh the relative importance of the packages. If pkg2 runs a database that is central to your business, you probably want it to run undisturbed, no matter what happens to application packages that depend on it. In this case, the database package should have the highest priority. Note that, if no priorities are set, the dragging rules favor a package that is depended on over a package that depends on it.
Planning and Documenting an HA Cluster Package Configuration Planning About External Scripts The package configuration template for modular scripts explicitly provides for external scripts. These replace the CUSTOMER DEFINED FUNCTIONS in legacy scripts, and can be run either: • On package startup and shutdown, as essentially the first and last functions the package performs.
Planning and Documenting an HA Cluster Package Configuration Planning sg_source_pkg_env(), provides access to all the parameters configured for this package, including package-specific environment variables configured via the pev_ parameter (see page 307). For more information, see the template in $SGCONF/examples/external_script.template. A sample script follows. It assumes there is another script called monitor.sh, which will be configured as a Serviceguard service to monitor some application.
Planning and Documenting an HA Cluster Package Configuration Planning typeset -i ret=0 typeset -i i=0 typeset -i found=0 # check PEV_ attribute is configured and within limits if [[ -z PEV_MONITORING_INTERVAL ]] then sg_log 0 "ERROR: PEV_MONITORING_INTERVAL attribute not configured!" ret=1 elif (( PEV_MONITORING_INTERVAL < 1 )) then sg_log 0 "ERROR: PEV_MONITORING_INTERVAL value ($PEV_MONITORING_INTERVAL) not within legal limits!" ret=1 fi # check monitoring service we are expecting for this package is conf
Planning and Documenting an HA Cluster Package Configuration Planning sg_log 0 "PEV_MONITORING_INTERVAL for $SG_PACKAGE_NAME is $PEV_MONITORING_INTERVAL" return 0 } function stop_command { sg_log 5 "stop_command" # log current PEV_MONITORING_INTERVAL value, PEV_ attribute can be changed # while the package is running sg_log 0 "PEV_MONITORING_INTERVAL for $SG_PACKAGE_NAME is $PEV_MONITORING_INTERVAL" return 0 } typeset -i exit_val=0 case ${1} in start) start_command $* exit_val=$? ;; stop) stop_command $* ex
Planning and Documenting an HA Cluster Package Configuration Planning Using Serviceguard Commands in an External Script You can use Serviceguard commands (such as cmmodpkg) in an external script run from a package. These commands must not interact with that package itself (that is, the package that runs the external script) but can interact with other packages. But be careful how you code these interactions. If a Serviceguard command interacts with another package, be careful to avoid command loops.
Planning and Documenting an HA Cluster Package Configuration Planning You can add custom code to the package to interrogate this variable, determine why the package halted, and take appropriate action.
Planning and Documenting an HA Cluster Package Configuration Planning About Cross-Subnet Failover It is possible to configure a cluster that spans subnets joined by a router, with some nodes using one subnet and some another. This is known as a cross-subnet configuration (see “Cross-Subnet Configurations” on page 41). In this context, you can configure packages to fail over from a node on one subnet to a node on another.
Planning and Documenting an HA Cluster Package Configuration Planning — As in other cluster configurations, a package will not start on a node unless the subnets configured on that node, and specified in the package configuration file as monitored subnets, are up.
Planning and Documenting an HA Cluster Package Configuration Planning Configuring node_name First you need to make sure that pkg1 will fail over to a node on another subnet only if it has to. For example, if it is running on NodeA and needs to fail over, you want it to try NodeB, on the same subnet, before incurring the cross-subnet overhead of failing over to NodeC or NodeD.
Planning and Documenting an HA Cluster Package Configuration Planning Configuring ip_subnet_node Now you need to specify which subnet is configured on which nodes. In our example, you would do this by means of entries such as the following in the package configuration file: ip_subnet 15.244.65.0 ip_subnet_node nodeA ip_subnet_node nodeB ip_address 15.244.65.82 ip_address 15.244.65.83 ip_subnet 15.244.56.0 ip_subnet_node nodeC ip_subnet_node nodeD ip_address 15.244.56.100 ip_address 15.244.56.
Planning and Documenting an HA Cluster Planning for Changes in Cluster Size Planning for Changes in Cluster Size If you intend to add additional nodes to the cluster online (while it is running) ensure that they are connected to the same heartbeat subnets and to the same lock disks as the other cluster nodes. In selecting a cluster lock configuration, be careful to anticipate any potential need for additional cluster nodes.
Planning and Documenting an HA Cluster Planning for Changes in Cluster Size 196 Chapter 4
Building an HA Cluster Configuration 5 Building an HA Cluster Configuration This chapter and the next take you through the configuration tasks required to set up a Serviceguard cluster. These procedures are carried out on one node, called the configuration node, and the resulting binary file is distributed by Serviceguard to all the nodes in the cluster. In the examples in this chapter, the configuration node is named ftsys9, and the sample target node is called ftsys10.
Building an HA Cluster Configuration Preparing Your Systems Preparing Your Systems This section describes the tasks that should be done on the prospective cluster nodes before you actually configure the cluster.
Building an HA Cluster Configuration Preparing Your Systems Installing and Updating Serviceguard For information about installing Serviceguard, see the Release Notes for your version at http://docs.hp.com -> High Availability -> Serviceguard -> Release Notes. For information about installing and updating HP-UX, see the HP-UX Installation and Update Guide for the version you need: go to http://docs.hp.
Building an HA Cluster Configuration Preparing Your Systems NOTE If these variables are not defined on your system, then source the file /etc/cmcluster.conf in your login profile for user root. For example, you can add this line to root’s .profile file: . /etc/cmcluster.conf Throughout this book, system filenames are usually given with one of these location prefixes. Thus, references to $SGCONF/filename can be resolved by supplying the definition of the prefix that is found in this file.
Building an HA Cluster Configuration Preparing Your Systems You may want to add a comment such as the following at the top of the file: ########################################################### # Do not edit this file! # Serviceguard uses this file only to authorize access to an # unconfigured node. Once the node is configured, # Serviceguard will not consult this file.
Building an HA Cluster Configuration Preparing Your Systems Ensuring that the Root User on Another Node Is Recognized The HP-UX root user on any cluster node can configure the cluster. This requires that Serviceguard on one node be able to recognize the root user on another. Serviceguard uses the identd daemon to verify user names, and, in the case of a root user, verification succeeds only if identd returns the username root.
Building an HA Cluster Configuration Preparing Your Systems Configuring Name Resolution Serviceguard uses the name resolution services built in to HP-UX. Serviceguard nodes can communicate over any of the cluster’s shared networks, so the network resolution service you are using (such as DNS, NIS, or LDAP) must be able to resolve each of their primary addresses on each of those networks to the primary hostname of the node in question.
Building an HA Cluster Configuration Preparing Your Systems NOTE Serviceguard recognizes only the hostname (the first element) in a fully qualified domain name (a name with four elements separated by periods, like those in the example above). This means, for example, that gryf.uksr.hp.com and gryf.cup.hp.com cannot be nodes in the same cluster, as Serviceguard would see them as the same host gryf.
Building an HA Cluster Configuration Preparing Your Systems The procedure that follows shows how to create a robust name-resolution configuration that will allow cluster nodes to continue communicating with one another if a name service fails. If a standby LAN is configured, this approach also allows the cluster to continue to function fully (including commands such as cmrunnode and cmruncl) after the primary LAN has failed.
Building an HA Cluster Configuration Preparing Your Systems Step 1. Edit the /etc/hosts file on all nodes in the cluster. Add name resolution for all heartbeat IP addresses, and other IP addresses from all the cluster nodes; see “Configuring Name Resolution” on page 203 for discussion and examples. NOTE For each cluster node, the public-network IP address must be the first address listed. This enables other applications to talk to other nodes on public networks. Step 2.
Building an HA Cluster Configuration Preparing Your Systems This step is critical, allowing the cluster nodes to resolve hostnames to IP addresses while DNS, NIS, or the primary LAN is down. Step 4. Create a $SGCONF/cmclnodelist file on all nodes that you intend to configure into the cluster, and allow access by all cluster nodes. See “Allowing Root Access to an Unconfigured Node” on page 200.
Building an HA Cluster Configuration Preparing Your Systems Adjust these parameters with care. If you experience problems, return the parameters to their default values. When contacting HP support for any issues regarding Serviceguard and networking, please be sure to mention any parameters that were changed from the defaults. Third-party applications that are running in a Serviceguard environment may require tuning of network and kernel parameters: • ndd is the network tuning utility.
Building an HA Cluster Configuration Preparing Your Systems Creating Mirrors of Root Logical Volumes HP strongly recommends that you use mirrored root volumes on all cluster nodes. The following procedure assumes that you are using separate boot and root volumes; you create a mirror of the boot volume (/dev/vg00/lvol1), primary swap (/dev/vg00/lvol2), and root volume (/dev/vg00/lvol3).
Building an HA Cluster Configuration Preparing Your Systems lvextend -m 1 /dev/vg00/lvol2 /dev/dsk/c4t6d0 The following is an example of mirroring the root logical volume: lvextend -m 1 /dev/vg00/lvol3 /dev/dsk/c4t6d0 5. Update the boot information contained in the BDRA for the mirror copies of boot, root and primary swap. /usr/sbin/lvlnboot -b /dev/vg00/lvol1 /usr/sbin/lvlnboot -s /dev/vg00/lvol2 /usr/sbin/lvlnboot -r /dev/vg00/lvol3 6. Verify that the mirrors were properly created.
Building an HA Cluster Configuration Preparing Your Systems pvdisplay The I/O Timeout value should be displayed as “default.” To set the IO Timeout back to the default value, run the command: pvchange -t 0 The use of a dual cluster lock is only allowed with certain specific configurations of hardware. Refer to the discussion in Chapter 3 on “Dual Cluster Lock.” For instructions on setting up a lock disk, see “Specifying a Lock Disk” on page 229.
Building an HA Cluster Configuration Preparing Your Systems This means that if you use an existing lock disk, the existing lock information will be lost, and if you use a LUN that was previously used as a lock LUN for a Linux cluster, that lock information will also be lost. • A lock LUN cannot also be used in an LVM physical volume or VxVM or CVM disk group. • A lock LUN cannot be shared by more than one cluster. • A lock LUN cannot be used in a dual-lock configuration.
Building an HA Cluster Configuration Preparing Your Systems Step 1. Use a text editor to create a file that contains the partition information. You need to create at least three partitions, for example: 3 EFI 100MB HPUX 1MB HPUX 100% This defines: • A 100 MB EFI (Extensible Firmware Interface) partition (this is required) • A 1 MB partition that can be used for the lock LUN • A third partition that consumes the remainder of the disk is and can be used for whatever purpose you like. Step 2.
Building an HA Cluster Configuration Preparing Your Systems Use the command insf -e on each node. This will create device files corresponding to the three partitions, though the names themselves may differ from node to node depending on each node’s I/O configuration. Step 5. Define the lock LUN; see “Defining the Lock LUN”. Defining the Lock LUN Use cmquerycl -L to create a cluster configuration file that defines the lock LUN.
Building an HA Cluster Configuration Preparing Your Systems Creating the Storage Infrastructure and Filesystems with LVM and VxVM In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes. This is done several ways: • for Logical Volume Manager, see “Creating a Storage Infrastructure with LVM” on page 215. Do this before you configure the cluster if you use a lock disk; otherwise it can be done before or after.
Building an HA Cluster Configuration Preparing Your Systems The Event Monitoring Service HA Disk Monitor provides the capability to monitor the health of LVM disks. If you intend to use this monitor for your mirrored disks, you should configure them in physical volume groups. For more information, refer to the manual Using High Availability Monitors (http://docs.hp.com -> High Availability -> Event Monitoring Service and HA Monitors -> Installation and User’s Guide).
Building an HA Cluster Configuration Preparing Your Systems In the following examples, we use /dev/rdsk/c1t2d0 and /dev/rdsk/c0t2d0, which happen to be the device names for the same disks on both ftsys9 and ftsys10. In the event that the device file names are different on the different nodes, make a careful note of the correspondences. NOTE Under agile addressing, the physical devices in these examples would have names such as /dev/rdisk/disk1 and /dev/rdisk/disk2.
Building an HA Cluster Configuration Preparing Your Systems 3. Create the volume group and add physical volumes to it with the following commands: vgcreate -g bus0 /dev/vgdatabase /dev/dsk/c1t2d0 vgextend -g bus1 /dev/vgdatabase /dev/dsk/c0t2d0 CAUTION Volume groups used by Serviceguard must have names no longer than 35 characters (that is, the name that follows /dev/, in this example vgdatabase, must be at most 35 characters long).
Building an HA Cluster Configuration Preparing Your Systems mkdir /mnt1 3. Mount the disk to verify your work: mount /dev/vgdatabase/lvol1 /mnt1 Note the mount command uses the block device file for the logical volume. 4. Verify the configuration: vgdisplay -v /dev/vgdatabase Distributing Volume Groups to Other Nodes After creating volume groups for cluster data, you must make them available to any cluster node that will need to activate the volume group.
Building an HA Cluster Configuration Preparing Your Systems 3. On ftsys10, create the volume group directory: mkdir /dev/vgdatabase 4. Still on ftsys10, create a control file named group in the directory /dev/vgdatabase, as follows: mknod /dev/vgdatabase/group c 64 0xhh0000 Use the same minor number as on ftsys9. Use the following command to display a list of existing volume groups: ls -l /dev/*/group 5. Import the volume group data using the map file from node ftsys9.
Building an HA Cluster Configuration Preparing Your Systems reflects the contents of all physical volume groups on that node. See the following section, “Making Physical Volume Group Files Consistent.” 7. Make sure that you have deactivated the volume group on ftsys9. Then enable the volume group on ftsys10: vgchange -a y /dev/vgdatabase 8. Create a directory to mount the disk: mkdir /mnt1 9. Mount and verify the volume group on ftsys10: mount /dev/vgdatabase/lvol1 /mnt1 10.
Building an HA Cluster Configuration Preparing Your Systems 3. If /etc/lvmpvg on ftsys10 contains entries for volume groups that do not appear in /etc/lvmpvg.new, then copy all physical volume group entries for that volume group to /etc/lvmpvg.new. 4. Adjust any physical volume names in /etc/lvmpvg.new to reflect their correct names on ftsys10. 5. On ftsys10, copy /etc/lvmpvg to /etc/lvmpvg.old to create a backup. Copy /etc/lvmvpg.new to /etc/lvmpvg on ftsys10.
Building an HA Cluster Configuration Preparing Your Systems Initializing the Veritas Cluster Volume Manager 3.5 NOTE Check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CVM (and CFS - Cluster File System): http://www.docs.hp.com -> High Availability -> Serviceguard). If you are using CVM 3.
Building an HA Cluster Configuration Preparing Your Systems Initializing Disks for VxVM You need to initialize the physical disks that will be employed in VxVM disk groups.
Building an HA Cluster Configuration Preparing Your Systems NAME STATE rootdg logdata enabled enabled ID 971995699.1025.node1 972078742.1084.node1 Creating Volumes Use the vxassist command to create logical volumes. The following is an example: vxassist -g logdata make log_files 1024m This command creates a 1024 MB volume named log_files in a disk group named logdata.
Building an HA Cluster Configuration Preparing Your Systems mkdir /logs 3. Mount the volume: mount /dev/vx/dsk/logdata/log_files /logs 4.
Building an HA Cluster Configuration Preparing Your Systems Clearimport at System Reboot Time At system reboot time, the cmcluster RC script does a vxdisk clearimport on all disks formerly imported by the system, provided they have the noautoimport flag set, and provided they are not currently imported by another running node. The clearimport clears the host ID on the disk group, to allow any node that is connected to the disk group to import it when the package moves from one node to another.
Building an HA Cluster Configuration Configuring the Cluster Configuring the Cluster This section describes how to define the basic cluster configuration. This must be done on a system that is not part of a Serviceguard cluster (that is, on which Serviceguard is installed but not configured). NOTE You can use Serviceguard Manager to configure a cluster: open the System Management Homepage (SMH) and choose Tools-> Serviceguard Manager. See “Using Serviceguard Manager” on page 30 for more information.
Building an HA Cluster Configuration Configuring the Cluster The cmquerycl(1m) manpage further explains the parameters that appear in the template file. Many are also described in the “Planning” chapter. Modify your /etc/cmcluster/clust1.config file as needed. cmquerycl Options Speeding up the Process In a larger or more complex cluster with many nodes, networks or disks, the cmquerycl command may take several minutes to complete.
Building an HA Cluster Configuration Configuring the Cluster To create a lock disk, enter the lock disk information following the cluster name. The lock disk must be in an LVM volume group that is accessible to all the nodes in the cluster.
Building an HA Cluster Configuration Configuring the Cluster where the /dev/volume-group is the name of the second volume group and block-special-file is the physical volume name of a lock disk in the chosen volume group.
Building an HA Cluster Configuration Configuring the Cluster Specifying a Quorum Server A cluster lock disk, lock LUN, or quorum server, is required for two-node clusters. To obtain a cluster configuration file that includes Quorum Server parameters, use the -q option of the cmquerycl command, specifying a Quorum Server host server, for example (all on one line): cmquerycl -q -n ftsys9 -n ftsys10 -C .config Enter the QS_HOST, QS_POLLING_INTERVAL and optionally a QS_TIMEOUT_EXTENSION.
Building an HA Cluster Configuration Configuring the Cluster Obtaining Cross-Subnet Information As of Serviceguard A.11.18 it is possible to configure multiple subnets, joined by a router, both for the cluster heartbeat and for data, with some nodes using one subnet and some another. See “Cross-Subnet Configurations” on page 41 for rules and definitions. You must use the -w full option to cmquerycl to discover the available subnets.
Building an HA Cluster Configuration Configuring the Cluster 5 6 lan4 (nodeD) lan1 (nodeC) lan1 (nodeD) lan2 (nodeC) lan2 (nodeD) IP subnets: IPv4: 15.13.164.0 15.13.172.0 15.13.165.0 15.13.182.0 15.244.65.0 15.244.56.
Building an HA Cluster Configuration Configuring the Cluster 3ffe:2222::/64 lan3 (nodeC) lan3 (nodeD) Possible Heartbeat IPs: 15.13.164.0 15.13.172.0 15.13.165.0 15.13.182.0 15.13.164.1 (nodeA) 15.13.164.2 (nodeB) 15.13.172.158 (nodeC) 15.13.172.159 (nodeD) 15.13.165.1 (nodeA) 15.13.165.2 (nodeB) 15.13.182.158 (nodeC) 15.13.182.159 (nodeD) Route connectivity(full probing performed): 1 15.13.164.0 15.13.172.0 2 15.13.165.0 15.13.182.0 3 15.244.65.0 4 15.244.56.
Building an HA Cluster Configuration Configuring the Cluster IMPORTANT Note that in this example subnet 15.244.65.0, used by NodeA and NodeB, is not routed to 15.244.56.0, used by NodeC and NodeD. But subnets 15.13.164.0 and 15.13.165.0, used by NodeA and NodeB, are routed respectively to subnets 15.13.172.0 and15.13.182.0, used by NodeC and NodeD. At least one such routing among all the nodes must exist for cmquerycl to succeed.
Building an HA Cluster Configuration Configuring the Cluster Specifying Maximum Number of Configured Packages This specifies the most packages that can be configured in the cluster. The parameter value must be equal to or greater than the number of packages currently configured in the cluster. The count includes all types of packages: failover, multi-node, and system multi-node. As of Serviceguard A.11.17, the default is 150, which is the maximum allowable number of packages in a cluster.
Building an HA Cluster Configuration Configuring the Cluster Optimization Serviceguard Extension for Faster Failover (SGeFF) is a separately purchased product. If it is installed, the configuration file will display the parameter to enable it. SGeFF reduces the time it takes Serviceguard to process a failover. It cannot, however, change the time it takes for packages and applications to gracefully shut down and restart.
Building an HA Cluster Configuration Configuring the Cluster Controlling Access to the Cluster Serviceguard access-control policies define cluster users’ administrative or monitoring capabilities. A Note about Terminology Although you will also sometimes see the term role-based access (RBA) in the output of Serviceguard commands, the preferred set of terms, always used in this manual, is as follows: • Access-control policies - the set of rules defining user access to the cluster.
Building an HA Cluster Configuration Configuring the Cluster Figure 5-1 240 Access Roles Chapter 5
Building an HA Cluster Configuration Configuring the Cluster Levels of Access Serviceguard recognizes two levels of access, root and non-root: • Root access: Full capabilities; only role allowed to configure the cluster. As Figure 5-1 shows, users with root access have complete control over the configuration of the cluster and its packages. This is the only role allowed to use the cmcheckconf, cmapplyconf, cmdeleteconf, and cmmodnet -a commands.
Building an HA Cluster Configuration Configuring the Cluster — (single-package) Package Admin: Allowed to perform package administration for a specified package, and use cluster and package view commands. These users can run and halt a specified package, and change its switching behavior, but cannot configure or create packages. This is the only access role defined in the package configuration file; the others are defined in the cluster configuration file.
Building an HA Cluster Configuration Configuring the Cluster Setting up Access-Control Policies The HP-UX root user on each cluster node is automatically granted the Serviceguard root access role on all nodes. (See “Configuring Root-Level Access” on page 200 for more information.) Access-control policies define non-root roles for other cluster users. NOTE For more information and advice, see the white paper Securing Serviceguard at http://docs.hp.com -> High Availability -> Serviceguard -> White Papers.
Building an HA Cluster Configuration Configuring the Cluster The commands must be issued on USER_HOST but can take effect on other nodes; for example patrick can use bit’s command line to start a package on gryf. NOTE Choose one of these three values for USER_HOST: — ANY_SERVICEGUARD_NODE - any node on which Serviceguard is configured, and which is on a subnet with which nodes in this cluster can communicate (as reported by cmquerycl -w full).
Building an HA Cluster Configuration Configuring the Cluster set in the cluster configuration file, PACKAGE_ADMIN applies to all configured packages; if it is set in a package configuration file, it applies to that package only. These roles are not exclusive; for example, you can configure more than one PACKAGE_ADMIN for the same package. NOTE You do not have to halt the cluster or package to configure or modify access control policies.
Building an HA Cluster Configuration Configuring the Cluster IMPORTANT Wildcards do not degrade higher-level roles that have been granted to individual members of the class specified by the wildcard.
Building an HA Cluster Configuration Configuring the Cluster Package versus Cluster Roles Package configuration will fail if there is any conflict in roles between the package configuration and the cluster configuration, so it is a good idea to have the cluster configuration file in front of you when you create roles for a package; use cmgetconf to get a listing of the cluster configuration file.
Building an HA Cluster Configuration Configuring the Cluster Verifying the Cluster Configuration If you have edited a cluster configuration file using the command line, use the following command to verify the content of the file: cmcheckconf -k -v -C /etc/cmcluster/clust1.config The following items are checked: 248 • Network addresses and connections. • Cluster lock connectivity (if you are configuring a lock disk). • Validity of configuration parameters for the cluster and packages.
Building an HA Cluster Configuration Configuring the Cluster If the cluster is online, the check also verifies that all the conditions for the specific change in configuration have been met. NOTE Using the -k option means that cmcheckconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmcheckconf tests the connectivity of all LVM disks on all nodes.
Building an HA Cluster Configuration Configuring the Cluster Using the -k option means that cmapplyconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmapplyconf tests the connectivity of all LVM disks on all nodes. Using -k can result in significantly faster operation of the command. NOTE • Deactivate the cluster lock volume group.
Building an HA Cluster Configuration Configuring the Cluster NOTE You must use the vgcfgbackup command to store a copy of the cluster lock disk's configuration data whether you created the volume group using the System Management Homepage (SMH), SAM, or HP-UX commands. If the cluster lock disk ever needs to be replaced while the cluster is running, you must use the vgcfgrestore command to restore lock information to the replacement disk.
Building an HA Cluster Configuration Configuring the Cluster many of the same commands, but the processes are in a slightly different order. Another difference is that when you use CFS, Serviceguard creates packages to manage the disk groups and mount points so you do not activate CFS disk groups or CFS mount points in your application packages. Refer to the Serviceguard man pages for more information about the commands cfscluster, cfsdgadm, cfsmntadm, cfsmount and cfsumount and cmgetpkgenv.
Building an HA Cluster Configuration Configuring the Cluster NOTE Do not edit system multi-node package configuration files, such as VxVM-CVM-pkg.conf and SG-CFS-pkg.conf. Create and modify configuration using the cfs admin commands listed in Appendix A. Activate the SG-CFS-pkg and start up CVM with the cfscluster command; this creates SG-CFS-pkg, and also starts it.
Building an HA Cluster Configuration Configuring the Cluster Creating the Disk Groups Initialize the disk group from the master node. 1. Find the master node using vxdctl or cfscluster status 2. Initialize a new disk group, or import an existing disk group, in shared mode, using the vxdg command. • For a new disk group use the init option: vxdg -s init logdata c4t0d6 • For an existing disk group, use the import option: vxdg -C -s import logdata 3. Verify the disk group.
Building an HA Cluster Configuration Configuring the Cluster Node Name : ftsys9 (MASTER) DISK GROUP ACTIVATION MODE logdata off (sw) Node Name : DISK GROUP logdata ftsys10 ACTIVATION MODE off (sw) 3. Activate the disk group and Start up the Package cfsdgadm activate logdata 4.
Building an HA Cluster Configuration Configuring the Cluster Creating a File System and Mount Point Package CAUTION Nested mounts are not supported: do not use a directory in a CFS file system as a mount point for a local file system or another cluster file system. For other restrictions, see “Unsupported Features” in the “Technical Overview” section of the VERITAS Storage Foundation™ Cluster File System 4.1 HP Serviceguard Storage Management Suite Extracts at http://docs.hp.
Building an HA Cluster Configuration Configuring the Cluster NOTE The disk group and mount point multi-node packages do not monitor the health of the disk group and mount point. They check that the packages that depend on them have access to the disk groups and mount points. If the dependent application package loses access and cannot read and write to the disk, it will fail; however that will not cause the DG or MP multi-node package to fail. 3. Verify with cmviewcl or cfsmntadm display.
Building an HA Cluster Configuration Configuring the Cluster Filesystem kbytes used avail %used /dev/vx/dsk/logdata/log_files 10485 17338 966793 Mounted on 2% tmp/logdata/log_files ftsys10/etc/cmcluster/cfs> bdf Filesystem kbytes used avail %used /dev/vx/dsk/logdata/log_files 10485 17338 966793 Mounted on 2% tmp/logdata/log_files 6. To view the package name that is monitoring a mount point, use the cfsmntadm show_package command: cfsmntadm show_package /tmp/logdata/log_files SG-CFS-MP-1 7.
Building an HA Cluster Configuration Configuring the Cluster For more information about the technique, see the Veritas File System Administrator’s Guide appropriate to your version of CFS, posted at http://docs.hp.com. The following example illustrates how to create a storage checkpoint of the /cfs/mnt2 filesystem. Start with a cluster-mounted file system. 1. Create a checkpoint of /tmp/logdata/log_files named check2.
Building an HA Cluster Configuration Configuring the Cluster SG-CFS-DG-1 SG-CFS-MP-1 SG-CFS-CK-1 up up up running running running enabled enabled disabled no no no /tmp/check_logfiles now contains a point in time view of /tmp/logdata/log_files, and it is persistent.
Building an HA Cluster Configuration Configuring the Cluster vxdg init dg1 c4t1d0 vxassist -g dg1 make vol1 100m vxvol -g dg1 startall 2. Associate it with the cluster. cfsmntadm add snapshot dev=/dev/vx/dsk/dg1/vol1 \ /tmp/logdata/log_files /local/snap1 ftsys9=ro Package name SG-CFS-SN-1 was generated to control the resource. Mount point /local/snap1 was associated to the cluster.
Building an HA Cluster Configuration Configuring the Cluster 102400 1765 94353 2% /tmp/logdata/log_files 102400 1765 94346 2% /local/snap1 /dev/vx/dsk/dg1/vol1 Creating the Storage Infrastructure with Veritas Cluster Volume Manager (CVM) NOTE Check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information on support for CVM (and CFS - Cluster File System): http://www.docs.hp.
Building an HA Cluster Configuration Configuring the Cluster For more information, including details about configuration of plexes (mirrors), multipathing, and RAID, refer to the HP-UX documentation for the Veritas Volume Manager. See the documents for HP Serviceguard Storage Management Suite posted at http://docs.hp.com. Initializing the Veritas Volume Manager If you are about to create disk groups for the first time, you need to initialize the Volume Manager.
Building an HA Cluster Configuration Configuring the Cluster NOTE Cluster configuration is described in the previous section, “Configuring the Cluster” on page 228. Check the heartbeat configuration. The CVM 3.5 heartbeat requirement is different from version 4.1 and later: • CVM 3.5 requires that you can configure only one heartbeat subnet. • CVM 4.1 and later versions require that the cluster have either multiple heartbeats or a single heartbeat with a standby.
Building an HA Cluster Configuration Configuring the Cluster You can confirm this using the cmviewcl command. This output shows results of the CVM 3.5 command above. CLUSTER example STATUS up NODE ftsys9 ftsys10 STATUS up up STATE running running MULTI_NODE_PACKAGES: PACKAGE STATUS VxVM-CVM-pkg up NOTE STATE running AUTO_RUN enabled SYSTEM yes Do not edit system multi-node package configuration files, such as VxVM-CVM-pkg.conf and SG-CFS-pkg.conf.
Building an HA Cluster Configuration Configuring the Cluster To initialize a disk for CVM, log on to the master node, then use the vxdiskadm program to initialize multiple disks, or use the vxdisksetup command to initialize one disk at a time, as in the following example: /usr/lib/vxvm/bin/vxdisksetup -i c4t3d4 Creating Disk Groups Use the following steps to create disk groups. Step 1. Use the vxdg command to create disk groups.
Building an HA Cluster Configuration Configuring the Cluster Mirror Detachment Policies with CVM The default CVM disk mirror detachment policy is global, which means that as soon as one node cannot see a specific mirror copy (plex), all nodes cannot see it as well. The alternate policy is local, which means that if one node cannot see a specific mirror copy, then CVM will deactivate access to the volume for that node only.
Building an HA Cluster Configuration Managing the Running Cluster Managing the Running Cluster This section describes some approaches to routine management of the cluster. Additional tools and suggestions are found in Chapter 7, “Cluster and Package Maintenance.” Checking Cluster Operation with Serviceguard Manager You can check configuration and status information using Serviceguard Manager: from the System Management Homepage (SMH), choose Tools-> Serviceguard Manager.
Building an HA Cluster Configuration Managing the Running Cluster You can use these commands to test cluster operation, as in the following: 1. If the cluster is not already running, start it. From the Serviceguard Manager menu, choose Run Cluster. From the command line, use cmruncl -v. By default, cmruncl will check the networks. Serviceguard will probe the actual network configuration with the network information in the cluster configuration.
Building an HA Cluster Configuration Managing the Running Cluster Preventing Automatic Activation of LVM Volume Groups It is important to prevent LVM volume groups that are to be used in packages from being activated at system boot time by the /etc/lvmrc file. One way to ensure that this does not happen is to edit the /etc/lvmrc file on all nodes, setting AUTO_VG_ACTIVATE to 0, then including all the volume groups that are not cluster-bound in the custom_vg_activation function.
Building an HA Cluster Configuration Managing the Running Cluster To enable automatic cluster start, set the flag AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file on each node in the cluster; the nodes will then join the cluster at boot time. Here is an example of the /etc/rc.config.d/cmcluster file: #************************ CMCLUSTER ************************ # Highly Available Cluster configuration # # @(#) $Revision: 72.
Building an HA Cluster Configuration Managing the Running Cluster Managing a Single-Node Cluster The number of nodes you will need for your Serviceguard cluster depends on the processing requirements of the applications you want to protect. You may want to configure a single-node cluster to take advantage of Serviceguard’s network failure protection. In a single-node cluster, a cluster lock is not required, since there is no other node in the cluster.
Building an HA Cluster Configuration Managing the Running Cluster Disabling identd Ignore this section unless you have a particular need to disable identd. You can configure Serviceguard not to use identd. CAUTION This is not recommended. Disabling identd removes an important security layer from Serviceguard. See the white paper Securing Serviceguard at http://docs.hp.com -> High Availability -> Serviceguard -> White Papers for more information.
Building an HA Cluster Configuration Managing the Running Cluster Although the cluster must be halted, all nodes in the cluster should be powered up and accessible before you use the cmdeleteconf command. If a node is powered down, power it up and boot. If a node is inaccessible, you will see a list of inaccessible nodes together with the following message: It is recommended that you do not proceed with the configuration operation unless you are sure these nodes are permanently unavailable.
Configuring Packages and Their Services 6 Configuring Packages and Their Services Serviceguard packages group together applications and the services and resources they depend on. The typical Serviceguard package is a failover package that starts on one node but can be moved (“failed over”) to another if necessary. See “What is Serviceguard?” on page 26, “How the Package Manager Works” on page 74, and “Package Configuration Planning” on page 169 for more information.
Configuring Packages and Their Services NOTE This is a new process for configuring packages, as of Serviceguard A.11.18. This manual refers to packages created by this method as modular packages, and assumes that you will use it to create new packages; it is simpler and more efficient than the older method, allowing you to build packages from smaller modules, and eliminating the separate package control script and the need to distribute it manually. Packages created using Serviceguard A.11.
Configuring Packages and Their Services Choosing Package Modules Choosing Package Modules IMPORTANT Before you start, you need to do the package-planning tasks described under “Package Configuration Planning” on page 169. To choose the right package modules, you need to decide the following things about the package you are creating: • What type of package it is; see “Types of Package: Failover, Multi-Node, System Multi-Node” on page 277.
Configuring Packages and Their Services Choosing Package Modules The Veritas Cluster File System (CFS) system multi-node packages are examples of multi-node packages; but support for multi-node packages is no longer restricted to CVM/CFS; you can create a multi-node package for any purpose. IMPORTANT But if the package uses volume groups, they must be activated in shared mode: vgchange -a s, which is available only if the SGeRAC add-on product is installed.
Configuring Packages and Their Services Choosing Package Modules NOTE On systems that support CFS, you configure the CFS system multi-node package by means of the cfscluster command, not by editing a package configuration file. See “Configuring Veritas System Multi-node Packages” on page 325.
Configuring Packages and Their Services Choosing Package Modules • If a multi-node package is halted via cmhaltpkg, package switching is not disabled. This means that the halted package will start to run on a rebooted node, if it is configured to run on that node and its dependencies are met.
Configuring Packages and Their Services Choosing Package Modules NOTE If you are going to create a complex package that contains many modules, you may want to skip the process of selecting modules, and simply create a configuration file that contains all the modules: cmmakepkg -m sg/all $SGCONF/sg-all (The output will be written to $SGCONF/sg-all.) Base Package Modules At least one base module (or default or all, which include the base module) must be specified on the cmmakepkg command line.
Configuring Packages and Their Services Choosing Package Modules Table 6-1 Base Modules (Continued) Module Name Parameters (page) Comments multi_node package_name (287) * module_name (288) * module_version (288) * package_type (288) node_name (288) auto_run (289) node_fail_fast_enabled (289) run_script_timeout (290) halt_script_timeout (290) successor_halt_timeout (291) * script_log_file (291) operation_sequence (291) * log_level (292) priority (293) * Base module.
Configuring Packages and Their Services Choosing Package Modules its equivalent) has moved from the package control script to the package configuration file for modular packages. See the “Package Parameter Explanations” on page 287 for more information. Table 6-2 Module Name Optional Modules Parameters (page) Comments dependency dependency_name (293) * dependency_condition (294) dependency_location (294) Add to a base module to create a package that depends on one or more other packages.
Configuring Packages and Their Services Choosing Package Modules Table 6-2 Module Name Optional Modules (Continued) Parameters (page) Comments volume_group concurrent_vgchange_operatio ns (300) (S) enable_threaded_vgchange (301) * vgchange_cmd (301) * (S) cvm_activation_cmd (302) (S) vxvol_cmd (302) * (S) vg (303) (S) cvm_dg (303) (S) vxvm_dg (303) (S) vxvm_dg_retry (303) (S) deactivation_retry_count (304) (S) kill_processes_accessing_raw _devices (304) (S) Add to a base module if the package needs to
Configuring Packages and Their Services Choosing Package Modules Table 6-2 Module Name Optional Modules (Continued) Parameters (page) Comments external_pre external_pre_script (307) * Add to a base module to specify additional programs to be run before volume groups and disk groups are activated while the package is starting and after they are deactivated while the package is halting.
Configuring Packages and Their Services Choosing Package Modules Table 6-2 Module Name default NOTE 286 Optional Modules (Continued) Parameters (page) (all parameters) Comments A symbolic link to the all module; used if a base module is not specified on the cmmakepkg command line; see “cmmakepkg Examples” on page 311. The default form for parameter names in the modular package configuration file is lower case; for legacy packages the default is upper case.
Configuring Packages and Their Services Choosing Package Modules Package Parameter Explanations Brief descriptions of the package configuration parameters follow. NOTE For more information, see the comments in the editable configuration file output by the cmmakepkg command, and the cmmakepkg manpage.
Configuring Packages and Their Services Choosing Package Modules module_name The module name (for example, failover, service, etc.) Do not change it. Used in the form of a relative path (for example sg/failover) as a parameter to cmmakepkg to specify modules to be used in configuring the package. (The files reside in the $SGCONF/modules directory; see “Learning Where Serviceguard Files Are Kept” on page 199 for an explanation of Serviceguard directories.) New for modular packages.
Configuring Packages and Their Services Choosing Package Modules IMPORTANT See “Cluster Configuration Parameters” on page 156 for important information about node names. See “About Cross-Subnet Failover” on page 191 for considerations when configuring cross-subnet packages, which are further explained under “Cross-Subnet Configurations” on page 41. auto_run Can be set to yes or no. The default is yes.
Configuring Packages and Their Services Choosing Package Modules NOTE If the package halt function fails with “exit 1”, Serviceguard does not halt the node, but sets no_restart for the package, which disables package switching (auto_run), thereby preventing the package from starting on any adoptive node. Setting node_fail_fast_enabled to yes prevents Serviceguard from repeatedly trying (and failing) to start the package on the same node.
Configuring Packages and Their Services Choosing Package Modules If the package’s halt process does not complete in the time specified by halt_script_timeout, Serviceguard will terminate the package and prevent it from switching to another node. In this case, if node_fail_fast_enabled (see page 289) is set to yes, the node will be halted (HP-UX system reset).
Configuring Packages and Their Services Choosing Package Modules log_level Determines the amount of information printed to stdout when the package is validated, and to the script_log_file (see page 291) when the package is started and halted. Valid values are 0 through 5, but you should normally use only the first two (0 or 1); the remainder (2 through 5) are intended for use by HP Support.
Configuring Packages and Their Services Choosing Package Modules This parameter can be set for failover packages only. If this package will depend on another package or vice versa, see also “About Package Dependencies” on page 178. priority Assigns a priority to a failover package whose failover_policy (see page 292) is configured_node. Valid values are 1 through 3000, or no_priority. The default is no_priority. See also the dependency_ parameter descriptions, starting on page 293.
Configuring Packages and Their Services Choosing Package Modules dependency_name pkg2dep dependency_condition pkg2 = UP dependency_location same_node For more information about package dependencies, see “About Package Dependencies” on page 178. dependency_condition The condition that must be met for this dependency to be satisfied. As of Serviceguard A.11.18, the only condition that can be set is that another package must be running.
Configuring Packages and Their Services Choosing Package Modules If you specify a subnet as a monitored_subnet the package will not run on any node not reachable via that subnet. This normally means that if the subnet is not up, the package will not run. (For cross-subnet configurations, in which a subnet may be configured on some nodes and not on others, see monitored_subnet_access below, ip_subnet_node on page 296, and “About Cross-Subnet Failover” on page 191.
Configuring Packages and Their Services Choosing Package Modules For each subnet used, specify the subnet address on one line and, on the following lines, the relocatable IP addresses that the package uses on that subnet. These will be configured when the package starts and unconfigured when it halts. For example, if this package uses subnet 192.10.25.0 and the relocatable IP addresses 192.10.25.12 and 192.10.25.13, enter: ip_subnet ip_address ip_address 192.10.25.0 192.10.25.12 192.10.25.
Configuring Packages and Their Services Choosing Package Modules For more information about relocatable IP addresses, see “Stationary and Relocatable IP Addresses” on page 99. This parameter can be set for failover packages only. service_name A service is a program or function which Serviceguard monitors as long as the package is up. service_name identifies this function and is used by the cmrunserv and cmhaltserv commands. You can configure a maximum of 30 services per package and 900 services per cluster.
Configuring Packages and Their Services Choosing Package Modules An absolute pathname is required; neither the PATH variable nor any other environment variable is passed to the command. The default shell is /usr/bin/sh. NOTE Be careful when defining service run commands. Each run command is executed in the following way: • The cmrunserv command executes the run command. • Serviceguard monitors the process ID (PID) of the process the run command creates.
Configuring Packages and Their Services Choosing Package Modules service_halt_timeout The length of time, in seconds, Serviceguard will wait for the service to halt before forcing termination of the service’s process. The maximum value is 4294. The value should be large enough to allow any cleanup required by the service to complete. If no value is specified, a zero timeout will be assumed, meaning that Serviceguard will not wait any time before terminating the process.
Configuring Packages and Their Services Choosing Package Modules Requires an operator and a value. Values can be string or numeric.The legal operators are =, !=, >, <, >=, or <=, depending on the type of value. If the type is string, then only = and != are valid. If the string contains white space, it must be enclosed in quotes. String values are case-sensitive. The maximum length of the entire resource_up_value string is 1024 characters. You can configure a total of 15 resource_up_values per package.
Configuring Packages and Their Services Choosing Package Modules NOTE If you set concurrent_vgchange_operations to a value greater than 1, you may see messages such as this in the package log file: Cannot lock “/etc/lvmconf//lvm_lock” still trying...” This is an informational message that can be safely ignored. enable_threaded_vgchange Indicates whether multi-threaded activation of volume groups (vgchange -T) is enabled. New for modular packages. Available on HP-UX 11i v3 only.
Configuring Packages and Their Services Choosing Package Modules configuration file, “LVM Planning” on page 149, and “Creating the Storage Infrastructure and Filesystems with LVM and VxVM” on page 215. IMPORTANT Volume groups for multi-node and system multi-node packages must be activated in shared mode: vgchange -a s, which is only available if the add-on product Serviceguard Extension for Real Application Cluster (SGeRAC) is installed.
Configuring Packages and Their Services Choosing Package Modules If recovery is found to be necessary during package startup, by default the script will pause until the recovery is complete. To change this behavior, comment out the line vxvol_cmd "vxvol -g \${DiskGroup} startall" in the configuration file, and uncomment the line vxvol_cmd "vxvol -g \${DiskGroup} -o bg startall" This allows package startup to continue while mirror re-synchronization is in progress.
Configuring Packages and Their Services Choosing Package Modules IMPORTANT vxdisk scandisks can take a long time in the case of a large IO subsystem. deactivation_retry_count Specifies how many times the package shutdown script will repeat an attempt to deactivate a volume group (LVM) or disk group (VxVM, CVM) before failing. Legal value is zero or any greater number. Default is zero.
Configuring Packages and Their Services Choosing Package Modules concurrent_fsck_operations The number of concurrent fsck operations allowed on file systems being mounted during package startup. Legal value is any number greater than zero. The default is 1. If the package needs to run fsck on a large number of filesystems, you can improve performance by carefully tuning this parameter during testing (increase it a little at time and monitor performance each time).
Configuring Packages and Their Services Choosing Package Modules fs_name This parameter, in conjunction with fs_directory, fs_type, fs_mount_opt, fs_umount_opt, and fs_fsck_opt, specifies a filesystem that is to be mounted by the package. Replaces LV, which is still supported in the package control script for legacy packages. fs_name must specify the block devicefile for a logical volume. Filesystems are mounted in the order specified in this file, and unmounted in the reverse order.
Configuring Packages and Their Services Choosing Package Modules fs_fsck_opt The fsck options for the file system specified by fs_name. Using the -s (safe performance mode) option of fsck will improve startup performance if the package uses a large number of file systems. This parameter is in the package control script for legacy packages. See the fsck (1m) manpage for more information.
Configuring Packages and Their Services Choosing Package Modules If more than one external_script is specified, the scripts will be executed on package startup in the order they are entered into this file, and in the reverse order during package shutdown. See “About External Scripts” on page 185, and the comments in the configuration file, for more information and examples. user_name Specifies the name of a user who has permission to administer this package.
Configuring Packages and Their Services Choosing Package Modules Additional Parameters Used Only by Legacy Packages IMPORTANT The following parameters are used only by legacy packages. Do not try to use them in modular packages. See “Configuring a Legacy Package” on page 377 for more information. PATH Specifies the path to be used by the script. SUBNET Specifies the IP subnets that are to be monitored for the package.
Configuring Packages and Their Services Choosing Package Modules In most cases, though, HP recommends that you use the same script for both run and halt instructions. (When the package starts, the script is passed the parameter start; when it halts, it is passed the parameter stop.) DEFERRED_RESOURCE_NAME Add DEFERRED_RESOURCE_NAME to a legacy package control script for any resource that has a RESOURCE_START setting of DEFERRED.
Configuring Packages and Their Services Generating the Package Configuration File Generating the Package Configuration File When you have chosen the configuration modules your package needs (see “Choosing Package Modules” on page 277), you are ready to generate a package configuration file that contains those modules. This file will consist of a base module (usually failover, multi-node or system multi-node) plus the modules that contain the additional parameters you have decided to include.
Configuring Packages and Their Services Generating the Package Configuration File • To generate a configuration file that contains all the optional modules: cmmakepkg $SGCONF/pkg1/pkg1.conf • To create a generic failover package (that could be applied with out editing): cmmakepkg -n pkg1 -m sg/failover $SGCONF/pkg1/pkg1.
Configuring Packages and Their Services Editing the Configuration File Editing the Configuration File When you have generated the configuration file that contains the modules your package needs (see “Generating the Package Configuration File” on page 311), you need to edit the file to set the package parameters to the values that will make the package function as you intend.
Configuring Packages and Their Services Editing the Configuration File Use the following bullet points as a checklist, referring to the “Package Parameter Explanations” on page 287, and the comments in the configuration file itself, for detailed specifications for each parameter. NOTE Optional parameters are commented out in the configuration file (with a # at the beginning of the line).
Configuring Packages and Their Services Editing the Configuration File • run_script_timeout and halt_script_timeout. Enter the number of seconds Serviceguard should wait for package startup and shutdown, respectively, to complete; or leave the default, no_timeout; see page 290. • successor_halt_timeout. Used if other packages depend on this package; see “About Package Dependencies” on page 178. • script_log_file. See page 291. • log_level. See log_level on page 292. • failover_policy.
Configuring Packages and Their Services Editing the Configuration File In a cross-subnet configuration, configure the additional monitored_subnet_access parameter for each monitored_subnet as necessary; see “About Cross-Subnet Failover” on page 191 for more information. • If this is a Serviceguard Extension for Oracle RAC (SGeRAC) installation, you can use the cluster_interconnect_subnet parameter (see page 295). • If your package will use relocatable IP addresses, enter the ip_subnet and ip_address.
Configuring Packages and Their Services Editing the Configuration File options in the FILESYSTEMS portion of the configuration file to specify the options for mounting and unmounting the filesystems. Do not use the vxvm_dg or cvm_dg parameters for LVM volume groups. Enter each volume group on a separate line, for example: vg vg01 vg vg02 • If you are using CVM, use the cvm_dg parameters to specify the names of the disk groups to be activated, and select the appropriate cvm_activation_cmd.
Configuring Packages and Their Services Editing the Configuration File — concurrent_fsck_operations (see page 305) — concurrent_mount_and_umount_operations (see page 305) You can also use the fsck_opt and fs_umount_opt parameters to specify the -s option of the fsck and mount/umount commands (see page 306). • You can use the pev_ parameter to specify a variable to be passed to external scripts. Make sure the variable name begins with the upper-case or lower-case letters pev and an underscore (_).
Configuring Packages and Their Services Editing the Configuration File • Configure the Access Control Policy for up to eight specific users or any_user. The only user role you can configure in the package configuration file is package_admin for the package in question. Cluster-wide roles are defined in the cluster configuration file. See “Setting up Access-Control Policies” on page 243 for more information.
Configuring Packages and Their Services Verifying and Applying the Package Configuration Verifying and Applying the Package Configuration Serviceguard checks the configuration you enter and reports any errors. Use a command such as the following to verify the content of the package configuration file you have created, for example: cmcheckconf -v -P $SGCONF/pkg1/pkg1.config Errors are displayed on the standard output.
Configuring Packages and Their Services Verifying and Applying the Package Configuration NOTE For modular packages, you now need to distribute any external scripts identified by the external_pre_script and external_script parameters. But if you are accustomed to configuring legacy packages, note that you do not have to create a separate package control script for a modular package, or distribute it manually. (You do still have to do this for legacy packages; see “Configuring a Legacy Package” on page 377.
Configuring Packages and Their Services Adding the Package to the Cluster Adding the Package to the Cluster You can add the new package to the cluster while the cluster is running, subject to the value of MAX_CONFIGURED_PACKAGES in the cluster configuration file. See “Adding a Package to a Running Cluster” on page 395.
Configuring Packages and Their Services How Control Scripts Manage VxVM Disk Groups How Control Scripts Manage VxVM Disk Groups VxVM disk groups (other than those managed by CVM) are outside the control of the Serviceguard cluster. The package control script uses standard VxVM commands to import and deport these disk groups. (For details on importing and deporting disk groups, refer to the discussion of the import and deport options in the vxdg man page.
Configuring Packages and Their Services How Control Scripts Manage VxVM Disk Groups This command takes over ownership of all the disks in disk group dg_01, even though the disk currently has a different host ID written on it. The command writes the current node’s host ID on all disks in disk group dg_01 and sets the noautoimport flag for the disks. This flag prevents a disk group from being automatically re-imported by a node following a reboot.
Configuring Packages and Their Services Configuring Veritas System Multi-node Packages Configuring Veritas System Multi-node Packages There are two system multi-node packages that regulate Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS). These packages ship with the Serviceguard product. There are two versions of the package files: VxVM-CVM-pkg for CVM Version 3.5, and SG-CFS-pkg for CFS/CVM Version 4.1 and later.
Configuring Packages and Their Services Configuring Veritas System Multi-node Packages NOTE Do not create or modify these packages by editing a configuration file. Never edit their control script files. The CFS admin commands are listed in Appendix A.
Configuring Packages and Their Services Configuring Veritas Multi-node Packages Configuring Veritas Multi-node Packages There are two types of multi-node packages that work with the Veritas Cluster File System (CFS): SG-CFS-DG-id# for disk groups, which you configure with the cfsdgadm command; and SG-CFS-MP-id# for mount points, which you configure with the cfsmntadm command. Each package name will have a unique number, appended by Serviceguard at creation.
Configuring Packages and Their Services Configuring Veritas Multi-node Packages NOTE Do not edit configuration files for the Serviceguard-supplied packages VxVM-CVM-pkg, SG-CFS-pkg, SG-CFS-DG-id#, or SG-CFS-MP-id#. Create VxVM-CVM-pkg and SG-CFS-pkg by means of the cmapplyconf command. Create and modify SG-CFS-DG-id# and SG-CFS-MP-id# using the cfs* commands listed in Appendix A, “Serviceguard Commands,” on page 439.
Cluster and Package Maintenance 7 Cluster and Package Maintenance This chapter describes how to see cluster configuration and status information, how to start and halt a cluster or an individual node, how to perform permanent reconfiguration, and how to start, halt, move, and modify packages during routine maintenance of the cluster.
Cluster and Package Maintenance Reviewing Cluster and Package Status Reviewing Cluster and Package Status You can check status using Serviceguard Manager or from a cluster node’s command line. Reviewing Cluster and Package Status with the cmviewcl Command Information about cluster status is stored in the status database, which is maintained on each individual node in the cluster.
Cluster and Package Maintenance Reviewing Cluster and Package Status cmviewcl -r A.11.16 (See the cmviewcl (1m) manpage for the supported release formats.) The formatting options let you choose a style: the tabulated format is designed for viewing; the line format is designed for scripting, and is easily parsed. See the manpage for a detailed description of other cmviewcl options. Viewing Dependencies The cmviewcl -v command output lists dependencies throughout the cluster.
Cluster and Package Maintenance Reviewing Cluster and Package Status Node Status and State The status of a node is either up (active as a member of the cluster) or down (inactive in the cluster), depending on whether its cluster daemon is running or not. Note that a node might be down from the cluster perspective, but still up and running HP-UX. A node may also be in one of the following states: • Failed.
Cluster and Package Maintenance Reviewing Cluster and Package Status (successors) to halt. The parameter description for successor_halt_timeout (see page 291) provides more information. • failing - The package is halting because it, or a package it depends on, has failed. • fail_wait - The package is waiting to be halted because the package or a package it depends on has failed, but must wait for a package it depends on to halt before it can halt.
Cluster and Package Maintenance Reviewing Cluster and Package Status • fail_wait - The package is waiting to be halted because the package or a package it depends on has failed, but must wait for a package it depends on to halt before it can halt. • failed - The package is down and failed. • relocate_wait - The package’s halt script has completed or Serviceguard is still trying to place the package. • unknown - Serviceguard could not determine the state at the time cmviewcl was run.
Cluster and Package Maintenance Reviewing Cluster and Package Status • Up. The service is being monitored. • Down. The service is not running. It may not have started, or have halted or failed. • Unknown. Network Status The network interfaces have only status, as follows: • Up. • Down. • Unknown. Serviceguard cannot determine whether the interface is up or down. A standby interface has this status.
Cluster and Package Maintenance Reviewing Cluster and Package Status ftsys9 up running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 56/36.
Cluster and Package Maintenance Reviewing Cluster and Package Status Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled NOTE NAME ftsys10 ftsys9 (current) The Script_Parameters section of the PACKAGE output of cmviewcl shows the Subnet status only for the node that the package is running on.
Cluster and Package Maintenance Reviewing Cluster and Package Status NOTE Check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CVM and CFS (http://www.docs.hp.com -> High Availability -> Serviceguard).
Cluster and Package Maintenance Reviewing Cluster and Package Status ITEM Service STATUS up NODE STATUS ftsys10 up Script_Parameters: ITEM STATUS Service up MAX_RESTARTS 0 RESTARTS 0 NAME VxVM-CVM-pkg.srv SWITCHING enabled MAX_RESTARTS 0 RESTARTS 0 NAME VxVM-CVM-pkg.
Cluster and Package Maintenance Reviewing Cluster and Package Status Service Service up up NODE_NAME ftsys10 0 0 STATUS up 0 0 SG-CFS-cmvxd SG-CFS-cmvxpingd SWITCHING enabled Script_Parameters: ITEM STATUS MAX_RESTARTS Service up 0 Service up 5 Service up 5 Service up 0 Service up 0 RESTARTS 0 0 0 0 0 NAME SG-CFS-vxconfigd SG-CFS-sgcvmd SG-CFS-vxfsckd SG-CFS-cmvxd SG-CFS-cmvxpingd Status After Halting a Package After we halt the failover package pkg2 with the cmhaltpkg command, the output of cmvie
Cluster and Package Maintenance Reviewing Cluster and Package Status Alternate NODE ftsys10 up STATUS up Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up enabled ftsys10 STATE running PATH 28.1 32.
Cluster and Package Maintenance Reviewing Cluster and Package Status and then run cmviewcl -v, we’ll see: CLUSTER example NODE ftsys9 STATUS up STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 56/36.
Cluster and Package Maintenance Reviewing Cluster and Package Status ITEM STATUS MAX_RESTARTS RESTARTS NAME Service up 0 0 service2.1 Subnet up 15.13.168.0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING NAME Primary up enabled ftsys10 Alternate up enabled ftsys9 (current) NODE ftsys10 STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 28.1 32.1 NAME lan0 lan1 Now pkg2 is running on node ftsys9. Note that switching is still disabled.
Cluster and Package Maintenance Reviewing Cluster and Package Status we’ll see the following output from cmviewcl: CLUSTER example STATUS up NODE ftsys9 STATUS up PACKAGE pkg1 pkg2 NODE ftsys10 STATE running STATUS up up STATUS down STATE running running AUTO_RUN enabled enabled NODE ftsys9 ftsys9 STATE halted This output can be seen on both ftsys9 and ftsys10.
Cluster and Package Maintenance Reviewing Cluster and Package Status Primary Alternate Alternate Alternate up up up up enabled enabled enabled enabled manx burmese tabby persian Viewing Information about System Multi-Node Packages The following example shows a cluster that includes system multi-node packages as well as failover packages. The system multi-node packages are running on all nodes in the cluster, whereas the standard packages run on only one node at a time.
Cluster and Package Maintenance Reviewing Cluster and Package Status #cfscluster status Node : ftsys9 Cluster Manager : up CVM state : up (MASTER) MOUNT POINT TYPE SHARED VOLUME DISK GROUP STATUS /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/1vol1 regular lvol1 vg_for_cvm_dd5 MOUNTED /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/lvol4 regular lvol4 vg_for_cvm_dd5 MOUNTED Node : ftsys8 Cluster Manager : up CVM state : up MOUNT POINT TYPE /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_fo
Cluster and Package Maintenance Reviewing Cluster and Package Status Status of the Packages with a Cluster File System Installed You can use cmviewcl to see the status of the package and the cluster file system on all nodes, as shown in the example below: cmviewcl -v -p SG-CFS-pkg MULTI_NODE_PACKAGES PACKAGE STATUS STATE AUTO_RUN SYSTEM SG-CFS-pkg up running enabled yes NODE_NAME STATUS SWITCHING soy up enabled Script_Parameters: ITEM STATUS MAX_RESTARTS RESTARTS NAME Service up 0 0 SG-CFS-vxconfigd Service
Cluster and Package Maintenance Reviewing Cluster and Package Status NODE NAME ACTIVATION MODE ftsys9 sw (sw) MOUNT POINT SHARED VOLUME TYPE ftsys10 sw (sw) MOUNT POINT SHARED VOLUME TYPE ... To see which package is monitoring a disk group, use the cfsdgadm show_package command. For example, for the diskgroup logdata, enter: cfsdgadm show_package logdata SG-CFS-DG-1 Status of CFS Mount Point Packages To see the status of the mount point package, use the cfsmntadm display command.
Cluster and Package Maintenance Managing the Cluster and Nodes Managing the Cluster and Nodes Managing the cluster involves the following tasks: • Starting the Cluster When All Nodes are Down • Adding Previously Configured Nodes to a Running Cluster • Removing Nodes from Operation in a Running Cluster • Halting the Entire Cluster In Serviceguard A.11.16 and later, these tasks can be performed by non-root users with the appropriate privileges.
Cluster and Package Maintenance Managing the Cluster and Nodes Starting the Cluster When all Nodes are Down You can use Serviceguard Manager, or Serviceguard commands as shown below, to start the cluster. Using Serviceguard Commands to Start the Cluster Use the cmruncl command to start the cluster when all cluster nodes are down. Particular command options can be used to start the cluster under specific circumstances.
Cluster and Package Maintenance Managing the Cluster and Nodes Adding Previously Configured Nodes to a Running Cluster You can use Serviceguard Manager, or Serviceguard commands as shown below, to bring a configured node up within a running cluster. Using Serviceguard Commands to Add Previously Configured Nodes to a Running Cluster Use the cmrunnode command to join one or more nodes to an already running cluster. Any node you add must already be a part of the cluster configuration.
Cluster and Package Maintenance Managing the Cluster and Nodes To return a node to the cluster, use cmrunnode. NOTE HP recommends that you remove a node from participation in the cluster (by running cmhaltnode as shown below, or Halt Node in Serviceguard Manager) before running the HP-UX shutdown command, especially in cases in which a packaged application might have trouble during shutdown and not halt cleanly. Use cmhaltnode to halt one or more nodes in a cluster.
Cluster and Package Maintenance Managing the Cluster and Nodes Automatically Restarting the Cluster You can configure your cluster to automatically restart after an event, such as a long-term power failure, which brought down all nodes in the cluster. This is done by setting AUTOSTART_CMCLD to 1 in the /etc/rc.config.d/cmcluster file.
Cluster and Package Maintenance Managing Packages and Services Managing Packages and Services Managing packages and services involves the following tasks: • Starting a Package • Halting a Package • Moving a Package (halt, then start) • Changing Package Switching Behavior Non-root users with the appropriate privileges can perform these tasks. See “Controlling Access to the Cluster” on page 239 for information about configuring access.
Cluster and Package Maintenance Managing Packages and Services You cannot start a package unless all the packages that it depends on are running. If you try, you’ll see a Serviceguard message telling you why the operation failed, and the package will not start. If this happens, you can repeat the run command, this time including the package(s) this package depends on; Serviceguard will start all the packages in the correct order.
Cluster and Package Maintenance Managing Packages and Services System multi-node packages run on all cluster nodes simultaneously; halting these packages stops them running on all nodes. A multi-node package can run on several nodes simultaneously; you can halt it on all the nodes it is running on, or you can specify individual nodes. Halting a Package that Has Dependencies Before halting a package, it is a good idea to use the cmviewcl command to check for package dependencies.
Cluster and Package Maintenance Managing Packages and Services Using Serviceguard Commands to Move a Running Failover Package Before you move a failover package to a new node, it is a good idea to run cmviewcl -v -l package and look at dependencies. If the package has dependencies, be sure they can be met on the new node. To move the package, first halt it where it is running using the cmhaltpkg command. This action not only halts the package, but also disables package switching.
Cluster and Package Maintenance Managing Packages and Services Changing Package Switching with Serviceguard Commands You can change package switching behavior either temporarily or permanently using Serviceguard commands. To temporarily disable switching to other nodes for a running package, use the cmmodpkg command.
Cluster and Package Maintenance Reconfiguring a Cluster Reconfiguring a Cluster You can reconfigure a cluster either when it is halted or while it is still running. Some operations can only be done when the cluster is halted. Table 7-1 shows the required cluster state for many kinds of changes. Table 7-1 Types of Changes to the Cluster Configuration Change to the Cluster Configuration Chapter 7 Required Cluster State Add a new node All systems configured as members of this cluster must be running.
Cluster and Package Maintenance Reconfiguring a Cluster Table 7-1 Types of Changes to the Cluster Configuration (Continued) Change to the Cluster Configuration Delete NICs and their IP addresses, if any, from the cluster configuration Required Cluster State Cluster can be running. “Changing the Cluster Networking Configuration while the Cluster Is Running” on page 367. If removing the NIC from the system, see “Removing a LAN or VLAN Interface from a Node” on page 372.
Cluster and Package Maintenance Reconfiguring a Cluster Table 7-1 Types of Changes to the Cluster Configuration (Continued) Change to the Cluster Configuration Failover Optimization to enable or disable Faster Failover product NOTE Required Cluster State Cluster must not be running. If you are using CVM or CFS, you cannot change HEARTBEAT_INTERVAL, NODE_TIMEOUT, or AUTO_START_TIMEOUT while the cluster is running.
Cluster and Package Maintenance Reconfiguring a Cluster To update the values of the FIRST_CLUSTER_LOCK_PV and SECOND_CLUSTER_LOCK_PV parameters without bringing down the cluster, proceed as follows: Step 1. Halt the node (cmhaltnode) on which you want to make the changes. Step 2. In the cluster configuration file, modify the values of FIRST_CLUSTER_LOCK_PV and SECOND_CLUSTER_LOCK_PV for this node. Step 3. Run cmcheckconf to check the configuration. Step 4. Run cmapplyconf to apply the configuration. Step 5.
Cluster and Package Maintenance Reconfiguring a Cluster Step 3. Run cmcheckconf to check the configuration. Step 4. Run cmapplyconf to apply the configuration. For information about replacing the physical device, see “Replacing a Lock LUN” on page 415. Reconfiguring a Halted Cluster You can make a permanent change in the cluster configuration when the cluster is halted.
Cluster and Package Maintenance Reconfiguring a Cluster • You cannot delete an active volume group from the cluster configuration. You must halt any package that uses the volume group and ensure that the volume is inactive before deleting it. • The only configuration change allowed while a node is unreachable (for example, completely disconnected from the network) is to delete the unreachable node from the cluster configuration.
Cluster and Package Maintenance Reconfiguring a Cluster Use cmrunnode to start the new node, and, if you so decide, set the AUTOSTART_CMCLD parameter to 1 in the /etc/rc.config.d/cmcluster file to enable the new node to join the cluster automatically each time it reboots. NOTE Before you can add a node to a running cluster that uses Veritas CVM (on systems that support it), the node must already be connected to the disk devices for all CVM disk groups.
Cluster and Package Maintenance Reconfiguring a Cluster cmquerycl -C clconfig.ascii -c cluster1 -n ftsys8 -n ftsys9 Step 3. Edit the file clconfig.ascii to check the information about the nodes that remain in the cluster. Step 4. Halt the node you are going to remove (ftsys10 in this example): cmhaltnode -f -v ftsys10 Step 5. Verify the new configuration: cmcheckconf -C clconfig.ascii Step 6.
Cluster and Package Maintenance Reconfiguring a Cluster Changing the Cluster Networking Configuration while the Cluster Is Running What You Can Do Online operations you can perform include: • Add a network interface with its HEARTBEAT_IP or STATIONARY_IP. • Add a standby interface. • Delete a network interface with its HEARTBEAT_IP or STATIONARY_IP. • Delete a standby interface. • Change the designation of an existing interface from HEARTBEAT_IP to STATIONARY_IP, or vice versa.
Cluster and Package Maintenance Reconfiguring a Cluster • You cannot change the designation of an interface from STATIONARY_IP to HEARTBEAT_IP unless the subnet is common to all nodes. Remember that the HEARTBEAT_IP must be an IPv4 address, and must be on the same subnet on all nodes (except in cross-subnet configurations; see “Cross-Subnet Configurations” on page 41).
Cluster and Package Maintenance Reconfiguring a Cluster Example: Adding a Heartbeat LAN Suppose that a subnet 15.13.170.0 is shared by nodes ftsys9 and ftsys10 in a two-node cluster cluster1, and you want to add it to the cluster configuration as a heartbeat subnet. Proceed as follows. Step 1. Run cmquerycl to get a cluster configuration template file that includes networking information for interfaces that are available to be added to the cluster configuration: cmquerycl -c cluster1 -C clconfig.
Cluster and Package Maintenance Reconfiguring a Cluster NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP NETWORK_INTERFACE HEARTBEAT_IP NETWORK_INTERFACE ftsys9 lan1 192.3.17.18 lan0 15.13.170.18 lan3 # Possible standby Network Interfaces for lan1, lan0: lan2. NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP NETWORK_INTERFACE HEARTBEAT_IP NETWORK_INTERFACE ftsys10 lan1 192.3.17.19 lan0 15.13.170.19 lan3 # Possible standby Network Interfaces for lan0, lan1: lan2 Step 3.
Cluster and Package Maintenance Reconfiguring a Cluster Example: Deleting a Subnet Used by a Package In this example, we are deleting subnet 15.13.170.0 (lan0). This will also mean deleting lan3, which is a standby for lan0 and not shared by any other primary LAN. Proceed as follows. Step 1. Halt any package that uses this subnet and delete the corresponding networking information (monitored_subnet, ip_subnet, ip_address; see page 294).
Cluster and Package Maintenance Reconfiguring a Cluster NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP ftsys9 lan1 192.3.17.18 # NETWORK_INTERFACE lan0 # STATIONARY_IP 15.13.170.18 # NETWORK_INTERFACE lan3 # Possible standby Network Interfaces for lan1, lan0: lan2. NODE_NAME NETWORK_INTERFACE HEARTBEAT_IP # NETWORK_INTERFACE # STATIONARY_IP # NETWORK_INTERFACE ftsys10 lan1 192.3.17.19 lan0 15.13.170.19 lan3 # Possible standby Network Interfaces for lan0, lan1: lan2 Step 4.
Cluster and Package Maintenance Reconfiguring a Cluster Step 1. If you are not sure whether or not a physical interface (NIC) is part of the cluster configuration, run olrad -C with the affected I/O slot ID as argument. If the NIC is part of the cluster configuration, you’ll see a warning message telling you to remove it from the configuration before you proceed. See the olrad(1M) manpage for more information about olrad. Step 2.
Cluster and Package Maintenance Reconfiguring a Cluster 1. Use the cmgetconf command to store a copy of the cluster's existing cluster configuration in a temporary file. For example: cmgetconf clconfig.ascii 2. Edit the file clconfig.ascii to add or delete volume groups. 3. Use the cmcheckconf command to verify the new configuration. 4. Use the cmapplyconf command to apply the changes to the configuration and distribute the new binary configuration file to all cluster nodes.
Cluster and Package Maintenance Reconfiguring a Cluster • For CVM 4.1 and later with CFS, edit the configuration file of the package that uses CFS. Configure the three dependency_ parameters. Then run the cmapplyconf command. Similarly, you can delete VxVM or CVM disk groups provided they are not being used by a cluster node at the time.
Cluster and Package Maintenance Reconfiguring a Cluster Changing MAX_CONFIGURED_PACKAGES As of Serviceguard A.11.17, you can change MAX_CONFIGURED_PACKAGES while the cluster is running. The default for MAX_CONFIGURED_PACKAGES is the maximum number allowed in the cluster. You can use Serviceguard Manager to change MAX_CONFIGURED_PACKAGES, or Serviceguard commands as shown below. Use cmgetconf to obtain a current copy of the cluster's existing configuration; for example: cmgetconf -c clconfig.
Cluster and Package Maintenance Configuring a Legacy Package Configuring a Legacy Package IMPORTANT You can still create a new legacy package. If you are using a Serviceguard Toolkit such as Serviceguard NFS Toolkit, consult the documentation for that product. Otherwise, use this section to maintain and re-work existing legacy packages rather than to create new ones.
Cluster and Package Maintenance Configuring a Legacy Package You can create a legacy package and its control script in Serviceguard Manager; use the Help for detailed instructions. Otherwise, use the following procedure to create a legacy package. NOTE For instructions on creating Veritas special-purpose system multi-node and multi-node packages, see “Configuring Veritas System Multi-node Packages” on page 325 and “Configuring Veritas Multi-node Packages” on page 327. Step 1.
Cluster and Package Maintenance Configuring a Legacy Package Configuring a Package in Stages It is a good idea to configure failover packages in stages, as follows: 1. Configure volume groups and mount points only. 2. Distribute the control script to all nodes. 3. Apply the configuration. 4. Run the package and ensure that it can be moved from node to node. 5. Halt the package. 6. Configure package IP addresses and application services in the control script. 7. Distribute the control script to all nodes. 8.
Cluster and Package Maintenance Configuring a Legacy Package Editing the Package Configuration File Edit the file you generated with cmmakepkg. Use the bullet points that follow as a checklist. NOTE HP strongly recommends that you never edit the package configuration file of a CVM/CFS multi-node or system multi-node package, although Serviceguard does not prohibit it. Create VxVM-CVM-pkg and SG-CFS-pkg by issuing the cmapplyconf command.
Cluster and Package Maintenance Configuring a Legacy Package • AUTO_RUN. Configure the package to start up automatically or manually; see auto_run on page 289. • LOCAL_LAN_FAILOVER_ALLOWED. Enter the policy for local_lan_failover_allowed (see page 294). • NODE_FAIL_FAST_ENABLED. Enter the policy for node_fail_fast_enabled (see page 289). • RUN_SCRIPT and HALT_SCRIPT. Specify the pathname of the package control script (described in the next section). No default is provided.
Cluster and Package Maintenance Configuring a Legacy Package • If your package runs services, enter the SERVICE_NAME (see service_name on page 297) and values for SERVICE_FAIL_FAST_ENABLED (see service_fail_fast_enabled on page 298) and SERVICE_HALT_TIMEOUT (see service_halt_timeout on page 299). Enter a group of these three for each service. Note that the rules for valid SERVICE_NAMEs are more restrictive as of A.11.18.
Cluster and Package Maintenance Configuring a Legacy Package Creating the Package Control Script For legacy packages, the package control script contains all the information necessary to run all the services in the package, monitor them during operation, react to a failure, and halt the package when necessary. You can use Serviceguard Manager, HP-UX commands, or a combination of both, to create or modify the package control script. Each package must have a separate control script, which must be executable.
Cluster and Package Maintenance Configuring a Legacy Package • Update the PATH statement to reflect any required paths needed to start your services. • If you are using LVM, enter the names of volume groups to be activated using the VG[] array parameters, and select the appropriate options for the storage activation command, including options for mounting and unmounting filesystems, if desired. Do not use the VXVM_DG[] or CVM_DG[] parameters for LVM volume groups.
Cluster and Package Maintenance Configuring a Legacy Package For more information about services, see the discussion of the service_ parameters that starts on page 297. • Specify whether or not to kill processes accessing raw devices; see the comments in the file under RAW DEVICES for more information. Adding Customer Defined Functions to the Package Control Script You can add additional shell commands to the package control script to be executed whenever the package starts or stops.
Cluster and Package Maintenance Configuring a Legacy Package # START OF CUSTOMER DEFINED FUNCTIONS # This function is a place holder for customer defined functions. # You should define all actions you want to happen here, before the service is # started. You can create as many functions as you need. function customer_defined_run_cmds { # ADD customer defined run commands. : # do nothing instruction, because a function must contain some command. date >> /tmp/pkg1.datelog echo 'Starting pkg1' >> /tmp/pkg1.
Cluster and Package Maintenance Configuring a Legacy Package Adding Serviceguard Commands in Customer Defined Functions You can add Serviceguard commands (such as cmmodpkg) in the Customer Defined Functions section of a package control script. These commands must not interact with the package itself. If a Serviceguard command interacts with another package, be careful to avoid command loops. For instance, a command loop might occur under the following circumstances.
Cluster and Package Maintenance Configuring a Legacy Package Verifying the Package Configuration Serviceguard checks the configuration you create and reports any errors. For legacy packages, you can do this in Serviceguard Manager: click Check to verify the package configuration you have done under any package configuration tab, or to check changes you have made to the control script. Click Apply to verify the package as a whole. See the local Help for more details.
Cluster and Package Maintenance Configuring a Legacy Package Distributing the Configuration And Control Script with Serviceguard Manager When you have finished creating a legacy package in Serviceguard Manager, click Apply Configuration. If the package control script has no errors, it is converted to a binary file and distributed to the cluster nodes.
Cluster and Package Maintenance Configuring a Legacy Package cmapplyconf -v -C /etc/cmcluster/cmcl.config -P \ /etc/cmcluster/pkg1/pkg1.config • If you are using a lock disk, deactivate the cluster lock volume group. vgchange -a n /dev/vg01 The cmapplyconf command creates a binary version of the cluster configuration file and distributes it to all nodes in the cluster. This action ensures that the contents of the file are consistent across all nodes.
Cluster and Package Maintenance Configuring a Legacy Package Assuming nodeA is pkg1’s primary node (where it normally starts), create node_name entries in the package configuration file as follows: node_name nodeA node_name nodeB node_name nodeC node_name nodeD Configuring monitored_subnet_access In order to monitor subnet 15.244.65.0 or 15.244.56.0, you would configure monitored_subnet and monitored_subnet_access in pkg1’s package configuration file as follows: monitored_subnet 15.244.65.
Cluster and Package Maintenance Configuring a Legacy Package In our example, you would create two copies of pkg1’s package control script, add entries to customize it for subnet 15.244.65.0 or 15.244.56.0, and copy one of the resulting scripts to each node, as follows. Control-script entries for nodeA and nodeB IP[0] = 15.244.65.82 SUBNET[0] 15.244.65.0 IP[1] = 15.244.65.83 SUBNET[1] 15.244.65.0 Control-script entries for nodeC and nodeD IP[0] = 15.244.56.100 SUBNET[0] = 15.244.56.0 IP[1] = 15.244.56.
Cluster and Package Maintenance Reconfiguring a Package Reconfiguring a Package You reconfigure a a package in much the same way as you originally configured it; for modular packages, see Chapter 6, “Configuring Packages and Their Services,” on page 275; for older packages, see “Configuring a Legacy Package” on page 377. The cluster can be either halted or running during package reconfiguration.
Cluster and Package Maintenance Reconfiguring a Package Reconfiguring a Package on a Running Cluster You can reconfigure a package while the cluster is running, and in some cases you can reconfigure the package while the package itself is running. You can do this in Serviceguard Manager (for legacy packages), or use Serviceguard commands. To modify the package with Serviceguard commands, use the following procedure (pkg1 is used as an example): 1.
Cluster and Package Maintenance Reconfiguring a Package Reconfiguring a Package on a Halted Cluster You can also make permanent changes in package configuration while the cluster is not running. Use the same steps as in “Reconfiguring a Package on a Running Cluster” on page 394. Adding a Package to a Running Cluster You can create a new package and add it to the cluster configuration while the cluster is up and while other packages are running.
Cluster and Package Maintenance Reconfiguring a Package The following example halts the failover package mypkg and removes the package configuration from the cluster: cmhaltpkg mypkg cmdeleteconf -p mypkg The command prompts for a verification before deleting the files unless you use the -f option. The directory /etc/cmcluster/mypkg is not deleted by this command. On systems that support CFS, you can remove nodes from a multi-node package configuration using the cfs commands listed in Appendix A.
Cluster and Package Maintenance Reconfiguring a Package 4. Remove the disk group package from the cluster. This disassociates the disk group from the cluster. cfsdgadm delete Resetting the Service Restart Counter The service restart counter is the number of times a package service has been automatically restarted. This value is used to determine when the package service has exceeded its maximum number of allowable automatic restarts.
Cluster and Package Maintenance Reconfiguring a Package NOTE All the nodes in the cluster must be powered up and accessible when you make package configuration changes. Table 7-2 Types of Changes to Packages Change to the Package 398 Required Package State Add a new package Other packages can be in any state. Delete a package Package must not be running. You cannot delete a package if another package has a dependency on it. Change package type Package must not be running.
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package Chapter 7 Required Package State Add a subnet Package must not be running. Subnet must already be configured into the cluster. Remove a subnet Package must not be running. Add an IP address Package must not be running. Remove an IP address Package must not be running.
Cluster and Package Maintenance Reconfiguring a Package Table 7-2 Types of Changes to Packages (Continued) Change to the Package Required Package State Change package auto_run Package can be either running or halted. See “Choosing Switching and Failover Behavior” on page 174. Add or delete a configured dependency Both packages can be either running or halted with one exception: If a running package adds a package dependency, the package it is to depend on must already be running on the same node(s).
Cluster and Package Maintenance Responding to Cluster Events Responding to Cluster Events Serviceguard does not require much ongoing system administration intervention. As long as there are no failures, your cluster will be monitored and protected. In the event of a failure, those packages that you have designated to be transferred to another node will be transferred automatically.
Cluster and Package Maintenance Single-Node Operation Single-Node Operation In a multi-node cluster, you could have a situation in which all but one node has failed, or you have shut down all but one node, leaving your cluster in single-node operation. This remaining node will probably have applications running on it. As long as the Serviceguard daemon cmcld is active, other nodes can rejoin the cluster.
Cluster and Package Maintenance Disabling Serviceguard Disabling Serviceguard If for some reason you want to disable Serviceguard on a system, you can do so by commenting out the following entries in /etc/inetd.conf: hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c Then force inetd to re-read inetd.
Cluster and Package Maintenance Removing Serviceguard from a System Removing Serviceguard from a System To remove Serviceguard from a node, use the swremove command. CAUTION Remove the node from the cluster first. If you run the swremove command on a server that is still a member of a cluster, it will cause that cluster to halt, and the cluster configuration to be deleted. To remove Serviceguard: 1. If the node is an active member of a cluster, halt the node. 2.
Troubleshooting Your Cluster 8 Troubleshooting Your Cluster This chapter describes how to verify cluster operation, how to review cluster status, how to add and replace hardware, and how to solve some typical cluster problems.
Troubleshooting Your Cluster Testing Cluster Operation Testing Cluster Operation Once you have configured your Serviceguard cluster, you should verify that the various components of the cluster behave correctly in case of a failure. In this section, the following procedures test that the cluster responds properly in the event of a package failure, a node failure, or a LAN failure.
Troubleshooting Your Cluster Testing Cluster Operation 4. Move the package back to the primary node (see “Moving a Failover Package” on page 356). Testing the Cluster Manager To test that the cluster manager is operating correctly, perform the following steps for each node on the cluster: 1. Turn off the power to the node SPU. 2.
Troubleshooting Your Cluster Testing Cluster Operation 3. Verify that a local switch has taken place so that the Standby card is now the Primary card. In Serviceguard Manager, check the cluster properties. On the command line, use cmviewcl -v. 4. Reconnect the LAN to the original Primary card, and verify its status. In Serviceguard Manager, check the cluster properties. On the command line, use cmviewcl -v.
Troubleshooting Your Cluster Monitoring Hardware Monitoring Hardware Good standard practice in handling a high availability system includes careful fault monitoring so as to prevent failures if possible or at least to react to them swiftly when they occur.
Troubleshooting Your Cluster Monitoring Hardware action in case of a problem. For example, you could configure a disk monitor to report when a mirror was lost from a mirrored volume group being used in the cluster. Refer to the manual Using High Availability Monitors (http://docs.hp.com -> High Availability -> Event Monitoring Service and HA Monitors -> Installation and User’s Guide) for additional information.
Troubleshooting Your Cluster Monitoring Hardware based on statistics for devices that are experiencing specific non-fatal errors over time. In a Serviceguard cluster, HP ISEE should be run on all nodes. HP ISEE also reports error conditions directly to an HP Response Center, alerting support personnel to the potential problem. HP ISEE is available through various support contracts. For more information, contact your HP representative.
Troubleshooting Your Cluster Replacing Disks Replacing Disks The procedure for replacing a faulty disk mechanism depends on the type of disk configuration you are using. Separate descriptions are provided for replacing an array mechanism and a disk in a high availability enclosure. For more information, see the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, at http://docs.hp.
Troubleshooting Your Cluster Replacing Disks new device. See the section Replacing a Bad Disk in the Logical Volume Management volume of the HP-UX System Administrator’s Guide, posted at http://docs.hp.com. 2. Identify the names of any logical volumes that have extents defined on the failed physical volume. 3.
Troubleshooting Your Cluster Replacing Disks Replacing a Lock Disk You can replace an unusable lock disk while the cluster is running, provided you do not change the devicefile name (DSF).
Troubleshooting Your Cluster Replacing Disks NOTE If you restore or recreate the volume group for the lock disk and you need to re-create the cluster lock (for example if no vgcfgbackup is available), you can run cmdisklock to re-create the lock. See the cmdisklock (1m) manpage for more information. Replacing a Lock LUN You can replace an unusable lock LUN while the cluster is running, provided you do not change the devicefile name (DSF).
Troubleshooting Your Cluster Replacing Disks cmdisklock checks that the specified device is not in use by LVM, VxVM, ASM, or the file system, and will fail if the device has a label marking it as in use by any of those subsystems. cmdisklock -f overrides this check. CAUTION You are responsible for determining that the device is not being used by any subsystem on any node connected to the device before using cmdisklock -f. If you use cmdisklock -f without taking this precaution, you could lose data.
Troubleshooting Your Cluster Replacing I/O Cards Replacing I/O Cards Replacing SCSI Host Bus Adapters After a SCSI Host Bus Adapter (HBA) card failure, you can replace the card using the following steps. Normally disconnecting any portion of the SCSI bus will leave the SCSI bus in an unterminated state, which will cause I/O errors for other nodes connected to that SCSI bus, so the cluster would need to be halted before disconnecting any portion of the SCSI bus.
Troubleshooting Your Cluster Replacing LAN or Fibre Channel Cards Replacing LAN or Fibre Channel Cards If a LAN or fibre channel card fails and the card has to be replaced, you can replace it on-line or off-line depending on the type of hardware and operating system you are running. It is not necessary to bring the cluster down to do this. Off-Line Replacement Follow these steps to replace an I/O card off-line. 1. Halt the node by using the cmhaltnode command. 2.
Troubleshooting Your Cluster Replacing LAN or Fibre Channel Cards NOTE After replacing a Fibre Channel I/O card, it may necessary to reconfigure the SAN to use the World Wide Name (WWN) of the new Fibre Channel card if Fabric Zoning or other SAN security requiring WWN is used.
Troubleshooting Your Cluster Replacing a Failed Quorum Server System Replacing a Failed Quorum Server System When a quorum server fails or becomes unavailable to the clusters it is providing quorum services for, this will not cause a failure on any cluster. However, the loss of the quorum server does increase the vulnerability of the clusters in case there is an additional failure. Use the following procedure to replace a defective quorum server system.
Troubleshooting Your Cluster Replacing a Failed Quorum Server System The command will output an error message if the specified nodes cannot communicate with the quorum server.
Troubleshooting Your Cluster Troubleshooting Approaches Troubleshooting Approaches The following sections offer a few suggestions for troubleshooting by reviewing the state of the running system and by examining cluster status data, log files, and configuration files.
Troubleshooting Your Cluster Troubleshooting Approaches IPv6: Name lan1* lo0 Mtu Address/Prefix 1500 none 4136 ::1/128 Ipkts Opkts 0 10690 0 10690 Reviewing the System Log File Messages from the Cluster Manager and Package Manager are written to the system log file. The default location of the log file is /var/adm/syslog/syslog.log. Also, package-related messages are logged into the package log file. The package log file is located in the package directory, by default.
Troubleshooting Your Cluster Troubleshooting Approaches Dec 14 14:34:44 star04 cmcld[2048]: Request from node star04 to start package pkg5 on node star04. Dec 14 14:34:44 star04 cmcld[2048]: Executing '/etc/cmcluster/pkg5/pkg5_run start' for package pkg5. Dec 14 14:34:45 star04 LVM[2066]: vgchange -a n /dev/vg02 Dec 14 14:34:45 star04 cmcld[2048]: Package pkg5 run script exited with NO_RESTART. Dec 14 14:34:45 star04 cmcld[2048]: Examine the file /etc/cmcluster/pkg5/pkg5_run.log for more details.
Troubleshooting Your Cluster Troubleshooting Approaches Reviewing the System Multi-node Package Files If you are running Veritas Cluster Volume Manager and you have problems starting the cluster, check the log file for the system multi-node package. For Cluster Volume Manager (CVM) 3.5, the file is VxVM-CVM-pkg.log. For CVM 4.1 and later, the file is SG-CFS-pkg.log. Reviewing Configuration Files Review the following ASCII configuration files: • Cluster configuration file. • Package configuration files.
Troubleshooting Your Cluster Troubleshooting Approaches cmquerycl -v -C /etc/cmcluster/verify.ascii -n ftsys9 -n ftsys10 cmcheckconf -v -C /etc/cmcluster/verify.ascii The cmcheckconf command checks: • The network addresses and connections. • The cluster lock disk connectivity. • The validity of configuration parameters of the cluster and packages for: — The uniqueness of names. — The existence and permission of scripts. It doesn’t check: • The correct setup of the power circuits.
Troubleshooting Your Cluster Troubleshooting Approaches you should see displayed the following message: Link Connectivity to LAN station: 0x08000993AB72 OK • cmscancl can be used to verify that primary and standby LANs are on the same bridged net. • cmviewcl -v shows the status of primary and standby LANs. Use these commands on all nodes.
Troubleshooting Your Cluster Solving Problems Solving Problems Problems with Serviceguard may be of several types.
Troubleshooting Your Cluster Solving Problems nslookup ftsys9 Name Server: server1.cup.hp.com Address: 15.13.168.63 Name: ftsys9.cup.hp.com Address: 15.13.172.229 If the output of this command does not include the correct IP address of the node, then check your name resolution services further. In many cases, a symptom such as Permission denied... or Connection refused... is the result of an error in the networking or security configuration.
Troubleshooting Your Cluster Solving Problems System Administration Errors There are a number of errors you can make when configuring Serviceguard that will not show up when you start the cluster. Your cluster can be running, and everything appears to be fine, until there is a hardware or software failure and control of your packages is not transferred to another node as you would have expected.
Troubleshooting Your Cluster Solving Problems • AUTO_RUN (automatic package switching) will be disabled. • The current node will be disabled from running the package. Following such a failure, since the control script is terminated, some of the package's resources may be left activated. Specifically: NOTE • Volume groups may be left active. • File systems may still be mounted. • IP addresses may still be installed. • Services may still be running.
Troubleshooting Your Cluster Solving Problems where is the address in the “Address” or the “Address/Prefix column and is the corresponding entry in the “Network” column for IPv4, or the prefix (which can be derived from the IPV6 address) for IPv6. 3. Ensure that package volume groups are deactivated. First unmount any package logical volumes which are being used for filesystems. This is determined by inspecting the output resulting from running the command bdf -l.
Troubleshooting Your Cluster Solving Problems Problems with Cluster File System (CFS) If you have a system multi-node package for Veritas CFS, you may not be able to start the cluster until SG-CFS-pkg starts. Check SG-CFS-pkg.log for errors. You will have trouble running the cluster if there is a discrepancy between the CFS cluster and the Serviceguard cluster. To check, enter gabconfig -a command. The ports that must be up are: 1. a - which is llt, gab 2. b - vxfen 3. v w - cvm 4.
Troubleshooting Your Cluster Solving Problems Force Import and Deport After Node Failure After certain failures, packages configured with VxVM disk groups will fail to start, logging an error such as the following in the package log file: vxdg: Error dg_01 may still be imported on ftsys9 ERROR: Function check_dg failed This can happen if a package is running on a node which then fails before the package control script can deport the disk group.
Troubleshooting Your Cluster Solving Problems Adding a set -x statement in the second line of your control script will cause additional details to be logged into the package log file, which can give you more information about where your script may be failing. Node and Network Failures These failures cause Serviceguard to transfer control of a package to another node.
Troubleshooting Your Cluster Solving Problems Troubleshooting Quorum Server Authorization File Problems The following kind of message in a Serviceguard node’s syslog file or in the output of cmviewcl -v may indicate an authorization problem: Access denied to quorum server 192.6.7.4 The reason may be that you have not updated the authorization file. Verify that the node is included in the file, and try using /usr/lbin/qs -update to re-read the quorum server authorization file.
Troubleshooting Your Cluster Solving Problems Oct 008 16:10:05:0: There is no connection to the applicant 2 for lock /sg/lockTest1 Oct 08 16:10:05:0:Request for lock /sg/lockTest1 from applicant 1 failed: not connected to all applicants. This condition can be ignored. The request will be retried a few seconds later and will succeed. The following message is logged: Oct 008 16:10:06:0: Request for lock /sg/lockTest1 succeeded. New lock owners: 1,2.
Troubleshooting Your Cluster Solving Problems 438 Chapter 8
Serviceguard Commands A Serviceguard Commands The following is an alphabetical list of commands used for Serviceguard cluster configuration and maintenance. Manpages for these commands are available on your system after installation. NOTE Check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for Cluster Volume manager (CVM) and Cluster File System (CFS): http://www.docs.hp.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cfsdgadm Description • Display the status of CFS disk groups. • Add shared disk groups to a Veritas Cluster File System CFS cluster configuration, or remove existing CFS disk groups from the configuration. Serviceguard automatically creates the multi-node package SG-CFS-DG-id# to regulate the disk groups. This package has a dependency on the SG-CFS-pkg created by cfscluster command.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf Description Verify and apply Serviceguard cluster configuration and package configuration files. cmapplyconf verifies the cluster configuration and package configuration specified in the cluster_ascii_file and the associated pkg_ascii_file(s), creates or updates the binary configuration file, called cmclconfig, and distributes it to all nodes.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmapplyconf (continued) Description Run cmgetconf to get either the cluster configuration file or package configuration file whenever changes to the existing configuration are required. Note that cmapplyconf will verify and distribute cluster configuration or package files. It will not cause the cluster daemon to start or be removed from the cluster configuration.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmdeleteconf Description Delete either the cluster or the package configuration. cmdeleteconf deletes either the entire cluster configuration, including all its packages, or only the specified package configuration. If neither cluster_name nor package_name is specified, cmdeleteconf will delete the local cluster’s configuration and all its packages.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltcl Description Halt a high availability cluster. cmhaltcl causes all nodes in a configured cluster to stop their cluster daemons, optionally halting all packages or applications in the process. This command will halt all the daemons on all currently running systems. If the user only wants to shutdown a subset of daemons, the cmhaltnode command should be used instead. cmhaltnode Halt a node in a high availability cluster.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmhaltserv Description Halt a service from the high availability package halt script. This is not a command line executable command, it runs only from within the package control script. cmhaltserv is used in the high availability package halt script to halt a service. If any part of package is marked down, the package halt script is executed as part of the recovery process.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmmodnet Description Add or remove an address from a high availability cluster. cmmodnet is used to add or remove a relocatable package IP_address for the current network interface running the given subnet_name. cmmodnet can also be used to enable or disable a LAN_name currently configured in a cluster.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmquerycl Description Query cluster or node configuration information. cmquerycl searches all specified nodes for cluster configuration and Logical Volume Manager (LVM) information. Cluster configuration information includes network information such as LAN interface, IP addresses, bridged networks and possible heartbeat networks. LVM information includes volume group (VG) interconnection and file system mount point information.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmruncl Description Run a high availability cluster. cmruncl causes all nodes in a configured cluster or all nodes specified to start their cluster daemons and form a new cluster.This command should only be run when the cluster is not active on any of the configured nodes.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmrunserv Description Run a service from the high availability package run script. This is not a command line executable command, it runs only from within the package control script. cmrunserv is used in the high availability package run script to run a service. If the service process dies, cmrunserv updates the status of the service to down.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmscancl Description Gather system configuration information from nodes with Serviceguard installed. cmscancl is a configuration report and diagnostic tool which gathers system software and hardware configuration information from a list of nodes, or from all the nodes in a cluster.
Serviceguard Commands Table A-1 Serviceguard Commands (Continued) Command cmstartres Description This command is run by package control scripts, and not by users! Starts resource monitoring on the local node for an EMS resource that is configured in a Serviceguard package. cmstartres starts resource monitoring for an EMS resource on the local node. This resource must be configured in the specified package_name.
Serviceguard Commands 452 Appendix A
Enterprise Cluster Master Toolkit B Enterprise Cluster Master Toolkit The Enterprise Cluster Master Toolkit (ECMT) provides a group of example scripts and package configuration files for creating Serviceguard packages for several major database and internet software products. Each toolkit contains a README file that explains how to customize the package for your needs. The ECMT can be installed on HP-UX 11i v1, 11i v2, or 11i v3.
Enterprise Cluster Master Toolkit 454 Appendix B
Designing Highly Available Cluster Applications C Designing Highly Available Cluster Applications This appendix describes how to create or port applications for high availability, with emphasis on the following topics: • Automating Application Operation • Controlling the Speed of Application Failover • Designing Applications to Run on Multiple Systems • Restoring Client Connections • Handling Application Failures • Minimizing Planned Downtime Designing for high availability means reducing the
Designing Highly Available Cluster Applications Automating Application Operation Automating Application Operation Can the application be started and stopped automatically or does it require operator intervention? This section describes how to automate application operations to avoid the need for user intervention. One of the first rules of high availability is to avoid manual intervention.
Designing Highly Available Cluster Applications Automating Application Operation Define Application Startup and Shutdown Applications must be restartable without manual intervention. If the application requires a switch to be flipped on a piece of hardware, then automated restart is impossible. Procedures for application startup, shutdown and monitoring must be created so that the HA software can perform these functions automatically.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Controlling the Speed of Application Failover What steps can be taken to ensure the fastest failover? If a failure does occur causing the application to be moved (failed over) to another node, there are many things the application can do to reduce the amount of time it takes to get the application back up and running.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Use Raw Volumes If your application uses data, use raw volumes rather than filesystems. Raw volumes do not require an fsck of the filesystem, thus eliminating one of the potentially lengthy steps during a failover. Evaluate the Use of JFS If a file system must be used, a JFS offers significantly faster file system recovery as compared to an HFS. However, performance of the JFS may vary with the application.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Keep Logs Small Some databases permit logs to be buffered in memory to increase online performance. Of course, when a failure occurs, any in-flight transaction will be lost. However, minimizing the size of this in-memory log will reduce the amount of completed transaction data that would be lost in case of failure.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Another example is an application where a clerk is entering data about a new employee. Suppose this application requires that employee numbers be unique, and that after the name and number of the new employee is entered, a failure occurs.
Designing Highly Available Cluster Applications Controlling the Speed of Application Failover Design for Multiple Servers If you use multiple active servers, multiple service points can provide relatively transparent service to a client. However, this capability requires that the client be smart enough to have knowledge about the multiple servers and the priority for addressing them. It also requires access to the data of the failed server or replicated data.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Designing Applications to Run on Multiple Systems If an application can be failed to a backup node, how will it work on that different system? The previous sections discussed methods to ensure that an application can be automatically restarted. This section will discuss some ways to ensure the application can run on multiple systems.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Each application or package should be given a unique name as well as a relocatable IP address. Following this rule separates the application from the system on which it runs, thus removing the need for user knowledge of which system the application runs on. It also makes it easier to move the application among different systems in a cluster for for load balancing or other reasons.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Avoid Using SPU IDs or MAC Addresses Design the application so that it does not rely on the SPU ID or MAC (link-level) addresses. The SPU ID is a unique hardware ID contained in non-volatile memory, which cannot be changed. A MAC address (also known as a LANIC id) is a link-specific address associated with the LAN hardware.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems Applications should not reference official hostnames or IP addresses. The official hostname and corresponding IP address for the hostname refer to the primary LAN card and the stationary IP address for that card.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems develop alternate means of verifying where they are running. For example, an application might check a list of hostnames that have been provided in a configuration file. Bind to a Fixed Port When binding a socket, a port address can be specified or one can be assigned dynamically.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems For TCP stream sockets, the TCP level of the protocol stack resolves this problem for the client since it is a connection-based protocol. On the client, TCP ignores the stationary IP address and continues to use the previously bound relocatable IP address originally used by the client. With UDP datagram sockets, however, there is a problem.
Designing Highly Available Cluster Applications Designing Applications to Run on Multiple Systems To prevent one node from inadvertently accessing disks being used by the application on another node, HA software uses an exclusive access mechanism to enforce access by only one node at a time. This exclusive access applies to a volume group as a whole.
Designing Highly Available Cluster Applications Restoring Client Connections Restoring Client Connections How does a client reconnect to the server after a failure? It is important to write client applications to specifically differentiate between the loss of a connection to the server and other application-oriented errors that might be returned. The application should take special action in case of connection loss.
Designing Highly Available Cluster Applications Restoring Client Connections the retry to the current server should continue for the amount of time it takes to restart the server locally. This will keep the client from having to switch to the second server in the event of a application failure. • Use a transaction processing monitor or message queueing software to increase robustness.
Designing Highly Available Cluster Applications Handling Application Failures Handling Application Failures What happens if part or all of an application fails? All of the preceding sections have assumed the failure in question was not a failure of the application, but of another component of the cluster. This section deals specifically with application problems.
Designing Highly Available Cluster Applications Handling Application Failures ensure that the application is behaving correctly. If the application fails and it is not detected automatically, it might take hours for a user to determine the cause of the downtime and recover from it.
Designing Highly Available Cluster Applications Minimizing Planned Downtime Minimizing Planned Downtime Planned downtime (as opposed to unplanned downtime) is scheduled; examples include backups, systems upgrades to new operating system revisions, or hardware replacements. For planned downtime, application designers should consider: • Reducing the time needed for application upgrades/patches.
Designing Highly Available Cluster Applications Minimizing Planned Downtime Provide for Rolling Upgrades Provide for a “rolling upgrade” in a client/server environment. For a system with many components, the typical scenario is to bring down the entire system, upgrade every node to the new version of the software, and then restart the application on all the affected nodes. For large systems, this could result in a long downtime. An alternative is to provide for a rolling upgrade.
Designing Highly Available Cluster Applications Minimizing Planned Downtime Providing Online Application Reconfiguration Most applications have some sort of configuration information that is read when the application is started. If to make a change to the configuration, the application must be halted and a new configuration file read, downtime is incurred. To avoid this downtime use configuration tools that interact with an application and make dynamic changes online.
Integrating HA Applications with Serviceguard D Integrating HA Applications with Serviceguard The following is a summary of the steps you should follow to integrate an application into the Serviceguard environment: 1. Read the rest of this book, including the chapters on cluster and package configuration, and the Appendix “Designing Highly Available Cluster Applications.” 2.
Integrating HA Applications with Serviceguard NOTE 478 • Can the application be installed cluster-wide? • Does the application work with a cluster-wide file name space? • Will the application run correctly with the data (file system) available on all nodes in the cluster? This includes being available on cluster nodes where the application is not currently running.
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Checklist for Integrating HA Applications This section contains a checklist for integrating HA applications in both single and multiple systems. Defining Baseline Application Behavior on a Single System Define a baseline behavior for the application on a standalone system: 1. Install the application, database, and other required resources on one of the systems.
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications c. Install the appropriate executables. d. With the application not running on the first system, try to bring it up on the second system. You might use the script you created in the step above. Is there anything different that you must do? Does it run? e. Repeat this process until you can get the application to run on the second system. 2. Configure the Serviceguard cluster: a. Create the cluster configuration. b.
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications NOTE Appendix D Check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CVM and CFS: http://www.docs.hp.com -> High Availability -> Serviceguard.
Integrating HA Applications with Serviceguard Checklist for Integrating HA Applications Testing the Cluster 1. Test the cluster: • Have clients connect. • Provide a normal system load. • Halt the package on the first node and move it to the second node: # cmhaltpkg pkg1 # cmrunpkg -n node2 pkg1 # cmmodpkg -e pkg1 • Move it back. # cmhaltpkg pkg1 # cmrunpkg -n node1 pkg1 # cmmodpkg -e pkg1 • Fail one of the systems. For example, turn off the power on node 1.
Software Upgrades E Software Upgrades There are three types of upgrade you can do: • rolling upgrade • non-rolling upgrade • migration with cold install Each of these is discussed below.
Software Upgrades Types of Upgrade Types of Upgrade Rolling Upgrade In a rolling upgrade, you upgrade the HP-UX operating system (if necessary) and the Serviceguard software one node at a time without bringing down your cluster. A rolling upgrade can also be done any time one system needs to be taken offline for hardware maintenance or patch installations. This method is the least disruptive, but your cluster must meet both general and release-specific requirements.
Software Upgrades Guidelines for Rolling Upgrade Guidelines for Rolling Upgrade You can normally do a rolling upgrade if: • You are not upgrading the nodes to a new version of HP-UX; or • You are upgrading to a new version of HP-UX, but using the update process (update-ux), rather than a cold install. update-ux supports many, but not all, upgrade paths. For more information, see the HP-UX Installation and Update Guide for the target version of HP-UX.
Software Upgrades Performing a Rolling Upgrade Performing a Rolling Upgrade Limitations of Rolling Upgrades The following limitations apply to rolling upgrades: • During a rolling upgrade, you should issue Serviceguard commands (other than cmrunnode and cmhaltnode) only on a node containing the latest revision of the software. Performing tasks on a node containing an earlier revision of the software will not work or will cause inconsistent results.
Software Upgrades Performing a Rolling Upgrade • Rolling upgrades are not intended as a means of using mixed releases of Serviceguard or HP-UX within the cluster. HP strongly recommends that you upgrade all cluster nodes as quickly as possible to the new release level. • You cannot delete Serviceguard software (via swremove) from a node while a rolling upgrade is in progress.
Software Upgrades Performing a Rolling Upgrade Step 5. If the Event Monitoring Service (EMS) is configured, restart it as follows: 1. Kill all EMS monitors. 2. Stop EMS clients. 3. Kill all registrar processes. 4. Kill the p_client demon. The p_client process restart immediately. The EMS registrar and monitor processes will be restarted automatically when they are needed. For more information, see “Using the Event Monitoring Service” on page 84. Step 6. Restart the cluster on the upgraded node.
Software Upgrades Example of a Rolling Upgrade Example of a Rolling Upgrade NOTE Warning messages may appear during a rolling upgrade while the node is determining what version of software is running. This is a normal occurrence and not a cause for concern. The following example shows a simple rolling upgrade on two nodes running one package each, as shown in Figure E-1.
Software Upgrades Example of a Rolling Upgrade This will cause pkg1 to be halted cleanly and moved to node 2. The Serviceguard daemon on node 1 is halted, and the result is shown in Figure E-2. Figure E-2 Running Cluster with Packages Moved to Node 2 Step 2. Upgrade node 1 to the next operating system release (“HP-UX (new)”), and install the next version of Serviceguard (“SG (new)”).
Software Upgrades Example of a Rolling Upgrade Step 3. When upgrading is finished, enter the following command on node 1 to restart the cluster on node 1. # cmrunnode -n node1 At this point, different versions of the Serviceguard daemon (cmcld) are running on the two nodes, as shown in Figure E-4. Figure E-4 Node 1 Rejoining the Cluster Step 4. Repeat the process on node 2. Halt the node, as follows: # cmhaltnode -f node2 This causes both packages to move to node 1.
Software Upgrades Example of a Rolling Upgrade Figure E-5 Running Cluster with Packages Moved to Node 1 Step 5. Move pkg2 back to its original node. Use the following commands: # cmhaltpkg pkg2 # cmrunpkg -n node2 pkg2 # cmmodpkg -e pkg2 The cmmodpkg command re-enables switching of the package, which was disabled by the cmhaltpkg command. The final running cluster is shown in Figure E-6.
Software Upgrades Example of a Rolling Upgrade Figure E-6 Appendix E Running Cluster After Upgrades 493
Software Upgrades Guidelines for Non-Rolling Upgrade Guidelines for Non-Rolling Upgrade Do a non-rolling upgrade if: • Your cluster does not meet the requirements for rolling upgrade as specified in the Release Notes for the target version of Serviceguard; or • The limitations imposed by rolling upgrades make it impractical for you to do a rolling upgrade (see “Limitations of Rolling Upgrades” on page 486); or • For some other reason you need or prefer to bring the cluster down before performing the u
Software Upgrades Performing a Non-Rolling Upgrade Performing a Non-Rolling Upgrade Limitations of Non-Rolling Upgrades The following limitations apply to non-rolling upgrades: • Binary configuration files may be incompatible between releases of Serviceguard. Do not manually copy configuration files between nodes. • You must halt the entire cluster before performing a non-rolling upgrade. Steps for Non-Rolling Upgrades Use the following steps for a non-rolling software upgrade: Step 1.
Software Upgrades Guidelines for Migrating a Cluster with Cold Install Guidelines for Migrating a Cluster with Cold Install There may be circumstances when you prefer to do a cold install of the HP-UX operating system rather than an upgrade. A cold install erases the existing operating system and data and then installs the new operating system and software; you must then restore the data. CAUTION The cold install process erases the existing software, operating system, and data.
Software Upgrades Guidelines for Migrating a Cluster with Cold Install See “Creating the Storage Infrastructure and Filesystems with LVM and VxVM” on page 215 for more information. 2. Halt the cluster applications, and then halt the cluster. 3. Do a cold install of the HP-UX operating system. For more information on the cold install process, see the HP-UX Installation and Update Guide for the target version of HP-UX: go to http://docs.hp.
Software Upgrades Guidelines for Migrating a Cluster with Cold Install 498 Appendix E
Blank Planning Worksheets F Blank Planning Worksheets This appendix contains blank versions of the planning worksheets described in the chapter “Planning and Documenting an HA Cluster.” You can duplicate any of these worksheets that you find useful and fill them in as a part of the planning process.
Blank Planning Worksheets Worksheet for Hardware Planning Worksheet for Hardware Planning HARDWARE WORKSHEET Page ___ of ____ =============================================================================== Node Information: Host Name _____________________ Series No _____________________ Memory Capacity ____________________ Number of I/O Slots ________________ =============================================================================== LAN Information: Name of Subnet _________ Name of IP Interface
Blank Planning Worksheets Power Supply Worksheet Power Supply Worksheet POWER SUPPLY WORKSHEET Page ___ of ____ =============================================================================== SPU Power: Host Name _____________________ Power Supply _______________________ Host Name _____________________ Power Supply _______________________ =============================================================================== Disk Power: Disk Unit __________________________ Power Supply _____________________
Blank Planning Worksheets Quorum Server Worksheet Quorum Server Worksheet Quorum Server Data: ============================================================================== QS Hostname: _____________IP Address: _______________IP Address_______________ ============================================================================== Quorum Services are Provided for: Cluster Name: ___________________________________________________________ Host Names ____________________________________________ Host Names ___
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet LVM Volume Group and Physical Volume Worksheet PHYSICAL VOLUME WORKSHEET Page ___ of ____ =============================================================================== Volume Group Name: ______________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_______________________
Blank Planning Worksheets VxVM Disk Group and Disk Worksheet VxVM Disk Group and Disk Worksheet DISK GROUP WORKSHEET Page ___ of ____ =========================================================================== Disk Group Name: __________________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:____________________________________________________
Blank Planning Worksheets Cluster Configuration Worksheet Cluster Configuration Worksheet =============================================================================== Name and Nodes: =============================================================================== Cluster Name: __________________________ RAC Version: _______________ Node Names: _________________________________________________________ Volume Groups (for packages):________________________________________ ===================================
Blank Planning Worksheets Cluster Configuration Worksheet Autostart Timeout: ___________ =============================================================================== Access Policies: User name: Host node: Role: =============================================================================== 506 Appendix F
Blank Planning Worksheets Package Configuration Worksheet Package Configuration Worksheet Package Configuration File Data: ========================================================================== Package Name: __________________Package Type:______________ Primary Node: ____________________ First Failover Node:__________________ Additional Failover Nodes:__________________________________ Run Script Timeout: _____ Halt Script Timeout: _____________ Package AutoRun Enabled? ______ Local LAN Failover Allow
Blank Planning Worksheets Package Configuration Worksheet CVM Disk Groups [ignore CVM items if CVM is not being used]: cvm_vg___________cvm_dg_____________cvm_vg_______________ cvm_activation_cmd: ______________________________________________ VxVM Disk Groups: vxvm_dg_________vxvm_dg____________vxvm_dg_____________ vxvol_cmd ______________________________________________________ ________________________________________________________________________________ Logical Volumes and File Systems: fs_name_____
Blank Planning Worksheets Package Configuration Worksheet Service Name: _______ Command: _________ Restart: __ Fail Fast enabled:_____ ================================================================================ Package environment variable:________________________________________________ Package environment variable:________________________________________________ External pre-script:_________________________________________________________ External script:______________________________________________
Blank Planning Worksheets Package Configuration Worksheet 510 Appendix F
Migrating from LVM to VxVM Data Storage G Migrating from LVM to VxVM Data Storage This appendix describes how to migrate LVM volume groups to VxVM disk groups for use with the Veritas Volume Manager (VxVM), or with the Cluster Volume Manager (CVM) on systems that support it.
Migrating from LVM to VxVM Data Storage Loading VxVM Loading VxVM Before you can begin migrating data, you must install the Veritas Volume Manager software and all required VxVM licenses on all cluster nodes. This step requires each system to be rebooted, so it requires you to remove the node from the cluster before the installation, and restart the node after installation. This can be done as a part of a rolling upgrade procedure, described in Appendix E.
Migrating from LVM to VxVM Data Storage Migrating Volume Groups Migrating Volume Groups The following procedure shows how to do the migration of individual volume groups for packages that are configured to run on a given node. You should convert all the volume groups for a package at the same time. It is assumed that VxVM software and an appropriate version of HP-UX and Serviceguard have been installed on the node, and that the node has rebooted and rejoined the cluster.
Migrating from LVM to VxVM Data Storage Migrating Volume Groups utility is described along with its limitations and cautions in the Veritas Volume Manager Migration Guide for your version, available from http://www.docs.hp.com. If using the vxconvert(1M) utility, then skip the next step and go ahead to the following section. NOTE Remember that the cluster lock disk, if used, must be configured on an LVM volume group and physical volume.
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM Customizing Packages for VxVM After creating the VxVM disk group, you need to customize the Serviceguard package that will access the storage. Use the following procedure for a legacy package that will you use with the Veritas Volume Manager (VxVM) disk groups. If you are using the Cluster Volume Manager (CVM), skip ahead to the next section.
Migrating from LVM to VxVM Data Storage Customizing Packages for VxVM LV[0]="/dev/vx/dsk/dg01/lvol101" LV[1]="/dev/vx/dsk/dg01/lvol102" LV[2]="/dev/vx/dsk/dg02/lvol201" LV[3]="/dev/vx/dsk/dg02/lvol202" FS[0]="/mnt_dg0101" FS[1]="/mnt_dg0102" FS[2]="/mnt_dg0201" FS[3]="/mnt_dg0202" FS_MOUNT_OPT[0]="-o FS_MOUNT_OPT[1]="-o FS_MOUNT_OPT[2]="-o FS_MOUNT_OPT[3]="-o ro" rw" ro" rw" 4.
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM Customizing Packages for CVM NOTE Check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CVM and CFS: http://www.docs.hp.com -> High Availability -> Serviceguard. After creating the CVM disk group, you need to customize the Serviceguard package that will access the storage.
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM For example lets say we have two volumes defined in each of the two disk groups from above, lvol101 and lvol102, and lvol201 and lvol202. These are mounted on /mnt_dg0101 and /mnt_dg0102, and /mnt_dg0201 and /mnt_dg0202, respectively. /mnt_dg0101 and /mnt_dg0201 are both mounted read-only.
Migrating from LVM to VxVM Data Storage Customizing Packages for CVM 9. Deport the disk group: # vxdg deport DiskGroupName 10. Start the cluster, if it is not already running: # cmruncl This will activate the special CVM package. 11. When CVM starts up, it selects a master node, and this is the node from which you must issue the disk group configuration commands.
Migrating from LVM to VxVM Data Storage Removing LVM Volume Groups Removing LVM Volume Groups After testing the new VxVM disk groups, remove any LVM volume groups that are no longer wanted from the system using the standard LVM commands lvremove, pvremove, and vgremove. At a convenient time, you should also edit the cluster ASCII configuration file to remove the VOLUME_GROUP statements that refer to the LVM volume groups that are no longer used in the cluster.
IPv6 Network Support H IPv6 Network Support This appendix describes some of the characteristics of IPv6 network addresses.
IPv6 Network Support IPv6 Address Types IPv6 Address Types Several IPv6 types of addressing schemes are specified in the RFC 2373 (IPv6 Addressing Architecture). IPv6 addresses are 128-bit identifiers for interfaces and sets of interfaces. There are various address formats for IPv6 defined by the RFC 2373. IPv6 addresses are broadly classified as follows: The following table explains the three types of IPv6 address types: unicast, anycast, and multicast.
IPv6 Network Support IPv6 Address Types multiple groups of 16-bits of zeros. The “::” can appear only once in an address and it can be used to compress the leading, trailing, or contiguous sixteen-bit zeroes in an address. Example: fec0:1:0:0:0:0:0:1234 can be represented as fec0:1::1234. • When dealing with a mixed environment of IPv4 and IPv6 nodes there is an alternative form of IPv6 address that will be used. It is x:x:x:x:x:x:d.d.d.
IPv6 Network Support IPv6 Address Types Unicast Addresses IPv6 unicast addresses are classified into different types. They are global aggregatable unicast address, site-local address and link-local address. Typically a unicast address is logically divided as follows: Table H-2 n bits 128-n bits Subnet prefix Interface ID Interface identifiers in a IPv6 unicast address are used to identify the interfaces on a link. Interface identifiers are required to be unique on that link.
IPv6 Network Support IPv6 Address Types IPv4 Mapped IPv6 Address There is a special type of IPv6 address that holds an embedded IPv4 address. This address is used to represent the addresses of IPv4-only nodes as IPv6 addresses. These addresses are used especially by applications that support both IPv6 and IPv4. These addresses are called as IPv4 Mapped IPv6 Addresses. The format of these address is as follows: Table H-4 80 bits 16 bits zeros 32 bits FFFF IPv4 address Example: ::ffff:192.168.0.
IPv6 Network Support IPv6 Address Types Link-Local Addresses Link-local addresses have the following format: Table H-6 10 bits 1111111010 54 bits 0 64 bits interface ID Link-local address are supposed to be used for addressing nodes on a single link. Packets originating from or destined to a link-local address will not be forwarded by a router.
IPv6 Network Support IPv6 Address Types “FF” at the beginning of the address identifies the address as a multicast address. The “flgs” field is a set of 4 flags “000T”. The higher order 3 bits are reserved and must be zero. The last bit ‘T’ indicates whether it is permanently assigned or not. A value of zero indicates that it is permanently assigned otherwise it is a temporary assignment. The “scop” field is a 4-bit field which is used to limit the scope of the multicast group.
IPv6 Network Support Network Configuration Restrictions Network Configuration Restrictions Serviceguard supports IPv6 for data links only. The heartbeat IP must still be IPv4, but the package IPs can be IPv4 or IPv6. To configure IPv6, the system should be set up in what is called a dual-stack configuration, which requires the IPv6 product bundle. The restrictions for supporting IPv6 in Serviceguard are listed below. NOTE 528 • The heartbeat IP address must be IPv4.
IPv6 Network Support Network Configuration Restrictions NOTE Appendix H • Quorum server, if used, has to be configured on an IPv4 network. It is not IPv6-capable. A quorum server configured on an IPv4 network can still be used by Serviceguard IPv6 clusters that have IPv6 networks as a part of their cluster configuration.
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature IPv6 Relocatable Address and Duplicate Address Detection Feature The IPv6 networking stack has a new feature, Duplicate Address Detection (DAD), that was not previously available in IPv4. When an address is being added, the DAD detects a duplicate address that is already being used on the network.
IPv6 Network Support IPv6 Relocatable Address and Duplicate Address Detection Feature # TRANSPORT_NAME[index]=ip6 # NDD_NAME[index]=ip6_nd_dad_solicit_count # NDD_VALUE[index]=n Where index is the next available integer value of the nddconf file, and n is a number: either 1 to turn the feature ON or 0 to turn it OFF.
IPv6 Network Support Local Primary/Standby LAN Patterns Local Primary/Standby LAN Patterns The use of IPv6 allows a number of different patterns of failover among LAN cards configured in the cluster. This is true because each LAN card can support several IP addresses when a dual IPv4/IPv6 configuration is used. This section describes several ways in that local failover to a standby LAN can be configured.
IPv6 Network Support Example Configurations Example Configurations An example of a LAN configuration on a cluster node using both IPv4 and IPv6 addresses is shown in below. Figure H-1 Example 1: IPv4 and IPv6 Addresses in Standby Configuration Following the loss of lan0 or lan2, lan1 can adopt either address, as shown below. Figure H-2 Example 1: IPv4 and IPv6 Addresses after Failover to Standby The same LAN card can be configured with both IPv4 and IPv6 addresses, as shown in below.
IPv6 Network Support Example Configurations Figure H-3 Example 2: IPv4 and IPv6 Addresses in Standby Configuration This type of configuration allows failover of both addresses to the standby. This is shown in below.
Maximum and Minimum Values for Cluster and Package Configuration Parameters I Maximum and Minimum Values for Cluster and Package Configuration Parameters Table I-1 shows the range of possible values for cluster configuration parameters.
Maximum and Minimum Values for Cluster and Package Configuration Parameters Table I-2 shows the range of possible values for package configuration parameters. Table I-2 Package Paramet er 536 Minimum and Maximum Values of Package Configuration Parameters Minimum Value Maximum Value Run Script Timeout 10 seconds 4294 seconds if a non-zero value is specified 0 (NO_TIMEOUT) This is a recommended value.
A Access Control Policies, 239 Access Control Policy, 167 Access roles, 167 active node, 27 adding a package to a running cluster, 395 adding cluster nodes advance planning, 195 adding nodes to a running cluster, 351 adding packages on a running cluster, 322 additional package resources monitoring, 84 addressing, SCSI, 138 administration adding nodes to a ruuning cluster, 351 cluster and package states, 331 halting a package, 355 halting the entire cluster, 352 moving a package, 356 of packages and services
changes to cluster allowed while the cluster is running, 359 changes to packages allowed while the cluster is running, 398 changing the volume group configuration while the cluster is running, 373 checkpoints, 461 client connections restoring in applications, 470 cluster configuring with commands, 228 redundancy of components, 36 Serviceguard, 26 typical configuration, 25 understanding components, 36 cluster administration, 349 solving problems, 428 cluster and package maintenance, 329 cluster configuration
cmapplyconf, 249, 389 cmassistd daemon, 59 cmcheckconf, 248, 320, 388 troubleshooting, 425 cmclconfd daemon, 59 cmcld daemon, 59 and node TOC, 60 and safety timer, 60 functions, 60 runtime priority, 60 cmclnodelist bootstrap file, 200 cmdeleteconf deleting a package configuration, 395 deleting the cluster configuration, 273 cmfileassistd daemon, 59, 60 cmlogd daemon, 59, 61 cmlvmd daemon, 59, 61 cmmodnet assigning IP addresses in control scripts, 99 cmnetassist daemon, 63 cmnetassistd daemon, 59 cmomd daemo
disk choosing for volume groups, 216 data, 45 interfaces, 45 mirroring, 46 root, 45 sample configurations, 49, 52 disk enclosures high availability, 47 disk failure protection through mirroring, 27 disk group planning, 152 disk group and disk planning, 152 disk I/O hardware planning, 139 disk layout planning, 149 disk logical units hardware planning, 139 disk management, 112 disk monitor, 47 disk monitor (EMS), 84 disk storage creating the infrastructure with CFS, 251 creating the infrastructure with CVM, 2
responses to package and service failures, 129 restarting a service after failure, 130 failures of applications, 472 figures cluster with high availability disk array, 51, 52 eight-node active/standby cluster, 55 eight-node cluster with EMC disk array, 56 mirrored disks connected for high availability, 50 node 1 rejoining the cluster, 491 node 1 upgraded to new HP-UX vesion, 490 redundant LANs, 40 running cluster after upgrades, 493 running cluster before rolling upgrade, 489 running cluster with packages
parameter in cluster manager configuration, 165 HEARTBEAT_IP parameter in cluster manager configuration, 160 high availability, 26 HA cluster defined, 36 objectives in planning, 132 host IP address hardware planning, 136, 147 host name hardware planning, 135 how the cluster manager works, 65 how the network manager works, 99 HP, 118 HP Predictive monitoring in troubleshooting, 410 I I/O bus addresses hardware planning, 139 I/O slots hardware planning, 135, 139 I/O subsystem changes as of HP-UX 11i v3, 47, 1
4 or more nodes, 69 specifying, 229 lock volume group identifying in configuration file, 229 planning, 145, 156 lock volume group, reconfiguring, 363 logical volumes blank planning worksheet, 504 creating for a cluster, 218, 224, 225, 266 creating the infrastructure, 215, 222 planning, 149 worksheet, 150, 154 lssf using to obtain a list of disks, 216 LV in sample package control script, 383 lvextend creating a root mirror with, 209 LVM, 118 commands for cluster use, 215 creating a root mirror, 209 disks, 45
Network Failure Detection parameter, 101 network manager adding and deleting package IP addresses, 101 main functions, 99 monitoring LAN interfaces, 101 testing, 407 network planning subnet, 136, 147 network polling interval (NETWORK_POLLING_INTERVAL) parameter in cluster manager configuration, 166 network time protocol (NTP) for clusters, 207 networking redundant subnets, 136 networks binding to IP addresses, 467 binding to port addresses, 467 IP addresses and naming, 463 node and package IP addresses, 99
package and cluster maintenance, 329 package configuration distributing the configuration file, 320, 388 multi-node packages, 327 planning, 169 run and halt script timeout parameters, 310 service name parameter, 309 step by step, 275 subnet parameter, 309 system multi-node packages, 325 using Serviceguard commands, 377 verifying the configuration, 320, 388 writing the package control script, 383 package configuration file package dependency paramters, 293 successor_halt_timeout, 291 package coordinator defi
primary network interface, 38 primary node, 27 pvcreate creating a root mirror with, 209 PVG-strict mirroring creating volume groups with, 217 Q qs daemon, 59 QS_HOST parameter in cluster manager configuration, 157, 158 QS_POLLING_INTERVAL parameter in cluster manager configuration, 158 QS_TIMEOUT_EXTENSION parameter in cluster manager configuration, 158 quorum and cluster reformation, 127 quorum server and safety timer, 60 blank planning worksheet, 502 installing, 214 parameters in cluster manager configur
parameter in package configuration, 310 running cluster adding or removing packages, 322 S safety timer and node TOC, 60 and syslog.
parameter in package configuration, 309 subnet hardware planning, 136, 147 parameter in package configuration, 309 successor_halt_timeout parameter, 291 supported disks in Serviceguard, 45 switching ARP messages after switching, 107 local interface switching, 103 remote system switching, 107 switching IP addresses, 77, 78, 107 system log file troubleshooting, 423 system message changing for clusters, 271 system multi-node package, 74 used with CVM, 264 system multi-node package configuration, replacing dis
VLAN Critical Resource Analysis (CRA), 372 Volume, 112 volume group creating for a cluster, 217 creating physical volumes for clusters, 217 deactivating before export to another node, 219 for cluster lock, 69 planning, 149 relinquishing exclusive access via TOC, 127 setting up on another node with LVM Commands, 219 worksheet, 150, 154 volume group and physical volume planning, 149 volume managers, 112 comparison, 122 CVM, 120 LVM, 118 migrating from LVM to VxVM, 511 VxVM, 119 VOLUME_GROUP parameter in clu