Managing Serviceguard Sixteenth Edition HP Part Number: B3936-90140 Published: March 2009
Legal Notices © Copyright 1995-2009 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Table of Contents Printing History ...........................................................................................................................23 Preface.......................................................................................................................................25 1 Serviceguard at a Glance.........................................................................................................29 What is Serviceguard? .................................................
Redundant Power Supplies ...............................................................................................49 Larger Clusters ...................................................................................................................50 Active/Standby Model ..................................................................................................50 Point to Point Connections to Storage Devices ............................................................
Deciding When and Where to Run and Halt Failover Packages .......................69 Failover Packages’ Switching Behavior..............................................................70 Failover Policy....................................................................................................72 Automatic Rotating Standby..............................................................................73 Failback Policy......................................................................................
Volume Managers for Data Storage..................................................................................107 Types of Redundant Storage.......................................................................................107 About Device File Names (Device Special Files).........................................................107 Examples of Mirrored Storage.....................................................................................108 Examples of Storage on Disk Arrays...................
LVM Worksheet ..........................................................................................................134 CVM and VxVM Planning ...............................................................................................135 CVM and VxVM Worksheet .......................................................................................136 Cluster Configuration Planning .......................................................................................
Implications for Application Deployment.............................................................189 Configuring a Package to Fail Over across Subnets: Example..............................189 Configuring node_name...................................................................................190 Configuring monitored_subnet_access............................................................190 Configuring ip_subnet_node............................................................................
Configuring the Cluster ...................................................................................................216 cmquerycl Options......................................................................................................217 Speeding up the Process........................................................................................217 Specifying the Address Family for the Heartbeat .................................................218 Full Network Probing..........................
Checking Cluster Operation with Serviceguard Commands.....................................246 Preventing Automatic Activation of LVM Volume Groups .......................................247 Setting up Autostart Features .....................................................................................248 Changing the System Message ...................................................................................249 Managing a Single-Node Cluster..........................................................
ip_subnet_node ..................................................................................................273 ip_address...........................................................................................................273 service_name.......................................................................................................273 service_cmd.........................................................................................................274 service_restart....................
Editing the Configuration File..........................................................................................287 Verifying and Applying the Package Configuration........................................................291 Adding the Package to the Cluster...................................................................................292 How Control Scripts Manage VxVM Disk Groups..........................................................292 Configuring Veritas System Multi-node Packages.........
Managing Packages and Services ....................................................................................315 Starting a Package .......................................................................................................316 Starting a Package that Has Dependencies............................................................316 Using Serviceguard Commands to Start a Package...............................................316 Starting the Special-Purpose CVM and CFS Packages.................
Customizing the Package Control Script ..............................................................343 Adding Customer Defined Functions to the Package Control Script ...................344 Adding Serviceguard Commands in Customer Defined Functions ...............345 Support for Additional Products...........................................................................345 Verifying the Package Configuration..........................................................................
Replacing a Faulty Mechanism in an HA Enclosure...................................................365 Replacing a Lock Disk.................................................................................................366 Replacing a Lock LUN.................................................................................................366 Online Hardware Maintenance with In-line SCSI Terminator ...................................368 Replacing I/O Cards....................................................
A Enterprise Cluster Master Toolkit .............................................................................................385 B Designing Highly Available Cluster Applications .......................................................................387 Automating Application Operation ................................................................................387 Insulate Users from Outages ......................................................................................
C Integrating HA Applications with Serviceguard.........................................................................403 Checklist for Integrating HA Applications ......................................................................404 Defining Baseline Application Behavior on a Single System .....................................404 Integrating HA Applications in Multiple Systems .....................................................404 Testing the Cluster ...............................................
F Migrating from LVM to VxVM Data Storage ..............................................................................427 Loading VxVM..................................................................................................................427 Migrating Volume Groups................................................................................................427 Customizing Packages for VxVM.....................................................................................
List of Figures 1-1 1-2 1-3 2-1 2-2 2-3 2-4 2-5 2-6 3-1 3-2 3-3 3-4 3-5 3-6 3-7 3-8 3-9 3-10 3-11 3-12 3-13 3-14 3-15 3-16 3-17 3-18 3-19 3-20 3-21 3-22 3-23 3-24 3-25 4-1 5-1 D-1 D-2 D-3 D-4 D-5 Typical Cluster Configuration ....................................................................................29 Typical Cluster After Failover ....................................................................................31 Tasks in Configuring a Serviceguard Cluster ....................................
D-6 H-1 H-2 20 Running Cluster After Upgrades .............................................................................415 System Management Homepage with Serviceguard Manager................................445 Cluster by Type.........................................................................................................
List of Tables 1 3-1 3-2 3-3 3-4 4-1 4-2 6-1 6-2 7-1 7-2 G-1 G-2 G-3 G-4 G-5 G-6 G-7 G-8 I-1 I-2 Printing History..........................................................................................................23 Package Configuration Data.......................................................................................73 Node Lists in Sample Cluster......................................................................................
Printing History Table 1 Printing History Printing Date Part Number Edition January 1995 B3936-90001 First June 1995 B3936-90003 Second December 1995 B3936-90005 Third August 1997 B3936-90019 Fourth January 1998 B3936-90024 Fifth October 1998 B3936-90026 Sixth December 2000 B3936-90045 Seventh September 2001 B3936-90053 Eighth March 2002 B3936-90065 Ninth June 2003 B3936-90070 Tenth June 2004 B3936-90076 Eleventh June 2005 B3936-90076 Eleventh, First reprint October 20
Preface This sixteenth printing of the manual applies to Serviceguard Version A.11.19. Earlier versions are available at http://www.docs.hp.com -> High Availability -> Serviceguard. This guide describes how to configure Serviceguard to run on HP 9000 or HP Integrity servers under the HP-UX operating system. The contents are as follows: • • • • • • • • • • • • • • • “Serviceguard at a Glance” (page 29), describes a Serviceguard cluster and provides a roadmap for using this guide.
• • Appendix H (page 443) describes the Serviceguard Manager GUI. “Maximum and Minimum Values for Parameters” (page 449) provides a reference to the supported ranges for Serviceguard parameters. Related Publications Use the following URL for HP’s high availability web page:http://www.hp.com/go/ha Use the following URL to find the latest versions of a wide variety of HP-UX documentation: http://www.docs.hp.
• From http://www.docs.hp.com -> High Availability -> Quorum Server: — HP Serviceguard Quorum Server Version A.04.00 Release Notes • From http://www.docs.hp.com -> High Availability -> Event Monitoring Service and HA Monitors -> Installation and User’s Guide: — Using High Availability Monitors — Using the Event Monitoring Service • From http://www.docs.hp.
1 Serviceguard at a Glance This chapter introduces Serviceguard on HP-UX, and shows where to find information in this book. It covers the following: • What is Serviceguard? • Using Serviceguard Manager (page 32) • A Roadmap for Configuring Clusters and Packages (page 34) If you are ready to start setting up Serviceguard clusters, skip ahead to Chapter 4: “Planning and Documenting an HA Cluster ” (page 123). Specific steps for setup are given in Chapter 5: “Building an HA Cluster Configuration” (page 193).
network (LAN) component. In the event that one component fails, the redundant component takes over. Serviceguard and other high availability subsystems coordinate the transfer between components. A Serviceguard cluster is a networked grouping of HP 9000 or HP Integrity servers (or both), known as nodes, having sufficient redundancy of software and hardware that a single point of failure will not significantly disrupt service. A package groups application services (individual HP-UX processes) together.
services also are used for other types of inter-node communication. (The heartbeat is explained in more detail in the chapter “Understanding Serviceguard Software.”) Failover Any host system running in a Serviceguard cluster is called an active node. Under normal conditions, a fully operating Serviceguard cluster monitors the health of the cluster's components on all its active nodes. Most Serviceguard packages are failover packages.
provide as many separate power circuits as needed to prevent a single point of failure of your nodes, disks and disk mirrors. Each power circuit should be protected by an uninterruptible power source. For more details, refer to the section on “Power Supply Planning” in Chapter 4, “Planning and Documenting an HA Cluster.
You can use Serviceguard Manager to monitor, administer, and configure Serviceguard clusters. • • • You can see properties, status, and alerts of clusters, nodes, and packages. You can do administrative tasks such as run or halt clusters, cluster nodes, and packages. Yyou can create or modify a cluster and its packages. Monitoring Clusters with Serviceguard Manager From the main page of Serviceguard Manager, you can see status and alerts for the cluster, nodes, and packages.
on the command line. As of HP-UX 11i v3, SAM offers a Terminal User Interface (TUI) which also acts as a gateway to the web-based System Management Homepage (SMH). • • To get to the SMH for any task area, highlight the task area in the SAM TUI and press w. To go directly to the SMH from the command line, enter /usr/sbin/sam -w For more information, see the HP-UX Systems Administrator’s Guide, posted at http://docs.hp.
Figure 1-3 Tasks in Configuring a Serviceguard Cluster The tasks in Figure 1-3 are covered in step-by-step detail in chapters 4 through 7. HP recommends you gather all the data that is needed for configuration before you start. See “Planning and Documenting an HA Cluster ” (page 123) for tips on gathering data.
2 Understanding Serviceguard Hardware Configurations This chapter gives a broad overview of how the Serviceguard hardware components work. The following topics are presented: • Redundancy of Cluster Components • Redundant Network Components (page 38) • Redundant Disk Storage (page 43) • Redundant Power Supplies (page 49) • Larger Clusters (page 50) Refer to the next chapter for information about Serviceguard software components.
Fibre Channel or HP StorageWorks XP or EMC Symmetrix disk technology can be configured for failover among 16 nodes. Note that a package that does not access data from a disk on a shared bus can be configured to fail over to as many nodes as you have configured in the cluster (regardless of disk technology).
NOTE: Serial (RS232) lines are no longer supported for the cluster heartbeat. Fibre Channel, Token Ring and FDDI networks are no longer supported as heartbeat or data LANs. Rules and Restrictions • • • • • A single subnet cannot be configured on different network interfaces (NICs) on the same node. In the case of subnets that can be used for communication between cluster nodes, the same network interface must not be used to route more than one subnet configured on the same node.
addresses themselves will be immediately configured into the cluster as stationary IP addresses. CAUTION: If you configure any address other than a stationary IP address on a Serviceguard network interface, it could collide with a relocatable package IP address assigned by Serviceguard. See “Stationary and Relocatable IP Addresses ” (page 91). (Oracle VIPs are an exception to this rule; such configurations require the HP add-on product Serviceguard Extension for Oracle RAC).
In the figure, a two-node Serviceguard cluster has one bridged net configured with both a primary and a standby LAN card for the data/heartbeat subnet (subnetA). Another LAN card provides an optional dedicated heartbeat LAN. Note that the primary and standby LAN segments are connected by a hub to provide a redundant data/heartbeat subnet. Each node has its own IP address for this subnet.
• • You should not use the wildcard (*) for node_name in the package configuration file, as this could allow the package to fail over across subnets when a node on the same subnet is eligible. Instead, list the nodes in order of preference. You should configure IP monitoring for each subnet; see “Monitoring LAN Interfaces and Detecting Failure: IP Level” (page 99).
monitored for this package are configured for PARTIAL access, each node on the node_name list must have at least one of these subnets configured. — As in other configurations, a package will not start on a node unless the subnets configured on that node, and specified in the package configuration file as monitored subnets, are up. NOTE: See also the Rules and Restrictions (page 39) that apply to all cluster networking configurations.
package is moved, the storage group can be activated by the adoptive node. All of the disks in the storage group owned by a failover package must be connected to the original node and to all possible adoptive nodes for that package. Disk storage is made redundant by using RAID or software mirroring.
To protect against Fibre Channel or SCSI bus failures, each copy of the data must be accessed by a separate bus; that is, you cannot have all copies of the data on disk drives connected to the same bus. It is critical for high availability that you mirror both data and root disks. If you do not mirror your data disks and there is a disk failure, you will not be able to run your applications on any node in the cluster until the disk has been replaced and the data reloaded.
disk failure events to a Serviceguard, to another application, or by email. For more information, refer to the manual Using High Availability Monitors (B5736-90074), available at http://docs.hp.com -> High Availability -> Event Monitoring Service and HA Monitors -> Installation and User’s Guide. Monitoring VxVM and CVM Disks The HP Serviceguard VxVM Volume Monitor provides a means for effective and persistent monitoring of VxVM and CVM volumes. The Volume Monitor supports Veritas Volume Manager versions 3.
separate bus. This arrangement eliminates single points of failure and makes either the disk or its mirror available in the event one of the buses fails. Figure 2-2 Mirrored Disks Connected for High Availability Figure 2-3 below shows a similar cluster with a disk array connected to each node on two I/O channels. See “About Multipathing” (page 45).
Figure 2-3 Cluster with High Availability Disk Array Details on logical volume configuration for Serviceguard are in the chapter “Building an HA Cluster Configuration.” Sample Fibre Channel Disk Configuration In Figure 2-4, the root disks are shown with simple mirroring, but the shared storage is now accessed via redundant Fibre Channel switches attached to a disk array.
Figure 2-4 Cluster with Fibre Channel Switched Disk Array This type of configuration uses native HP-UX or other multipathing software; see “About Multipathing” (page 45). Redundant Power Supplies You can extend the availability of your hardware by providing battery backup to your nodes and disks. HP-supported uninterruptible power supplies (UPS), such as HP PowerTrust, can provide this protection from momentary power loss.
Therefore, if all of the hardware in the cluster has 2 or 3 power inputs, then at least three separate power circuits will be required to ensure that there is no single point of failure in the power circuit design for the cluster. Larger Clusters You can create clusters of up to 16 nodes with Serviceguard. Clusters of up to 16 nodes may be built by connecting individual SPUs via Ethernet.
Figure 2-5 Eight-Node Active/Standby Cluster Point to Point Connections to Storage Devices Some storage devices allow point-to-point connection to a large number of host nodes without using a shared SCSI bus. An example is shown in Figure 2-11, a cluster consisting of eight nodes with a SCSI interconnect. The nodes access shared data on an XP or EMC disk array configured with 16 SCSI I/O ports. Each node is connected to the array using two separate SCSI channels.
Figure 2-6 Eight-Node Cluster with XP or EMC Disk Array Fibre Channel switched configurations also are supported using either an arbitrated loop or fabric login topology. For additional information about supported cluster configurations, refer to the HP Unix Servers Configuration Guide, available through your HP representative.
3 Understanding Serviceguard Software Components This chapter gives a broad overview of how the Serviceguard software components work.
NOTE: Veritas CFS may not yet be supported on the version of HP-UX you are running; see “About Veritas CFS and CVM from Symantec” (page 32).
• • • • • /usr/lbin/cmvxpingd—Serviceguard-to-Veritas Activation daemon. (Only present if Veritas CFS is installed.) /usr/lbin/cmdisklockd— Lock LUN daemon /usr/lbin/cmlockd—Utility daemon /opt/sgproviders/bin/cmwbemd—WBEM daemon /usr/lbin/cmproxyd—Proxy daemon Each of these daemons logs to the /var/adm/syslog/syslog.logfile except for /opt/cmom/lbin/cmomd, which logs to /var/opt/cmom/cmomd.log. The quorum server runs outside the cluster.
NOTE: Two of the central components of Serviceguard—Package Manager, and Cluster Manager—run as parts of the cmcld daemon. This daemon runs at priority 20 on all cluster nodes. It is important that user processes should have a priority lower than 20, otherwise they may prevent Serviceguard from updating the kernel safety timer, causing a system reset. File Management Daemon: cmfileassistd The cmfileassistd daemon is used by cmcld to manage the files that it needs to read from, and write to, disk.
The SNMP Master Agent and the cmsnmpd provide notification (traps) for cluster-related events. For example, a trap is sent when the cluster configuration changes, or when a Serviceguard package has failed. You must edit /etc/SnmpAgent.d/ snmpd.conf to tell cmsnmpd where to send this information. You must also edit /etc/rc.conf.d/cmsnmpagt to auto-start cmsnmpd. Configure cmsnmpd to start before the Serviceguard cluster comes up. For more information, see the cmsnmpd (1m) manpage.
Lock LUN Daemon: cmdisklockd If a lock LUN is being used, cmdisklockd runs on each node in the cluster and is started by cmcld when the node joins the cluster. Utility Daemon: cmlockd Runs on every node on which cmcld is running (though currently not actually used by Serviceguard on HP-UX systems).
deployed as part of the Serviceguard Storage Management Suite bundles, the file /etc/gabtab is automatically configured and maintained by Serviceguard. GAB provides membership and messaging for CVM and the CFS. GAB membership also provides orderly startup and shutdown of the cluster file system.
Heartbeat Messages Central to the operation of the cluster manager is the sending and receiving of heartbeat messages among the nodes in the cluster. Each node in the cluster exchanges UDP heartbeat messages with every other node over each monitored IP network configured as a heartbeat device. (LAN monitoring is discussed later, in the section “Monitoring LAN Interfaces and Detecting Failure: Link Level” (page 93).
IMPORTANT: When multiple heartbeats are configured, heartbeats are sent in parallel; Serviceguard must receive at least one heartbeat to establish the health of a node. HP recommends that you configure all subnets that connect cluster nodes as heartbeat networks; this increases protection against multiple faults at no additional cost. Heartbeat IP addresses are usually on the same subnet on each node, but it is possible to configure a cluster that spans subnets; see “Cross-Subnet Configurations” (page 41).
Dynamic Cluster Re-formation A dynamic re-formation is a temporary change in cluster membership that takes place as nodes join or leave a running cluster. Re-formation differs from reconfiguration, which is a permanent modification of the configuration files. Re-formation of the cluster occurs under the following conditions (not a complete list): • • • • • • • • An SPU or network failure was detected on an active node. An inactive node wants to join the cluster.
possibility of two sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-cluster with greater than 50% of the nodes will form the new cluster, and the cluster lock is not used. If you have a two-node cluster, you are required to configure a cluster lock. If communications are lost between these two nodes, the node that obtains the cluster lock will take over the cluster and the other node will halt (system reset).
Figure 3-2 Lock Disk or Lock LUN Operation Serviceguard periodically checks the health of the lock disk or LUN and writes messages to the syslog file if the device fails the health check. This file should be monitored for early detection of lock disk problems. If you are using a lock disk, you can choose between two lock disk options—a single or dual lock disk—based on the kind of high availability configuration you are building. A single lock disk is recommended where possible.
in two separate data centers, a single lock disk would be a single point of failure should the data center it resides in suffer a catastrophic failure. In these two cases only, a dual cluster lock, with two separately powered cluster disks, should be used to eliminate the lock disk as a single point of failure. NOTE: You must use Fibre Channel connections for a dual cluster lock; you can no longer implement it in a parallel SCSI configuration.
Figure 3-3 Quorum Server Operation The quorum server runs on a separate system, and can provide quorum services for multiple clusters. IMPORTANT: For more information about the quorum server, see the latest version of the HP Serviceguard Quorum Server release notes at http://docs.hp.com -> High Availability -> Quorum Server. No Cluster Lock Normally, you should not configure a cluster of three or fewer nodes without a cluster lock. In two-node clusters, a cluster lock is required.
server), the quorum device (for example from one quorum server to another), and the parameters that govern them (for example the quorum server polling interval). For more information about the quorum server and lock parameters, see “Cluster Configuration Parameters ” (page 138). NOTE: If you are using the Veritas Cluster Volume Manager (CVM) you cannot change the quorum configuration while SG-CFS-pkg is running. For more information about CVM, see “CVM and VxVM Planning ” (page 135).
Package Types Three different types of packages can run in the cluster; the most common is the failover package. There are also special-purpose packages that run on more than one node at a time, and so do not failover. They are typically used to manage resources of certain failover packages.
Figure 3-4 Package Moving During Failover Configuring Failover Packages You configure each package separately. You create a failover package by generating and editing a package configuration file template, then adding the package to the cluster configuration database; see Chapter 6: “Configuring Packages and Their Services ” (page 253). For legacy packages (packages created by the method used on versions of Serviceguard earlier than A.11.
that determine failover behavior. These are the auto_run parameter, the failover_policy parameter, and the failback_policy parameter.
Figure 3-5 Before Package Switching Figure 3-6 shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2 on the same subnet. Package 1’s IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package1’s disk and Package2’s disk.
NOTE: For design and configuration information about site-aware disaster-tolerant clusters (which span subnets), see the documents listed under “Cross-Subnet Configurations” (page 41). Figure 3-6 After Package Switching Failover Policy The Package Manager selects a node for a failover package to run on based on the priority list included in the package configuration file together with the failover_policy parameter, also in the configuration file.
this does not mean the lightest load; the only thing that is checked is the number of packages currently running on the node.) Package placement is also affected by package dependencies and weights, if you choose to use them. See “About Package Dependencies” (page 166) and “About Package Weights” (page 174). Automatic Rotating Standby Using the min_package_node failover policy, it is possible to configure a cluster that lets you use one node as an automatic rotating standby node for the cluster.
Figure 3-7 Rotating Standby Configuration before Failover If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8, which shows a failure on node 2: 74 Understanding Serviceguard Software Components
Figure 3-8 Rotating Standby Configuration after Failover NOTE: Using the min_package_node policy, when node 2 is repaired and brought back into the cluster, it will then be running the fewest packages, and thus will become the new standby node.
Figure 3-9 CONFIGURED_NODE Policy Packages after Failover If you use configured_node as the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.
Figure 3-10 Automatic Failback Configuration before Failover Table 3-2 Node Lists in Sample Cluster Package Name NODE_NAME List FAILOVER POLICY FAILBACK POLICY pkgA node1, node4 CONFIGURED_NODE AUTOMATIC pkgB node2, node4 CONFIGURED_NODE AUTOMATIC pkgC node3, node4 CONFIGURED_NODE AUTOMATIC node1 panics, and after the cluster reforms, pkgA starts running on node4: How the Package Manager Works 77
Figure 3-11 Automatic Failback Configuration After Failover After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1.
Figure 3-12 Automatic Failback Configuration After Restart of Node 1 NOTE: Setting the failback_policy to automatic can result in a package failback and application outage during a critical production period. If you are using automatic failback, you may want to wait to add the package’s primary node back into the cluster until you can allow the package to be taken out of service temporarily while it switches back to the primary node.
Using the Event Monitoring Service Basic package resources include cluster nodes, LAN interfaces, and services, which are the individual processes within an application. All of these are monitored by Serviceguard directly. In addition, you can use the Event Monitoring Service registry through which add-on monitors can be configured. This registry allows other software components to supply monitoring of their resources for Serviceguard.
How Packages Run Packages are the means by which Serviceguard starts and halts configured applications. Failover packages are also units of failover behavior in Serviceguard. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available. There can be a maximum of 300 packages per cluster and a total of 900 services per cluster.
packages and failover packages can name some subset of the cluster’s nodes or all of them. If the auto_run parameter is set to yes in a package’s configuration file Serviceguard automatically starts the package when the cluster starts. System multi-node packages are required to have auto_run set to yes. If a failover package has auto_run set to no, Serviceguard cannot start it automatically at cluster startup time; you must explicitly enable this kind of package using the cmmodpkg command.
1. 2. 3. 4. 5. 6. 7. Before the control script starts. (For modular packages, this is the master control script.) During run script execution. (For modular packages, during control script execution to start the package.) While services are running When a service, subnet, or monitored resource fails, or a dependency is not met. During halt script execution. (For modular packages, during control script execution to halt the package.
7. 8. Starts up any EMS (Event Monitoring Service) resources needed by the package that were specially marked for deferred startup. Exits with an exit code of zero (0). Figure 3-14 Package Time Line (Legacy Package) At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). For example, if a package service is unable to be started, the control script will exit with an error. NOTE: This diagram is specific to legacy packages.
NOTE: After the package run script has finished its work, it exits, which means that the script is no longer executing once the package is running normally. After the script exits, the PIDs of the services started by the script are monitored by the package manager directly. If the service dies, the package manager will then run the package halt script or, if service_fail_fast_enabled is set to yes, it will halt the node on which the package is running.
SERVICE_RESTART[0]=" " SERVICE_RESTART[0]="-r " SERVICE_RESTART[0]="-R" ; do not restart ; restart as many as times ; restart indefinitely NOTE: If you set restarts and also set service_fail_fast_enabled to yes, the failfast will take place after restart attempts have failed. It does not make sense to set service_restart to “-R” for a service and also set service_fail_fast_enabled to yes.
NOTE: If a package is dependent on a subnet, and the subnet fails on the node where the package is running, the package will start to shut down. If the subnet recovers immediately (before the package is restarted on an adoptive node), the package manager restarts the package on the same node; no package switch occurs.
Figure 3-15 Legacy Package Time Line for Halt Script Execution At any step along the way, an error will result in the script exiting abnormally (with an exit code of 1). Also, if the halt script execution is not complete before the time specified in the HALT_SCRIPT_TIMEOUT, the package manager will kill the script. During halt script execution, messages are written to a log file. For legacy packages, this is in the same directory as the run script and has the same name as the run script and the extension.
• • • 0—normal exit. The package halted normally, so all services are down on this node. 1—abnormal exit, also known as no_restart exit. The package did not halt normally. Services are killed, and the package is disabled globally. It is not disabled on the current node, however. Timeout—Another type of exit occurs when the halt_script_timeout is exceeded. In this scenario, the package is killed and disabled globally. It is not disabled on the current node, however.
Table 3-3 Error Conditions and Package Movement for Failover Packages (continued) Package Error Condition Results Error or Exit Code Node Failfast Enabled Service Failfast Enabled HP-UX Status on Primary after Error Halt script runs after Error or Exit Package Allowed Package to Run on Primary Allowed to Node after Error Run on Alternate Node Halt Script Timeout YES Either Setting system reset N/A N/A (system reset) Yes, unless the timeout happened after the cmhaltpkg command was executed.
where the package is running and monitoring the health of all interfaces, switching them when necessary. NOTE: Serviceguard monitors the health of the network interfaces (NICs) but does not perform network connectivity checking. Stationary and Relocatable IP Addresses Each node (host system) should have at least one IP address for each active network interface. This address, known as a stationary IP address, is configured in the node's /etc/rc.conf.d/netconf file or in the node’s /etc/rc.conf.
that applications can access the package via its relocatable address without knowing which node the package currently resides on. IMPORTANT: Any subnet that is used by a package for relocatable addresses should be configured into the cluster via NETWORK_INTERFACE and either STATIONARY_IP or HEARTBEAT_IP in the cluster configuration file. For more information about those parameters, see “Cluster Configuration Parameters ” (page 138).
IP addresses are configured only on each primary network interface card; standby cards are not configured with an IP address. Multiple IPv4 addresses on the same network card must belong to the same IP subnet. CAUTION: HP strongly recommends that you add relocatable addresses to packages only by editing ip_address (page 273) in the package configuration file (or IP [] entries in the control script of a legacy package) and running cmapplyconf (1m).
• calculates the time depending on the type of LAN card.) Serviceguard will not declare the card as bad if only the inbound or only the outbound count stops incrementing. Both must stop. This is the default. INONLY_OR_INOUT: This option will also declare the card as bad if both inbound and outbound counts stop incrementing. However, it will also declare it as bad if only the inbound count stops. This option is not suitable for all environments.
Jumbo Frames. Additionally, network interface cards running 1000Base-T or 1000Base-SX cannot do local failover to 10BaseT. During the transfer, IP packets will be lost, but TCP (Transmission Control Protocol) will retransmit the packets. In the case of UDP (User Datagram Protocol), the packets will not be retransmitted automatically by the protocol. However, since UDP is an unreliable service, UDP applications should be prepared to handle the case of lost network packets and recover appropriately.
Figure 3-17 Cluster After Local Network Switching As the standby interface takes over, IP addresses will be switched to the hardware path associated with the standby interface. The switch is transparent at the TCP/IP level. All applications continue to run on their original nodes. During this time, IP traffic on Node 1 will be delayed as the transfer occurs. However, the TCP/IP connections will continue to be maintained and applications will continue to run.
Figure 3-18 Local Switching After Cable Failure Local network switching will work with a cluster containing one or more nodes. You may wish to design a single-node cluster in order to take advantage of this local network switching feature in situations where you need only one node and do not wish to set up a more complex cluster. Switching Back to Primary LAN Interfaces after Local Switching If a primary interface fails, the IP address will be switched to a standby.
cmhaltnode command is issued or on all nodes in the cluster if a cmhaltcl command is issued. • Configurable behavior: NETWORK_AUTO_FAILBACK = NO. Serviceguard will detect and log the recovery of the interface, but will not switch the IP address back from the standby to the primary interface. You can tell Serviceguard to switch the IP address back to the primary interface by means of the cmmodnet command: cmmodnet -e where is the primary interface.
configured on that node, and identified as monitored subnets in the package configuration file, must be available.) Note that remote switching is supported only between LANs of the same type. For example, a remote switchover between an Ethernet interface on one machine and an IPoIB interface on the failover machine is not supported. The remote switching of relocatable IP addresses is shown in Figure 3-5 and Figure 3-6.
NOTE: This applies only to subnets for which the cluster configuration parameter IP_MONITOR is set to ON. See “Cluster Configuration Parameters ” (page 138) for more information. — Errors that prevent packets from being received but do not affect the link-level health of an interface IMPORTANT: You should configure the IP Monitor in a cross-subnet configuration, because IP monitoring will detect some errors that link-level monitoring will not. See also “Cross-Subnet Configurations” (page 41).
… Route Connectivity (no probing was performed): IPv4: 1 16.89.143.192 16.89.120.0 … Possible IP Monitor Subnets: IPv4: 16.89.112.0 Polling Target 16.89.112.1 IPv6: 3ffe:1000:0:a801:: Polling Target 3ffe:1000:0:a801::254 … The IP Monitor section of the cluster configuration file will look similar to the following for a subnet on which IP monitoring is configured with target polling.
SUBNET 192.168.3.0 IP_MONITOR OFF NOTE: This is the default if cmquerycl does not detect a gateway for the subnet in question; it is equivalent to having no SUBNET entry for the subnet. See SUBNET under “Cluster Configuration Parameters ” (page 138) for more information. Constraints and Limitations • • • • • • A subnet must be configured into the cluster in order to be monitored. Polling targets are not detected beyond the first-level router.
Example 1: If Local Switching is Configured If local switching is configured and a failure is detected by link-level monitoring, output from cmviewcl -v will look like something like this: Network_Parameters: INTERFACE STATUS PRIMARY down (Link and IP) PRIMARY up PATH 0/3/1/0 0/5/1/0 NAME lan2 lan3 cmviewcl -v -f line will report the same failure like this: node:gary|interface:lan2|status=down node:gary|interface:lan2|local_switch_peer=lan1 node:gary|interface:lan2|disabled=false node:gary|interface:lan2
cmviewcl -v -f line will report the same failure like this: node:gary|interface:lan2|status=down node:gary|interface:lan2|disabled=false node:gary|interface:lan2|failure_type=ip_only Automatic Port Aggregation Serviceguard supports the use of automatic port aggregation through HP-APA (Auto-Port Aggregation, HP product J4240AA). HP-APA is a networking technology that aggregates multiple physical Fast Ethernet or multiple physical Gigabit Ethernet ports into a logical link aggregate.
Figure 3-19 Aggregated Networking Ports Both the Single and Dual ported LANs in the non-aggregated configuration have four LAN cards, each associated with a separate non-aggregated IP address and MAC address, and each with its own LAN name (lan0, lan1, lan2, lan3). When these ports are aggregated all four ports are associated with a single IP address and MAC address. In this example, the aggregated ports are collectively known as lan900, the name by which the aggregate is known on HP-UX 11i.
What is VLAN? Virtual LAN (or VLAN) is a technology that allows logical grouping of network nodes, regardless of their physical locations. VLAN can be used to divide a physical LAN into multiple logical LAN segments or broadcast domains, helping to reduce broadcast traffic, increase network performance and security, and improve manageability.
Additional Heartbeat Requirements VLAN technology allows great flexibility in network configuration. To maintain Serviceguard’s reliability and availability in such an environment, the heartbeat rules are tightened as follows when the cluster is using VLANs: 1. VLAN heartbeat networks must be configured on separate physical NICs or APA aggregates, to avoid single points of failure. 2. Heartbeats are still recommended on all cluster networks, including VLANs. 3.
to agile addressing when you upgrade to 11i v3, though you should seriously consider its advantages. For instructions on migrating a system to agile addressing, see the white paper Migrating from HP-UX 11i v2 to HP-UX 11i v3 at http://docs.hp.com. NOTE: It is possible, though not a best practice, to use legacyDSFs (that is, DSFs using the older naming convention) on some nodes after migrating to agile addressing on others; this allows you to migrate different nodes at different times, if necessary.
NOTE: Under agile addressing (see “About Device File Names (Device Special Files)” (page 107)), the storage units in this example would have names such as disk1, disk2, disk3, etc. Figure 3-20 Physical Disks Within Shared Storage Units Figure 3-21 shows the individual disks combined in a multiple disk mirrored configuration.
Figure 3-21 Mirrored Physical Disks Figure 3-22 shows the mirrors configured into LVM volume groups, shown in the figure as /dev/vgpkgA and /dev/vgpkgB. The volume groups are activated by Serviceguard packages for use by highly available applications.
Examples of Storage on Disk Arrays Figure 3-23 shows an illustration of storage configured on a disk array. Physical disks are configured by an array utility program into logical units or LUNs which are then seen by the operating system. Figure 3-23 Physical Disks Combined into LUNs NOTE: LUN definition is normally done using utility programs provided by the disk array manufacturer. Since arrays vary considerably, you should refer to the documentation that accompanies your storage unit.
Figure 3-24 Multiple Paths to LUNs Finally, the multiple paths are configured into volume groups as shown in Figure 3-25.
Types of Volume Manager Serviceguard allows a choice of volume managers for data storage: • HP-UX Logical Volume Manager (LVM) and (optionally) Mirrordisk/UX • Veritas Volume Manager for HP-UX (VxVM)—Base and add-on Products • Veritas Cluster Volume Manager for HP-UX Separate sections in Chapters 5 and 6 explain how to configure cluster storage using all of these volume managers.
• • • do not have all nodes cabled to all disks. (required with CFS) need to use software RAID mirroring or striped mirroring. have multiple heartbeat subnets configured. Propagation of Disk Groups in VxVM A VxVM disk group can be created on any node, whether the cluster is up or not. You must validate the disk group by trying to import it on each node. Package Startup Time with VxVM With VxVM, each disk group is imported by the package control script that uses the disk group.
CVM supports concurrent storage read/write access between multiple nodes by applications which can manage read/write access contention, such as Oracle Real Application Cluster (RAC). CVM 4.1 and later can be used with Veritas Cluster File System (CFS) in Serviceguard. Several of the HP Serviceguard Storage Management Suite bundles include features to enable both CVM and CFS.
2) single heartbeat network with standby LAN card(s) 3) single heartbeat network with APA. CVM 3.5 supports only options 2 and 3. Options 1 and 2 are the minimum recommended configurations for CVM 4.1 and later. Comparison of Volume Managers The following table summarizes the advantages and disadvantages of the volume managers.
Table 3-4 Pros and Cons of Volume Managers with Serviceguard (continued) Product Advantages Tradeoffs Base-VxVM • Software is supplied free with HP-UX 11i releases. • Java-based administration through graphical user interface. • Striping (RAID 0) support. • Concatenation. • Online resizing of volumes. • Supports multiple heartbeat subnets. • Cannot be used for a cluster lock • root/boot disk supported only on VxVM 3.
Responses to Failures Serviceguard responds to different kinds of failures in specific ways. For most hardware failures, the response is not user-configurable, but for package and service failures, you can choose the system’s response, within limits. System Reset When a Node Fails The most dramatic response to a failure in a Serviceguard cluster is an HP-UX TOC or INIT, which is a system reset without a graceful shutdown (normally referred to in this manual simply as a system reset).
Failure. Only one LAN has been configured for both heartbeat and data traffic. During the course of operations, heavy application traffic monopolizes the bandwidth of the network, preventing heartbeat packets from getting through. Since SystemA does not receive heartbeat messages from SystemB, SystemA attempts to reform as a one-node cluster. Likewise, since SystemB does not receive heartbeat messages from SystemA, SystemB also attempts to reform as a one-node cluster.
is configured, the node fails with a system reset. If a monitored data LAN interface fails without a standby, the node fails with a system reset only if node_fail_fast_enabled (page 264) is set to YES for the package. Otherwise any packages using that LAN interface will be halted and moved to another node if possible (unless the LAN recovers immediately; see “When a Service, Subnet, or Monitored Resource Fails, or a Dependency is Not Met” (page 86)).
NOTE: In a very few cases, Serviceguard will attempt to reboot the system before a system reset when this behavior is specified. If there is enough time to flush the buffers in the buffer cache, the reboot succeeds, and a system reset does not take place. Either way, the system will be guaranteed to come down within a predetermined number of seconds. “Choosing Switching and Failover Behavior” (page 163) provides advice on choosing appropriate failover behavior.
4 Planning and Documenting an HA Cluster Building a Serviceguard cluster begins with a planning phase in which you gather information about all the hardware and software components of the configuration.
• • electrical points of failure. application points of failure. Serviceguard Memory Requirements Serviceguard requires approximately 15.5 MB of lockable memory. Planning for Expansion When you first set up the cluster, you indicate a set of nodes and define a group of packages for the initial configuration. At a later time, you may wish to add additional nodes and packages, or you may wish to use additional disk hardware for shared data storage.
NOTE: Under agile addressing, the storage units in this example would have names such as disk1, disk2, disk3, etc. See “About Device File Names (Device Special Files)” (page 107). Figure 4-1 Sample Cluster Configuration Create a similar sketch for your own cluster.
Host Name The name to be used on the system as the host name. Memory Capacity The memory in MB. Number of I/O slots The number of slots. Network Information Serviceguard monitors LAN interfaces. NOTE: Serviceguard supports communication across routers between nodes in the same cluster; for more information, see the documents listed under “Cross-Subnet Configurations” (page 41).
An IPV6 address is a string of 8 hexadecimal values separated with colons, in this form: xxx:xxx:xxx:xxx:xxx:xxx:xxx:xxx. For more details of IPv6 address format, see the Appendix G (page 433). NETWORK_FAILURE_DETECTION When there is a primary and a standby network card, Serviceguard needs to determine when a card has failed, so it knows whether to fail traffic over to the other card.
Table 4-1 SCSI Addressing in Cluster Configuration System or Disk Host Interface SCSI Address Primary System A 7 Primary System B 6 Primary System C 5 Primary System D 4 Disk #1 3 Disk #2 2 Disk #3 1 Disk #4 0 Disk #5 15 Disk #6 14 Others 13 - 8 NOTE: When a boot/root disk is configured with a low-priority address on a shared SCSI bus, a system panic can occur if there is a timeout on accessing the boot/root device.
This information is used in creating the mirrored disk configuration using Logical Volume Manager. In addition, it is useful to gather as much information as possible about your disk configuration. You can obtain information about available disks by using the following commands: • • • • • • • • • • • diskinfo ioscan -fnC disk or ioscan -fnNC disk lssf /dev/*dsk/* bdf mount swapinfo vgdisplay -v lvdisplay -v lvlnboot -v vxdg list (VxVM and CVM) vxprint (VxVM and CVM) These are standard HP-UX commands.
Bus Type _SCSI_ Slot Number _6_ Address _24_ Disk Device File __________ Bus Type ______ Slot Number ___ Address ____ Disk Device File _________ Attach a printout of the output from the ioscan -fnC disk command after installing disk hardware and rebooting the system. Mark this printout to indicate which physical volume group each disk belongs to.
Be sure to follow UPS and cabinet power limits as well as SPU power limits. Power Supply Configuration Worksheet You may find the following worksheet useful to help you organize and record your power supply configuration. This worksheet is an example; blank worksheets are in Appendix E. Make as many copies as you need.
four nodes can use only a quorum server as the cluster lock. In selecting a cluster lock configuration, be careful to anticipate any potential need for additional cluster nodes. For more information on lock disks, lock LUNs, and the Quorum Server, see “Choosing Cluster Lock Disks” (page 202), “Setting Up a Lock LUN” (page 202), and “Setting Up and Running the Quorum Server” (page 206).
IMPORTANT: If you plan to use a Quorum Server, make sure you read the HP Serviceguard Quorum Server Version A.04.00 Release Notes before you proceed. You can find them at: http://www.docs.hp.com -> High Availability -> Quorum Server. You should also consult the Quorum Server white papers at the same location. Quorum Server Worksheet You may find it useful to use a Quorum Server worksheet, as in the example that follows, to identify a quorum server for use with one or more clusters.
When designing your disk layout using LVM, you should consider the following: • • • • • • • • The root disk should belong to its own volume group. The volume groups that contain high availability applications, services, or data must be on a bus or busses available to the primary node and all adoptive nodes. High availability applications, services, and data should be placed in a separate volume group from non-high availability applications, services, and data.
Physical Volume Name: ____________/dev/dsk/c2t2d0__________________________ Physical Volume Name: ____________/dev/dsk/c3t2d0__________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Physical Volume Name: _____________________________________________________ Name of Second Physical Volume Group: ______
• • • • • High availability applications, services, and data should be placed in separate disk groups from non-high availability applications, services, and data. You must not group two different high availability applications, services, or data, whose control needs to be transferred independently, onto the same disk group. Your HP-UX root disk can belong to an LVM or VxVM volume group (starting from VxVM 3.5) that is not shared among cluster nodes.
Cluster Configuration Planning A cluster should be designed to provide the quickest possible recovery from failures. The actual time required to recover from a failure depends on several factors: • The value of the cluster MEMBER_TIMEOUT. See MEMBER_TIMEOUT under “Cluster Configuration Parameters ” (page 138) for recommendations. • • The availability of raw disk access. Applications that use raw disk access should be designed with crash recovery services. The application and database recovery time.
NOTE: For heartbeat configuration requirements, see the discussion of the HEARTBEAT_IP parameter later in this chapter. For more information about managing the speed of cluster re-formation, see the discussion of the MEMBER_TIMEOUT parameter, and further discussion under “What Happens when a Node Times Out” (page 118) ,“Modifying the MEMBER_TIMEOUT Parameter” (page 223), and, for troubleshooting, “Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low” (page 376).
NOTE: In addition, the following characters must not be used in the cluster name if you are using the Quorum Server: at-sign (@), equal-sign (=), or-sign (|), semicolon (;). These characters are deprecated, meaning that you should not use them, even if you are not using the Quorum Server, because they will be illegal in a future Serviceguard release. All other characters are legal. The cluster name can contain up to 39 characters (bytes).
IPv4 address in this file. In addition, /etc/hosts should contain the following entry: ::1 ipv6-localhost ipv6-loopback For more information and recommendations about hostname resolution, see “Configuring Name Resolution” (page 196). NOTE: — ANY cannot be used in a cluster that uses Veritas Volume Manager (VxVM), Cluster Volume Manager (CVM), or Cluster File System (CFS). — ANY cannot be used in a cross-subnet configuration; these configurations are explained under“Cross-Subnet Configurations” (page 41).
providing quorum server functionality. Can be (or resolve to) either and IPv4 or an IPv6 address. This parameter is used only when you employ a quorum server for tie-breaking services in the cluster. You can also specify an alternate address (QS_ADDR) by which the cluster nodes can reach the quorum server. See also the entry for HOSTNAME_ADDRESS_FAMILY later in this section. For more information, see “Using a Quorum Server” (page 132) and “Specifying a Quorum Server” (page 220).
minutes). Minimum is 10,000,000 (10 seconds). Maximum is 2,147,483,647 (approximately 35 minutes). Can be changed while the cluster is running; see “What Happens when You Change the Quorum Configuration Online” (page 66) for important information. QS_TIMEOUT_EXTENSION Optional parameter used to increase the time (in microseconds) to wait for a quorum server response. The default quorum server timeout is calculated from the Serviceguard cluster parameter MEMBER_TIMEOUT. For clusters of two nodes, it is 0.
NODE_NAME The hostname of each system that will be a node in the cluster. CAUTION: Make sure that the node name is unique within the subnets configured on the cluster nodes; under some circumstances Serviceguard may not be able to detect a duplicate name and unexpected problems may result. Do not use the full domain name. For example, enter ftsys9, not ftsys9.cup.hp.com.
which the node identified by the preceding NODE_NAME entry belongs. Can be used only in a site-aware disaster-tolerant cluster, which requires additional software; see the documents listed under “Cross-Subnet Configurations” (page 41) for more information. If SITE is used, it must be used for each node in the cluster (that is, all the nodes must be associated with some defined site, though not necessarily the same one).
HEARTBEAT_IP IP notation indicating this node's connection to a subnet that will carry the cluster heartbeat. NOTE: Any subnet that is configured in this cluster configuration file as a SUBNET for IP monitoring purposes, or as a monitored_subnet in a package configuration file (or SUBNET in a legacy package; see “Package Configuration Planning ” (page 159)) must be specified in the cluster configuration file via NETWORK_INTERFACE and either STATIONARY_IP or HEARTBEAT_IP.
You cannot configure more than one heartbeat IP address on an interface; only one HEARTBEAT_IP is allowed for each NETWORK_INTERFACE. NOTE: The Serviceguard cmapplyconf, cmcheckconf, and cmquerycl commands check that these minimum requirements are met, and produce a warning if they are not met at the immediate network level. If you see this warning, you need to check that the requirements are met in your overall network configuration.
NOTE: Limitations: • Because Veritas Cluster File System from Symantec (CFS) requires link-level traffic communication (LLT) among the nodes, Serviceguard cannot be configured in cross-subnet configurations with CFS alone. But CFS is supported in specific cross-subnet configurations with Serviceguard and HP add-on products; see the documentation listed under “Cross-Subnet Configurations” (page 41) for more information. • IPv6 heartbeat subnets are not supported in a cross-subnet configuration.
NOTE: The use of a private heartbeat network is not advisable if you plan to use Remote Procedure Call (RPC) protocols and services. RPC assumes that each network adapter device or I/O card is connected to a route-able network. An isolated or private heartbeat LAN is not route-able, and could cause an RPC request-reply, directed to that LAN, to risk timeout without being serviced. NFS, NIS and NIS+, and CDE are examples of RPC based applications that are frequently used on HP-UX.
monitored non-heartbeat subnets here. You can identify any number of subnets to be monitored. IMPORTANT: In a cross-subnet configuration, each package subnet configured on an interface (NIC) must have a standby interface connected to the local bridged network; see “Cross-Subnet Configurations” (page 41). A stationary IP address can be either an IPv4 or an IPv6 address. For information about the IPv6 address format, see “IPv6 Address Types” (page 433).
FIRST_CLUSTER_LOCK_PV for the first physical lock volume and SECOND_CLUSTER_LOCK_PV for the second physical lock volume, if any. If there is a second physical lock volume, SECOND_CLUSTER_LOCK_PV must be on a separate line. These parameters are used only when you employ a lock disk for tie-breaking services in the cluster. Enter the physical volume name as it appears on each node in the cluster (the same physical volume may have a different path name on each node).
Capacity definition is optional, but if CAPACITY_NAME is specified, CAPACITY_VALUE must also be specified; CAPACITY_NAME must come first. NOTE: cmapplyconf will fail if any node defines a capacity and any package has min_package_node as its failover_policy (page 266) or automatic as its failback_policy (page 267). To specify more than one capacity for a node, repeat these parameters for each capacity.
you using a SCSI cluster lock or dual Fibre Channel cluster lock). Maximum supported value: 300 seconds (300,000,000 microseconds). If you enter a value greater than 60 seconds (60,000,000 microseconds), cmcheckconf and cmapplyconf will note the fact, as confirmation that you intend to use a large value. Minimum supported values: • 3 seconds for a cluster with more than one heartbeat subnet.
will lead to a cluster re-formation, and to the node being removed from the cluster and rebooted, if a system hang or network load spike prevents the node from sending a heartbeat signal within the MEMBER_TIMEOUT value. More than one node could be affected if, for example, a network event such as a broadcast storm caused kernel interrupts to be turned off on some or all nodes while the packets are being processed, preventing the nodes from sending and processing heartbeat messages.
Can be changed while the cluster is running. NETWORK_POLLING_INTERVAL Specifies how frequently the networks configured for Serviceguard are checked. Default is 2,000,000 microseconds (2 seconds). This means that the network manager will poll each network interface every 2 seconds, to make sure it can still send and receive information. Changing this value can affect how quickly a network failure is detected. The minimum value is 1,000,000 (1 second) and the maximum value supported is 30 seconds.
CONFIGURED_IO_TIMEOUT_EXTENSION The number of microseconds by which to increase the time Serviceguard waits after detecting a node failure, so as to ensure that all pending I/O on the failed node has ceased. This parameter must be set for extended-distance clusters using iFCP interconnects between sites. See the manual Understanding and Designing Serviceguard Disaster Tolerant Architectures on docs.hp.com —> High Availability —> Serviceguard for more information. Default is 0.
By default, each of the cluster subnets is listed under SUBNET, and, if at least one gateway is detected for that subnet, IP_MONITOR is set to ON and POLLING_TARGET entries are populated with the gateway addresses, enabling target polling; otherwise the subnet is listed with IP_MONITOR set to OFF. See “Monitoring LAN Interfaces and Detecting Failure: IP Level” (page 99) for more information.
POLLING_TARGET The IP address to which polling messages will be sent from all network interfaces on the subnet specified in the preceding SUBNET entry, if IP_MONITOR is set to ON. This is called target polling. Each subnet can have multiple polling targets; repeat POLLING_TARGET entries as needed. If IP_MONITOR is set to ON, but no POLLING_TARGET is specified, polling messages are sent between network interfaces on the same subnet (peer polling).
These parameters are optional, but if they are defined, WEIGHT_DEFAULT must follow WEIGHT_NAME, and must be set to a floating-point value between 0 and 1000000. If they are not specified for a given weight, Serviceguard will assume a default value of zero for that weight. In either case, the default can be overridden for an individual package via the weight_name and weight_value parameters in the package configuration file. For more information and examples, see “Defining Weights” (page 179).
VOLUME_GROUP The name of an LVM volume group whose disks are attached to at least two nodes in the cluster. Such disks are considered cluster-aware. The volume group name can have up to 39 characters (bytes). Cluster Configuration: Next Step When you are ready to configure the cluster, proceed to “Configuring the Cluster ” (page 216). If you find it useful to record your configuration ahead of time, use the worksheet in Appendix E.
NOTE: As of the date of this manual, the Framework for HP Serviceguard Toolkits deals specifically with legacy packages. Logical Volume and File System Planning NOTE: LVM Volume groups that are to be activated by packages must also be defined as cluster-aware in the cluster configuration file. See “Cluster Configuration Planning ” (page 137). Disk groups (for Veritas volume managers) that are to be activated by packages must be defined in the package configuration file, described below.
# /dev/vg01/lvoldb4 /general vxfs defaults 0 2 # /dev/vg01/lvoldb5 raw_free ignore ignore 0 0 # /dev/vg01/lvoldb6 raw_free ignore ignore 0 0 # logical volumes that # exist for Serviceguard's # HA package. Do not uncomment. Create an entry for each logical volume, indicating its use for a file system or for a raw device. Don’t forget to comment out the lines (using the # character as shown). NOTE: Do not use /etc/fstab to mount file systems that are used by Serviceguard packages.
CVM 4.1 and later with CFS CFS (Veritas Cluster File System) is supported for use with Veritas Cluster Volume Manager Version 4.1 and later. The system multi-node package SG-CFS-pkg manages the cluster’s volumes. Two sets of multi-node packages are also used: the CFS mount packages, SG-CFS-MP-id#, and the CFS disk group packages, SG-CFS-DG-id#. Create the multi-node packages with the cfs family of commands; do not edit the configuration file. CVM 4.
their configuration file, Serviceguard specifies the dependency on the CFS system multi-node package (SG-CFS-pkg). CAUTION: Once you create the disk group and mount point packages, it is critical that you administer the cluster with the cfs commands, including cfsdgadm, cfsmntadm, cfsmount, and cfsumount. If you use the general commands such as mount and umount, it could cause serious problems such as writing to the local file system instead of the cluster file system.
The following table describes different types of failover behavior and the settings in the package configuration file that determine each behavior. See “Package Parameter Explanations” (page 261) for more information. Table 4-2 Package Failover Behavior Switching Behavior Parameters in Configuration File Package switches normally after • node_fail_fast_enabled set to no. (Default) detection of service, network, or EMS • service_fail_fast_enabled set to NO for all services.
Parameters for Configuring EMS Resources NOTE: The default form for parameter names and literal values in the modular package configuration file is lower case; for legacy packages the default is upper case. There are no compatibility issues; Serviceguard is case-insensitive as far as the parameters are concerned. This manual uses lower case, unless the parameter in question is used only in legacy packages, or the context refers exclusively to such a package.
NOTE: For a legacy package, specify the deferred resources again in the package control script, using the DEFERRED_RESOURCE_NAME parameter: DEFERRED_RESOURCE_NAME[0]="/net/interfaces/lan/status/lan0" DEFERRED_RESOURCE_NAME[1]="/net/interfaces/lan/status/lan1" If a resource is configured to be AUTOMATIC in a legacy configuration file, you do not need to define DEFERRED_RESOURCE_NAME in the package control script. About Package Dependencies Starting in Serviceguard A.11.
Rules for Simple Dependencies Assume that we want to make pkg1 depend on pkg2. NOTE: pkg1 can depend on more than one other package, and pkg2 can depend on another package or packages; we are assuming only two packages in order to make the rules as clear as possible. • • pkg1 will not start on any node unless pkg2 is running on that node.
halted first, then pkg2. If there were a third package, pkg3, that depended on pkg1, pkg3 would be halted first, then pkg1, then pkg2. If the halt script for any dependent package hangs, by default the package depended on will wait forever (pkg2 will wait forever for pkg1, and if there is a pkg3 that depends on pkg1, pkg1 will wait forever for pkg3). You can modify this behavior by means of the successor_halt_timeout parameter (page 265).
NOTE: Keep the following in mind when reading the examples that follow, and when actually configuring priorities: 1. auto_run (page 263) should be set to yes for all the packages involved; the examples assume that it is. 2. Priorities express a ranking order, so a lower number means a higher priority (10 is a higher priority than 30, for example).
— If both packages have moved from node1 to node2 and node1 becomes available, pkg2 will fail back to node1 only if pkg2’s priority is higher than pkg1’s: ◦ If the priorities are equal, neither package will fail back (unless pkg1 is not running; in that case pkg2 can fail back).
because that provides the best chance for a successful failover (and failback) if pkg1 fails. But you also need to weigh the relative importance of the packages. If pkg2 runs a database that is central to your business, you probably want it to run undisturbed, no matter what happens to application packages that depend on it. In this case, the database package should have the highest priority.
IMPORTANT: If you have not already done so, read the discussion of Simple Dependencies (page 166) before you go on. The interaction of the legal values of dependency_location and dependency_condition creates the following possibilities: • Same-node dependency: a package can require that another package be UP on the same node. This is the case covered in the section on Simple Dependencies (page 166). • • • • Different-node dependency: a package can require that another package be UP on a different node.
Rules for different_node and any_node Dependencies These rules apply to packages whose dependency_condition is UP and whose dependency_location is different_node or any_node. For same-node dependencies, see Simple Dependencies (page 166); for exclusionary dependencies, see “Rules for Exclusionary Dependencies” (page 172). • • • Both packages must be failover packages whose failover_policy (page 266) is configured_node.
will allow the failed package to halt after the successor_halt_timeout number of seconds whether or not the dependent packages have completed their halt scripts. 2. Halts the failing package. After the successor halt timer has expired or the dependent packages have all halted, Serviceguard starts the halt script of the failing package, regardless of whether the dependents' halts succeeded, failed, or timed out. 3.
Package Weights and Node Capacities You define a capacity, or capacities, for a node (in the cluster configuration file), and corresponding weights for packages (in the package configuration file). Node capacity is consumed by package weights. Serviceguard ensures that the capacity limit you set for a node is never exceeded by the combined weight of packages running on it; if a node's available capacity will be exceeded by a package that wants to run on that node, the package will not run there.
CAPACITY_VALUE 10 Now all packages will be considered equal in terms of their resource consumption, and this node will never run more than ten packages at one time. (You can change this behavior if you need to by modifying the weight for some or all packages, as the next example shows.) Next, define the CAPACITY_NAME and CAPACITY_VALUE parameters for the remaining nodes, setting CAPACITY_NAME to package_limit in each case. You may want to set CAPACITY_VALUE to different values for different nodes.
Points to Keep in Mind The following points apply specifically to the Simple Method (page 175). Read them in conjunction with the Rules and Guidelines (page 182), which apply to all weights and capacities. • If you use the reserved CAPACITY_NAME package_limit, then this is the only type of capacity and weight you can define in this cluster. • If you use the reserved CAPACITY_NAME package_limit, the default weight for all packages is 1.
could be misleading to identify single resources, such as “processor”, if packages really contend for sets of interacting resources that are hard to characterize with a single name. In any case, the real-world meanings of the names you assign to node capacities and package weights are outside the scope of Serviceguard. Serviceguard simply ensures that for each capacity configured for a node, the combined weight of packages currently running on that node does not exceed that capacity.
CAPACITY_NAME A CAPACITY_VALUE 80 CAPACITY_NAME B CAPACITY_VALUE 50 NODE_NAME node2 CAPACITY_NAME A CAPACITY_VALUE 60 CAPACITY_NAME B CAPACITY_VALUE 70 ... NOTE: You do not have to define capacities for every node in the cluster. If any capacity is not defined for any node, Serviceguard assumes that node has an infinite amount of that capacity.
Example 3 WEIGHT_NAME A WEIGHT_DEFAULT 20 WEIGHT_NAME B WEIGHT_DEFAULT 15 This means that any package for which weight A is not defined in its package configuration file will have a weight A of 20, and any package for which weight B is not defined in its package configuration file will have a weight B of 15. Given the capacities we defined in the cluster configuration file (see “Defining Capacities”), node1 can run any three packages that use the default for both A and B.
weight_name A weight_value 40 In pkg3's package configuration file: weight_name B weight_value 35 weight_name A weight_value 0 In pkg4's package configuration file: weight_name B weight_value 40 IMPORTANT: weight_name in the package configuration file must exactly match the corresponding CAPACITY_NAME in the cluster configuration file. This applies to case as well as spelling: weight_name a would not match CAPACITY_NAME A.
Rules and Guidelines The following rules and guidelines apply to both the Simple Method (page 175) and the Comprehensive Method (page 177) of configuring capacities and weights. • You can define a maximum of four capacities, and corresponding weights, throughout the cluster. NOTE: But if you use the reserved CAPACITY_NAME package_limit, you can define only that single capacity and corresponding weight. See “Simple Method” (page 175).
For further discussion and use cases, see the white paper Using Serviceguard’s Node Capacity and Package Weight Feature on docs.hp.com under High Availability —> Serviceguard —> White Papers. How Package Weights Interact with Package Priorities and Dependencies If necessary, Serviceguard will halt a running lower-priority package that has weight to make room for a higher-priority package that has weight.
About External Scripts The package configuration template for modular scripts explicitly provides for external scripts. These replace the CUSTOMER DEFINED FUNCTIONS in legacy scripts, and can be run either: • • On package startup and shutdown, as essentially the first and last functions the package performs.
NOTE: Some variables, including SG_PACKAGE, and SG_NODE, are available only at package run and halt time, not when the package is validated. You can use SG_PACKAGE_NAME at validation time as a substitute for SG_PACKAGE. IMPORTANT: For more information, see the template in $SGCONF/examples/external_script.template. A sample script follows. It assumes there is another script called monitor.sh, which will be configured as a Serviceguard service to monitor some application. The monitor.
do case ${SG_SERVICE_CMD[i]} in *monitor.
Suppose a script run by pkg1 does a cmmodpkg -d of pkg2, and a script run by pkg2 does a cmmodpkg -d of pkg1. If both pkg1 and pkg2 start at the same time, the pkg1 script now tries to cmmodpkg pkg2. But that cmmodpkg command has to wait for pkg2 startup to complete. The pkg2 script tries to cmmodpkg pkg1, but pkg2 has to wait for pkg1 startup to complete, thereby causing a command loop.
NOTE: last_halt_failed appears only in the line output of cmviewcl, not the default tabular format; you must use the -v and -f line options to see it. The value of last_halt_failed is no if the halt script ran successfully, or was not run since the node joined the cluster, or was not run since the package was configured to run on the node; otherwise it is yes.
• • Deploying applications in this environment requires careful consideration; see “Implications for Application Deployment” (page 189). If a monitored_subnet (page 270) is configured for PARTIAL monitored_subnet_access in a package’s configuration file, it must be configured on at least one of the nodes on the node_name list for that package.
Configuring node_name First you need to make sure that pkg1 will fail over to a node on another subnet only if it has to. For example, if it is running on NodeA and needs to fail over, you want it to try NodeB, on the same subnet, before incurring the cross-subnet overhead of failing over to NodeC or NodeD.
ip_address 15.244.56.100 ip_address 15.244.56.101 Configuring a Package: Next Steps When you are ready to start configuring a package, proceed to Chapter 6: “Configuring Packages and Their Services ” (page 253); start with “Choosing Package Modules” (page 254). (If you find it helpful, you can assemble your package configuration data ahead of time on a separate worksheet for each package; blank worksheets are in Appendix E.
5 Building an HA Cluster Configuration This chapter and the next take you through the configuration tasks required to set up a Serviceguard cluster. These procedures are carried out on one node, called the configuration node, and the resulting binary file is distributed by Serviceguard to all the nodes in the cluster. In the examples in this chapter, the configuration node is named ftsys9, and the sample target node is called ftsys10.
Appendix D (page 407) provides instructions for upgrading Serviceguard without halting the cluster. Make sure you read the entire Appendix, and the corresponding section in the Release Notes, before you begin. Learning Where Serviceguard Files Are Kept Serviceguard uses a special file, /etc/cmcluster.conf, to define the locations for configuration and log files within the HP-UX filesystem. The following locations are defined in the file: ################## cmcluster.
NOTE: For more information and advice, see the white paper Securing Serviceguard at http://docs.hp.com -> High Availability -> Serviceguard -> White Papers. Allowing Root Access to an Unconfigured Node To enable a system to be included in a cluster, you must enable HP-UX root access to the system by the root user of every other potential cluster node. The Serviceguard mechanism for doing this is the file $SGCONF/cmclnodelist.
IMPORTANT: If $SGCONF/cmclnodelist does not exist, Serviceguard will look at ~/.rhosts. HP strongly recommends that you use cmclnodelist. NOTE: When you upgrade a cluster from Version A.11.15 or earlier, entries in $SGCONF/cmclnodelist are automatically updated to Access Control Policies in the cluster configuration file. All non-root user-hostname pairs are assigned the role of Monitor.
to resolve each of their primary addresses on each of those networks to the primary hostname of the node in question. In addition, HP recommends that you define name resolution in each node’s /etc/hosts file, rather than rely solely on a service such as DNS. Configure the name service switch to consult the /etc/hosts file before other services. See “Safeguarding against Loss of Name Resolution Services” (page 198) for instructions.
IMPORTANT: Serviceguard does not support aliases for IPv6 addresses. For more information about configuring a combination of IPv6 and IPv4 addresses, see the discussion of the HOSTNAME_ADDRESS_FAMILY parameter under “Cluster Configuration Parameters ” (page 138). Safeguarding against Loss of Name Resolution Services When you employ any user-level Serviceguard command (including cmviewcl), the command uses the name service you have configured (such as DNS) to obtain the addresses of all the cluster nodes.
nameserver 15.243.160.51 3. Edit or create the /etc/nsswitch.
Tuning Network and Kernel Parameters Serviceguard and its extension products, such as SGeSAP and SGeRAC, have been tested with default values of the network and kernel parameters supported by the ndd and kmtune utilities. You may need to adjust these parameters for larger cluster configurations and applications. • • ndd is the network tuning utility. For more information, see the man page for ndd (1m) kmtune is the system tuning utility. For more information, see the man page for kmtune (1m).
/dev/dsk/c4t6d0 is the mirror; be sure to use the correct device file names for the root disks on your system. NOTE: Under agile addressing, the physical devices in these examples would have names such as /dev/[r]disk/disk1, and /dev/[r]disk/disk2. See “About Device File Names (Device Special Files)” (page 107). 1. Create a bootable LVM disk to be used for the mirror. pvcreate -B /dev/rdsk/c4t6d0 2. Add this disk to the current root volume group. vgextend /dev/vg00 /dev/dsk/c4t6d0 3.
Root: lvol3 on: Swap: lvol2 on: Dump: lvol2 on: /dev/dsk/c4t5d0 /dev/dsk/c4t6d0 /dev/dsk/c4t5d0 /dev/dsk/c4t6d0 /dev/dsk/c4t6d0, 0 Choosing Cluster Lock Disks The following guidelines apply if you are using a lock disk. See “Cluster Lock ” (page 62) and “Cluster Lock Planning” (page 131) for discussion of cluster lock options. The cluster lock disk is configured on an LVM volume group that is physically connected to all cluster nodes.
Keep the following points in mind when choosing a device for a lock LUN: • • • All the cluster nodes must be physically connected to the lock LUN. A lock LUN must be a block device. All existing data on the LUN will be destroyed when you configure it as a lock LUN. This means that if you use an existing lock disk, the existing lock information will be lost, and if you use a LUN that was previously used as a lock LUN for a Linux cluster, that lock information will also be lost.
manpage for more information. Do this on one of the nodes in the cluster that will use this lock LUN. CAUTION: Before you start, make sure the disk or LUN that is to be partitioned has no data on it that you need. idisk will destroy any existing data. 1. Use a text editor to create a file that contains the partition information. You need to create at least three partitions, for example: 3 EFI 100MBHPUX 1MB HPUX 100% This defines: • • • 2. 3.
• If the pathname for the lock LUN is the same on all nodes, use a command such as: cmquerycl -C $SGCONF/config.ascii -L /dev/dsk/c0t1d1 -n -n • If the pathname for the lock LUN is different on some nodes, you must specify the path on each node; for example (all on one line): cmquerycl -C $SGCONF/config.
Use pvcreate(1M) to initialize a disk for LVM or, use vxdiskadm(1M) to initialize a disk for VxVM. IMPORTANT: The purpose of cmnotdisk.conf is to exclude specific devices, usually CD and DVD drives, that Serviceguard does not recognize and which should not be probed. Make sure you do not add a DEVICE_FILE entry in cmnotdisk.conf for any device that should be probed; that is, disk devices being managed by LVM or LVM2. Excluding any such device will cause cmquerycl to fail.
NOTE: If you are configuring volume groups that use mass storage on HP’s HA disk arrays, you should use redundant I/O channels from each node, connecting them to separate ports on the array. As of HP-UX 11i v3, the I/O subsystem performs load balancing and multipathing automatically. Creating a Storage Infrastructure with LVM This section describes storage configuration with LVM.
If your volume groups have not been set up, use the procedures that follow. If you have already done LVM configuration, skip ahead to the section “Configuring the Cluster.” Obtain a list of the disks on both nodes and identify which device files are used for the same disk on both.
vgcreate -g bus0 /dev/vgdatabase /dev/dsk/c1t2d0 vgextend -g bus1 /dev/vgdatabase /dev/dsk/c0t2d0 CAUTION: Volume groups used by Serviceguard must have names no longer than 35 characters (that is, the name that follows /dev/, in this example vgdatabase, must be at most 35 characters long). The first command creates the volume group and adds a physical volume to it in a physical volume group called bus0.
Deactivating the Volume Group At the time you create the volume group, it is active on the configuration node (ftsys9, for example).The next step is to unmount the file system and deactivate the volume group; for example, on ftsys9: umount /mnt1 vgchange -a n /dev/vgdatabase NOTE: Do this during this set-up process only, so that activation and mounting can be done by the package control script at run time.
be performed on this node because of a disaster on the primary node and an LVM problem with the volume group.) Do this as shown in the example below: vgchange -a y /dev/vgdatabase vgcfgbackup /dev/vgdatabase vgchange -a n /dev/vgdatabase 6. If you are using mirrored individual disks in physical volume groups, check the /etc/lvmpvg file to ensure that each physical volume group contains the correct physical volume names for ftsys10.
(non-cluster-aware) disks. To make merging the files easier, be sure to keep a careful record of the physical volume group names. Use the following procedure to merge files between the configuration node (ftsys9) and a new node (ftsys10) to which you are importing volume groups: 1. 2. 3. 4. 5. Copy /etc/lvmpvg from ftsys10 to /etc/lvmvpg.new on ftsys10. If there are volume groups in /etc/lvmvpg.new that do not exist on ftsys10, remove all entries for that volume group from /etc/lvmpvg.new.
Initializing the Veritas Cluster Volume Manager 3.5 NOTE: Check the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CVM (and CFS — Cluster File System): http://www.docs.hp.com -> High Availability -> Serviceguard. If you are using CVM 3.5 and you are about to create disk groups for the first time, you need to initialize the Volume Manager.
Initializing Disks Previously Used by LVM If a physical disk has been previously used with LVM, you should use the pvremove command to delete the LVM header data from all the disks in the volume group. In addition, if the LVM disk was previously used in a cluster, you have to re-initialize the disk with the pvcreate -f command to remove the cluster ID from the disk. NOTE: These commands make the disk and its data unusable by LVM, and allow it to be initialized by VxVM.
Creating Volumes Use the vxassist command to create logical volumes. The following is an example: vxassist -g logdata make log_files 1024m This command creates a 1024 MB volume named log_files in a disk group named logdata. The volume can be referenced with the block device file /dev/vx/dsk/logdata/log_files or the raw (character) device file /dev/vx/rdsk/logdata/log_files.
When all disk groups have been deported, you must issue the following command on all cluster nodes to allow them to access the disk groups: vxdctl enable Re-Importing Disk Groups After deporting disk groups, they are not available for use on the node until they are imported again either by a package control script or with a vxdg import command.
NOTE: You can use Serviceguard Manager to configure a cluster: open the System Management Homepage (SMH) and choose Tools-> Serviceguard Manager. See “Using Serviceguard Manager” (page 32) for more information. To use Serviceguard commands to configure the cluster, follow directions in the remainder of this section. Use the cmquerycl command to specify a set of nodes to be included in the cluster and to generate a template for the cluster configuration file.
-w none skips network querying. If you have recently checked the networks, this option will save time. Specifying the Address Family for the Heartbeat To tell Serviceguard to use only IPv4, or only IPv6, addresses for the heartbeat, use the -h option. For example, to use only IPv6 addresses: cmquerycl -v -h ipv6 -C $SGCONF/clust1.conf -n ftsys9 -n ftsys10 • • • • -h ipv4 tells Serviceguard to discover and configure only IPv4 subnets. If it does not find any eligible subnets, the command will fail.
that this disk meets your power wiring requirements. If necessary, choose a disk powered by a circuit which powers fewer than half the nodes in the cluster. To display the failover times of disks, use the cmquerycl command, specifying all the nodes in the cluster. The output of the command lists the disks connected to each node together with the re-formation time associated with each. Do not include the node’s entire domain name; for example, specify ftsys9, not ftsys9.cup.hp.
NODE_NAME hasupt21 NETWORK_INTERFACE lan1 HEARTBEAT_IP 15.13.173.189 NETWORK_INTERFACE lan2 NETWORK_INTERFACE lan3 CLUSTER_LOCK_LUN /dev/dsk/c0t1d1 Specifying a Quorum Server IMPORTANT: The following are standard instructions. For special instructions that may apply to your version of Serviceguard and the Quorum Server see “Configuring Serviceguard to Use the Quorum Server” in the latest version HP Serviceguard Quorum Server Version A.04.00 Release Notes, at http://www.docs.hp.
Node Names: nodeA nodeB nodeC nodeD Bridged networks (full probing performed): 1 lan3 lan4 lan3 lan4 (nodeA) (nodeA) (nodeB) (nodeB) 2 lan1 lan1 (nodeA) (nodeB) 3 lan2 lan2 (nodeA) (nodeB) 4 lan3 lan4 lan3 lan4 lan1 lan1 (nodeC) (nodeC) (nodeD) (nodeD) (nodeC) (nodeD) lan2 lan2 (nodeC) (nodeD) 5 6 IP subnets: IPv4: 15.13.164.0 lan1 lan1 (nodeA) (nodeB) 15.13.172.0 lan1 lan1 (nodeC) (nodeD) 15.13.165.0 lan2 lan2 (nodeA) (nodeB) 15.13.182.0 lan2 lan2 (nodeC) (nodeD) 15.244.65.
3ffe:1111::/64 lan3 lan3 (nodeA) (nodeB) 3ffe:2222::/64 lan3 lan3 (nodeC) (nodeD) Possible Heartbeat IPs: 15.13.164.0 15.13.164.1 15.13.164.2 (nodeA) (nodeB) 15.13.172.0 15.13.172.158 15.13.172.159 (nodeC) (nodeD) 15.13.165.0 15.13.165.1 15.13.165.2 (nodeA) (nodeB) 15.13.182.0 15.13.182.158 15.13.182.159 (nodeC) (nodeD) Route connectivity(full probing performed): 1 15.13.164.0 15.13.172.0 2 15.13.165.0 15.13.182.0 3 15.244.65.0 4 15.244.56.
Identifying Heartbeat Subnets The cluster configuration file includes entries for IP addresses on the heartbeat subnet. HP recommends that you use a dedicated heartbeat subnet, and configure heartbeat on other subnets as well, including the data subnet. The heartbeat can be configured on an IPv4 or IPv6 subnet. The heartbeat can comprise multiple subnets joined by a router. In this case at least two heartbeat paths must be configured for each cluster node.
Controlling Access to the Cluster Serviceguard access-control policies define cluster users’ administrative or monitoring capabilities. A Note about Terminology Although you will also sometimes see the term role-based access (RBA) in the output of Serviceguard commands, the preferred set of terms, always used in this manual, is as follows: • Access-control policies- the set of rules defining user access to the cluster.
Figure 5-1 Access Roles Levels of Access Serviceguard recognizes two levels of access, root and non-root: • Root access: Full capabilities; only role allowed to configure the cluster. As Figure 5-1 shows, users with root access have complete control over the configuration of the cluster and its packages. This is the only role allowed to use the cmcheckconf, cmapplyconf, cmdeleteconf, and cmmodnet -a commands.
IMPORTANT: Users on systems outside the cluster can gain Serviceguard root access privileges to configure the cluster only via a secure connection (rsh or ssh). • Non-root access: Other users can be assigned one of four roles: — Full Admin: Allowed to perform cluster administration, package administration, and cluster and package view operations. These users can administer the cluster, but cannot configure or create a cluster. Full Admin includes the privileges of the Package Admin role.
NOTE: For more information and advice, see the white paper Securing Serviceguard at http://docs.hp.com -> High Availability -> Serviceguard -> White Papers. Define access-control policies for a cluster in the cluster configuration file; see “Cluster Configuration Parameters ” (page 138). You can define up to 200 access policies for each cluster. A root user can create or modify access control policies while the cluster is running.
NOTE: If you set USER_HOST to ANY_SERVICEGUARD_NODE, set USER_ROLE to MONITOR; users connecting from outside the cluster cannot have any higher privileges (unless they are connecting via rsh or ssh; this is treated as a local connection). Depending on your network configuration, ANY_SERVICEGUARD_NODE can provide wide-ranging read-only access to the cluster.
Plan the cluster’s roles and validate them as soon as possible. If your organization’s security policies allow it, you may find it easiest to create group logins. For example, you could create a MONITOR role for user operator1 from ANY_CLUSTER_NODE. Then you could give this login name and password to everyone who will need to monitor your clusters.
NOTE: Check spelling especially carefully when typing wildcards, such as ANY_USER and ANY_SERVICEGUARD_NODE. If they are misspelled, Serviceguard will assume they are specific users or nodes.
• • • • • • • • • • • • • If all nodes specified are in the same heartbeat subnet, except in cross-subnet configurations(page 41) . If you specify the wrong configuration filename. If all nodes can be accessed. No more than one CLUSTER_NAME and AUTO_START_TIMEOUT are specified. The value for package run and halt script timeouts is less than 4294 seconds. The value for AUTO_START_TIMEOUT variables is >=0. Heartbeat network minimum requirement is met.
or cmapplyconf -k -v -C /etc/cmcluster/clust1.ascii NOTE: Using the -k option means that cmapplyconf only checks disk connectivity to the LVM disks that are identified in the ASCII file. Omitting the -k option (the default behavior) means that cmapplyconf tests the connectivity of all LVM disks on all nodes. Using -k can result in significantly faster operation of the command. • Deactivate the cluster lock volume group.
NOTE: You must use the vgcfgbackup command to store a copy of the cluster lock disk's configuration data whether you created the volume group using the System Management Homepage (SMH), SAM, or HP-UX commands. If the cluster lock disk ever needs to be replaced while the cluster is running, you must use the vgcfgrestore command to restore lock information to the replacement disk.
Preparing the Cluster and the System Multi-node Package 1. First, be sure the cluster is running: cmviewcl 2. If it is not, start it: cmruncl 3. 4. If you have not initialized your disk groups, or if you have an old install that needs to be re-initialized, use the vxinstallcommand to initialize VxVM/CVM disk groups. See “Initializing the Veritas Volume Manager ” (page 242).
Node : Cluster Manager : CVM state : MOUNT POINT TYPE ftsys10 up up SHARED VOLUME DISK GROUP STATUS NOTE: Because the CVM system multi-node package automatically starts up the Veritas processes, do not edit /etc/llthosts, /etc/llttab, or /etc/gabtab. Creating the Disk Groups Initialize the disk group from the master node. 1. 2. Find the master node usingvxdctl or cfscluster status Initialize a new disk group, or import an existing disk group, in shared mode, using the vxdg command.
Node Name : ftsys9 (MASTER) DISK GROUP ACTIVATION MODE logdata off (sw) Node Name : DISK GROUP logdata 3. ftsys10 ACTIVATION MODE off (sw) Activate the disk group and start up the package: cfsdgadm activate logdata 4. To verify, you can use cfsdgadm or cmviewcl. This example shows the cfsdgadm output: cfsdgadm display -v logdata NODE NAME ACTIVATION MODE ftsys9 sw (sw) MOUNT POINT SHARED VOLUME ftsys10 sw (sw) MOUNT POINT SHARED VOLUME 5.
Creating a File System and Mount Point Package CAUTION: Nested mounts are not supported: do not use a directory in a CFS file system as a mount point for a local file system or another cluster file system. For other restrictions, see “Unsupported Features” in the “Technical Overview” section of the VERITAS Storage Foundation™ Cluster File System 4.1 HP Serviceguard Storage Management Suite Extracts at http://docs.hp.com -> High Availability -> HP Serviceguard Storage Management Suite. 1.
4. Mount the filesystem: cfsmount /tmp/logdata/log_files This starts up the multi-node package and mounts a cluster-wide filesystem. 5.
Mount Point Packages for Storage Checkpoints The Veritas File System provides a unique storage checkpoint facility which quickly creates a persistent image of a filesystem at an exact point in time. Storage checkpoints significantly reduce I/O overhead by identifying and maintaining only the filesystem blocks that have changed since the last storage checkpoint or backup. This is done by a copy-on-write technique.
SG-CFS-pkg SG-CFS-DG-1 SG-CFS-MP-1 SG-CFS-CK-1 up up up up running running running running enabled enabled enabled disabled yes no no no /tmp/check_logfiles now contains a point in time view of /tmp/logdata/log_files, and it is persistent.
CLUSTER cfs-cluster STATUS up NODE STATUS ftsys9 up ftsys10 up MULTI_NODE_PACKAGES PACKAGE SG-CFS-pkg SG-CFS-DG-1 SG-CFS-MP-1 SG-CFS-SN-1 STATUS up up up up STATE running running STATE running running running running AUTO_RUN enabled enabled enabled disabled SYSTEM yes no no no The snapshot file system/local/snap1 is now mounted and provides a point in time view of /tmp/logdata/log_files.
the Veritas Installation Guide for your version. For more information, refer to the Veritas Volume Manager Administrator’s Guide for your version. Separate procedures follow for: • • • Initializing the Volume Manager Preparing the Cluster for Use with CVM Creating Disk Groups for Shared Storage For more information, including details about configuration of plexes (mirrors), multipathing, and RAID, refer to the HP-UX documentation for the Veritas Volume Manager.
NOTE: Cluster configuration is described in the previous section, “Configuring the Cluster ” (page 216). Check the heartbeat configuration. The CVM 3.5 heartbeat requirement is different from version 4.1 and later: • • CVM 3.5 requires that you can configure only one heartbeat subnet. CVM 4.1 and later versions require that the cluster have either multiple heartbeats or a single heartbeat with a standby. Neither version can use Auto Port Aggregation, Infiniband, or VLAN interfaces as a heartbeat subnet.
PACKAGE STATUS VxVM-CVM-pkg up STATE running AUTO_RUN enabled SYSTEM yes NOTE: Do not edit system multi-node package configuration files, such as VxVM-CVM-pkg.conf and SG-CFS-pkg.conf. Create and modify the configuration using thecfs administration commands. Starting the Cluster and Identifying the Master Node If it is not already running, start the cluster.
3. Activate the disk group, as follows, before creating volumes: vxdg -g logdata set activation=ew Creating Volumes Use the vxassist command to create volumes, as in the following example: vxassist -g logdata make log_files 1024m This command creates a 1024 MB volume named log_files in a disk group named logdata. The volume can be referenced with the block device file /dev/vx/dsk/logdata/log_files or the raw (character) device file /dev/vx/rdsk/logdata/log_files.
groups, filesystems, logical volumes, and mount options in the package control script. The package configuration process is described in detail in Chapter 6. Using DSAU during Configuration You can use DSAU to centralize and simplify configuration and monitoring tasks. See “What are the Distributed Systems Administration Utilities?” (page 34). Managing the Running Cluster This section describes some approaches to routine management of the cluster.
1. If the cluster is not already running, start it. From the Serviceguard Manager menu, choose Run Cluster. From the command line, use cmruncl -v. By default, cmruncl will check the networks. Serviceguard will probe the actual network configuration with the network information in the cluster configuration. If you do not need this validation, use cmruncl -v -w none instead, to turn off validation and save time 2. When the cluster has started, make sure that cluster components are operating correctly.
NOTE: Special considerations apply in the case of the root volume group: • If the root volume group is mirrored using MirrorDisk/UX, include it in the custom_vg_activation function so that any stale extents in the mirror will be re-synchronized. • Otherwise, the root volume group does not need to be included in the custom_vg_activation function, because it is automatically activated before the/etc/lvmrc file is used at boot time.
NOTE: The /sbin/init.d/cmcluster file may call files that Serviceguard stored in the directories: /etc/cmcluster/rc (HP-UX) and ${SGCONF}/rc (Linux). The directory is for Serviceguard use only! Do not move, delete, modify, or add files to this directory. Changing the System Message You may find it useful to modify the system's login message to include a statement such as the following: This system is a node in a high availability cluster.
It is not necessary to halt the single node in this scenario, since the application is still running, and no other node is currently available for package switching. However, you should not try to restart Serviceguard, since data corruption might occur if the node were to attempt to start up a new instance of the application that is still running on the node.
NOTE: The cmdeleteconf command removes only the cluster binary file /etc/cmcluster/cmclconfig. It does not remove any other files from the /etc/cmcluster directory. Although the cluster must be halted, all nodes in the cluster should be powered up and accessible before you use the cmdeleteconf command. If a node is powered down, power it up and boot.
6 Configuring Packages and Their Services Serviceguard packages group together applications and the services and resources they depend on. The typical Serviceguard package is a failover package that starts on one node but can be moved (“failed over”) to another if necessary. See “What is Serviceguard? ” (page 29), “How the Package Manager Works” (page 67), and “Package Configuration Planning ” (page 159) for more information.
NOTE: This is a new process for configuring packages, as of Serviceguard A.11.18. This manual refers to packages created by this method as modular packages, and assumes that you will use it to create new packages; it is simpler and more efficient than the older method, allowing you to build packages from smaller modules, and eliminating the separate package control script and the need to distribute it manually. Packages created using Serviceguard A.11.17 or earlier are referred to as legacy packages.
them, and then start them up on another node selected from the package’s configuration list; see “node_name” (page 263). To generate a package configuration file that creates a failover package, include-m sg/failover on the cmmakepkg command line. See “Generating the Package Configuration File” (page 285). • Multi-node packages. These packages run simultaneously on more than one node in the cluster.
NOTE: On systems that support CFS, you configure the CFS system multi-node package by means of the cfscluster command, not by editing a package configuration file. See “Configuring Veritas System Multi-node Packages” (page 293). NOTE: The following parameters cannot be configured for multi-node or system multi-node packages: • failover_policy • failback_policy • ip_subnet • ip_address Volume groups configured for packages of these types must be activated in shared mode.
start on the first eligible node on which an instance of the multi-node package comes up; this may not be the dependent packages’ primary node. To ensure that dependent failover packages restart on their primary node if the multi-node packages they depend on need to be restarted, make sure the dependent packages’ package switching is not re-enabled before the multi-node packages are restarted.
Table 6-1 Base Modules Module Name Parameters (page) Comments failover package_name (page 262) * module_name (page 262) * module_version (page 262) * package_type (page 263) package_description (page 263) * node_name (page 263) auto_run (page 263) node_fail_fast_enabled (page 264) run_script_timeout (page 264) halt_script_timeout (page 265) successor_halt_timeout (page 265) * script_log_file (page 265) operation_sequence (page 266) * log_level (page 266) * failover_policy (page 266) failback_policy (pag
Table 6-2 Optional Modules Module Name Parameters (page) Comments dependency dependency_name (page 267) * dependency_condition (page 268) * dependency_location (page 269) * Add to a base module to create a package that depends on one or more other packages. weight weight_name (page 269) * weight value (page 269) * Add to a base module to create a package that has weight that will be counted against a node's capacity.
Table 6-2 Optional Modules (continued) 260 Module Name Parameters (page) Comments filesystem concurrent_fsck_operations (page 280) (S) concurrent_mount_and_umount_operations (page 280) (S) fs_mount_retry_count (page 281) (S) fs_umount_retry_count (page 281) * (S) fs_name (page 281) * (S) fs_directory (page 282) * (S) fs_type (page 282) (S) fs_mount_opt (page 282) (S) fs_umount_opt (page 282) (S) fs_fsck_opt (page 282) (S) Add to a base module to configure filesystem options for the package.
Table 6-2 Optional Modules (continued) Module Name Parameters (page) Comments multi_node_all all parameters that can be used by a multi-node package; includes multi_node, dependency, monitor_subnet, service, resource, volume_group, filesystem, pev, external_pre, external, and acp modules. Use if you are creating a multi-node package that requires most or all of the optional parameters that are available for this type of package.
NOTE: For more information, see the comments in the editable configuration file output by the cmmakepkg command, and the cmmakepkg manpage.
package_type The type can be failover, multi_node, or system_multi_node. Default is failover. You can configure only failover or multi-node packages; see “Types of Package: Failover, Multi-Node, System Multi-Node” (page 254). package_description The application that the package runs. This is a descriptive parameter that can be set to any value you choose, up to a maximum of 80 characters. Default value is Serviceguard Package. New for 11.
This is also referred to as package switching, and can be enabled or disabled while the package is running, by means of the cmmodpkg (1m) command. auto_run should be set to yes if the package depends on another package, or is depended on; see “About Package Dependencies” (page 166). For system multi-node packages, auto_run must be set to yes. In the case of a multi-node package, setting auto_run to yes allows an instance to start on a new node joining the cluster; no means it will not.
NOTE: VxVM disk groups are imported at package run time and exported at package halt time. If a package uses a large number of VxVM disk, the timeout value must be large enough to allow all of them to finish the import or export. NOTE: If (no_timeout is specified, and the script hangs, or takes a very long time to complete, during the validation step (cmcheckconf (1m)), cmcheckconf will wait 20 minutes to allow the validation to complete before giving up.
Kept” (page 194) for more information about Serviceguard pathnames.) See also log_level (page 266). operation_sequence Defines the order in which the scripts defined by the package’s component modules will start up. See the package configuration file for details. This parameter is not configurable; do not change the entries in the configuration file. New for modular packages.
failback_policy Specifies what action the package manager should take when a failover package is not running on its primary node (the first node on its node_name list) and the primary node is once again available. Can be set to automatic or manual. The default is manual. • • manual means the package will continue to run on the current (adoptive) node.
IMPORTANT: Restrictions on dependency names in previous Serviceguard releases were less stringent. Packages that specify dependency_names that do not conform to the above rules will continue to run, but if you reconfigure them, you will need to change the dependency_name; cmcheckconf and cmapplyconf will enforce the new rules.
— Both packages must be failover packages whose failover_policy is configured_node. — At least one of the packages must specify a priority (page 267). For more information, see “About Package Dependencies” (page 166). dependency_location Specifies where the dependency_condition must be met. • If dependency_condition is UP, legal values fordependency_location are same_node, any_node, and different_node. — same_node means that the package depended on must be running on the same node.
weight_value is an unsigned floating-point value between 0 and 1000000 with at most three digits after the decimal point. You can use these parameters to override the cluster-wide default package weight that corresponds to a given node capacity. You can define that cluster-wide default package weight by means of the WEIGHT_NAME and WEIGHT_DEFAULT parameters in the cluster configuration file (explicit default).
monitored_subnet_access In cross-subnet configurations, specifies whether each monitored_subnet is accessible on all nodes in the package’s node list (see node_name (page 263)), or only some. Valid values are PARTIAL, meaning that at least one of the nodes has access to the subnet, but not all; and FULL, meaning that all nodes have access to the subnet. The default is FULL, and it is in effect if monitored_subnet_access is not specified.
ip_subnet Specifies an IP subnet used by the package for relocatable addresses; see also ip_address (page 273) and “Stationary and Relocatable IP Addresses ” (page 91). Replaces SUBNET, which is still supported in the package control script for legacy packages. CAUTION: HP recommends that this subnet be configured into the cluster.
If you want the subnet to be monitored, specify it in the monitored_subnet parameter as well. In a cross-subnet configuration, you also need to specify which nodes the subnet is configured on; see ip_subnet_node below. See also monitored_subnet_access (page 271) and “About Cross-Subnet Failover” (page 188). This parameter can be set for failover packages only. Can be added or deleted while the package is running.
The length and formal restrictions for the name are the same as for package_name (page 262). service_name must be unique among all packages in the cluster. IMPORTANT: Restrictions on service names in previous Serviceguard releases were less stringent. Packages that specify services whose names do not conform to the above rules will continue to run, but if you reconfigure them, you will need to change the name; cmcheckconf and cmapplyconf will enforce the new rules.
NOTE: Be careful when defining service run commands. Each run command is executed in the following way: • The cmrunserv command executes the run command. • Serviceguard monitors the process ID (PID) of the process the run command creates. • When the command exits, Serviceguard determines that a failure has occurred and takes appropriate action, which may include transferring the package to an adoptive node.
resource_name The name of a resource to be monitored. resource_name, in conjunction with resource_polling_interval, resource_start and resource_up_value, defines an Event Monitoring Service (EMS) dependency. In legacy packages, RESOURCE_NAME in the package configuration file requires a corresponding DEFERRED_RESOURCE_NAME in the package control script.
concurrent_vgchange_operations Specifies the number of concurrent volume group activations or deactivations allowed during package startup or shutdown. Legal value is any number greater than zero. The default is 1. If a package activates a large number of volume groups, you can improve the package’s start-up and shutdown performance by carefully tuning this parameter.
The default is vgchange -a e. The configuration file contains several other vgchange command variants; either uncomment one of these and comment out the default, or use the default. For more information, see the explanations in the configuration file, “LVM Planning ” (page 133), and “Creating the Storage Infrastructure and Filesystems with LVM, VxVM and CVM” (page 206).
in the configuration file, and uncomment the line vxvol_cmd "vxvol -g \${DiskGroup} -o bg startall" This allows package startup to continue while mirror re-synchronization is in progress. vg Specifies an LVM volume group (one per vg, each on a new line) on which a file system needs to be mounted. A corresponding vgchange_cmd (page 277) specifies how the volume group is to be activated. The package script generates the necessary filesystem commands on the basis of the fs_ parameters (page 281).
Legal value is zero or any greater number. Default is zero. kill_processes_accessing_raw_devices Specifies whether or not to kill processes that are using raw devices (for example, database applications) when the package shuts down. Default is no. See the comments in the package configuration file for more information. File system parameters A package can activate one or more storage groups on startup, and to mount logical volumes to file systems.
If the package needs to mount and unmount a large number of filesystems, you can improve performance by carefully tuning this parameter during testing (increase it a little at time and monitor performance each time). fs_mount_retry_count The number of mount retries for each file system. Legal value is zero or any greater number. The default is zero. If the mount point is busy at package startup and fs_mount_retry_count is set to zero, package startup will fail.
fs_directory The root of the file system specified by fs_name. Replaces FS, which is still supported in the package control script for legacy packages; see “Configuring a Legacy Package” (page 338). See the mount (1m) manpage for more information. fs_type The type of the file system specified by fs_name. This parameter is in the package control script for legacy packages. See the mount (1m) and fstyp (1m) manpages for more information.
external_pre_script The full pathname of an external script to be executed before volume groups and disk groups are activated during package startup, and after they have been deactivated during package shutdown (that is, effectively the first step in package startup and last step in package shutdown). New for modular packages.
user_host The system from which a user specified by user_name can execute package-administration commands. Legal values are any_serviceguard_node, or cluster_member_node, or a specific cluster node. If you specify a specific node it must be the official hostname (the hostname portion, and only thehostname portion, of the fully qualified domain name). As with user_name, be careful to spell the keywords exactly as given.
These two parameters allow you to separate package run instructions and package halt instructions for legacy packages into separate scripts if you need to. In this case, make sure you include identical configuration information (such as node names, IP addresses, etc.) in both scripts. In most cases, though, HP recommends that you use the same script for both run and halt instructions. (When the package starts, the script is passed the parameter start; when it halts, it is passed the parameter stop.
NOTE: If you do not include a base module (or default or all) on the cmmakepkg command line, cmmakepkg will ignore the modules you specify and generate a default configuration file containing all the parameters. For a complex package, or if you are not yet sure which parameters you will need to set, the default may be the best choice; see the first example below. You can use the-v option with cmmakepkg to control how much information is displayed online or included in the configuration file.
Editing the Configuration File When you have generated the configuration file that contains the modules your package needs (see “Generating the Package Configuration File” (page 285)), you need to edit the file to set the package parameters to the values that will make the package function as you intend. IMPORTANT: Do not edit the package configuration file of a Veritas Cluster Volume Manager (CVM) or Cluster File System (CFS) multi-node or system multi-node package.
NOTE: Optional parameters are commented out in the configuration file (with a # at the beginning of the line). In some cases these parameters have default values that will take effect unless you uncomment the parameter (remove the #) and enter a valid value different from the default. Read the surrounding comments in the file, and the explanations in this chapter, to make sure you understand the implications both of accepting and of changing a given default.
• If this package will depend on another package or packages, enter values for dependency_name, (page 267)dependency_condition, dependency_location, and optionally priority. See “About Package Dependencies” (page 166) for more information. NOTE: The package(s) this package depends on must already be part of the cluster configuration by the time you validate this package (via cmcheckconf(1m); see “Verifying and Applying the Package Configuration” (page 291)); otherwise validation will fail.
• To configure the package to monitor a registered EMS resource, enter values for the following parameters (page 276): — resource_name — resource_polling_interval — resource_up_value — resource_start See “Parameters for Configuring EMS Resources” (page 165) for more information and an example.
• If your package uses a large number of volume groups or disk groups, or mounts a large number of file systems, consider increasing the values of the following parameters: — concurrent_vgchange_operations (page 277) — concurrent_fsck_operations (page 280) — concurrent_mount_and_umount_operations (page 280) You can also use the fsck_opt and fs_umount_opt parameters (page 282) to specify the -s option of the fsck and mount/umount commands.
cmcheckconf -v -P $SGCONF/pkg1/pkg1.conf Errors are displayed on the standard output. If necessary, re-edit the file to correct any errors, then run cmcheckconf again until it completes without errors. The following items are checked: • • • • • • • Package name is valid, and at least one node_name entry is included. There are no duplicate parameter entries (except as permitted for multiple volume groups, etc). Values for all parameters are within permitted ranges.
The control script imports disk groups using the vxdg command with the -tfC options. The -t option specifies that the disk is imported with the noautoimport flag, which means that the disk will not be automatically re-imported at boot time. Since disk groups included in the package control script are only imported by Serviceguard packages, they should not be auto-imported. The -foption allows the disk group to be imported even if one or more disks (a mirror, for example) is not currently available.
NOTE: Check the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information on support for CVM and CFS: http://www.docs.hp.com -> High Availability -> Serviceguard. The SG-CFS-pkg for CVM Version 4.
NOTE: Check the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CFS (http://www.docs.hp.com -> High Availability -> Serviceguard). CAUTION: Once you create the disk group and mount point packages, it is critical that you administer these packages using the cfs commands, including cfsdgadm,cfsmntadm, cfsmount, and cfsumount.
7 Cluster and Package Maintenance This chapter describes how to see cluster configuration and status information, how to start and halt a cluster or an individual node, how to perform permanent reconfiguration, and how to start, halt, move, and modify packages during routine maintenance of the cluster.
TIP: Some commands take longer to complete in large configurations. In particular, you can expect Serviceguard’s CPU usage to increase during cmviewcl -v as the number of packages and services increases. You can also specify that the output should be formatted as it was in a specific earlier release by using the -r option to specify the release format you want, for example: cmviewcl -r A.11.16 (See the cmviewcl(1m) manpage for the supported release formats.
Node Status and State The status of a node is either up (as an active member of the cluster) or down (inactive in the cluster), depending on whether its cluster daemon is running or not. Note that a node might be down from the cluster perspective, but still up and running HP-UX. A node may also be in one of the following states: • • • • • Failed. Active members of the cluster will see a node in this state if that node was active in a cluster, but is no longer, and is not Halted. Reforming.
• • • reconfiguring — The node where this package is running is adjusting the package configuration to reflect the latest changes that have been applied. reconfigure_wait — The node where this package is running is waiting to adjust the package configuration to reflect the latest changes that have been applied. unknown — Serviceguard could not determine the status at the time cmviewcl was run. A system multi-node package is up when it is running on all the active cluster nodes.
The following states are possible only for multi-node packages: • • blocked — The package has never run on this node, either because a dependency has not been met, or because auto_run is set to no. changing — The package is in a transient state, different from the status shown, on some nodes. For example, a status of starting with a state of changing would mean that the package was starting on at least one node, but in some other, transitory condition (for example, failing) on at least one other node.
Failover and Failback Policies Failover packages can be configured with one of two values for the failover_policy parameter (page 266), as displayed in the output of cmviewcl -v: • • configured_node. The package fails over to the next node in the node_name list in the package configuration file (page 263). min_package_node. The package fails over to the node in the cluster that has the fewest running packages.
NODE ftsys10 STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 28.1 32.1 NAME lan0 lan1 PACKAGE pkg2 STATE running AUTO_RUN enabled STATUS up NODE ftsys10 Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover configured_node Failback manual Script_Parameters: ITEM STATUS Service up Subnet up MAX_RESTARTS 0 0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled RESTARTS 0 0 NAME ftsys10 ftsys9 NAME service2 15.
Quorum Server Status: NAME STATUS lp-qs up STATE running CVM Package Status If the cluster is using the Veritas Cluster Volume Manager (CVM), version 3.5, for disk storage, the system multi-node package VxVM-CVM-pkg must be running on all active nodes for applications to be able to access CVM disk groups. The system multi-node package is named SG-CFS-pkg if the cluster is using version 4.1 or later of the Veritas Cluster Volume Manager.
ITEM Service STATUS up NODE STATUS ftsys10 up Script_Parameters: ITEM STATUS Service up MAX_RESTARTS 0 RESTARTS 0 NAME VxVM-CVM-pkg.srv SWITCHING enabled MAX_RESTARTS 0 RESTARTS 0 NAME VxVM-CVM-pkg.srv CFS Package Status If the cluster is using the Veritas Cluster File System (CFS), the system multi-node package SG-CFS-pkg must be running on all active nodes, and the multi-node packages for disk group and mount point must also be running on at least one of their configured nodes.
Status After Halting a Package After we halt the failover package pkg2 with the cmhaltpkg command, the output of cmviewcl-v is as follows: CLUSTER example NODE ftsys9 STATUS up STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 56/36.
Failback manual Script_Parameters: ITEM STATUS Resource up Subnet up Resource down Subnet up NODE_NAME ftsys9 ftsys9 ftsys10 ftsys10 NAME /example/float 15.13.168.0 /example/float 15.13.168.0 Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled NAME ftsys10 ftsys9 pkg2 now has the status down, and it is shown as unowned, with package switching disabled. Resource /example/float, which is configured as a dependency of pkg2, is down on one node.
Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled PACKAGE pkg2 STATUS up STATE running NAME ftsys9 ftsys10 AUTO_RUN disabled (current) NODE ftsys9 Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover configured_node Failback manual Script_Parameters: ITEM STATUS MAX_RESTARTS RESTARTS NAME Service up 0 0 service2.1 Subnet up 15.13.168.
Both packages are now running on ftsys9 and pkg2 is enabled for switching. ftsys10 is running the cmcld daemon but no packages.
Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled Alternate up enabled Alternate up enabled Alternate up enabled NAME manx burmese tabby persian Viewing Information about System Multi-Node Packages The following example shows a cluster that includes system multi-node packages as well as failover packages. The system multi-node packages are running on all nodes in the cluster, whereas the standard packages run on only one node at a time.
tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/1vol1 regular lvol1 vg_for_cvm_dd5 MOUNTED /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/lvol4 regular lvol4 vg_for_cvm_dd5 MOUNTED Node : ftsys8 Cluster Manager : up CVM state : up MOUNT POINT TYPE /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/lvol1 /var/opt/sgtest/ tmp/mnt/dev/vx/dsk/ vg_for_cvm1_dd5/lvol4 SHARED VOLUME DISK GROUP STATUS regular lvol1 vg_for_cvm_veggie_dd5 MOUNTED regular lvol4 vg_for_cvm_dd5 MOUNTED Status of the Packa
Status of CFS Disk Group Packages To see the status of the disk group, use the cfsdgadm display command. For example, for the disk group logdata, enter: cfsdgadm display -v logdata NODE NAME ACTIVATION MODE ftsys9 MOUNT POINT sw (sw) SHARED VOLUME TYPE ftsys10 sw (sw) MOUNT POINT SHARED VOLUME TYPE ... To see which package is monitoring a disk group, use the cfsdgadm show_package command.
You can use Serviceguard Manager or the Serviceguard command line to start or stop the cluster, or to add or halt nodes. Starting the cluster means running the cluster daemon on one or more of the nodes in a cluster. You use different Serviceguard commands to start the cluster, depending on whether all nodes are currently down (that is, no cluster daemons are running), or whether you are starting the cluster daemon on an individual node.
cmruncl -v -n ftsys9 -n ftsys10 CAUTION: Serviceguard cannot guarantee data integrity if you try to start a cluster with the cmruncl -n command while a subset of the cluster's nodes are already running a cluster. If the network connection is down between nodes, using cmruncl -n might result in a second cluster forming, and this second cluster might start up the same applications that are already running on the other cluster. The result could be two applications overwriting each other's data on the disks.
NOTE: HP recommends that you remove a node from participation in the cluster (by running cmhaltnode as shown below, or Halt Node in Serviceguard Manager) before running the HP-UX shutdown command, especially in cases in which a packaged application might have trouble during shutdown and not halt cleanly. Use cmhaltnode to halt one or more nodes in a cluster. The cluster daemon on the specified node stops, and the node is removed from active participation in the cluster.
• • Changing package switching behavior Maintaining a package using maintenance mode Non-root users with the appropriate privileges can perform these tasks. See “Controlling Access to the Cluster” (page 224) for information about configuring access. You can use Serviceguard Manager or the Serviceguard command line to perform these tasks. Starting a Package Ordinarily, when a cluster starts up, the packages configured as part of the cluster will start up on their configured nodes.
Starting the Special-Purpose CVM and CFS Packages Use CFS administration commands to start the special-purpose multi-node packages used with CVM and CFS. For example, to start the special-purpose multi-node package for the disk group package ( SG-CFS-DG-id#), use the cfsdgadm command. To start the special-purpose multi-node package for the mount package (SG-CFS-MP-id#) use the cfsmntadm command.
You cannot halt a package unless all packages that depend on it are down. If you try, Serviceguard will take no action, except to send a message indicating that not all dependent packages are down. Before you halt a system multi-node package, or halt all instances of a multi-node package, halt any packages that depend on them Moving a Failover Package You can use Serviceguard Manager, or Serviceguard commands as shown below, to move a failover package from one node to another.
package, use the cmmodpkg command. For example, if pkg1 is currently running, and you want to prevent it from starting up on another node, enter the following: cmmodpkg -d pkg1 This does not halt the package, but will prevent it from starting up elsewhere. You can disable package switching to particular nodes by using the -n option of the cmmodpkg command.
2. Place the package in maintenance mode: cmmodpkg -m on pkg1 3. Run the package in maintenance mode. In this example, we'll start pkg1 such that only the modules up to and including the package_ip module are started. (See “Package Modules and Parameters” (page 257) for a list of package modules. The modules used by a package are started in the order shown near the top of its package configuration file.) cmrunpkg -m sg/package_ip pkg1 4.
apart from the module whose components you are going to work on. In this case you can use the -e option: cmrunpkg -e sg/service pkg1 This runs all the package's modules except the services module. You can also use -e in combination with -m. This has the effect of starting all modules up to and including the module identified by -m, except the module identified by -e.
• • “About Package Weights” (page 174) for a discussion of weights and capacities. This allows you to perform maintenance on any node in the cluster. Node-wide and cluster-wide events affect the package as follows: — If the node the package is running on is halted, the package will also be halted, and will remain in maintenance mode; it will not be automatically re-started. — If the node crashes, the package will remain in maintenance mode and will not be automatically re-started.
Reconfiguring a Cluster You can reconfigure a cluster either when it is halted or while it is still running. Some operations can only be done when the cluster is halted. Table 7-1 shows the required cluster state for many kinds of changes. Table 7-1 Types of Changes to the Cluster Configuration Change to the Cluster Configuration Required Cluster State Add a new node All systems configured as members of this cluster must be running.
Table 7-1 Types of Changes to the Cluster Configuration (continued) Change to the Cluster Configuration Required Cluster State Reconfigure IP addresses for a NIC used by the Must delete the interface from the cluster configuration, cluster reconfigure it, then add it back into the cluster configuration. See “What You Must Keep in Mind” (page 332). Cluster can be running throughout.
Serviceguard provides two ways to do this: you can use the preview mode of Serviceguard commands, or you can use the cmeval (1m) command to simulate different cluster states. Alternatively, you might want to model changes to the cluster as a whole; cmeval allows you to do this; see “Using cmeval” (page 326).
cmmodpkg -e -t pkg1 You will see output something like this: package:pkg3|node:node2|action:failing package:pkg2|node:node2|action:failing package:pkg2|node:node1|action:starting package:pkg3|node:node1|action:starting package:pkg1|node:node1|action:starting cmmodpkg: Command preview completed successfully This shows that pkg1, when enabled, will “drag” pkg2 and pkg3 to its primary node, node1. It can do this because of its higher priority; see “Dragging Rules for Simple Dependencies” (page 168).
In the output of cmviewcl -v -f line, you would find the line package:pkg1|autorun=disabled and change it to package:pkg1|autorun=enabled. You should also make sure that the nodes the package is configured to run on are shown as available; for example: package:pkg1|node:node1|available=yes. Then save the file (for example as newstate.in) and run cmeval: cmeval -v newstate.
IMPORTANT: See “What Happens when You Change the Quorum Configuration Online” (page 66) for important information. 1. 2. 3. In the cluster configuration file, modify the values of FIRST_CLUSTER_LOCK_PV and SECOND_CLUSTER_LOCK_PV for each node. Run cmcheckconf to check the configuration. Run cmapplyconf to apply the configuration. For information about replacing the physical disk, see “Replacing a Lock Disk” (page 366).
4. line. This file overwrites any previous version of the binary cluster configuration file. Start the cluster on all nodes or on a subset of nodes. Use Serviceguard Manager’s Run Cluster command, or cmruncl on the command line. Reconfiguring a Running Cluster This section provides instructions for changing the cluster configuration while the cluster is up and running. Note the following restrictions: • • • You cannot remove an active node from the cluster. You must halt the node first.
5. Apply the changes to the configuration and distribute the new binary configuration file to all cluster nodes: cmapplyconf -C clconfig.ascii Use cmrunnode to start the new node, and, if you so decide, set the AUTOSTART_CMCLD parameter to 1 in the /etc/rc.conf.d/cmcluster file to enable the new node to join the cluster automatically each time it reboots.
5. Verify the new configuration: cmcheckconf -C clconfig.ascii 6. From ftsys8 or ftsys9, apply the changes to the configuration and distribute the new binary configuration file to all cluster nodes.: cmapplyconf -C clconfig.ascii NOTE: If you are trying to remove an unreachable node on which many packages are configured to run (especially if the packages use a large number of EMS resources) you may see the following message: The configuration change is too large to process while the cluster is running.
What You Must Keep in Mind The following restrictions apply: • You must not change the configuration of all heartbeats at one time, or change or delete the only configured heartbeat. At least one working heartbeat, preferably with a standby, must remain unchanged. • In a CVM configuration, you can add and delete only data LANs and IP addresses. You cannot change the heartbeat configuration while a cluster that uses CVM is running.
Examples of when you must do this include: — moving a NIC from one subnet to another — adding an IP address to a NIC — removing an IP address from a NIC CAUTION: Do not add IP addresses to network interfaces that are configured into the Serviceguard cluster, unless those IP addresses themselves will be immediately configured into the cluster as stationary IP addresses.
2. Edit the file to uncomment the entries for the subnet that is being added lan0 in this example), and change STATIONARY_IP to HEARTBEAT_IP: NODE_NAME ftsys9 NETWORK_INTERFACE lan1 HEARTBEAT_IP 192.3.17.18 NETWORK_INTERFACE lan0 HEARTBEAT_IP 15.13.170.18 NETWORK_INTERFACE lan3 # Possible standby Network Interfaces for lan1, lan0: lan2. NODE_NAME ftsys10 NETWORK_INTERFACE lan1 HEARTBEAT_IP 192.3.17.19 NETWORK_INTERFACE lan0 HEARTBEAT_IP 15.13.170.
2. Run cmquerycl to get the cluster configuration file: cmquerycl -c cluster1 -C clconfig.ascii 3. Comment out the network interfaces lan0 and lan3 and their network interfaces, if any, on all affected nodes. The networking portion of the NODE_NAME ftsys9 NETWORK_INTERFACE lan1 HEARTBEAT_IP 192.3.17.18 # NETWORK_INTERFACE lan0 # STATIONARY_IP 15.13.170.18 # NETWORK_INTERFACE lan3 # Possible standby Network Interfaces for lan1, lan0: lan2. NODE_NAME ftsys10 NETWORK_INTERFACE lan1 HEARTBEAT_IP 192.3.17.
3. 4. 5. 6. Edit clconfig.ascii and delete the line(s) specifying the NIC name and its IP address(es) (if any) from the configuration. Run cmcheckconf to verify the new configuration. Run cmapplyconf to apply the changes to the configuration and distribute the new configuration file to all the cluster nodes. Runolrad -d to remove the NIC. See also “Replacing LAN or Fibre Channel Cards” (page 368).
NOTE: Check the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CVM and CFS: http://www.docs.hp.com -> High Availability -> Serviceguard. Create CVM disk groups from the CVM Master Node: • • For CVM 3.5, and for CVM 4.1 and later without CFS, edit the configuration file of the package that uses CVM storage.
the new configuration. Using the -k or -K option can significantly reduce the response time. Use cmapplyconf to apply the changes to the configuration and send the new configuration file to all cluster nodes. Using -k or -K can significantly reduce the response time. Configuring a Legacy Package IMPORTANT: You can still create a new legacy package. If you are using a Serviceguard Toolkit such as Serviceguard NFS Toolkit, consult the documentation for that product.
NOTE: For instructions on creating Veritas special-purpose system multi-node and multi-node packages, see “Configuring Veritas System Multi-node Packages” (page 293) and “Configuring Veritas Multi-node Packages” (page 294). 1. Create a subdirectory for each package you are configuring in the /etc/ cmcluster directory: mkdir /etc/cmcluster/pkg1 You can use any directory names you like. 2. Generate a package configuration file for each package, for example: cmmakepkg -p /etc/cmcluster/pkg1/pkg1.
NOTE: HP strongly recommends that you never edit the package configuration file of a CVM/CFS multi-node or system multi-node package, although Serviceguard does not prohibit it. Create VxVM-CVM-pkg and SG-CFS-pkg by issuing the cmapplyconf command. Create and modify SG-CFS-DG-id# and SG-CFS-MP-id# using cfs commands. • PACKAGE_TYPE. Enter the package type; see “Types of Package: Failover, Multi-Node, System Multi-Node” (page 254) and package_type (page 263).
line. Note that CVM storage groups are not entered in the cluster configuration file. NOTE: file. • You should not enter LVM volume groups or VxVM disk groups in this Enter theSUBNET or SUBNETs that are to be monitored for this package. They can be IPv4 or an IPv6 subnets, but must not be link-local subnets (link-local package IPs are not allowed).
NOTE: For legacy packages, DEFERRED resources must be specified in the package control script. • ACCESS_CONTROL_POLICY. You can grant a non-root user PACKAGE_ADMIN privileges for this package. See the entries for user_name, user_host, and user_role (page 283), and “Controlling Access to the Cluster” (page 224), for more information. • If the package will depend on another package, enter values for DEPENDENCY_NAME, DEPENDENCY_CONDITION, and DEPENDENCY_LOCATION.
Customizing the Package Control Script You need to customize as follows. See the entries for the corresponding modular-package parameters under “Package Parameter Explanations” (page 261) for more discussion. • • • • Update the PATH statement to reflect any required paths needed to start your services.
• If your package will use relocatable IP addresses, define IP subnet and IP address pairs. IPv4 or IPv6 addresses are allowed. CAUTION: HP recommends that the subnet(s) be specified in the cluster configuration file via the NETWORK_INTERFACE parameter and either the HEARTBEAT_IP or STATIONARY_IP parameter; see “Cluster Configuration Parameters ” (page 138).
# You should define all actions you want to happen here, before the service is # halted. function customer_defined_halt_cmds { # ADD customer defined halt commands. : # do nothing instruction, because a function must contain some command. date >> /tmp/pkg1.datelog echo 'Halting pkg1' >> /tmp/pkg1.
cmcheckconf -v -P /etc/cmcluster/pkg1/pkg1.conf Errors are displayed on the standard output. If necessary, edit the file to correct any errors, then run the command again until it completes without errors. The following items are checked (whether you use Serviceguard Manager or cmcheckconf command): • • • • • • • Package name is valid, and at least one NODE_NAME entry is included. There are no duplicate parameter entries. Values for parameters are within permitted ranges.
Distributing the Binary Cluster Configuration File with HP-UX Commands Use the following steps from the node on which you created the cluster and package configuration files: • Verify that the configuration file is correct. Use the following command (all on one line): cmcheckconf -C /etc/cmcluster/cmcl.conf -P /etc/cmcluster/pkg1/pkg1.
to try NodeB, on the same subnet, before incurring the cross-subnet overhead of failing over to NodeC or NodeD. NOTE: If you are using a site-aware disaster-tolerant cluster, which requires additional software, you can use the SITE to accomplish this. See the description of that parameter under “Cluster Configuration Parameters ” (page 138).
Control-script entries for nodeA and nodeB IP[0] = 15.244.65.82 SUBNET[0] 15.244.65.0 IP[1] = 15.244.65.83 SUBNET[1] 15.244.65.0 Control-script entries for nodeC and nodeD IP[0] = 15.244.56.100 SUBNET[0] = 15.244.56.0 IP[1] = 15.244.56.101 SUBNET[1] = 15.244.56.
NOTE: The cmmigratepkg command requires Perl version 5.8.3 or higher on the system on which you run the command. It should already be on the system as part of the HP-UX base product. Reconfiguring a Package on a Running Cluster You can reconfigure a package while the cluster is running, and in some cases you can reconfigure the package while the package itself is running. You can do this in Serviceguard Manager (for legacy packages), or use Serviceguard commands.
Adding a Package to a Running Cluster You can create a new package and add it to the cluster configuration while the cluster is up and while other packages are running. The number of packages you can add is subject to the value of MAX_CONFIGURED_PACKAGES in the cluster configuration file. To create the package, follow the steps in the chapter “Configuring Packages and Their Services ” (page 253).
NOTE: Any form of the mount command other than cfsmount or cfsumount should be used with caution in a CFS environment. Non-CFS commands (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) could cause conflicts with subsequent operations on the file system or Serviceguard packages, and will not create an appropriate multi-node package, with the result that cluster packages are not aware of file system changes. 1. Remove any dependencies on the package being deleted.
In general, you have greater scope for online changes to a modular than to a legacy package. In some cases, though, the capability of legacy packages has been upgraded to match that of modular packages as far as possible; these cases are shown in the table. For more information about legacy and modular packages, see Chapter 6 (page 253). NOTE: If neither legacy nor modular is called out under “Change to the Package”, the “Required Package State” applies to both types of package.
Table 7-2 Types of Changes to Packages (continued) Change to the Package Required Package State Add or delete a service: legacy package Package must not be running. Change service_restart: modular Package can be running. package Serviceguard will not allow the change if the new value is less than the current restart count. (You can use cmmodpkg -R to reset the restart count if you need to.) Change SERVICE_RESTART: legacy package Package must not be running.
Table 7-2 Types of Changes to Packages (continued) Change to the Package Required Package State Add or remove a resource: modular package Package can be running. Serviceguard will not allow the change if it would cause the package to fail. In addition, Serviceguard will reject the change if the resource is not UP within about 30 seconds. Multiple changes that take longer than this should be done incrementally; resources that are very slow to come up may need to be configured offline.
Table 7-2 Types of Changes to Packages (continued) Change to the Package Required Package State Change a file system: modular package Package should not be running (unless you are only changing fs_umount_opt). Changing file-system options other than fs_umount_opt may cause problems because the file system must be unmounted (using the existing fs_umount_opt) and remounted with the new options; the CAUTION under “Remove a file system: modular package” applies in this case as well.
Table 7-2 Types of Changes to Packages (continued) Change to the Package Required Package State Remove a vxvm_dg or cvm_dg: modular package Package should not be running. See CAUTION under “Remove a volume group: legacy package”. Change vxvm_dg_retry: modular Package can be running. package Add or remove a VXVM_DG or CVM_DG: legacy package Package must not be running. Change VXVM_DG_RETRY: legacy package Package must not be running. Add, change, or delete external Package can be running.
Table 7-2 Types of Changes to Packages (continued) Change to the Package Required Package State Add or delete a configured dependency Both packages can be either running or halted. Special rules apply to packages in maintenance mode; see “Dependency Rules for a Package in Partial-Startup Maintenance Mode” (page 322). For dependency purposes, a package being reconfigured is considered to be UP.
Responding to Cluster Events Serviceguard does not require much ongoing system administration intervention. As long as there are no failures, your cluster will be monitored and protected. In the event of a failure, those packages that you have designated to be transferred to another node will be transferred automatically. Your ongoing responsibility as the system administrator will be to monitor the cluster and determine if a transfer of package has occurred.
Disabling Serviceguard If for some reason you want to disable Serviceguard on a system, you can do so by commenting out the following entries in /etc/inetd.conf: hacl-cfg dgram udp wait root /usr/lbin/cmclconfd cmclconfd -p hacl-cfg stream tcp nowait root /usr/lbin/cmclconfd cmclconfd -c Then force inetd to re-read inetd.conf: /usr/sbin/inetd -c You can check that this did in fact disable Serviceguard by trying the following command: cmquerycl -n nodename where nodename is the name of the local system.
8 Troubleshooting Your Cluster This chapter describes how to verify cluster operation, how to review cluster status, how to add and replace hardware, and how to solve some typical cluster problems.
kill PID 3. To view the package status, enter cmviewcl -v The package should be running on the specified adoptive node. 4. Move the package back to the primary node (see “Moving a Failover Package ” (page 318)). Testing the Cluster Manager To test that the cluster manager is operating correctly, perform the following steps for each node on the cluster: 1. 2. Turn off the power to the node SPU.
3. 4. Verify that a local switch has taken place so that the Standby card is now the Primary card. In Serviceguard Manager, check the cluster properties. On the command line, use cmviewcl -v. Reconnect the LAN to the original Primary card, and verify its status. In Serviceguard Manager, check the cluster properties. On the command line, use cmviewcl -v.
Using EMS (Event Monitoring Service) Hardware Monitors A set of hardware monitors is available for monitoring and reporting on memory, CPU, and many other system values. Some of these monitors are supplied with specific hardware products. Hardware Monitors and Persistence Requests When hardware monitors are disabled using the monconfig tool, associated hardware monitor persistent requests are removed from the persistence files.
Replacing a Faulty Mechanism in an HA Enclosure If you are using software mirroring with Mirrordisk/UX and the mirrored disks are mounted in a high availability disk enclosure, you can use the following steps to hot plug a disk mechanism: 1. Identify the physical volume name of the failed disk and the name of the volume group in which it was configured. In the following example, the volume group name is shown as /dev/vg_sg01 and the physical volume name is shown as /dev/dsk/c2t3d0.
Replacing a Lock Disk You can replace an unusable lock disk while the cluster is running. You can do this without any cluster reconfiguration if you do not change the devicefile name (Device Special File, or DSF); or, if you need to change the DSF, you can do the necessary reconfiguration while the cluster is running.
Special File, or DSF); or, if you need to change the DSF, you can do the necessary reconfiguration while the cluster is running.
Online Hardware Maintenance with In-line SCSI Terminator In some shared SCSI bus configurations, online SCSI disk controller hardware repairs can be made if HP in-line terminator (ILT) cables are used. In-line terminator cables are supported with most SCSI-2 Fast-Wide configurations. In-line terminator cables are supported with Ultra2 SCSI host bus adapters only when used with the SC10 disk enclosure. This is because the SC10 operates at slower SCSI bus speeds, which are safe for the use of ILT cables.
Offline Replacement Follow these steps to replace an I/O card off-line. 1. 2. 3. 4. 5. 6. Halt the node by using the cmhaltnode command. Shut down the system using /usr/sbin/shutdown, then power down the system. Remove the defective I/O card. Install the new I/O card. The new card must be exactly the same card type, and it must be installed in the same slot as the card you removed. Power up the system. If necessary, add the node back into the cluster by using the cmrunnode command.
This procedure updates the binary file with the new MAC address and thus avoids data inconsistency between the outputs of the cmviewconf and lanscan commands. Replacing a Failed Quorum Server System When a quorum server (QS) fails or becomes unavailable to the clusters it is providing quorum services for, this will not cause a failure on any cluster. However, the loss of the quorum server does increase the vulnerability of the clusters in case there is an additional failure.
NOTE: While the old quorum server is down and the new one is being set up, these things can happen: • These three commands will not work: cmquerycl -q cmapplyconf -C cmcheckconf -C • If there is a node or network failure that creates a 50-50 membership split, the quorum server will not be available as a tie-breaker, and the cluster will fail. NOTE: Make sure that the old Quorum Server system does not rejoin the network with the old IP address.
lan1* IPv6: Name lan1* lo0 1500 none none 418623 0 55822 Mtu Address/Prefix 1500 none 4136 ::1/128 0 0 Ipkts Opkts 0 10690 0 10690 Reviewing the System Log File Messages from the Cluster Manager and Package Manager are written to the system log file. The default location of the log file is /var/adm/syslog/syslog.log. Also, package-related messages are logged into the package log file. The package log file is located in the package directory, by default.
Dec 14 14:39:27 star04 cmclconfd[2097]: Command execution message Dec 14 14:39:33 star04 cmcld[2098]: 3 nodes have formed a new cluster Dec 14 14:39:33 star04 cmcld[2098]: The new active cluster membership is: star04(id=1), star05(id=2), star06(id=3) Dec 14 17:39:33 star04 cmlvmd[2099]: Clvmd initialized successfully. Dec 14 14:39:34 star04 cmcld[2098]: Executing '/etc/cmcluster/pkg4/pkg4_run start' for package pkg4.
Information about the starting and halting of each package is found in the package’s control script log. This log provides the history of the operation of the package control script. By default, it is found at /etc/cmcluster//control_script.log; but another location may have been specified in the package configuration file’s script_log_file parameter. This log documents all package run and halt activities.
• linkloop verifies the communication between LAN cards at MAC address levels. For example, if you enter linkloop -i4 0x08000993AB72 you should see displayed the following message: Link Connectivity to LAN station: 0x08000993AB72 • • OK cmscancl can be used to verify that primary and standby LANs are on the same bridged net. cmviewcl -v shows the status of primary and standby LANs. Use these commands on all nodes. Solving Problems Problems with Serviceguard may be of several types.
Name Server: server1.cup.hp.com Address: 15.13.168.63 Name: ftsys9.cup.hp.com Address: 15.13.172.229 If the output of this command does not include the correct IP address of the node, then check your name resolution services further. In many cases, a symptom such as Permission denied... or Connection refused... is the result of an error in the networking or security configuration. Most such problems can be resolved by correcting the entries in /etc/hosts.
What to do: If this message appears once a month or more often, increase MEMBER_TIMEOUT to more than 10 times the largest reported delay. For example, if the message that reports the largest number says that cmcld was unable to run for the last 1.6 seconds, increase MEMBER_TIMEOUT to more than 16 seconds. 2. This node is at risk of being evicted from the running cluster. Increase MEMBER_TIMEOUT.
• • • • • • strings /etc/lvmtab - to ensure that the configuration is correct. ioscan -fnC disk - to see physical disks. diskinfo -v /dev/rdsk/cxtydz - to display information about a disk. lssf /dev/d*/* - to check logical volumes and paths. vxdg list - to list Veritas disk groups. vxprint- to show Veritas disk group details.
2. package on an alternate node. This might include such things as shutting down application processes, removing lock files, and removing temporary files. Ensure that package IP addresses are removed from the system; use the cmmodnet(1m) command. First determine which package IP addresses are installed by inspecting the output of netstat -in.
cleans up any side effects of the package's run or halt attempt. In this case the package will be automatically restarted on any available alternate node for which it is configured. Problems with Cluster File System (CFS) NOTE: Check the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CFS (http://www.docs.hp.com -> High Availability -> Serviceguard).
This can happen if a package is running on a node which then fails before the package control script can deport the disk group. In these cases, the host name of the node that had failed is still written on the disk group header. When the package starts up on another node in the cluster, a series of messages is printed in the package log file Follow the instructions in the messages to use the force import option (-C) to allow the current node to import the disk group.
In the event of a TOC, a system dump is performed on the failed node and numerous messages are also displayed on the console. You can use the following commands to check the status of your network and subnets: • • • • netstat -in - to display LAN status and check to see if the package IP is stacked on the LAN card. lanscan - to see if the LAN is on the primary interface or has switched to the standby interface. arp -a - to check the arp tables. lanadmin - to display, test, and reset the LAN cards.
A message such as the following in a Serviceguard node’s syslog file indicates that the node did not receive a reply to its lock request on time. This could be because of delay in communication between the node and the Quorum Server or between the Quorum Server and other nodes in the cluster: Attempt to get lock /sg/cluser1 unsuccessful. Reason: request_timedout Messages The coordinator node in Serviceguard sometimes sends a request to the quorum server to set the lock state.
A Enterprise Cluster Master Toolkit The Enterprise Cluster Master Toolkit (ECMT) provides a group of example scripts and package configuration files for creating Serviceguard packages for several major database and internet software products. Each toolkit contains a README file that explains how to customize the package for your needs. The ECMT can be installed on HP-UX 11i v1, 11i v2, or 11i v3.
B Designing Highly Available Cluster Applications This appendix describes how to create or port applications for high availability, with emphasis on the following topics: • Automating Application Operation • Controlling the Speed of Application Failover (page 389) • Designing Applications to Run on Multiple Systems (page 392) • Restoring Client Connections (page 397) • Handling Application Failures (page 398) • Minimizing Planned Downtime (page 400) Designing for high availability means reducing the amount
There are two principles to keep in mind for automating application relocation: • Insulate users from outages. • Applications must have defined startup and shutdown procedures. You need to be aware of what happens currently when the system your application is running on is rebooted, and whether changes need to be made in the application's response for high availability. Insulate Users from Outages Wherever possible, insulate your end users from outages.
To reduce the impact on users, the application should not simply abort in case of error, since aborting would cause an unneeded failover to a backup system. Applications should determine the exact error and take specific action to recover from the error rather than, for example, aborting upon receipt of any error.
is advisable to take certain actions to minimize the amount of data that will be lost, as explained in the following discussion. Minimize the Use and Amount of Memory-Based Data Any in-memory data (the in-memory context) will be lost when a failure occurs. The application should be designed to minimize the amount of in-memory data that exists unless this data can be easily recalculated.
A common example is a print job. Printer applications typically schedule jobs. When that job completes, the scheduler goes on to the next job.
the second system simply takes over the load of the first system. This eliminates the start up time of the application. There are many ways to design this sort of architecture, and there are also many issues with this sort of design. This discussion will not go into details other than to give a few examples. The simplest method is to have two applications running in a master/slave relationship where the slave is simply a hot standby application for the master.
Avoid Node-Specific Information Typically, when a new system is installed, an IP address must be assigned to each active network interface. This IP address is always associated with the node and is called a stationary IP address. The use of packages containing highly available applications adds the requirement for an additional set of IP addresses, which are assigned to the applications themselves. These are known as relocatable application IP addresses.
Avoid Using SPU IDs or MAC Addresses Design the application so that it does not rely on the SPU ID or MAC (link-level) addresses. The SPU ID is a unique hardware ID contained in non-volatile memory, which cannot be changed. A MAC address (also known as a LANIC id) is a link-specific address associated with the LAN hardware.
be avoided for the same reason. Also, the gethostbyaddr() call may return different answers over time if called with a stationary IP address. Instead, the application should always refer to the application name and relocatable IP address rather than the hostname and stationary IP address. It is appropriate for the application to call gethostbyname(2), specifying the application name rather than the hostname. gethostbyname(2) will pass in the IP address of the application.
Network applications can bind to a stationary IP address, a relocatable IP address, or INADDR_ANY. If the stationary IP address is specified, then the application may fail when restarted on another node, because the stationary IP address is not moved to the new system. If an application binds to the relocatable IP address, then the application will behave correctly when moved to another system. Many server-style applications will bind to INADDR_ANY, meaning that they will receive requests on any interface.
To prevent one node from inadvertently accessing disks being used by the application on another node, HA software uses an exclusive access mechanism to enforce access by only one node at a time. This exclusive access applies to a volume group as a whole. Use Multiple Destinations for SNA Applications SNA is point-to-point link-oriented; that is, the services cannot simply be moved to another system, since that system has a different point-to-point link which originates in the mainframe.
There are a number of strategies to use for client reconnection: • Design clients which continue to try to reconnect to their failed server. Put the work into the client application rather than relying on the user to reconnect. If the server is back up and running in 5 minutes, and the client is continually retrying, then after 5 minutes, the client application will reestablish the link with the server and either restart or continue the transaction. No intervention from the user is required.
with application problems. For instance, software bugs may cause an application to fail or system resource issues (such as low swap/memory space) may cause an application to die. The section deals with how to design your application to recover after these types of failures. Create Applications to be Failure Tolerant An application should be tolerant to failure of a single component. Many applications have multiple processes running on a single node.
Minimizing Planned Downtime Planned downtime (as opposed to unplanned downtime) is scheduled; examples include backups, systems upgrades to new operating system revisions, or hardware replacements. For planned downtime, application designers should consider: • Reducing the time needed for application upgrades/patches.
the some of the application servers are at revision 4.0. The application must be designed to handle this type of situation. For more information about the rolling upgrades, see “Software Upgrades ” (page 407), and the Release Notes for your version of Serviceguard at http://docs.hp.com -> High Availability. Do Not Change the Data Layout Between Releases Migration of the data to a new format can be very time intensive. It also almost guarantees that rolling upgrade will not be possible.
C Integrating HA Applications with Serviceguard The following is a summary of the steps you should follow to integrate an application into the Serviceguard environment: 1. Read the rest of this book, including the chapters on cluster and package configuration, and the Appendix “Designing Highly Available Cluster Applications.” 2.
NOTE: Check the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CFS (http:// www.docs.hp.com -> High Availability -> Serviceguard). Checklist for Integrating HA Applications This section contains a checklist for integrating HA applications in both single and multiple systems.
a. b. c. d. 3. 4. Create the cluster configuration. Create a package. Create the package script. Use the simple scripts you created in earlier steps as the customer defined functions in the package control script. Start the cluster and verify that applications run as planned. If you will be building an application that depends on a Veritas Cluster File System (CFS) and Cluster Volume Manager (CVM), then consider the following: a.
• 2. 3. 406 Fail one of the systems. For example, turn off the power on node 1. Make sure the package starts up on node 2. • Repeat failover from node2 back to node1. Be sure to test all combinations of application load during the testing. Repeat the failover processes under different application states such as heavy user load versus no user load, batch jobs vs online transactions, etc. Record timelines of the amount of time spent during the failover for each application state.
D Software Upgrades There are three types of upgrade you can do: • rolling upgrade • non-rolling upgrade • migration with cold install Each of these is discussed below. Special Considerations for Upgrade to Serviceguard A.11.19 Serviceguard A.11.19 introduces a new cluster manager. In a running cluster, this affects node timeout and failover and cluster re-formation; see the discussion of MEMBER_TIMEOUT under “Cluster Configuration Parameters ” (page 138) for more information.
and finished upgrading the node to the new cluster manager (Cluster Membership Protocol version 2), and then, once all nodes have been upgraded, you will see a message indicating that the new cluster has formed. Watch for three messages similar to the following (for clarity, intervening messages have been replaced with ellipses [...]): Nov 14 13:52:46 bbq1 cmcld[20319]: Starting to upgrade this node to Cluster Membership Protocol version 2 [....
Guidelines for Rolling Upgrade You can normally do a rolling upgrade if: • You are not upgrading the nodes to a new version of HP-UX; or • You are upgrading to a new version of HP-UX, but using the update process (update-ux), rather than a cold install. update-ux supports many, but not all, upgrade paths. For more information, see the HP-UX Installation and Update Guide for the target version of HP-UX.
• • Performing tasks on a node containing an earlier revision of the software will not work or will cause inconsistent results. You cannot modify the hardware configuration—including the cluster’s network configuration—during rolling upgrade. You cannot modify the cluster or package configuration until the upgrade is complete.
3. Upgrade the node to the new HP-UX release, including Serviceguard. You can perform other software or hardware upgrades if you wish (such as installation of Veritas Volume Manager software), provided you do not detach any SCSI cabling. See the section on hardware maintenance in the “Troubleshooting” chapter. For instructions on upgrading HP-UX, see the HP-UX Installation and Update Guide for the target version of HP-UX: go to http://docs.hp.
in the cluster configuration file, and all have the role of Monitor. If you want to grant administration roles to non-root users, add more entries in the configuration file. “Controlling Access to the Cluster” (page 224) for more information about access control policies. Example of a Rolling Upgrade NOTE: Warning messages may appear during a rolling upgrade while the node is determining what version of software is running. This is a normal occurrence and not a cause for concern.
Figure D-2 Running Cluster with Packages Moved to Node 2 Step 2. Upgrade node 1 to the next operating system release (“HP-UX (new)”), and install the next version of Serviceguard (“SG (new)”). Figure D-3 Node 1 Upgraded to new HP-UX version Step 3. When upgrading is finished, enter the following command on node 1 to restart the cluster on node 1. # cmrunnode -n node1 At this point, different versions of the Serviceguard daemon (cmcld) are running on the two nodes, as shown in Figure D-4.
Figure D-4 Node 1 Rejoining the Cluster Step 4. Repeat the process on node 2. Halt the node, as follows: # cmhaltnode -f node2 This causes both packages to move to node 1. Then upgrade node 2 to the new versions of HP-UX and Serviceguard. Figure D-5 Running Cluster with Packages Moved to Node 1 Step 5. Move pkg2 back to its original node.
cmhaltpkg pkg2 cmrunpkg -n node2 pkg2 cmmodpkg -e pkg2 The cmmodpkg command re-enables switching of the package, which was disabled by the cmhaltpkg command. The final running cluster is shown in Figure D-6.
Performing a Non-Rolling Upgrade Limitations of Non-Rolling Upgrades CAUTION: Stricter limitations apply to an upgrade to A.11.19; do not proceed with an upgrade to A.11.19 until you have read and understood the Special Considerations for Upgrade to Serviceguard A.11.19 (page 407). The following limitations apply to non-rolling upgrades: • Binary configuration files may be incompatible between releases of Serviceguard. Do not manually copy configuration files between nodes.
NOTE: Data on shared disks, or on local disks in volumes that are not are touched by the HP-UX installation process, will not normally be erased by the cold install; you can re-import this data after the cold install. If you intend to do this, you must do the following before you do the cold install: • For LVM: create a map file for each LVM volume group and save it as part of your backup. • For VxVM: deport disk groups (halting the package should do this).
E Blank Planning Worksheets This appendix contains blank versions of the planning worksheets mentioned inChapter 4 “Planning and Documenting an HA Cluster ”. You can duplicate any of these worksheets that you find useful and fill them in as a part of the planning process.
Worksheet for Hardware Planning HARDWARE WORKSHEET Page ___ of ____ =============================================================================== Node Information: Host Name _____________________ Series No _____________________ Memory Capacity ____________________ Number of I/O Slots ________________ =============================================================================== LAN Information: Name of Subnet _________ Name of IP Interface __________ Addr_____________ Traffic Type ___________ Name o
Power Supply Worksheet POWER SUPPLY WORKSHEET Page ___ of ____ =============================================================================== SPU Power: Host Name _____________________ Power Supply _______________________ Host Name _____________________ Power Supply _______________________ =============================================================================== Disk Power: Disk Unit __________________________ Power Supply _______________________ Disk Unit __________________________ Power Supp
Quorum Server Worksheet Quorum Server Data: ============================================================================== QS Hostname: _____________IP Address: _______________IP Address_______________ ============================================================================== Quorum Services are Provided for: Cluster Name: ___________________________________________________________ Host Names ____________________________________________ Host Names ____________________________________________ Cluster
LVM Volume Group and Physical Volume Worksheet PHYSICAL VOLUME WORKSHEET Page ___ of ____ =============================================================================== Volume Group Name: ______________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name: _____________________
VxVM Disk Group and Disk Worksheet DISK GROUP WORKSHEET Page ___ of ____ =========================================================================== Disk Group Name: __________________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name: _____________________________________
Cluster Configuration Worksheet =============================================================================== Name and Nodes: =============================================================================== Cluster Name: __________________________ RAC Version: _______________ Node Names: _________________________________________________________ Volume Groups (for packages):________________________________________ =========================================================================== Subnets: =========
Package Configuration Worksheet Package Configuration File Data:===============================================================Package Name: __________________Package Type:______________ Primary Node: ____________________ First Failover Node:__________________ Additional Failover Nodes:__________________________________ Run Script Timeout: _____ Halt Script Timeout: _____________ Package AutoRun Enabled? ______ Local LAN Failover Allowed? _____ Node Failfast Enabled? ________ Failover Policy:_______________
F Migrating from LVM to VxVM Data Storage This appendix describes how to migrate LVM volume groups to VxVM disk groups for use with the Veritas Volume Manager (VxVM), or with the Cluster Volume Manager (CVM) on systems that support it.
1. Halt the package that activates the volume group you wish to convert to VxVM: cmhaltpkg PackageName 2. Activate the LVM volume group in read-only mode: vgchange -a r VolumeGroupName 3. Back up the volume group’s data, using whatever means are most appropriate for the data contained on this volume group. For example, you might use a backup/restore utility such as Omniback, or you might use an HP-UX utility such as dd. Back up the volume group configuration: vgcfgbackup 4. 5.
1. Rename the old package control script as follows: mv Package.ctl Package.ctl.bak 2. Create a new package control script with the same name as the old one: cmmakepkg -s Package.ctl 3. Edit the new script to include the names of the new VxVM disk groups and logical volumes. The new portions of the package control script that are needed for VxVM use are as follows: • The VXVM_DG[] array. This defines the VxVM disk groups that are used for this package.
8. Make the disk group visible to the other nodes in the cluster by issuing the following command on all other nodes: vxdctl enable 9. Restart the package. Customizing Packages for CVM NOTE: Check the Serviceguard/SGeRAC/SMS/Serviceguard Manager Plug-in Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information about support for CVM and CFS: http://www.docs.hp.com -> High Availability -> Serviceguard.
LV[3]="/dev/vx/dsk/dg02/lvol202" FS[0]="/mnt_dg0101" FS[1]="/mnt_dg0102" FS[2]="/mnt_dg0201" FS[3]="/mnt_dg0202" FS_MOUNT_OPT[0]="-o FS_MOUNT_OPT[1]="-o FS_MOUNT_OPT[2]="-o FS_MOUNT_OPT[3]="-o 4. 5. 6. 7. ro" rw" ro" rw" Be sure to copy from the old script any user-specific code that may have been added, including environment variables and customer defined functions.
Removing LVM Volume Groups After testing the new VxVM disk groups, remove any LVM volume groups that are no longer wanted from the system using the standard LVM commands lvremove, pvremove, and vgremove. At a convenient time, you should also edit the cluster ASCII configuration file to remove the VOLUME_GROUP statements that refer to the LVM volume groups that are no longer used in the cluster. These entries should be removed before the next time you re-apply the cluster configuration.
G IPv6 Network Support This appendix describes some of the characteristics of IPv6 network addresses. Topics: • IPv6 Address Types • Network Configuration Restrictions • Local Primary/Standby LAN Patterns • IPv6 Relocatable Address and Duplicate Address Detection Feature (page 438) IPv6 Address Types Several IPv6 types of addressing schemes are specified in the RFC 2373 (IPv6 Addressing Architecture). IPv6 addresses are 128-bit identifiers for interfaces and sets of interfaces.
can appear only once in an address and it can be used to compress the leading, trailing, or contiguous sixteen-bit zeroes in an address. Example: fec0:1:0:0:0:0:0:1234 can be represented as fec0:1::1234. • When dealing with a mixed environment of IPv4 and IPv6 nodes there is an alternative form of IPv6 address that will be used. It is x:x:x:x:x:x:d.d.d.d, where 'x's are the hexadecimal values of higher order 96 bits of IPv6 address and the 'd's are the decimal values of the 32-bit lower order bits.
IPv4 and IPv6 Compatibility There are a number of techniques for using IPv4 addresses within the framework of IPv6 addressing. IPv4 Compatible IPv6 Addresses The IPv6 transition mechanisms use a technique for tunneling IPv6 packets over the existing IPv4 infrastructure. IPv6 nodes that support such mechanisms use a special kind of IPv6 addresses that carry IPv4 addresses in their lower order 32-bits. These addresses are called IPv4 Compatible IPv6 addresses.
TLA ID = Top-level Aggregation Identifier. RES = Reserved for future use. NLA ID = Next-Level Aggregation Identifier. SLA ID = Site-Level Aggregation Identifier. Interface ID = Interface Identifier. Link-Local Addresses Link-local addresses have the following format: Table G-6 10 bits 54 bits 64 bits 1111111010 0 interface ID Link-local address are supposed to be used for addressing nodes on a single link. Packets originating from or destined to a link-local address will not be forwarded by a router.
‘2’ indicates that the scope is link-local. A value of “5” indicates that the scope is site-local. The “group ID” field identifies the multicast group. Some frequently used multicast groups are the following: All Node Addresses = FF02:0:0:0:0:0:0:1 (link-local) All Router Addresses = FF02:0:0:0:0:0:0:2 (link-local) All Router Addresses = FF05:0:0:0:0:0:0:2 (site-local) Network Configuration Restrictions Serviceguard supports IPv6 for data and heartbeat IP.
NOTE: Even though link-local IP addresses are not supported in the Serviceguard cluster configuration, the primary link-local address on the Serviceguard primary interface will be switched over the standby during a local switch. This is because of two requirements: First, the dual stack (IPv4/IPv6) kernel requires that the primary IP address associated with an interface must always be a link-local address.
Local Primary/Standby LAN Patterns The use of IPv6 allows a number of different patterns of failover among LAN cards configured in the cluster. This is true because each LAN card can support several IP addresses when a dual IPv4/IPv6 configuration is used. This section describes several ways in that local failover to a standby LAN can be configured.
Following the loss of lan0 or lan2, lan1 can adopt either address, as shown below. The same LAN card can be configured with both IPv4 and IPv6 addresses, as shown in below.
This type of configuration allows failover of both addresses to the standby. This is shown in below.
H Using Serviceguard Manager HP Serviceguard Manager is a web-based, HP System Management Homepage (HP SMH) tool, that replaces the functionality of the earlier Serviceguard management tools. HP Serviceguard Manager allows you to monitor, administer and configure a Serviceguard cluster from any system with a supported web browser. Serviceguard Manager does not require additional software installation.
— A user with HP SMH Administrator access has full cluster management capabilities. — A user with HP SMH Operator access can monitor the cluster and has restricted cluster management capabilities as defined by the user’s Serviceguard role-based access configuration. — A user with HP SMH User access does not have any cluster management capabilities. See the online help topic About Security for more information. • • Have created the security “bootstrap” file cmclnodelist.
1. Enter the standard URL “http://:2301/” For example http://clusternode1.cup.hp.com:2301/ 2. When the System Management Homepage login screen appears, enter your login credentials and click Sign In. The System Management Homepage for the selected server appears. 3. From the Serviceguard Cluster box, click the name of the cluster. NOTE: If a cluster is not yet configured, then you will not see the Serviceguard Cluster section on this screen.
Number What is it? Description 3 Tab bar The default Tab bar allows you to view additional cluster-related information. The Tab bar displays different content when you click on a specific node or package. 4 Node information Displays information about the Node status, alerts and general information. 5 Package information Displays information about the Package status, alerts and general information.
Figure H-2 Cluster by Type 4. Expand HP Serviceguard, and click on a Serviceguard cluster. NOTE: If you click on a cluster running an earlier Serviceguard release, the page will display a link that will launch Serviceguard Manager A.05.01 (if installed) via Java Webstart.
I Maximum and Minimum Values for Parameters Table I-1 shows the range of possible values for cluster configuration parameters. Table I-1 Minimum and Maximum Values of Cluster Configuration Parameters Cluster Parameter Minimum Value Maximum Value Member Timeout See MEMBER_TIMEOUT under “Cluster Configuration Parameters” in Chapter 4. See 14,000,000 MEMBER_TIMEOUT microseconds under “Cluster Configuration Parameters” in Chapter 4.
Index A Access Control Policies, 224 Access Control Policy, 158 Access roles, 158 active node, 31 adding a package to a running cluster, 351 adding cluster nodes advance planning, 191 adding nodes to a running cluster, 314 adding packages on a running cluster, 292 additional package resources monitoring, 80 addressing, SCSI, 127 administration adding nodes to a ruuning cluster, 314 cluster and package states, 298 halting a package, 317 halting the entire cluster, 315 moving a package, 318 of packages and se
configuring with commands, 217 redundancy of components, 37 Serviceguard, 30 typical configuration, 29 understanding components, 37 cluster administration, 312 solving problems, 375 cluster and package maintenance, 297 cluster configuration creating with SAM or Commands, 216 file on all nodes, 59 identifying cluster lock volume group, 218 identifying cluster-aware volume groups, 223 planning, 137 planning worksheet, 159 sample diagram, 125 verifying the cluster configuration, 230 cluster configuration file
defined, 155 Configuring clusters with Serviceguard command line, 217 configuring multi-node packages, 294 configuring packages and their services, 253 configuring system multi-node packages, 293 control script adding customer defined functions, 344 in package configuration, 342 pathname parameter in package configuration, 284 support for additional productss, 345 troubleshooting, 373 controlling the speed of application failover, 389 creating the package configuration, 338 Critical Resource Analysis (CRA)
redundant configuration, 40 Event Monitoring Service for disk monitoring, 45 in troubleshooting, 363, 364 event monitoring service using, 80 exclusive access relinquishing via TOC, 119 expanding the cluster planning ahead, 124 expansion planning for, 163 F failback policy used by package manager, 76 FAILBACK_POLICY parameter used by package manager, 76 failover controlling the speed in applications, 389 defined, 31 failover behavior in packages, 164 failover package, 68 failover policy used by package mana
parameter in cluster configuration, 145 HEARTBEAT_IP configuration requirements, 145 parameter in cluster configuration, 145 high availability, 29 HA cluster defined, 37 objectives in planning, 123 host IP address hardware planning, 126, 133 host name hardware planning, 125 HOSTNAME_ADDRESS_FAMILY defined, 139 how the cluster manager works, 59 how the network manager works, 90 HP Predictive monitoring in troubleshooting, 364 I I/O bus addresses hardware planning, 128 I/O slots hardware planning, 126, 128 I
planning, 133 setting up volume groups on another node, 210 LVM configuration worksheet, 134, 136 M MAC addresses, 394 managing the cluster and nodes, 312 manual cluster startup, 61 MAX_CONFIGURED_PACKAGES defined, 157 maximum number of nodes, 37 MEMBER_TIMEOUT and cluster re-formation, 118 and safety timer, 55 configuring, 152 defined, 151 maximum and minimum values , 152 modifying, 223 membership change reasons for, 62 memory capacity hardware planning, 126 memory requirements lockable memory for Service
primary, 31 NODE_FAIL_FAST_ENABLED effect of setting, 120 NODE_NAME cluster configuration parameter, 143 parameter in cluster manager configuration, 138 nodetypes primary, 31 NTP time protocol for clusters, 199 O olrad command removing a LAN or VLAN interface, 335 online hardware maintenance by means of in-line SCSI terminators, 368 OTS/9000 support, 449 outages insulating users from, 388 P package adding and deleting package IP addresses, 92 base modules, 257 basic concepts, 37 changes while the cluster
package configuration, 159 power, 130 quorum server, 132 SCSI addresses, 127 SPU information, 125 volume groups and physical volumes, 133 worksheets, 129 worksheets for physical volume planning, 423 planning and documenting an HA cluster, 123 planning for cluster expansion, 124 planning worksheets blanks, 419 point of failure in networking, 40 point to point connections to storage devices, 51 POLLING_TARGET defined, 156 ports dual and single aggregated, 105 power planning power sources, 130 worksheet, 131 p
rotating standby configuring with failover policies, 73 setting package policies, 73 RUN_SCRIPT parameter in package configuration, 284 RUN_SCRIPT_TIMEOUT (run script timeout) parameter in package configuration, 285 running cluster adding or removing packages, 292 S safety timer and node TOC, 55 and syslog.
system multi-node package, 68 used with CVM, 243 system multi-node package configuration, 293 system multi-node packages configuring, 293 T tasks in Serviceguard configuration, 35 testing cluster manager, 362 network manager, 362 package manager, 361 testing cluster operation, 361 time protocol (NTP) for clusters, 199 timeout node, 118 TOC and MEMBER_TIMEOUT, 118 and package availability, 119 and safety timer, 153 and the safety timer, 55 defined, 55 when a node fails, 118 toolkits for databases, 385 traff
quorum server configuration, 133 use in planning, 123 volume group and physical volumes, 134, 136 worksheets physical volume planning, 423 worksheets for planning blanks, 419 461