Using Serviceguard Extension for RAC Second Edition February 2005 Update Manufacturing Part Number : T1859-90017 February 2005 © Copyright 2005 Hewlett-Packard Development Company, L.P. All rights reserved.
Legal Notices The information in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be held liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material.
Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. Hewlett-Packard is independent of Sun Microsystems. MS-DOS® and Microsoft® are U.S. registered trademarks of Microsoft Corporation. Netscape ® is a registered trademark of Netscape Communications Corporation. Oracle ® is a registered trademark of Oracle Corporation. Oracle8 ™ is a trademark of Oracle Corporation.
Contents 1. Introduction to Serviceguard Extension for RAC What is a Serviceguard Extension for RAC Cluster? . . . . . . . . . . . . . . . . . . . . . . . . . . Group Membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Packages in a Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Serviceguard Extension for RAC Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents Creating Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oracle Demo Database Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adding Disk Groups to the Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Packages to Configure Startup and Shutdown of RAC Instances . . . . . . . . . . . Starting Oracle Instances. . . . . . . . . . . . . . . . . . . . . . . . .
Contents On-Line Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 After Replacing the Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Monitoring RAC Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 A. Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet . . . . . . . . . . . . . . . . . . . . . . . . .
Contents 8
Printing History Table 1 Printing Date June 2003 Part Number T1859-90006 Edition First Edition Print, CD-ROM (Instant Information), and Web (http://www.docs.hp.com/) June 2004 T1859-90017 Second Edition Print, CD-ROM (Instant Information), and Web (http://www.docs.hp.com/) February 2005 T1859-90017 Second Edition February 2005 Update Web (http://www.docs.hp.com/) The last printing date and part number indicate the current edition, which applies to the 11.14.03, 11.15 and 11.
Preface This guide describes how to use the Serviceguard Extension for RAC (Oracle Real Application Cluster) to configure Serviceguard clusters for use with Oracle Real Application Cluster software on HP 9000 High Availability clusters running the HP-UX operating system. The contents are as follows: Related Publications • Chapter 1, “Introduction,” describes a Serviceguard cluster and provides a roadmap for using this guide.
Before attempting to use VxVM storage with Serviceguard, please refer to the following: • VERITAS Volume Manager Administrator’s Guide. This contains a glossary of VERITAS terminology. • VERITAS Volume Manager Storage Administrator Administrator’s Guide • VERITAS Volume Manager Reference Guide • VERITAS Volume Manager Migration Guide • VERITAS Volume Manager for HP-UX Release Notes Use the following URL to access HP’s high availability web page: • http://www.hp.
UserInput Commands and other text that you type. Command A command name or qualified command phrase. Variable The name of a variable that you may replace in a command or function or information in a display that represents several possible values. [ ] The contents are optional in formats and command descriptions. If the contents are a list separated by |, you must choose one of the items. { } The contents are required in formats and command descriptions.
Introduction to Serviceguard Extension for RAC 1 Introduction to Serviceguard Extension for RAC Serviceguard Extension for RAC (SGeRAC) enables the Oracle Real Application Cluster (RAC), formerly known as Oracle Parallel Server RDBMS, to run on HP 9000 high availability clusters under the HP-UX operating system. This chapter introduces Serviceguard Extension for RAC and shows where to find different kinds of information in this book.
Introduction to Serviceguard Extension for RAC What is a Serviceguard Extension for RAC Cluster? What is a Serviceguard Extension for RAC Cluster? A high availability cluster is a grouping of HP 9000 series 800 servers having sufficient redundancy of software and hardware components that a single point of failure will not disrupt the availability of computer services. High availability clusters configured with Oracle Real Application Cluster software are known as RAC clusters.
Introduction to Serviceguard Extension for RAC What is a Serviceguard Extension for RAC Cluster? RAC on HP-UX lets you maintain a single database image that is accessed by the HP 9000 servers in parallel, thereby gaining added processing power without the need to administer separate databases. Further, when properly configured, Serviceguard Extension for RAC provides a highly available database that continues to operate even if one hardware component should fail. Group Membership Oracle RAC 8.1.
Introduction to Serviceguard Extension for RAC What is a Serviceguard Extension for RAC Cluster? Figure 1-2 Group Membership Services Using Packages in a Cluster In order to make other important applications highly available (in addition to the Oracle Real Application Cluster), you can configure your RAC cluster to use packages.
Introduction to Serviceguard Extension for RAC What is a Serviceguard Extension for RAC Cluster? NOTE Chapter 1 In RAC clusters, you create packages to start and stop RAC itself as well as to run applications that access the database instances. For details on the use of packages with RAC, refer to the chapter “Configuring Packages and Their Services.
Introduction to Serviceguard Extension for RAC Serviceguard Extension for RAC Architecture Serviceguard Extension for RAC Architecture This chapter discusses the main software components used by Serviceguard Extension for RAC in some detail.
Introduction to Serviceguard Extension for RAC How Serviceguard Works with Oracle Real Application Clusters How Serviceguard Works with Oracle Real Application Clusters Serviceguard provides the cluster framework for Oracle, a relational database product in which multiple database instances run on different cluster nodes. A central component of Real Application Clusters is the distributed lock manager (DLM), which provides parallel cache management for database instances.
Introduction to Serviceguard Extension for RAC Configuring Packages for Oracle RAC Instances Configuring Packages for Oracle RAC Instances Oracle instances can be configured as packages with a single node in their node list. Package configuration is described in Chapter 2. NOTE Packages that start and halt Oracle instances (called instance packages) do not fail over from one node to another; they are single-node packages. You should include only one node name in the package ASCII configuration file.
Introduction to Serviceguard Extension for RAC Configuring Packages for Oracle RAC Instances For example, on a two node cluster with one database, each node can have one RAC instance and one listener package. Oracle clients can be configured to connect to either package IP address (or corresponding hostname) using Oracle Net Services. When a node failure occurs, existing client connection to the package IP address will be reset after the listener package fails over and adds the package IP address.
Introduction to Serviceguard Extension for RAC Node Failure Node Failure RAC cluster configuration is designed so that in the event of a node failure, another node with a separate instance of Oracle can continue processing transactions. Figure 1-3 shows a typical cluster with instances running on both nodes. Figure 1-3 Before Node Failure Figure 1-4 shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2.
Introduction to Serviceguard Extension for RAC Node Failure and is now running on Node 2. Also note that Node 2 can now access both Package 1’s disk and Package 2’s disk. Oracle instance 2 now handles all database access, since instance 1 has gone down. Figure 1-4 After Node Failure In the above figure, pkg1 and pkg2 are not instance packages. They are shown to illustrate the movement of packages in general.
Introduction to Serviceguard Extension for RAC Larger Clusters Larger Clusters Serviceguard Extension for RAC supports clusters of up to 16 nodes. The actual cluster size is limited by the type of storage and the type of volume manager used. Up to Four Nodes with SCSI Storage You can configure up to four nodes using a shared F/W SCSI bus; for more than 4 nodes, FibreChannel must be used. An example of a four-node RAC cluster appears in the following figure.
Introduction to Serviceguard Extension for RAC Larger Clusters The figure shows a dual Ethernet configuration with all four nodes connected to a disk array (the details of the connections depend on the type of disk array). In addition, each node has a mirrored root disk (R and R'). Nodes may have multiple connections to the same array using alternate links (PV links) to take advantage of the array's use of RAID levels for data protection.
Introduction to Serviceguard Extension for RAC Larger Clusters Figure 1-6 Eight-Node Cluster with XP or EMC Disk Array FibreChannel switched configurations also are supported using either an arbitrated loop or fabric login topology. For additional information about supported cluster configurations, refer to the HP 9000 Servers Configuration Guide, available through your HP representative.
Introduction to Serviceguard Extension for RAC Extended Distance Cluster Using Serviceguard Extension for RAC Extended Distance Cluster Using Serviceguard Extension for RAC Simple Serviceguard clusters are usually configured in a single data center, often in a single room, to provide protection against failures in CPUs, interface cards, and software.
Introduction to Serviceguard Extension for RAC Extended Distance Cluster Using Serviceguard Extension for RAC 30 Chapter 1
Serviceguard Configuration for Oracle RAC 2 Serviceguard Configuration for Oracle RAC This chapter shows the additional planning and configuration that is needed to use Oracle Real Application Clusters with Serviceguard.
Serviceguard Configuration for Oracle RAC Planning Database Storage Planning Database Storage The files needed by the Oracle database must be placed on physical volumes that are accessible to all RAC cluster nodes. This section shows how to plan the volumes using either SLVM or VERITAS CVM storage groups. Volume Planning with SLVM Storage capacity for the Oracle database must be provided in the form of logical volumes located in shared volume groups.
Serviceguard Configuration for Oracle RAC Planning Database Storage ORACLE LOGICAL VOLUME WORKSHEET FOR LVM Page ___ of ____ =============================================================================== RAW LOGICAL VOLUME NAME SIZE (MB) Oracle Control File _____/dev/vg_ops/ropsctl1.ctl_______100______ Oracle Control File 2: ___/dev/vg_ops/ropsctl2.ctl______100______ Oracle Control File 3: ___/dev/vg_ops/ropsctl3.ctl______100______ Instance 1 Redo Log 1: ___/dev/vg_ops/rops1log1.
Serviceguard Configuration for Oracle RAC Planning Database Storage Volume Planning with CVM Storage capacity for the Oracle database must be provided in the form of volumes located in shared disk groups. The Oracle software requires at least two log files (an one undo tablespace for Oracle9) for each Oracle instance, several Oracle control files and data files for the database itself.
Serviceguard Configuration for Oracle RAC Planning Database Storage ORACLE LOGICAL VOLUME WORKSHEET FOR CVM Page ___ of ____ =============================================================================== RAW LOGICAL VOLUME NAME SIZE (MB) Oracle Control File 1: ___/dev/vx/rdsk/ops_dg/opsctl1.ctl______100______ Oracle Control File 2: ___/dev/vx/rdsk/ops_dg/opsctl2.ctl______100______ Oracle Control File 3: ___/dev/vx/rdsk/ops_dg/opsctl3.
Serviceguard Configuration for Oracle RAC Installing Serviceguard Extension for RAC Installing Serviceguard Extension for RAC Installing Serviceguard Extension for RAC includes updating the software and rebuilding the kernel to support high availability cluster operation for Oracle Real Application Clusters.
Serviceguard Configuration for Oracle RAC Configuration File Parameters Configuration File Parameters You need to code specific entries for all the storage groups that you want to use in an Oracle RAC configuration. If you are using LVM, the OPS_VOLUME_GROUP parameter is included in the cluster ASCII file. If you are using VERITAS CVM, the STORAGE_GROUP parameter is included in the package ASCII file.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM Creating a Storage Infrastructure with LVM In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes. This is done with Logical Volume Manager (LVM), VERITAS Cluster Volume Manager (CVM), or VERITAS Volume Manager (VxVM).
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM Creating Volume Groups and Logical Volumes If your volume groups have not been set up, use the procedure in the next sections. If you have already done LVM configuration, skip ahead to the section “Installing Oracle Real Application Clusters.” Selecting Disks for the Volume Group Obtain a list of the disks on both nodes and identify which device files are used for the same disk on both.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM where hh must be unique to the volume group you are creating. Use the next hexadecimal number that is available on your system, after the volume groups that are already configured. Use the following command to display a list of existing volume groups: # ls -l /dev/*/group 3.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM PVG-strict, that is, it occurs between different physical volume groups; the -n redo1.log option lets you specify the name of the logical volume; and the -L 4 option allocates 4 megabytes. NOTE It is important to use the -M n and -c y options for both redo logs and control files. These options allow the redo log files to be resynchronized by SLVM following a system crash before Oracle recovery proceeds.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM If Oracle performs resilvering of RAC data files that are mirrored logical volumes, choose a mirror consistency policy of “NONE” by disabling both mirror write caching and mirror consistency recovery. With a mirror consistency policy of “NONE”, SLVM does not perform the resynchronization.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM the array. If you are using SAM, choose the type of disk array you wish to configure, and follow the menus to define alternate links. If you are using LVM commands, specify the links on the command line. The following example shows how to configure alternate links using LVM commands. The following disk configuration is assumed: 8/0.15.0 8/0.15.1 8/0.15.2 8/0.15.3 8/0.15.4 8/0.15.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM # ls -l /dev/*/group 3. Use the pvcreate command on one of the device files associated with the LUN to define the LUN to LVM as a physical volume. # pvcreate -f /dev/rdsk/c0t15d0 It is only necessary to do this with one of the device file names for the LUN. The -f option is only necessary if the physical volume was previously used in some other volume group. 4.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM Oracle Demo Database Files The following set of files is required for the Oracle demo database which you can create during the installation process. Table 2-1 Required Oracle File Names for Demo Database Logical Volume Name LV Size (MB) Raw Logical Volume Path Name Oracle File Size (MB)* opsctl1.ctl 108 /dev/vg_ops/ropsctl1.ctl 100 opsctl2.ctl 108 /dev/vg_ops/ropsctl2.ctl 100 opsctl3.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM Table 2-1 Required Oracle File Names for Demo Database (Continued) Logical Volume Name LV Size (MB) Raw Logical Volume Path Name Oracle File Size (MB)* ops3log3 28 /dev/vg_ops/rops2log3.log 20 opsdata1 208 /dev/vg_ops/ropsdata1.dbf 200 opsdata2 208 /dev/vg_ops/ropsdata2.dbf 200 opsdata3 208 /dev/vg_ops/ropsdata3.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM Exporting the Logical Volume Infrastructure Before the Oracle volume groups can be shared, their configuration data must be exported to other nodes in the cluster. This is done either in Serviceguard Manager or by using HP-UX commands, as shown in the following sections. Exporting with Serviceguard Manager In Serviceguard Manager, choose Disks and File Systems, then choose Volume Groups.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with LVM 3. On ftsys10 (and other nodes, as necessary), create the volume group directory and the control file named group: # mkdir /dev/vg_ops # mknod /dev/vg_ops/group c 64 0xhh0000 For the group file, the major number is always 64, and the hexadecimal minor number has the form 0xhh0000 where hh must be unique to the volume group you are creating. If possible, use the same number as on ftsys9.
Serviceguard Configuration for Oracle RAC Installing Oracle Real Application Clusters Installing Oracle Real Application Clusters NOTE Some versions of Oracle RAC requires installation of additional software. Refer to your version of Oracle for specific requirements. Before installing the Oracle Real Application Cluster software, make sure the cluster is running.
Serviceguard Configuration for Oracle RAC Cluster Configuration ASCII File Cluster Configuration ASCII File The following is an example of an ASCII configuration file generated with the cmquerycl command using the -w full option on a system with Serviceguard Extension for RAC. The OPS_VOLUME_GROUP parameters appear at the end of the file.
Serviceguard Configuration for Oracle RAC Cluster Configuration ASCII File # Primary Network Interfaces on Bridged Net 2: lan3. # Possible standby Network Interfaces on Bridged Net 2: lan4. # Primary Network Interfaces on Bridged Net 3: lan1. # Warning: There are no standby network interfaces on bridged net 3. # Cluster Timing Parameters (microseconds). # # # # # # # # # The NODE_TIMEOUT parameter defaults to 2000000 (2 seconds). This default setting yields the fastest cluster reformations.
Serviceguard Configuration for Oracle RAC Cluster Configuration ASCII File # # # # is also still supported for compatibility with earlier versions.) For example: OPS_VOLUME GROUP /dev/vg_ops. OPS_VOLUME_GROUP /dev/vg02.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with CVM Creating a Storage Infrastructure with CVM In addition to configuring the cluster, you create the appropriate logical volume infrastructure to provide access to data from different nodes. This is done with Logical Volume Manager (LVM), VERITAS Volume Manager (VxVM), or VERITAS Cluster Volume Manager (CVM).
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with CVM Preparing the Cluster for Use with CVM In order to use the VERITAS Cluster Volume Manager (CVM), you need a cluster that is running with a special CVM package. This means that the cluster must already be configured and running before you create disk groups. NOTE Cluster configuration is described in the previous section.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with CVM Starting the Cluster and Identifying the Master Node Run the cluster, which will activate the special CVM package: # cmruncl When CVM starts up, it selects a master node, and this is the node from which you must issue the disk group configuration commands. To determine the master node, issue the following command from each node in the cluster: # vxdctl -c mode One node will identify itself as the master.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with CVM # vxdg list NAME rootdg ops_dg STATE enabled enabled,shared ID 971995699.1025.node1 972078742.1084.node2 Creating Volumes Use the vxassist command to create logical volumes. The following is an example: # vxassist -g log_files make ops_dg 1024m This command creates a 1024 MB volume named log_files in a disk group named ops_dg.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with CVM Mirror Detachment Policies with CVM The required CVM disk mirror detachment policy is ‘global’, which means that as soon as one node cannot see a specific mirror copy (plex), all nodes cannot see it as well. The alternate policy is ‘local’, which means that if one node cannot see a specific mirror copy, then CVM will deactivate access to the volume for that node only.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with CVM Table 2-2 Volume Name Required Oracle File Names for Demo Database (Continued) Size (MB) Raw Device File Name Oracle File Size (MB) temp.dbf 108 /dev/vx/rdsk/ops_dg/temp.dbf 100 users.dbf 128 /dev/vx/rdsk/ops_dg/users.dbf 120 tools.dbf 24 /dev/vx/rdsk/ops_dg/tools.dbf 15 opsdata1.dbf 208 /dev/vx/rdsk/ops_dg/opsdata1.dbf 200 opsdata2.dbf 208 /dev/vx/rdsk/ops_dg/opsdata2.dbf 200 opsdata3.
Serviceguard Configuration for Oracle RAC Creating a Storage Infrastructure with CVM # ln -s /dev/vx/rdsk/ops_dg/opsctl1.ctl \ /u01/ORACLE/db001/ctrl01_1.ctl Example, Oracle9: 1. Create an ASCII file, and define the path for each database object. # control1= /dev/vg_ops/ropsctl1.ctl \ /u01/ORACLE/db001/ctrl01_1.ctl 2. Set the following environment variable where filename is the name of the ASCII file created.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances Using Packages to Configure Startup and Shutdown of RAC Instances To automate the startup and shutdown of RAC instances on the nodes of the cluster, you can create packages which activate the appropriate volume groups and then run RAC.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances 1. Shut down the Oracle applications, if any. 2. Shut down Oracle. 3. Deactivate the database volume groups or disk groups. 4. Shut down the cluster (cmhaltnode or cmhaltcl). If the shutdown sequence described above is not followed, cmhaltcl or cmhaltnode may fail with a message that GMS clients (RAC 9i) are active or that shared volume groups are active.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances If you are using CVM disk groups for the RAC database, be sure to include the name of each disk group on a separate STORAGE_GROUP line in the configuration file. Configuring Packages that Access the Oracle RAC Database You can also use packages to start up applications that access the RAC instances.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances Using Serviceguard Manager to Write the Package Control Script As you complete the tabs for the configuration, the control script can be generated automatically. When asked to supply the the pathname of the package run and halt scripts, use the filenames from the ECM toolkit. For more information, use the Help key.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances NOTE 64 • If you are using CVM, enter the names of disk groups to be activated using the CVM_DG[] array parameters, and select the appropriate storage activation command, CVM_ACTIVATION_CMD. Do not use the VG[] or VXVM_DG[] parameters for CVM disk groups.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances To avoid problems in the execution of control scripts, ensure that each run command is the name of an actual service and that its process remains alive until the actual service stops. If you need to define a set of run and halt operations in addition to the defaults, create functions for them in the sections under the heading CUSTOMER DEFINED FUNCTIONS.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances If your disks are mirrored with LVM mirroring on separate physical paths and you want to override quorum, use the following setting: VGCHANGE="vgchange -a s -q n” Enter the names of the CVM disk groups you wish to activate in shared mode in the CVM_DG[] array. Use a different array element for each RAC disk group.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances Information for Creating the Oracle RAC Instance Package on a SGeRAC Node Use the following steps to set up the pre-package configuration on a SGeRAC node: 1. Gather the RAC Instance SID_NAME. If you are using Serviceguard Manager, this is in the cluster Properties. Example: SID_NAME=ORACLE_TEST0 For an ORACLE RAC Instance for a two node cluster, each node would have an SID_NAME. 2.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances Example: SERVICE_NAME[0]=ORACLE_TEST0 SERVICE_CMD[0]=”/etc/cmcluster/ORACLE_TEST0/ORACLE_TES T0.sh monitor” SERVICE_RESTART[0]=”3” 7. Gather how to start the database using an ECMT script. In Serviceguard Manager, enter this filename for the control script start command. /etc/cmcluster/${SID_NAME}/${SID_NAME}.sh start Example: /etc/cmcluster/ORACLE_TEST0/ORACLE_TEST0.sh start 8.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances Figure 2-1 Serviceguard Manager display for a RAC Instance package 2. Create the Package. 3. Select the Parameters and select the parameters to edit. Next select the check box “Enable template(x)” to enable Package Template for Oracle RAC. The template defaults can be reset with the “Reset template defaults” push button.
Serviceguard Configuration for Oracle RAC Using Packages to Configure Startup and Shutdown of RAC Instances 7. Select the Control Script tab and configure parameters. Configure volume groups and customer defined run/halt functions. 8. Apply the package configuration after filling in the specified paramenters.
Maintenance and Troubleshooting 3 Maintenance and Troubleshooting This chapter includes information about carrying out routine maintenance on an Real Application Cluster configuration. As presented here, these tasks differ in some details from the similar tasks described in Chapter 7 of Managing Serviceguard.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command Reviewing Cluster and Package States with the cmviewcl Command A cluster or its component nodes may be in several different states at different points in time. Status information for clusters, packages and other cluster elements is shown in the output of the cmviewcl command and in some displays in Serviceguard Manager.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command Cluster Status The status of a cluster may be one of the following: • Up. At least one node has a running cluster daemon, and reconfiguration is not taking place. • Down. No cluster daemons are running on any cluster node. • Starting. The cluster is in the process of determining its active membership. At least one cluster daemon is running. • Unknown.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command • Up. The package control script is active. • Down. The package control script is not active. • Unknown. The state of the package can be one of the following: • Starting. The start instructions in the control script are being run. • Running. Services are active and being monitored. • Halting. The halt instructions in the control script are being run.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command DAALL_DB IGOPALL 0 1 2 1 10396 10501 10423 10528 comanche chinook comanche chinook where the cmviewcl output values are: GROUP the name of a configured group MEMBER the ID number of a member of a group PID the Process ID of the group member MEMBER_NODE the Node on which the group member is running Service Status Services have only status, as follows: • Up. The service is being monitored. • Down.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command • Recovering. A corrupt message was received on the serial line, and the line is in the process of resynchronizing. • Unknown. We cannot determine whether the serial line is up or down. This can happen when the remote node is down. Failover and Failback Policies Packages can be configured with one of two values for the FAILOVER_POLICY parameter: • CONFIGURED_NODE.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command PRIMARY STANDBY up up 56/36.1 60/6 lan0 lan1 PACKAGE ops_pkg1 STATUS up STATE running AUTO_RUN disabled NODE ftsys9 NAME ftsys9 (cur Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Start configured_node Failback manual Node_Switching_Parameters: NODE_TYPE STATUS SWITCHING Primary up enabled rent) NODE ftsys10 STATUS up STATE running Network_Parameters: INTERFACE STATUS PRIMARY up STANDBY up PATH 28.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command CLUSTER example NODE ftsys9 STATUS up STATUS up STATE running Quorum Server Status: NAME STATUS lp-qs up ...
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command NODE ftsys8 STATUS down NODE STATUS ftsys9 up Script_Parameters: ITEM STATUS Service up VxVM-CVM-pkg.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command NODE_TYPE Primary Alternate PACKAGE pkg2 STATUS up up STATUS up SWITCHING enabled enabled NAME ftsys9 ftsys10 STATE running (current) AUTO_RUN disabled NODE ftsys9 Policy_Parameters: POLICY_NAME CONFIGURED_VALUE Failover min_package_node Failback manual Script_Parameters: ITEM STATUS NAME MAX_RESTARTS Service up service2.1 0 Subnet up 15.13.168.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command PACKAGE pkg1 pkg2 NODE ftsys10 STATUS up up STATUS up STATE running running AUTO_RUN enabled enabled NODE ftsys9 ftsys9 STATE running Both packages are now running on ftsys9 and pkg2 is enabled for switching. Ftsys10 is running the daemon and no packages are running on ftsys10.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command PRIMARY up Serial_Heartbeat: DEVICE_FILE_NAME /dev/tty0p0 NODE ftsys10 STATUS up Network_Parameters: INTERFACE STATUS PRIMARY up Serial_Heartbeat: DEVICE_FILE_NAME /dev/tty0p0 56/36.1 STATUS up lan0 CONNECTED_TO: ftsys10 /dev/tty0p0 STATE running PATH 28.
Maintenance and Troubleshooting Reviewing Cluster and Package States with the cmviewcl Command Viewing Data on Unowned Packages The following example shows packages that are currently unowned, that is, not running on any configured node. Information on monitored resources is provided for each node on which the package can run; this allows you to identify the cause of a failure and decide where to start the package up again.
Maintenance and Troubleshooting Managing the Shared Storage Managing the Shared Storage Making LVM Volume Groups Shareable Normally, volume groups are marked to be activated in shared mode when they are listed with the OPS_VOLUME_GROUP parameter in the cluster configuration file or in Serviceguard Manager. which occurs when the configuration is applied. However, in some cases you may want to manually make a volume group sharable.
Maintenance and Troubleshooting Managing the Shared Storage The above example marks the volume group as non-shared and not associated with a cluster. Activating an LVM Volume Group in Shared Mode Activation and deactivation of shared volume groups is normally done through a control script. If you need to perform activation from the command line, you can issue the following command from each node to activate the volume group in shared mode.
Maintenance and Troubleshooting Managing the Shared Storage Making Changes to Shared Volume Groups You may need to change the volume group configuration of RAC shared logical volumes to add capacity to the data files or to add log files. No configuration changes are allowed on shared LVM volume groups while they are activated. The volume group must be deactivated first on all nodes, and marked as non-shareable.
Maintenance and Troubleshooting Managing the Shared Storage Make a copy of /etc/lvmpvg in /tmp/lvmpvg, then copy the file to /tmp/lvmpvg on node 2. Copy the file /tmp/vg_ops.map to node 2. 10. Use the following command to make the volume group shareable by the entire cluster again: # vgchange -S y -c y /dev/vg_ops 11. On node 2, issue the following command: # mkdir /dev/vg_ops 12.
Maintenance and Troubleshooting Managing the Shared Storage • Volume groups should include different PV links to each logical unit on the disk array. • Volume group names must be the same on all nodes in the cluster. • Logical volume names must be the same on all nodes in the cluster. Changing the VxVM or CVM Storage Configuration You can add VxVM disk groups to the cluster configuration while the cluster is running. To add new CVM disk groups, the cluster must be running.
Maintenance and Troubleshooting Removing ServiceGuard Extension for RAC from a System Removing ServiceGuard Extension for RAC from a System If you wish to remove a node from ServiceGuard Extension for RAC operation, use the swremove command to delete the software. Note the following: NOTE Chapter 3 • The cluster should not be running on the node from which you will be deleting ServiceGuard Extension for RAC.
Maintenance and Troubleshooting Monitoring Hardware Monitoring Hardware Good standard practice in handling a high availability system includes careful fault monitoring so as to prevent failures if possible or at least to react to them swiftly when they occur.
Maintenance and Troubleshooting Monitoring Hardware Using HP Predictive Monitoring In addition to messages reporting actual device failure, the logs may accumulate messages of lesser severity which, over time, can indicate that a failure may happen soon. One product that provides a degree of automation in monitoring is called HP Predictive, which gathers information from the status queues of a monitored system to see what errors are accumulating.
Maintenance and Troubleshooting Adding Disk Hardware Adding Disk Hardware As your system expands, you may need to add disk hardware. This also means modifying the logical volume structure. Use the following general procedure: 1. Halt packages. 2. Ensure that the Oracle database is not active on either node. 3. Deactivate and mark as unshareable any shared volume groups. 4. Halt the cluster. 5. Deactivate automatic cluster startup. 6. Shutdown and power off system before installing new hardware. 7.
Maintenance and Troubleshooting Replacing Disks Replacing Disks The procedure for replacing a faulty disk mechanism depends on the type of disk configuration you are using and on the type of Volume Manager software. For a description of replacement procedures using VERITAS VxVM or CVM, refer to the chapter on “Administering Hot-Relocation” in the VERITAS Volume Manager 3.2 Administrator’s Guide. Additional information is found in the VERITAS Volume Manager 3.2 Troubleshooting Guide.
Maintenance and Troubleshooting Replacing Disks 1. Identify the physical volume name of the failed disk and the name of the volume group in which it was configured. In the following examples, the volume group name is shown as /dev/vg_sg01 and the physical volume name is shown as /dev/c2t3d0. Substitute the volume group and physical volume names that are correct for your system. 2. Identify the names of any logical volumes that have extents defined on the failed physical volume. 3.
Maintenance and Troubleshooting Replacing Disks 1. Make a note of the physical volume name of the failed mechanism (e.g., /dev/dsk/c2t3d0). 2. Deactivate the volume group on all nodes of the cluster: # vgchange -a n vg_ops 3. Replace the bad disk mechanism with a good one. 4.
Maintenance and Troubleshooting Replacing Disks On-line Hardware Maintenance with In-line SCSI Terminator ServiceGuard allows on-line SCSI disk controller hardware repairs to all cluster nodes if you use HP’s in-line terminator (C2980A) on nodes connected to the end of the shared FW/SCSI bus. The in-line terminator cable is a 0.5 meter extension cable with the terminator on the male end, which connects to the controller card for an external bus.
Maintenance and Troubleshooting Replacing Disks Figure 3-1 F/W SCSI Buses with In-line Terminators The use of in-line SCSI terminators allows you to do hardware maintenance on a given node by temporarily moving its packages to another node and then halting the original node while its hardware is serviced. Following the replacement, the packages can be moved back to the original node.
Maintenance and Troubleshooting Replacing Disks 2. Halt the node that requires maintenance. The cluster will re-form, and activity will continue on other nodes. Packages on the halted node will switch to other available nodes if they are configured to switch. 3. Disconnect the power to the node. 4. Disconnect the node from the in-line terminator cable or Y cable if necessary.
Maintenance and Troubleshooting Replacement of I/O Cards Replacement of I/O Cards After an I/O card failure, you can replace the card using the following steps. It is not necessary to bring the cluster down to do this if you are using SCSI inline terminators or Y cables at each node. 1. Halt the node by using Serviceguard Manager or the cmhaltnode command. Packages should fail over normally to other nodes. 2. Remove the I/O cable from the card.
Maintenance and Troubleshooting Replacement of LAN Cards Replacement of LAN Cards If you have a LAN card failure, which requires the LAN card to be replaced, you can replace it on-line or off-line depending on the type of hardware and operating system you are running. It is not necessary to bring the cluster down to do this. Off-Line Replacement The following steps show how to replace a LAN card off-line. These steps apply to both HP-UX 11.0 and 11i: 1. Halt the node by using the cmhaltnode command. 2.
Maintenance and Troubleshooting Replacement of LAN Cards After Replacing the Card After the on-line or off-line replacement of LAN cards has been done, Serviceguard will detect that the MAC address (LLA) of the card has changed from the value stored in the cluster binary configuration file, and it will notify the other nodes in the cluster of the new MAC address. The cluster will operate normally after this.
Maintenance and Troubleshooting Monitoring RAC Instances Monitoring RAC Instances DB Provides provides the capability to monitor RAC databases. RBA (Role Based Access) enables a non-root user to have the capability to monitor RAC instances using Serviceguard Manager.
Blank Planning Worksheets A Blank Planning Worksheets This appendix reprints blank planning worksheets used in preparing the RAC cluster. You can duplicate any of these worksheets that you find useful and fill them in as a part of the planning process.
Blank Planning Worksheets LVM Volume Group and Physical Volume Worksheet LVM Volume Group and Physical Volume Worksheet VG and PHYSICAL VOLUME WORKSHEET Page ___ of ____ ========================================================================== Volume Group Name: ______________________________________________________ PV Link 1 PV Link2 Physical Volume Name:_____________________________________________________ Physical Volume Name:_____________________________________________________ Physical Volume Name:
Blank Planning Worksheets VxVM Disk Group and Disk Worksheet VxVM Disk Group and Disk Worksheet DISK GROUP WORKSHEET Page ___ of ____ =========================================================================== Disk Group Name: __________________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:______________________________________________________ Physical Volume Name:____________________________________________________
Blank Planning Worksheets Oracle Logical Volume Worksheet Oracle Logical Volume Worksheet NAME SIZE Oracle Control File 1: _____________________________________________________ Oracle Control File 2: _____________________________________________________ Oracle Control File 3: _____________________________________________________ 106 Instance 1 Redo Log 1: _____________________________________________________ Instance 1 Redo Log 2: _____________________________________________________ Instance 1 Red
Index A activation of volume groups in shared mode, 85 adding packages on a running cluster, 62 administration cluster and package states, 72 array replacing a faulty mechanism, 93, 94 AUTO_RUN parameter, 61 AUTO_START_TIMEOUT in sample configuration file, 50 B building a cluster CVM infrastructure, 53 building an RAC cluster displaying the logical volume infrastructure, 46 logical volume infrastructure, 38 building logical volumes for RAC, 44 C cluster state, 76 status options, 73 cluster configuration fil
Index H hardware adding disks, 92 monitoring, 90 heartbeat subnet address parameter in cluster manager configuration, 37 HEARTBEAT_INTERVAL in sample configuration file, 50 HEARTBEAT_IP in sample configuration file, 50 high availability cluster defined, 16 HP Predictive monitoring in troubleshooting, 91 I in-line terminator permitting online hardware maintenance, 96 installing software MC/LockManager, 36 Oracle Parallel Server, 49 IP in sample package control script, 65 IP address switching, 25 L lock disk
Index P package basic concepts, 17, 18 moving status, 79 state, 76 status and state, 73 switching status, 80 package configuration service name parameter, 37 writing the package control script, 62 package control script generating with commands, 63 packages accessing OPS database, 62 deciding where and when to run, 22 launching OPS instances, 61 startup and shutdown volume groups, 60 parameter AUTO_RUN, 61 NODE_FAILFAST_ENABLED, 61 performance optimizing packages for large numbers of storage units, 65 physi
Index package, 73 RAC, 74 serial line, 75 service, 75 switching package, 80 SUBNET in sample package control script, 65 switching IP addresses, 25 system multi-node package used with CVM, 54 system.dbf Oracle demo database files, 45, 57 T temp.dbf Oracle demo database files, 45, 58 tools.