Red Hat Cluster Manager The Red Hat Cluster Manager Installation and Administration Guide
ISBN: N/A Red Hat, Inc. 1801 Varsity Drive Raleigh, NC 27606 USA +1 919 754 3700 (Voice) +1 919 754 3701 (FAX) 888 733 4281 (Voice) P.O. Box 13588 Research Triangle Park, NC 27709 USA © 2002 Red Hat, Inc. © 2000 Mission Critical Linux, Inc. © 2000 K.M. Sorenson rh-cm(EN)-1.0-Print-RHI (2002-04-17T17:16-0400) Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation.
iii Acknowledgments The Red Hat Cluster Manager software was originally based on the open source Kimberlite http://oss.missioncriticallinux.com/kimberlite/ cluster project which was developed by Mission Critical Linux, Inc. Subsequent to its inception based on Kimberlite, developers at Red Hat have made a large number of enhancements and modifications. The following is a non-comprehensive list highlighting some of these enhancements.
Contents Red Hat Cluster Manager Acknowledgments ... .. ... ... .. ... ... .. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... . iii Chapter 1 7 7 9 12 1.1 1.2 1.3 Chapter 2 2.1 2.2 2.3 2.4 Chapter 3 3.1 3.2 3.3 3.4 Chapter 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 Introduction to Red Hat Cluster Manager .. ... .. ... . Cluster Overview . ... .. ... ... .. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... . Cluster Features .. ... .. ... ... .. ... .. ... ... .. ... .
Chapter 5 5.1 5.2 5.3 5.4 Chapter 6 6.1 6.2 Chapter 7 7.1 Chapter 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 Chapter 9 9.1 9.2 9.3 9.4 Database Services.. .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... . Setting Up an Oracle Service ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... . Tuning Oracle Services .. ... .. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... . Setting Up a MySQL Service. ... .. ... ... .. ... .. ... ... .. ... ...
Appendix A Supplementary Hardware Information .. .. ... ... .. ... . A.1 A.2 A.3 A.4 A.5 A.6 A.7 Setting Up Power Switches... ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... . SCSI Bus Configuration Requirements .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... . SCSI Bus Termination. ... ... .. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... . SCSI Bus Length . ... .. ... ... .. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... .
Section 1.1:Cluster Overview 1 Introduction to Red Hat Cluster Manager The Red Hat Cluster Manager is a collection of technologies working together to provide data integrity and the ability to maintain application availability in the event of a failure. Using redundant hardware, shared disk storage, power management, and robust cluster communication and application failover mechanisms, a cluster can meet the needs of the enterprise market.
Chapter 1:Introduction to Red Hat Cluster Manager Figure 1–1 Example Cluster Figure 1–1, Example Cluster shows an example of a cluster in an active-active configuration. If a hardware or software failure occurs, the cluster will automatically restart the failed system’s services on the functional cluster system. This service failover capability ensures that no data is lost, and there is little disruption to users.
Section 1.2:Cluster Features 9 1.2 Cluster Features A cluster includes the following features: • No-single-point-of-failure hardware configuration Clusters can include a dual-controller RAID array, multiple network and serial communication channels, and redundant uninterruptible power supply (UPS) systems to ensure that no single failure results in application down time or loss of data. Alternately, a low-cost cluster can be set up to provide less availability than a no-single-point-offailure cluster.
Chapter 1:Introduction to Red Hat Cluster Manager the two systems from simultaneously accessing the same data and corrupting it. Although not required, it is recommended that power switches are used to guarantee data integrity under all failure conditions. Watchdog timers are an optional variety of power control to ensure correct operation of service failover.
Section 1.2:Cluster Features Figure 1–2 Cluster Communication Mechanisms Figure 1–2, Cluster Communication Mechanisms shows how systems communicate in a cluster configuration. Note that the terminal server used to access system consoles via serial ports is not a required cluster component. • Service failover capability If a hardware or software failure occurs, the cluster will take the appropriate action to maintain application availability and data integrity.
Chapter 1:Introduction to Red Hat Cluster Manager • Manual service relocation capability In addition to automatic service failover, a cluster enables administrators to cleanly stop services on one cluster system and restart them on the other system. This allows administrators to perform planned maintenance on a cluster system, while providing application and data availability.
Section 2.1:Choosing a Hardware Configuration 2 Hardware Installation and Operating System Configuration To set up the hardware configuration and install the Linux distribution, follow these steps: • Choose a cluster hardware configuration that meets the needs of applications and users, see Section 2.1, Choosing a Hardware Configuration. • Set up and connect the cluster systems and the optional console switch and network switch or hub, see Section 2.2, Steps for Setting Up the Cluster Systems.
Chapter 2:Hardware Installation and Operating System Configuration Cost restrictions The hardware configuration chosen must meet budget requirements. For example, systems with multiple I/O ports usually cost more than low-end systems with less expansion capabilities.
Section 2.1:Choosing a Hardware Configuration RAID units. These products require extensive testing to ensure reliable operation, especially if the shared RAID units are based on parallel SCSI buses. These products typically do not allow for online repair of a failed system. No host RAID adapters are currently certified with Red Hat Cluster Manager. Refer to the Red Hat web site at http://www.redhat.com for the most up-to-date supported hardware matrix.
Chapter 2:Hardware Installation and Operating System Configuration Problem Solution Power source failure Redundant uninterruptible power supply (UPS) systems.
Section 2.1:Choosing a Hardware Configuration 2.1.3 Choosing the Type of Power Controller The Red Hat Cluster Manager implementation consists of a generic power management layer and a set of device specific modules which accommodate a range of power management types. When selecting the appropriate type of power controller to deploy in the cluster, it is important to recognize the implications of specific device types.
Chapter 2:Hardware Installation and Operating System Configuration with a power controller type of "None" is useful for simple evaluation purposes, but because it affords the weakest data integrity provisions, it is not recommended for usage in a production environment. Ultimately, the right type of power controller deployed in a cluster environment depends on the data integrity requirements weighed against the cost and availability of external power switches.
Section 2.1:Choosing a Hardware Configuration 19 2.1.4 Cluster Hardware Tables Use the following tables to identify the hardware components required for your cluster configuration. In some cases, the tables list specific products that have been tested in a cluster, although a cluster is expected to work with other products. The complete set of qualified cluster hardware components change over time. Consequently, the table below may be incomplete.
Chapter 2:Hardware Installation and Operating System Configuration Table 2–4 Power Switch Hardware Table Hardware Quantity Description Serial power switches Two Null modem cable Two Null modem cables connect a serial port on a cluster system to a serial power switch. This serial connection enables each cluster system to power-cycle the other system. Some power switches may require different cables.
Section 2.1:Choosing a Hardware Configuration Description 21 Hardware Quantity Required Network power switch One Network attached power switches enable each cluster member to power cycle all others. Refer to Section 2.4.2, Configuring Power Switches for information about using network attached power switches, as well as caveats associated with each. The following network attached power switch has been fully tested: · WTI NPS-115, or NPS-230, available from http://www.wti.com.
Chapter 2:Hardware Installation and Operating System Configuration Table 2–5 Shared Disk Storage Hardware Table Hardware Quantity External disk storage enclosure One Description Use Fibre Channel or single-initiator parallel SCSI to connect the cluster systems to a single or dual-controller RAID array. To use single-initiator buses, a RAID controller must have multiple host ports and provide simultaneous access to all the logical units on the host ports.
Section 2.1:Choosing a Hardware Configuration Hardware Quantity Description Host bus adapter Two To connect to shared disk storage, you must install either a parallel SCSI or a Fibre Channel host bus adapter in a PCI slot in each cluster system. For parallel SCSI, use a low voltage differential (LVD) host bus adapter. Adapters have either HD68 or VHDCI connectors.
Chapter 2:Hardware Installation and Operating System Configuration Hardware Quantity Description Required SCSI terminator Two For a RAID storage enclosure that uses "out" ports (such as FlashDisk RAID Disk Array) and is connected to single-initiator SCSI buses, connect terminators to the "out" ports in order to terminate the buses. Only for parallel SCSI configurations and only if necessary for termination Fibre Channel hub or switch One or two A Fibre Channel hub or switch is required.
Section 2.1:Choosing a Hardware Configuration Table 2–7 25 Point-To-Point Ethernet Heartbeat Channel Hardware Table Hardware Quantity Description Required Network interface Two for each channel Each Ethernet heartbeat channel requires a network interface installed in both cluster systems.
Chapter 2:Hardware Installation and Operating System Configuration Table 2–8 Point-To-Point Serial Heartbeat Channel Hardware Table Hardware Quantity Serial card Two for each serial channel Null modem cable One for each channel Table 2–9 Description Each serial heartbeat channel requires a serial port on both cluster systems. To expand your serial port capacity, you can use multi-port serial PCI cards.
Section 2.1:Choosing a Hardware Configuration Table 2–10 27 UPS System Hardware Table Hardware Quantity UPS system One or two Description Uninterruptible power supply (UPS) systems protect against downtime if a power outage occurs. UPS systems are highly recommended for cluster operation. Ideally, connect the power cables for the shared storage enclosure and both power switches to redundant UPS systems.
Chapter 2:Hardware Installation and Operating System Configuration Hardware Quantity RAID storage enclosure The RAID storage enclosure contains one controller with at least two host ports. Two HD68 SCSI cables Each cable connects one HBA to one port on the RAID controller, creating two single-initiator SCSI buses. 2.1.
Section 2.1:Choosing a Hardware Configuration Hardware Quantity One network crossover cable A network crossover cable connects a network interface on one cluster system to a network interface on the other system, creating a point-to-point Ethernet heartbeat channel. Two RPS-10 power switches Power switches enable each cluster system to power-cycle the other system before restarting its services. The power cable for each cluster system is connected to its own power switch.
Chapter 2:Hardware Installation and Operating System Configuration Figure 2–1 No-Single-Point-Of-Failure Configuration Example 2.2 Steps for Setting Up the Cluster Systems After identifying the cluster hardware components described in Section 2.1, Choosing a Hardware Configuration, set up the basic cluster system hardware and connect the systems to the optional console switch and network switch or hub. Follow these steps: 1.
Section 2.2:Steps for Setting Up the Cluster Systems 31 2.2.1 Installing the Basic System Hardware Cluster systems must provide the CPU processing power and memory required by applications. It is recommended that each system have a minimum of 450 MHz CPU speed and 256 MB of memory. In addition, cluster systems must be able to accommodate the SCSI or FC adapters, network interfaces, and serial ports that the hardware configuration requires.
Chapter 2:Hardware Installation and Operating System Configuration devices on one channel and the shared disks on the other channel. Using multiple SCSI cards is also possible. See the system documentation supplied by the vendor for detailed installation information. See Appendix A, Supplementary Hardware Information for hardware-specific information about using host bus adapters in a cluster.
Section 2.3:Steps for Installing and Configuring the Red Hat Linux Distribution Set up the console switch according to the documentation provided by the vendor. After the console switch has been set up, connect it to each cluster system. The cables used depend on the type of console switch. For example, if you a Cyclades terminal server uses RJ45 to DB9 crossover cables to connect a serial port on each cluster system to the terminal server. 2.2.
Chapter 2:Hardware Installation and Operating System Configuration • Use the cat /proc/devices command to display the devices configured in the kernel. See Section 2.3.5, Displaying Devices Configured in the Kernel for more information about performing this task. 8. Verify that the cluster systems can communicate over all the network interfaces by using the ping command to send test packets from one system to the other. 9.
Section 2.3:Steps for Installing and Configuring the Red Hat Linux Distribution • Do not place local file systems, such as /, /etc, /tmp, and /var on shared disks or on the same SCSI bus as shared disks. This helps prevent the other cluster member from accidentally mounting these file systems, and also reserves the limited number of SCSI identification numbers on a bus for cluster disks. • Place /tmp and /var on different file systems. This may improve system performance.
Chapter 2:Hardware Installation and Operating System Configuration point heartbeat connection on each cluster system (ecluster2 and ecluster3) as well as the IP alias clusteralias used for remote cluster monitoring. Verify correct formatting of the local host entry in the /etc/hosts file to ensure that it does not include non-local systems in the entry for the local host. An example of an incorrect local host entry that includes a non-local system (server1) is shown next: 127.0.0.1 localhost.
Section 2.3:Steps for Installing and Configuring the Red Hat Linux Distribution 37 To modify the kernel boot timeout limit for a cluster system, edit the /etc/lilo.conf file and specify the desired value (in tenths of a second) for the timeout parameter. The following example sets the timeout limit to three seconds: timeout = 30 To apply any changes made to the /etc/lilo.conf file, invoke the /sbin/lilo command. Similarly, when using the grub boot loader, the timeout parameter in /boot/grub/grub.
Chapter 2:Hardware Installation and Operating System Configuration May 22 14:02:11 storage3 kernel: Detected scsi disk sde at scsi1, channel 0, id 3, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdf at scsi1, channel 0, id 8, lun 0 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001 May 22 14:02:11 storage3 kernel: Detected scsi disk sdg at scsi1, channel 0, id 9, lun 0 May 22 14:02:11 sto
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 19 20 128 136 162 ttyC cub ptm pts raw Block devices: 2 fd 3 ide0 8 sd 65 sd # The previous example shows: • Onboard serial ports (ttyS) • Serial expansion card (ttyC) • Raw devices (raw) • SCSI devices (sd) 2.
Chapter 2:Hardware Installation and Operating System Configuration 4. Set up the shared disk storage according to the vendor instructions and connect the cluster systems to the external storage enclosure.See Section 2.4.4, Configuring Shared Disk Storage for more information about performing this task. In addition, it is recommended to connect the storage enclosure to redundant UPS systems. See Section 2.4.3, Configuring UPS Systems for more information about using optional UPS systems. 5.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware To set up a redundant Ethernet heartbeat channel, use a network crossover cable to connect a network interface on one cluster system to a network interface on the other cluster system. To set up a serial heartbeat channel, use a null modem cable to connect a serial port on one cluster system to a serial port on the other cluster system.
Chapter 2:Hardware Installation and Operating System Configuration If power switches are not used in cluster, and a cluster system determines that a hung system is down, it will set the status of the failed system to DOWN on the quorum partitions, and then restart the hung system’s services. If the hung system becomes becomes responsive, it will notice that its status is DOWN, and initiate a system reboot.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware It is not recommended to use a large UPS infrastructure as the sole source of power for the cluster. A UPS solution dedicated to the cluster itself allows for more flexibility in terms of manageability and availability. A complete UPS system must be able to provide adequate voltage and current for a prolonged period of time.
Chapter 2:Hardware Installation and Operating System Configuration Figure 2–4 Single UPS System Configuration Many vendor-supplied UPS systems include Linux applications that monitor the operational status of the UPS system through a serial port connection. If the battery power is low, the monitoring software will initiate a clean system shutdown. As this occurs, the cluster software will be properly stopped, because it is controlled by a System V run level script (for example, /etc/rc.d/init.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware Multi-initiator SCSI configurations are not supported due to the difficulty in obtaining proper bus termination. • The Linux device name for each shared storage device must be the same on each cluster system. For example, a device named /dev/sdc on one cluster system must be named /dev/sdc on the other cluster system. Using identical hardware for both cluster systems usually ensures that these devices will be named the same.
Chapter 2:Hardware Installation and Operating System Configuration two single-initiator SCSI buses to connect each cluster system to the RAID array is possible. If a logical unit can fail over from one controller to the other, the process must be transparent to the operating system. Note that some RAID controllers restrict a set of disks to a specific controller or port. In this case, single-initiator bus setups are not possible.
Section 2.
Chapter 2:Hardware Installation and Operating System Configuration Setting Up a Fibre Channel Interconnect Fibre Channel can be used in either single-initiator or multi-initiator configurations A single-initiator Fibre Channel interconnect has only one cluster system connected to it. This may provide better host isolation and better performance than a multi-initiator bus.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware Figure 2–9 Dual-Controller RAID Array Connected to Single-Initiator Fibre Channel Interconnects If a dual-controller RAID array with two host ports on each controller is used, a Fibre Channel hub or switch is required to connect each host bus adapter to one port on both controllers, as shown in Figure 2–9, Dual-Controller RAID Array Connected to Single-Initiator Fibre Channel Interconnects.
Chapter 2:Hardware Installation and Operating System Configuration partition. Data consistency is maintained through checksums and any inconsistencies between the partitions are automatically corrected. If a system is unable to write to both quorum partitions at startup time, it will not be allowed to join the cluster.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware 1. Invoke the interactive fdisk command, specifying an available shared disk device. At the prompt, specify the p command to display the current partition table. # fdisk /dev/sde Command (m for help): p Disk /dev/sde: 255 heads, 63 sectors, 2213 cylinders Units = cylinders of 16065 * 512 bytes Device Boot /dev/sde1 /dev/sde2 Start 1 263 End 262 288 Blocks 2104483+ 208845 Id 83 83 System Linux Linux 2.
Chapter 2:Hardware Installation and Operating System Configuration Syncing disks. 7. If a partition was added while both cluster systems are powered on and connected to the shared storage, reboot the other cluster system in order for it to recognize the new partition. After partitioning a disk, format the partition for use in the cluster. For example, create file systems or raw devices for quorum partitions. See Creating Raw Devices in Section 2.4.4 and Creating File Systems in Section 2.4.
Section 2.4:Steps for Setting Up and Connecting the Cluster Hardware # service rawdevices restart Query all the raw devices by using the command raw -aq: # raw -aq /dev/raw/raw1 /dev/raw/raw2 bound to major 8, minor 17 bound to major 8, minor 18 Note that, for raw devices, there is no cache coherency between the raw device and the block device. In addition, requests must be 512-byte aligned both in memory and on disk.
Chapter 2:Hardware Installation and Operating System Configuration
Section 3.1:Steps for Installing and Initializing the Cluster Software 3 Cluster Software Installation and Configuration After installing and configuring the cluster hardware, the cluster system software can be installed. The following sections describe installing and initializing of cluster software, checking cluster configuration, configuring syslog event logging, and using the cluadmin utility. 3.
Chapter 3:Cluster Software Installation and Configuration • Number of heartbeat connections (channels), both Ethernet and serial • Device special file for each heartbeat serial line connection (for example, /dev/ttyS1) • IP host name associated with each heartbeat Ethernet interface • IP address for remote cluster monitoring, also referred to as the "cluster alias". Refer to Section 3.1.2, Configuring the Cluster Alias for further information.
Section 3.1:Steps for Installing and Initializing the Cluster Software 3.1.1 Editing the rawdevices File The /etc/sysconfig/rawdevices file is used to map the raw devices for the quorum partitions each time a cluster system boots. As part of the cluster software installation procedure, edit the rawdevices file on each cluster system and specify the raw character devices and block devices for the primary and backup quorum partitions. This must be done prior to running the cluconfig utility.
Chapter 3:Cluster Software Installation and Configuration While running cluconfig, you will be prompted as to whether or not you wish to configure a cluster alias. This appears as the following prompt: Enter IP address for cluster alias [NONE]: 172.16.33.105 As shown above, the default value is set to NONE, which means that there is no cluster alias, but the user overrides this default and configures an alias using an IP address of 172.16.33.105.
Section 3.1:Steps for Installing and Initializing the Cluster Software 59 /sbin/cluconfig Red Hat Cluster Manager Configuration Utility (running on storage0) - Configuration file exists already. Would you like to use those prior settings as defaults? (yes/no) [yes]: yes Enter cluster name [Development Cluster]: Enter IP address for cluster alias [10.0.0.154]: 10.0.0.
Chapter 3:Cluster Software Installation and Configuration Enter hostname of the cluster member on heartbeat channel 0 \ [storage1]: storage1 Looking for host storage1 (may take a few seconds)...
Section 3.
Chapter 3:Cluster Software Installation and Configuration 3.2 Checking the Cluster Configuration To ensure that the cluster software has been correctly configured, use the following tools located in the /sbin directory: • Test the quorum partitions and ensure that they are accessible. Invoke the cludiskutil utility with the -t option to test the accessibility of the quorum partitions. See Section 3.2.1, Testing the Quorum Partitions for more information. • Test the operation of the power switches.
Section 3.2:Checking the Cluster Configuration /sbin/cludiskutil -p ----- Shared State Header -----Magic# = 0x39119fcd Version = 1 Updated on Thu Sep 14 05:43:18 2000 Updated by node 0 -------------------------------- The Magic# and Version fields will be the same for all cluster configurations. The last two lines of output indicate the date that the quorum partitions were initialized with cludiskutil -I, and the numeric identifier for the cluster system that invoked the initialization command.
Chapter 3:Cluster Software Installation and Configuration invoked, it checks the status of the cluster software. If the cluster software is running, the command exits with a message to stop the cluster software.
Section 3.3:Configuring syslog Event Logging – Verify that the network connection to network-based switches is operational. Most switches have a link light that indicates connectivity. – It should be possible to ping the network switch; if not, then the switch may not be properly configured for its network parameters. – Verify that the correct password and login name (depending on switch type) have been specified in the cluster configuration database (as established by running cluconfig).
Chapter 3:Cluster Software Installation and Configuration The importance of an event determines the severity level of the log entry. Important events should be investigated before they affect cluster availability. The cluster can log messages with the following severity levels, listed in order of severity level: • emerg — The cluster system is unusable. • alert — Action must be taken immediately to address the problem. • crit — A critical condition has occurred. • err — An error has occurred.
Section 3.4:Using the cluadmin Utility 67 After configuring the cluster software, optionally edit the /etc/syslog.conf file to enable the cluster to log events to a file that is different from the default log file, /var/log/messages. The cluster utilities and daemons log their messages using a syslog tag called local4. Using a clusterspecific log file facilitates cluster monitoring and problem solving.
Chapter 3:Cluster Software Installation and Configuration If another user holds the lock, a warning will be displayed indicating that there is already a lock on the database. The cluster software allows for the option of taking the lock. If the lock is taken by the current requesting user, the previous holder of the lock can no longer modify the cluster database. Take the lock only if necessary, because uncoordinated simultaneous configuration sessions may cause unpredictable cluster behavior.
Section 3.4:Using the cluadmin Utility Table 3–1 69 cluadmin Commands cluadmin Command cluadmin Subcommand help cluster Description Example None Displays help for the specified cluadmin command or subcommand. help service add status cluster status Displays a snapshot of the current cluster status. See Section 8.1, Displaying Cluster and Service Status for information. loglevel Sets the logging for the specified cluster daemon to the specified severity level. See Section 8.
Chapter 3:Cluster Software Installation and Configuration cluadmin Command service cluadmin Subcommand Description Example restore Restores the cluster configuration database from the backup copy in the /etc/cluster.conf.bak file. See Section 8.5, Backing Up and Restoring the Cluster Database for information. cluster restore saveas Saves the cluster configuration database to the specified file. See Section 8.5, Backing Up and Restoring the Cluster Database for information.
Section 3.4:Using the cluadmin Utility cluadmin Command cluadmin Subcommand Description 71 Example relocate Causes a service to be stopped on the service relocate cluster member its currently running nfs1 on and restarted on the other. Refer to Section 4.6, Relocating a Service for more information. show config Displays the current configuration for service show the specified service. See Section 4.2, config dbservice Displaying a Service Configuration for information.
Chapter 3:Cluster Software Installation and Configuration While using the cluadmin utility, press the [Tab] key to help identify cluadmin commands. For example, pressing the [Tab] key at the cluadmin> prompt displays a list of all the commands. Entering a letter at the prompt and then pressing the [Tab] key displays the commands that begin with the specified letter. Specifying a command and then pressing the [Tab] key displays a list of all the subcommands that can be specified with that command.
Section 4.1:Configuring a Service 4 Service Configuration and Administration The following sections describe how to configure, display, enable/disable, modify, relocate, and delete a service, as well as how to handle services which fail to start. 4.1 Configuring a Service The cluster systems must be prepared before any attempts to configure a service. For example, set up disk storage or applications used in the services.
Chapter 4:Service Configuration and Administration • Section 5.4, Setting Up a DB2 Service • Section 6.1, Setting Up an NFS Service • Section 6.2, Setting Up a High Availability Samba Service • Section 7.1, Setting Up an Apache Service 4.1.1 Gathering Service Information Before creating a service, gather all available information about the service resources and properties. When adding a service to the cluster database, the cluadmin utility will prompt for this information.
Section 4.1:Configuring a Service Service Property or Resource IP address Disk partition Mount points, file system types, mount options, NFS export options, and Samba shares Description One or more Internet protocol (IP) addresses may be assigned to a service.
Chapter 4:Service Configuration and Administration Service Property or Resource Description Service Check Interval Specifies the frequency (in seconds) that the system will check the health of the application associated with the service. For example, it will verify that the necessary NFS or Samba daemons are running. For additional service types, the monitoring consists of examining the return status when calling the "status" clause of the application service script.
Section 4.2:Displaying a Service Configuration The /usr/share/cluster/doc/services/examples directory contains a template that can be used to create service scripts, in addition to examples of scripts. See Section 5.1, Setting Up an Oracle Service, Section 5.3, Setting Up a MySQL Service, Section 7.1, Setting Up an Apache Service, and Section 5.4, Setting Up a DB2 Service for sample scripts. 4.1.
Chapter 4:Service Configuration and Administration • Whether the service was disabled after it was added • Preferred member system • Whether the service will relocate to its preferred member when it joins the cluster • Service Monitoring interval • Service start script location IP addresses • Disk partitions • File system type • Mount points and mount options • NFS exports • Samba shares To display cluster service status, see Section 8.1, Displaying Cluster and Service Status.
Section 4.4:Enabling a Service NFS export 0: /mnt/users/engineering/brown Client 0: brown, rw cluadmin> If the name of the service is known, it can be specified with the service show config service_name command. 4.3 Disabling a Service A running service can be disabled in order to stop the service and make it unavailable. Once disabled, a service can then be re-enabled. See Section 4.4, Enabling a Service for information.
Chapter 4:Service Configuration and Administration 4.5 Modifying a Service All properties that were specified when a service was created can be modified. For example, specified IP addresses can be changed. More resources can also be added to a service (for example, more file systems). See Section 4.1.1, Gathering Service Information for information. A service must be disabled before it can be modified. If an attempt is made to modify a running service, the cluster manager will prompt to disable it.
Section 4.8:Handling Services that Fail to Start 4.7 Deleting a Service A cluster service can be deleted. Note that the cluster database should be backed up before deleting a service. See Section 8.5, Backing Up and Restoring the Cluster Database for information. To delete a service by using the cluadmin utility, follow these steps: 1. Invoke the cluadmin utility on the cluster system that is running the service, and specify the service disable service_name command. See Section 4.
Chapter 4:Service Configuration and Administration 2. Use the cluadmin utility to attempt to enable or disable the service on the cluster system that owns the service. See Section 4.3, Disabling a Service and Section 4.4, Enabling a Service for more information. 3. If the service does not start or stop on the owner system, examine the /var/log/messages log file, and diagnose and correct the problem.
Section 5.1:Setting Up an Oracle Service 5 Database Services This chapter contains instructions for configuring Red Hat Linux Advanced Server to make database services highly available. Note The following descriptions present example database configuration instructions. Be aware that differences may exist in newer versions of each database product. Consequently, this information may not be directly applicable. 5.
Chapter 5:Database Services start and stop a Web application that has been written using Perl scripts and modules and is used to interact with the Oracle database. Note that there are many ways for an application to interact with an Oracle database. The following is an example of the oracle script, which is used to start and stop the Oracle service. Note that the script is run as user oracle, instead of root.
Section 5.1:Setting Up an Oracle Service # ORACLE_SID # # Specifies the Oracle system identifier or "sid", which is the name of # the Oracle Server instance. # ######################################################################## export ORACLE_SID=TESTDB ######################################################################## # # ORACLE_BASE # # Specifies the directory at the top of the Oracle software product and # administrative file structure.
Chapter 5:Database Services # Verify that the users search path includes $ORCLE_HOME/bin # ######################################################################## export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin ######################################################################## # # This does the actual work. # # The oracle server manager is used to start the Oracle Server instance # based on the initSID.ora initialization parameters file specified.
Section 5.1:Setting Up an Oracle Service # Specifies the Oracle system identifier or "sid", which is the name # of the Oracle Server instance. # ###################################################################### export ORACLE_SID=TESTDB ###################################################################### # # ORACLE_BASE # # Specifies the directory at the top of the Oracle software product # and administrative file structure.
Chapter 5:Database Services ###################################################################### export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin ###################################################################### # # This does the actual work. # # The oracle server manager is used to STOP the Oracle Server instance # in a tidy fashion.
Section 5.1:Setting Up an Oracle Service # # This line does the real work. # /usr/bin/dbiproxy --logfile /home/oracle/dbiproxy.log --localport 1100 & exit 0 The following is an example of the stopdbi script, which is used to stop a networking DBI proxy daemon: #!/bin/sh # # ################################################################### # # Our Web Server application (perl scripts) work in a distributed # environment. The technology we use is base upon the # DBD::Oracle/DBI CPAN perl modules.
Chapter 5:Database Services c - Cancel and return to the top-level cluadmin command r - Restart to the initial prompt while keeping previous responses p - Proceed with the next prompt Preferred member [None]: ministor0 Relocate when the preferred member joins the cluster (yes/no/?) \ [no]: yes User script (e.g., /usr/foo/script or None) \ [None]: /home/oracle/oracle Do you want to add an IP address to the service (yes/no/?): yes IP Address Information IP address: 10.1.16.132 Netmask (e.g. 255.255.255.
Section 5.2:Tuning Oracle Services Disable service (yes/no/?) [no]: no name: oracle disabled: no preferred node: ministor0 relocate: yes user script: /home/oracle/oracle IP address 0: 10.1.16.132 netmask 0: 255.255.255.0 broadcast 0: 10.1.16.
Chapter 5:Database Services in the cluster environment. This will ensure that failover is transparent to database client application programs and does not require programs to reconnect. 5.3 Setting Up a MySQL Service A database service can serve highly-available data to a MySQL database application. The application can then provide network access to database client systems, such as Web servers. If the service fails over, the application accesses the shared database data through the new cluster system.
Section 5.3:Setting Up a MySQL Service # Mysql daemon start/stop script. # Usually this is put in /etc/init.d (at least on machines SYSV R4 # based systems) and linked to /etc/rc3.d/S99mysql. When this is done # the mysql server will be started when the machine is started. # Comments to support chkconfig on RedHat Linux # chkconfig: 2345 90 90 # description: A very fast and reliable SQL database engine.
Chapter 5:Database Services else if test -d "$datadir" then pid_file=$datadir/‘hostname‘.pid fi fi if grep "^basedir" $conf > /dev/null then basedir=‘grep "^basedir" $conf | cut -f 2 -d= | tr -d ’ ’‘ bindir=$basedir/bin fi if grep "^bindir" $conf > /dev/null then bindir=‘grep "^bindir" $conf | cut -f 2 -d=| tr -d ’ ’‘ fi fi # Safeguard (relative paths, core dumps..) cd $basedir case "$mode" in ’start’) # Start daemon if test -x $bindir/safe_mysqld then # Give extra arguments to mysqld with the my.
Section 5.3:Setting Up a MySQL Service echo "No mysqld pid file found. Looked for $pid_file." fi ;; *) # usage echo "usage: $0 start|stop" exit 1 ;; esac The following example shows how to use cluadmin to add a MySQL service. cluadmin> service add The user interface will prompt you for information about the service. Not all information is required for all services. Enter a question mark (?) at a prompt to obtain help.
Chapter 5:Database Services Broadcast (e.g. X.Y.Z.255 or None) [None]: [Return] Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses: f Do you want to add a disk device to the service (yes/no/?): yes Disk Device Information Device special file (e.g., /dev/sda1): /dev/sda1 Filesystem type (e.g., ext2, reiserfs, ext3 or None): ext2 Mount point (e.g., /usr/mnt/service1 or None) [None]: /var/mysql Mount options (e.g.
Section 5.4:Setting Up a DB2 Service 97 1. On both cluster systems, log in as root and add the IP address and host name that will be used to access the DB2 service to /etc/hosts file. For example: 10.1.16.182 ibmdb2.class.cluster.com ibmdb2 2. Choose an unused partition on a shared disk to use for hosting DB2 administration and instance data, and create a file system on it. For example: # mke2fs /dev/sda3 3. Create a mount point on both cluster systems for the file system created in Step 2.
Chapter 5:Database Services ADMIN.HOME_DIRECTORY = /db2home/db2as ---------Administration Server Profile Registry Settings--------------------------------------------------------ADMIN.DB2COMM = TCPIP ---------Global Profile Registry Settings-----------------------------------------------------------------DB2SYSTEM = ibmdb2 7. Start the installation. For example: devel0# cd /mnt/cdrom/IBM/DB2 devel0# ./db2setup -d -r /root/db2server.rsp 1>/dev/null \ 2>/dev/null & 8.
Section 5.4:Setting Up a DB2 Service 99 2>/dev/null & 14. Check for errors during the installation by examining the installation log file.
Chapter 5:Database Services ;; esac 17. Modify the /usr/IBMdb2/V6.1/instance/db2ishut file on both cluster systems to forcefully disconnect active applications before stopping the database. For example: for DB2INST in ${DB2INSTLIST?}; do echo "Stopping DB2 Instance "${DB2INST?}"...
Section 5.4:Setting Up a DB2 Service To test the database from the DB2 client system, invoke the following commands: # db2 connect to db2 user db2inst1 using ibmdb2 # db2 select tabname from syscat.
Chapter 5:Database Services
Section 6.1:Setting Up an NFS Service 6 Network File Sharing Services This chapter contains instructions for configuring Red Hat Linux Advanced Server to make network file sharing services through NFS and Samba highly available. 6.1 Setting Up an NFS Service A highly available network filesystem (NFS) are one of the key strengths of the clustering infrastructure.
Chapter 6:Network File Sharing Services NFS services will not start unless the following NFS daemons are running: nfsd, rpc.mountd, and rpc.statd. • Filesystem mounts and their associated exports for clustered NFS services should not be included in /etc/fstab or /etc/exports. Rather, for clustered NFS services, the parameters describing mounts and exports are entered via the cluadmin configuration utility. 6.1.
Section 6.1:Setting Up an NFS Service – Mount options — The mount information also designates the mount options. Note: by default, the Linux NFS server does not guarantee that all write operations are synchronously written to disk. In order to ensure synchronous writes, specify the sync mount option. Specifying the sync mount option favors data integrity at the expense of performance. Refer to mount(8) for detailed descriptions of the mount related parameters.
Chapter 6:Network File Sharing Services The following are the service configuration parameters which will be used as well as some descriptive commentary. Note Prior to configuring an NFS service using cluadmin, it is required that the cluster daemons are running. • Service Name — nfs_accounting. This name was chosen as a reminder of the service’s intended function to provide exports to the members of the accounting department. • Preferred Member — clu4.
Section 6.1:Setting Up an NFS Service 107 Service name: nfs_accounting Preferred member [None]: clu4 Relocate when the preferred member joins the cluster (yes/no/?) \ [no]: yes Status check interval [0]: 30 User script (e.g., /usr/foo/script or None) [None]: Do you want to add an IP address to the service (yes/no/?) [no]: yes IP Address Information IP address: 10.0.0.10 Netmask (e.g. 255.255.255.0 or None) [None]: Broadcast (e.g. X.Y.Z.
Chapter 6:Network File Sharing Services are you (f)inished adding CLIENTS [f]: a Export client name [*]: dwalsh Export client options [None]: rw Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: f Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS EXPORTS, or are you (f)inished adding EXPORTS [f]: Do you want to (a)dd, (m)odify, (d)elete or (s)how DEVICES, or are you (f)inished adding DEVICES [f]: Disable service (yes/no/?) [no]: name: nfs_en
Section 6.1:Setting Up an NFS Service 6.1.5 Active-Active NFS Configuration In the previous section, an example configuration of a simple NFS service was discussed. This section describes how to setup a more complex NFS service. The example in this section involves configuring a pair of highly available NFS services. In this example, suppose two separate teams of users will be accessing NFS filesystems served by the cluster. To serve these users, two separate NFS services will be configured.
Chapter 6:Network File Sharing Services Service name: nfs_engineering Preferred member [None]: clu3 Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes Status check interval [0]: 30 User script (e.g., /usr/foo/script or None) [None]: Do you want to add an IP address to the service (yes/no/?) [no]: yes IP Address Information IP address: 10.0.0.11 Netmask (e.g. 255.255.255.0 or None) [None]: Broadcast (e.g. X.Y.Z.
Section 6.
Chapter 6:Network File Sharing Services Avoid using exportfs -r File systems being NFS exported by cluster members do not get specified in the conventional /etc/exports file. Rather, the NFS exports associated with cluster services are specified in the cluster configuration file (as established by cluadmin). The command exportfs -r removes any exports which are not explicitly specified in the /etc/exports file.
Section 6.2:Setting Up a High Availability Samba Service Note A complete explanation of Samba configuration is beyond the scope of this document. Rather, this documentation highlights aspects which are crucial for clustered operation. Refer to The Official Red Hat Linux Customization Guide for more details on Samba configuration. Additionally, refer to the following URL for more information on Samba configuration http://www.redhat.com/support/resources/print_file/samba.html.
Chapter 6:Network File Sharing Services the specified Windows clients. It also designates access permissions and other mapping capabilities. In the single system model, a single instance of each of the smbd and nmbd daemons are automatically started up by the /etc/rc.d/init.d/smb runlevel script. In order to implement high availibility Samba services, rather than having a single /etc/samba/smb.conf file; there is an individual per-service samba configuration file.
Section 6.2:Setting Up a High Availability Samba Service 6.2.3 Gathering Samba Service Configuration Parameters When preparing to configure Samba services, determine configuration information such as which filesystems will be presented as shares to Windows based clients. The following information is required in order to configure NFS services: • Service Name — A name used to uniquely identify this service within the cluster.
Chapter 6:Network File Sharing Services – Forced unmount — As part of the mount information, you will be prompted as to whether forced unmount should be enabled or not. When forced unmount is enabled, if any applications running on the cluster server have the designated filesystem mounted when the service is being disabled or relocated, then that application will be killed off to allow the unmount to proceed. • Export Information — this information is required for NFS services only.
Section 6.2:Setting Up a High Availability Samba Service 6.2.4 Example Samba Service Configuration In order to illustrate the configuration process for a Samba service, an example configuration is described in this section. This example consists of setting up a single Samba share which houses the home directories of four members of the accounting team. The accounting team will then access this share from their Windows based systems.
Chapter 6:Network File Sharing Services Service name: samba_acct Preferred member [None]: clu4 Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes User script (e.g., /usr/foo/script or None) [None]: Status check interval [0]: 90 Do you want to add an IP address to the service (yes/no/?) [no]: yes IP Address Information IP address: 10.0.0.10 Netmask (e.g. 255.255.255.0 or None) [None]: Broadcast (e.g. X.Y.Z.
Section 6.2:Setting Up a High Availability Samba Service 119 relocate: yes user script: None monitor interval: 90 IP address 0: 10.0.0.
Chapter 6:Network File Sharing Services workgroup = RHCLUSTER lock directory = /var/cache/samba/acct log file = /var/log/samba/%m.log encrypt passwords = yes bind interfaces only = yes interfaces = 10.0.0.10 [acct] comment = High Availability Samba Service browsable = yes writable = no public = yes path = /mnt/service12 The following are descriptions of the most relevant fields, from a clustering perspective, In this example, the file is named in the /etc/samba/smb.conf.sharename file.
Section 6.2:Setting Up a High Availability Samba Service writable By default, the share access permissions are conservatively set as non-writable. Tune this parameter according to your site-specific preferences. path Defaults to the first filesystem mount point specified within the service configuration. This should be adjusted to match the specific directory or subdirectory intended to be available as a share to Windows clients. 6.2.
Chapter 6:Network File Sharing Services measures to respond to the lack of immediate response from the Samba server. In the case of a planned service relocation or a true failover scenario, there is a period of time where the Windows clients will not get immediate response from the Samba server. Robust Windows applications will retry requests which timeout during this interval.
Section 7.1:Setting Up an Apache Service 7 Apache Services This chapter contains instructions for configuring Red Hat Linux Advanced Server to make the Apache Web server highly available. 7.1 Setting Up an Apache Service This section provides an example of setting up a cluster service that will fail over an Apache Web server. Although the actual variables used in the service depend on the specific configuration, the example may assist in setting up a service for a particular environment.
Chapter 7:Apache Services 1. On a shared disk, use the interactive fdisk utility to create a partition that will be used for the Apache document root directory. Note that it is possible to create multiple document root directories on different disk partitions. See Partitioning Disks in Section 2.4.4 for more information. 2. Use the mkfs command to create an ext2 file system on the partition you created in the previous step. Specify the drive letter and the partition number.
Section 7.1:Setting Up an Apache Service • If the script directory resides in a non-standard location, specify the directory that will contain the CGI programs. For example: ScriptAlias /cgi-bin/ "/mnt/apacheservice/cgi-bin/" • Specify the path that was used in the previous step, and set the access permissions to default to that directory.
Chapter 7:Apache Services Before the Apache service is added to the cluster database, ensure that the Apache directories are not mounted. Then, on one cluster system, add the service. Specify an IP address, which the cluster infrastructure will bind to the network interface on the cluster system that runs the Apache service. The following is an example of using the cluadmin utility to add an Apache service.
Section 7.1:Setting Up an Apache Service Do you want to (a)dd, (m)odify, (d)elete or (s)how devices, or are you (f)inished adding device information: f Disable service (yes/no/?) [no]: no name: apache disabled: no preferred node: node1 relocate: yes user script: /etc/rc.d/init/httpd IP address 0: 10.1.16.150 netmask 0: 255.255.255.0 broadcast 0: 10.1.16.
Chapter 7:Apache Services
Section 8.1:Displaying Cluster and Service Status 8 Cluster Administration The following chapter describes the various administrative tasks involved in maintaining a cluster after it has been installed and configured. 8.1 Displaying Cluster and Service Status Monitoring cluster and service status can help identify and resolve problems in the cluster environment.
Chapter 8:Cluster Administration Table 8–2 Power Switch Status Power Switch Status Description OK The power switch is operating properly. Wrn Could not obtain power switch status. Err A failure or error has occurred. Good The power switch is operating properly. Unknown The other cluster member is DOWN. Timeout The power switch is not responding to power daemon commands, possibly because of a disconnected serial cable. Error A failure or error has occurred.
Section 8.1:Displaying Cluster and Service Status Table 8–4 Service Status Service Status Description running The service resources are configured and available on the cluster system that owns the service. The running state is a persistent state. From this state, a service can enter the stopping state (for example, if the preferred member rejoins the cluster) disabled The service has been disabled, and does not have an assigned owner. The disabled state is a persistent state.
Chapter 8:Cluster Administration clu2 Up 1 Good =================== H e a r t b e a t S t Name Type ------------------------------ ---------clu1 <--> clu2 network =================== S e r v i c e Restart Service ------------nfs1 nfs2 nfs3 a t u s =================== Status -----------ONLINE S t a t u s ======================= Last Monitor Status Owner -------started started started ------------clu1 clu2 clu1 Transition ---------------16:07:42 Feb 27 00:03:52 Feb 28 07:43:54 Feb 28 Interva
Section 8.5:Backing Up and Restoring the Cluster Database When the system is able to rejoin the cluster, use the following command: /sbin/chkconfig --add cluster Then reboot the system or run the cluster start command located in the System V init directory. For example: /sbin/service cluster start 8.4 Modifying the Cluster Configuration It may be necessary at some point to modify the cluster configuration.
Chapter 8:Cluster Administration 2. On the remaining cluster system, invoke the cluadmin utility and restore the cluster database. To restore the database from the /etc/cluster.conf.bak file, specify the cluster restore command. To restore the database from a different file, specify the cluster restorefrom file_name command. The cluster will disable all running services, delete all the services, and then restore the database. 3.
Section 8.7:Updating the Cluster Software 8.7 Updating the Cluster Software Before upgrading Red Hat Cluster Manager, be sure to install all of the required software, as described in Section 2.3.1, Kernel Requirements. The cluster software can be updated while preserving the existing cluster database. Updating the cluster software on a system can take from 10 to 20 minutes. To update the cluster software while minimizing service downtime, follow these steps: 1.
Chapter 8:Cluster Administration cluconfig --init=/dev/raw/raw1 9. Start the cluster software on the second cluster system by invoking the cluster start command located in the System V init directory. For example: /sbin/service cluster start 8.8 Reloading the Cluster Database Invoke the cluadmin utility and use the cluster reload command to force the cluster to re-read the cluster database. For example: cluadmin> cluster reload 8.
Section 8.12:Diagnosing and Correcting Problems in a Cluster /sbin/cluconfig --init=/dev/raw/raw1 6. Start the cluster daemons by invoking the cluster start command located in the System V init directory on both cluster systems. For example: /sbin/service cluster start 8.11 Disabling the Cluster Software It may become necessary to temporarily disable the cluster software on a member system.
Chapter 8:Cluster Administration Table 8–5 Diagnosing and Correcting Problems in a Cluster Problem Symptom Solution SCSI bus not terminated SCSI errors appear in the log file Each SCSI bus must be terminated only at the beginning and end of the bus. Depending on the bus configuration, it might be necessary to enable or disable termination in host bus adapters, RAID controllers, and storage enclosures. To support hot plugging, external termination is required to terminate a SCSI bus.
Section 8.12:Diagnosing and Correcting Problems in a Cluster Problem Symptom Solution SCSI identification numbers not unique SCSI errors appear in the log file Each device on a SCSI bus must have a unique identification number. See Section A.5, SCSI Identification Numbers for more information. SCSI commands timing out SCSI errors appear in the before completion log file The prioritized arbitration scheme on a SCSI bus can result in low-priority devices being locked out for some period of time.
Chapter 8:Cluster Administration Problem Symptom Solution Mounted quorum partition Messages indicating checksum errors on a quorum partition appear in the log file Be sure that the quorum partition raw devices are used only for cluster state information. They cannot be used for cluster services or for non-cluster purposes, and cannot contain a file system. See Configuring Quorum Partitions in Section 2.4.4 for more information.
Section 8.12:Diagnosing and Correcting Problems in a Cluster Problem Symptom Solution Quorum partitions not set up correctly Messages indicating that a quorum partition cannot be accessed appear in the log file Run the cludiskutil -t command to check that the quorum partitions are accessible. If the command succeeds, run the cludiskutil -p command on both cluster systems. If the output is different on the systems, the quorum partitions do not point to the same devices on both systems.
Chapter 8:Cluster Administration Problem Symptom Cluster service stop fails because a file system cannot be unmounted Messages indicating the operation failed appear on the console or in the log file Incorrect entry in the cluster database Cluster operation is impaired Incorrect Ethernet heartbeat entry in the cluster database or /etc/hosts file Cluster status indicates that a Ethernet heartbeat channel is OFFLINE even though the interface is valid Solution Use the fuser and ps commands to ide
Section 8.12:Diagnosing and Correcting Problems in a Cluster Problem Symptom Solution Loose cable connection to power switch Power switch status is Check the serial cable connection. Power switch serial port incorrectly specified in the cluster database Power switch status indicates a problem Heartbeat channel problem Heartbeat channel status is Timeout OFFLINE Examine the current settings and modify the cluster configuration by running the cluconfig utility, as specified in Section 8.
Chapter 8:Cluster Administration
Section 9.1:Setting up the JRE 145 9 Configuring and using the Red Hat Cluster Manager GUI Red Hat Cluster Manager includes a graphical user interface (GUI) which allows an administrator to graphically monitor cluster status. The GUI does not allow configuration changes or management of the cluster, however. 9.1 Setting up the JRE The Red Hat Cluster Manager GUI can be run directly on a cluster member, or from a non-cluster member to facilitate remote web based monitoring.
Chapter 9:Configuring and using the Red Hat Cluster Manager GUI 9.1.2 Setting up the Sun JRE If the cluster GUI is to be installed on a non-cluster member, it may be necessary to download and install the JRE. The JRE can be obtained from Sun’s java.sun.com site. For example, at the time of publication, the specific page is http://java.sun.com/j2se/1.3/jre/download-linux.html After downloading the JRE, run the downloaded program (for example, j2re-1_3_1_02-linuxi386-rpm.
Section 9.4:Starting the Red Hat Cluster Manager GUI Do you wish to enable monitoring, both locally and remotely, via \ the Cluster GUI? yes/no [yes]: Answering no disables Cluster GUI access completely. 9.3 Enabling the Web Server In order to enable usage of the Cluster Manager GUI, all cluster members must be running a web server. For example, the HTTP daemon must be running for the Apache web server to operate.
Chapter 9:Configuring and using the Red Hat Cluster Manager GUI Figure 9–1 Red Hat Cluster Manager GUI Splashscreen By double-clicking on the cluster name within the tree view, the right side of the GUI will then fill with cluster statistics, as shown in Figure 9–2, Red Hat Cluster Manager GUI Main Screen. These statistics depict the status of the cluster members, the services running on each member, and the heartbeat channel status.
Section 9.4:Starting the Red Hat Cluster Manager GUI Figure 9–2 Red Hat Cluster Manager GUI Main Screen By default, the cluster statistics will be refreshed every 5 seconds. Clicking the right mouse button on the cluster name within the tree view will load a dialog allowing modification of the default update interval. 9.4.1 Viewing Configuration Details After initiating cluster monitoring, it is possible to obtain detailed configuration information by double-clicking on any of the cluster status items.
Chapter 9:Configuring and using the Red Hat Cluster Manager GUI Figure 9–3 Red Hat Cluster Manager GUI Configuration Details Screen In Figure 9–3, Red Hat Cluster Manager GUI Configuration Details Screen, notice that the detailed device information appears after clicking on the individual device parameters.
Section A.1:Setting Up Power Switches A 151 Supplementary Hardware Information The information in the following sections can help you set up a cluster hardware configuration. In some cases, the information is vendor specific. A.1 Setting Up Power Switches A.1.1 Setting up RPS-10 Power Switches If an RPS-10 Series power switch is used as a part of a cluster, be sure of the following: • Set the rotary address on both power switches to 0.
Appendix A:Supplementary Hardware Information Figure A–1 RPS-10 Power Switch Hardware Configuration See the RPS-10 documentation supplied by the vendor for additional installation information. Note that the information provided in this document supersedes the vendor information. A.1.2 Setting up WTI NPS Power Switches The WTI NPS-115 and NPS-230 power switch is a network attached device. Essentially it is a power strip with network connectivity enabling power cycling of individual outlets.
Section A.1:Setting Up Power Switches • 153 Assign system names to the Plug Parameters, (for example, clu1 to plug 1, clu2 to plug 2 — assuming these are the cluster member names). When running cluconfig to specify power switch parameters: • Specify a switch type of WTI_NPS. • Specify the password you assigned to the NPS switch (refer to Step 1 in prior section). • When prompted for the plug/port number, specify the same name as assigned in Step 3 in the prior section.
Appendix A:Supplementary Hardware Information A.1.3 Setting up Baytech Power Switches The following information pertains to the RPC-3 and PRC-5 power switches. The Baytech power switch is a network attached device. Essentially, it is a power strip with network connectivity enabling power cycling of individual outlets. Only 1 Baytech switch is needed within the cluster (unlike the RPS-10 model where a separate switch per cluster member is required).
Section A.1:Setting Up Power Switches • When prompted for the plug/port number, specify the same name as assigned in Step 4 in prior section. The following is an example screen output from configuring the Baytech switch which shows that the outlets have been named according to the example cluster names clu1 and clu2. Outlet Operation Configuration Menu Enter request, CR to exit. 1)...Outlet Status Display: enabled 2)...Command Confirmation : enabled 3)...Current Alarm Level (amps): 4.1 4)...
Appendix A:Supplementary Hardware Information Configuring the Software Watchdog Timer Any cluster system can utilize the software watchdog timer as a data integrity provision, as no dedicated hardware components are required. If you have specified a power switch type of SW_WATCHDOG while using the cluconfig utility, the cluster software will automatically load the corresponding loadable kernel module called softdog.
Section A.1:Setting Up Power Switches 157 Note There may be other server types that support NMI watchdog timers aside from ones with Intel-based SMP system boards. Unfortunately, there is no simple way to test for this functionality other than simple trial and error. The NMI watchdog is enabled on supported systems by adding nmi_watchdog=1 to the kernel’s command line. Here is an example /etc/grub.conf: # # grub.conf # default=0 timeout=10 splashimage=(hd0,0)/grub/splash.xpm.gz title HA Test Kernel (2.4.
Appendix A:Supplementary Hardware Information In order to determine if the server supports the NMI watchdog timer, first try adding "nmi_watchdog=1" to the kernel command line as described above.
Section A.
Appendix A:Supplementary Hardware Information Note It has been observed that the Master Switch may become unresponsive when placed on networks which have high occurrences of broadcast or multi-cast packets. In these cases, isolate the power switch to a private subnet. • APC Serial On/Off Switch (partAP9211): http://www.apc.com Note This switch type does not provide a means for the cluster to query its status. Therefore the cluster always assumes it is connected and operational. A.1.
Section A.3:SCSI Bus Termination • Buses must be terminated at each end. See Section A.3, SCSI Bus Termination for more information. • Buses must not extend beyond the maximum length restriction for the bus type. Internal cabling must be included in the length of the SCSI bus. See Section A.4, SCSI Bus Length for more information. • All devices (host bus adapters and disks) on a bus must have unique SCSI identification numbers. See Section A.5, SCSI Identification Numbers for more information.
Appendix A:Supplementary Hardware Information • To disconnect a host bus adapter from a single-initiator bus, you must disconnect the SCSI cable first from the RAID controller and then from the adapter. This ensures that the RAID controller is not exposed to any erroneous input. • Protect connector pins from electrostatic discharge while the SCSI cable is disconnected by wearing a grounded anti-static wrist guard and physically protecting the cable ends from contact with other objects.
Section A.6:Host Bus Adapter Features and Configuration Requirements The previous order specifies that 7 is the highest priority, and 8 is the lowest priority. The default SCSI identification number for a host bus adapter is 7, because adapters are usually assigned the highest priority. It is possible to assign identification numbers for logical units in a RAID subsystem by using the RAID management interface. To modify an adapter’s SCSI identification number, use the system BIOS utility.
Appendix A:Supplementary Hardware Information Table A–3 Host Bus Adapter Features and Configuration Requirements Host Bus Adapter Features Single-Initiator Configuration Adaptec 2940U2W Ultra2, wide, LVD. HD68 external connector. One channel, with two bus segments. Set the onboard termination by using the BIOS utility. Onboard termination is disabled when the power is off. Set the onboard termination to automatic (the default). Use the internal SCSI connector for private (non-cluster) storage.
Section A.6:Host Bus Adapter Features and Configuration Requirements Host Bus Adapter Features Tekram DC-390U2W Ultra2, wide, LVD HD68 external connector One channel, two segments Onboard termination for a bus segment is disabled if internal and external cables are connected to the segment. Onboard termination is enabled if there is only one cable connected to the segment. Termination is disabled when the power is off.
Appendix A:Supplementary Hardware Information Host Bus Adapter Adaptec 29160LP Adaptec 39160 Qlogic QLA12160 Features Single-Initiator Configuration Ultra160 VHDCI external connector One channel Set the onboard termination by using the BIOS utility. Termination is disabled when the power is off, unless jumpers are used to enforce termination. Set the onboard termination to automatic (the default). Use the internal SCSI connector for private (non-cluster) storage.
Section A.6:Host Bus Adapter Features and Configuration Requirements Host Bus Adapter Features Single-Initiator Configuration LSI Logic SYM22915 Ultra160 Two VHDCI external connectors Two channels Set the onboard termination by using the BIOS utility. The onboard termination is automatically enabled or disabled, depending on the configuration, even when the module power is off. Use jumpers to disable the automatic termination. Set onboard termination to automatic (the default).
Appendix A:Supplementary Hardware Information Table A–4 QLA2200 Features and Configuration Requirements Host Bus Adapter Features QLA2200 (minimum driver: QLA2x00 V2.23 Fibre Channel arbitrated loop and fabric One channel Single-Initiator Configuration Can be implemented with point-to-point links from the adapter to a multi-ported storage device. Hubs are required to connect an adapter to a dual-controller RAID array or to multiple RAID arrays.
Section A.7:Tuning the Failover Interval 169 Name Default (sec.) Description sameTimeNetdown 7 The number of intervals that must elapse before concluding a cluster member has failed when the cluhbd heartbeat daemon is unable to communicate with the other cluster member sameTimeNetup 12 The number of intervals that must elapse before concluding a cluster member to have failed, when the cluhbd heartbeat daemon is able to communicate with the other cluster member.
Appendix A:Supplementary Hardware Information
Section B.1:Cluster Communication Mechanisms B Supplementary Software Information The information in the following sections can assist in the management of the cluster software configuration. B.1 Cluster Communication Mechanisms A cluster uses several intra-cluster communication mechanisms to ensure data integrity and correct cluster behavior when a failure occurs.
Appendix B:Supplementary Software Information The complete failure of the heartbeat communication mechanism does not automatically result in a failover. If a cluster system determines that the quorum timestamp from the other cluster system is not up-todate, it will check the heartbeat status. If heartbeats to the system are still operating, the cluster will take no action at this time.
Section B.3:Failover and Recovery Scenarios B.3 Failover and Recovery Scenarios Understanding cluster behavior when significant events occur can assist in the proper management of a cluster. Note that cluster behavior depends on whether power switches are employed in the configuration. Power switches enable the cluster to maintain complete data integrity under all failure conditions. The following sections describe how the system will respond to various failure and error scenarios. B.3.
Appendix B:Supplementary Software Information B.3.2 System Panic A system panic (crash) is a controlled response to a software-detected error. A panic attempts to return the system to a consistent state by shutting down the system. If a cluster system panics, the following occurs: 1. The functional cluster system detects that the cluster system that is experiencing the panic is not updating its timestamp on the quorum partitions and is not communicating over the heartbeat channels. 2.
Section B.3:Failover and Recovery Scenarios • All the heartbeat network cables are disconnected from a system. • All the serial connections and network interfaces used for heartbeat communication fail. If a total network connection failure occurs, both systems detect the problem, but they also detect that the SCSI disk connections are still active. Therefore, services remain running on the systems and are not interrupted.
Appendix B:Supplementary Software Information If a quorum daemon fails, and power switches are used in the cluster, the following occurs: 1. The functional cluster system detects that the cluster system whose quorum daemon has failed is not updating its timestamp on the quorum partitions, although the system is still communicating over the heartbeat channels. 2. After a period of time, the functional cluster system power-cycles the cluster system whose quorum daemon has failed.
Section B.4:Cluster Database Fields /sbin/service cluster stop Then, to restart the cluster software, perform the following: /sbin/service cluster start B.3.10 Monitoring Daemon Failure If the cluster monitoring daemon (clumibd) fails, it is not possible to use the cluster GUI to monitor status. Note, to enable the cluster GUI to remotely monitor cluster status from non-cluster systems, enable this compatibility when prompted in cluconfig. B.
Appendix B:Supplementary Software Information id = id name = system_name Specifies the identification number (either 0 or 1) for the cluster system and the name that is returned by the hostname command (for example, storage0). powerSerialPort = serial_port Specifies the device special file for the serial port to which the power switches are connected, if any (for example, /dev/ttyS0). powerSwitchType = power_switch Specifies the power switch type, either RPS10, APC, or None.
Section B.5:Using Red Hat Cluster Manager with Piranha start device0 name = device_file Specifies the special device file, if any, that is used in the service (for example, /dev/sda1). Note that it is possible to specify multiple device files for a service.
Appendix B:Supplementary Software Information Figure B–1 Cluster in an LVS Environment In a Piranha configuration, client systems issue requests on the World Wide Web. For security reasons, these requests enter a Web site through a firewall, which can be a Linux system serving in that capacity or a dedicated firewall device. For redundancy, you can configure firewall devices in a failover configuration.
Section B.5:Using Red Hat Cluster Manager with Piranha For example, the figure could represent an e-commerce site used for online merchandise ordering through a URL. Client requests to the URL pass through the firewall to the active Piranha load-balancing system, which then forwards the requests to one of the three Web servers. The Red Hat Cluster Manager systems serve dynamic data to the Web servers, which forward the data to the requesting client system.
Appendix B:Supplementary Software Information
Index Index A active-active configuration ............. ....... 7 Apache httpd.conf....... ................. .... 124 setting up service.... ................. .... 123 availability and data integrity table .... ..... 15 C cluadmin adding a MySQL service ............ ..... 95 and Oracle services . ................. ..... 89 commands ........... ................. ..... 69 using ................. ................. ..... 67 cluadmin commands. ................ ..... 69 cluconfig cluster aliasing with. .........
Index severity levels ...... .................. .... 134 cluster features administration user interface ........ ....... 9 application monitoring............... ....... 9 data integrity assurance. ............. ....... 9 event logging facility ................ ....... 9 manual service relocation capabilities..... 9 multiple cluster communication methods . 9 no-single-point-of-failure hardware configuration.. .................. ....... 9 service configuration framework ... .......
Index ( See cluster daemons ) database service ........ ................. ....... 9 databases DB2 setting up service. ................. ..... 96 MySQL setting up service. ................. ..... 92 using cluadmin with ........... ..... 95 Oracle oraclescript example ........... ..... 84 setting up.......... ................. ..... 83 startdb script example ........ ..... 84 startdbi script example....... ..... 88 stopdb script example .......... ..... 86 stopdbi script example ........ ..... 89 tuning ....
Index Red Hat Cluster Manager GUI splashscreen .. .................. .... 147 file services NFS active-active configuration ....... .... 109 caveats ........... .................. .... 111 client access ..... .................. .... 108 configuration parameters ......... .... 104 server requirements ............... .... 103 service configuration example ... .... 105 setting up service. ................. .... 103 Samba operating model . .................. .... 113 server requirements ............... ..
Index requirements......... ................. ..... 34 Kernel Boot Timeout Limit decreasing ........... ................. ..... 36 kernel requirements Red Hat Linux....... ................. ..... 34 KVM (keyboard, video, mouse) switch. ... 16 L low voltage differential (LVD)......... .... 162 LVS using Red Hat Cluster Manager with Piranha ......... ................. .... 179 M member status table.... ................. .... 129 member systems ( See cluster systems ) minimum cluster configuration example. ..
Index watchdog timers.... .................. ..... 17 hardware-based.. .................. ..... 17 software-based... .................. ..... 17 power switch hardware table. .......... ..... 20 power switch status table............... .... 130 power switches configuring ......... .................. ..... 41 hardware watchdog timers Configuring ...... .................. .... 158 NMI watchdog timers enabling .......... .................. .... 156 other network power switches ...... .... 159 setting up .
Index services ................. ................. ..... 73 ( See also cluster services ) setting up RPS-10 power switches table .. 151 shared disk storage configuring .......... ................. ..... 44 shared disk storage hardware table .... ..... 22 shared storage .......... ................. ..... 45 shared storage requirements ........... ..... 14 single-initiator fibre channel interconnect setting up ............ ................. ..... 48 single-initiator SCSI bus setting up ............ .........
Index UPS systems configuring ......... .................. ..... 42 W watchdog timers hardware configuring. ...... .................. .... 158 hardware-based. .... .................. ..... 17 NMI enabling .......... .................. .... 156 setting up ........... .................. .... 155 software . ............ .................. .... 156 configuration .... .................. .... 156 software-based ..... .................. .....